Skip to main content

Create a Data Source

Creating a data source is the first step in setting up data ingestion in Nexla. Data sources define the connection to external systems and configure how data should be extracted and processed.

Required Fields

When creating a data source, you must provide:

  • name: A descriptive name for the data source
  • source_type: The connector type (e.g., s3, mysql, rest)
  • data_credentials_id: ID of existing credentials or inline credentials object
  • source_config: Connector-specific configuration settings

API Endpoint

Create Source: Request
POST /data_sources

Example Request Body:

{
"name": "Example S3 Data Source",
"source_type": "s3",
"data_credentials_id": 5001,
"source_config": {
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv"
}
}

Response Structure

A successful creation returns the complete data source object:

Create Source: Response
{
"id": 5001,
"owner_id": 2,
"org_id": 1,
"name": "Example S3 Data Source",
"description": null,
"status": "INIT",
"source_type": "s3",
"source_config": {
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv"
},
"data_credentials_id": 5001,
"flow_type": "streaming",
"data_sets": [],
"created_at": "2023-01-15T10:30:00.000Z",
"updated_at": "2023-01-15T10:30:00.000Z"
}

Creating with Existing Credentials

When you have already created data credentials, you can reference them by ID when creating a data source. This approach is recommended for production environments where credentials are managed separately.

Using Credential ID

Create with Existing Credentials: Request
{
"name": "Production MySQL Source",
"source_type": "mysql",
"data_credentials_id": 5001,
"source_config": {
"host": "db.example.com",
"port": 3306,
"database": "customers",
"incremental_column": "updated_at"
}
}

Benefits of Existing Credentials

  • Security: Credentials are managed centrally and encrypted
  • Reusability: Same credentials can be used for multiple sources
  • Audit Trail: Clear tracking of credential usage and access
  • Rotation: Easier to update credentials across multiple sources

Creating with Inline Credentials

For development and testing, you can create credentials inline with the data source. This approach is convenient but should be used carefully in production environments.

Inline Credentials Example

Create with Inline Credentials: Request
{
"name": "Test FTP Source",
"source_type": "ftp",
"data_credentials": {
"name": "FTP Test Credentials",
"credentials_type": "ftp",
"credentials": {
"host": "ftp.example.com",
"username": "testuser",
"password": "testpass",
"port": 21
}
},
"source_config": {
"path": "/data/",
"file_pattern": "*.txt"
}
}

When to Use Inline Credentials

  • Development: Quick setup for testing and development
  • Temporary Sources: Short-lived or experimental data sources
  • Demo Purposes: Demonstrations and proof-of-concept work
  • CI/CD: Automated testing and deployment scenarios

Source Configuration

Each connector type requires specific configuration parameters. The source_config object contains these connector-specific settings.

Common Configuration Elements

  • Connection Details: Host, port, database, bucket, or endpoint information
  • Authentication: Additional authentication parameters beyond credentials
  • Data Selection: Paths, file patterns, table names, or query filters
  • Scheduling: Poll intervals, cron expressions, or event triggers
  • Schema Detection: Automatic or manual schema identification settings

Connector-Specific Examples

S3 Configuration:

{
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv",
"region": "us-east-1",
"compression": "gzip"
}

MySQL Configuration:

{
"host": "db.example.com",
"port": 3306,
"database": "customers",
"incremental_column": "updated_at",
"query_timeout": 300
}

REST API Configuration:

{
"base_url": "https://api.example.com",
"endpoint": "/v1/data",
"method": "GET",
"headers": {
"Accept": "application/json"
},
"rate_limit": 100
}

Flow Type Configuration

Data sources can be configured with different flow types that optimize performance for specific use cases:

Available Flow Types

  • streaming (default): Standard streaming data processing
  • in_memory: High-performance in-memory processing
  • replication: Data replication and synchronization

Setting Flow Type

Create with Flow Type: Request
{
"name": "High-Performance Source",
"source_type": "mysql",
"data_credentials_id": 5001,
"flow_type": "in_memory",
"source_config": {
"host": "db.example.com",
"database": "analytics"
}
}

Code Container Integration

Data sources can be enhanced with custom code containers for advanced data processing:

Code Container Configuration

Create with Code Container: Request
{
"name": "Custom Processing Source",
"source_type": "rest",
"data_credentials_id": 5001,
"code_container": {
"name": "Custom REST Processor",
"code_type": "python",
"code": "def process_data(data): return data.upper()",
"resource_type": "source_custom"
},
"source_config": {
"base_url": "https://api.example.com"
}
}

Code Container Benefits

  • Custom Logic: Implement source-specific data processing
  • Data Transformation: Clean, filter, or enrich data at the source
  • Format Conversion: Convert data to standard formats
  • Validation: Ensure data quality before processing

Post-Creation Steps

After creating a data source, you typically need to:

1. Verify Configuration

Check that the source configuration is correct:

GET /data_sources/{source_id}

2. Test Connection

Verify connectivity and credentials:

PUT /data_sources/{source_id}/test

3. Activate Source

Start data ingestion:

PUT /data_sources/{source_id}/activate

4. Monitor Performance

Track ingestion rates and data quality:

GET /data_sources/{source_id}/metrics

Best Practices

To ensure successful data source creation:

  1. Use Descriptive Names: Choose names that clearly identify the source purpose
  2. Secure Credentials: Store sensitive information in dedicated credential resources
  3. Validate Configuration: Test source settings before production use
  4. Plan for Scale: Consider data volume and processing requirements
  5. Document Settings: Maintain clear documentation of configuration choices
  6. Monitor Health: Set up alerts for source performance and errors

Error Handling

Common creation errors and solutions:

  • Invalid Source Type: Ensure the connector type is supported
  • Missing Credentials: Provide valid data credentials or inline credentials
  • Configuration Errors: Verify source_config parameters for the connector type
  • Permission Issues: Ensure you have access to create data sources
  • Duplicate Names: Use unique names within your organization