Create a Data Source
Creating a data source is the first step in setting up data ingestion in Nexla. Data sources define the connection to external systems and configure how data should be extracted and processed.
Required Fields
When creating a data source, you must provide:
name
: A descriptive name for the data sourcesource_type
: The connector type (e.g.,s3
,mysql
,rest
)data_credentials_id
: ID of existing credentials or inline credentials objectsource_config
: Connector-specific configuration settings
API Endpoint
- Nexla API
POST /data_sources
Example Request Body:
{
"name": "Example S3 Data Source",
"source_type": "s3",
"data_credentials_id": 5001,
"source_config": {
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv"
}
}
Response Structure
A successful creation returns the complete data source object:
- Nexla API
{
"id": 5001,
"owner_id": 2,
"org_id": 1,
"name": "Example S3 Data Source",
"description": null,
"status": "INIT",
"source_type": "s3",
"source_config": {
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv"
},
"data_credentials_id": 5001,
"flow_type": "streaming",
"data_sets": [],
"created_at": "2023-01-15T10:30:00.000Z",
"updated_at": "2023-01-15T10:30:00.000Z"
}
Creating with Existing Credentials
When you have already created data credentials, you can reference them by ID when creating a data source. This approach is recommended for production environments where credentials are managed separately.
Using Credential ID
- Nexla API
{
"name": "Production MySQL Source",
"source_type": "mysql",
"data_credentials_id": 5001,
"source_config": {
"host": "db.example.com",
"port": 3306,
"database": "customers",
"incremental_column": "updated_at"
}
}
Benefits of Existing Credentials
- Security: Credentials are managed centrally and encrypted
- Reusability: Same credentials can be used for multiple sources
- Audit Trail: Clear tracking of credential usage and access
- Rotation: Easier to update credentials across multiple sources
Creating with Inline Credentials
For development and testing, you can create credentials inline with the data source. This approach is convenient but should be used carefully in production environments.
Inline Credentials Example
- Nexla API
{
"name": "Test FTP Source",
"source_type": "ftp",
"data_credentials": {
"name": "FTP Test Credentials",
"credentials_type": "ftp",
"credentials": {
"host": "ftp.example.com",
"username": "testuser",
"password": "testpass",
"port": 21
}
},
"source_config": {
"path": "/data/",
"file_pattern": "*.txt"
}
}
When to Use Inline Credentials
- Development: Quick setup for testing and development
- Temporary Sources: Short-lived or experimental data sources
- Demo Purposes: Demonstrations and proof-of-concept work
- CI/CD: Automated testing and deployment scenarios
Source Configuration
Each connector type requires specific configuration parameters. The source_config
object contains these connector-specific settings.
Common Configuration Elements
- Connection Details: Host, port, database, bucket, or endpoint information
- Authentication: Additional authentication parameters beyond credentials
- Data Selection: Paths, file patterns, table names, or query filters
- Scheduling: Poll intervals, cron expressions, or event triggers
- Schema Detection: Automatic or manual schema identification settings
Connector-Specific Examples
S3 Configuration:
{
"bucket": "my-data-bucket",
"prefix": "daily/",
"file_pattern": "*.csv",
"region": "us-east-1",
"compression": "gzip"
}
MySQL Configuration:
{
"host": "db.example.com",
"port": 3306,
"database": "customers",
"incremental_column": "updated_at",
"query_timeout": 300
}
REST API Configuration:
{
"base_url": "https://api.example.com",
"endpoint": "/v1/data",
"method": "GET",
"headers": {
"Accept": "application/json"
},
"rate_limit": 100
}
Flow Type Configuration
Data sources can be configured with different flow types that optimize performance for specific use cases:
Available Flow Types
streaming
(default): Standard streaming data processingin_memory
: High-performance in-memory processingreplication
: Data replication and synchronization
Setting Flow Type
- Nexla API
{
"name": "High-Performance Source",
"source_type": "mysql",
"data_credentials_id": 5001,
"flow_type": "in_memory",
"source_config": {
"host": "db.example.com",
"database": "analytics"
}
}
Code Container Integration
Data sources can be enhanced with custom code containers for advanced data processing:
Code Container Configuration
- Nexla API
{
"name": "Custom Processing Source",
"source_type": "rest",
"data_credentials_id": 5001,
"code_container": {
"name": "Custom REST Processor",
"code_type": "python",
"code": "def process_data(data): return data.upper()",
"resource_type": "source_custom"
},
"source_config": {
"base_url": "https://api.example.com"
}
}
Code Container Benefits
- Custom Logic: Implement source-specific data processing
- Data Transformation: Clean, filter, or enrich data at the source
- Format Conversion: Convert data to standard formats
- Validation: Ensure data quality before processing
Post-Creation Steps
After creating a data source, you typically need to:
1. Verify Configuration
Check that the source configuration is correct:
GET /data_sources/{source_id}
2. Test Connection
Verify connectivity and credentials:
PUT /data_sources/{source_id}/test
3. Activate Source
Start data ingestion:
PUT /data_sources/{source_id}/activate
4. Monitor Performance
Track ingestion rates and data quality:
GET /data_sources/{source_id}/metrics
Best Practices
To ensure successful data source creation:
- Use Descriptive Names: Choose names that clearly identify the source purpose
- Secure Credentials: Store sensitive information in dedicated credential resources
- Validate Configuration: Test source settings before production use
- Plan for Scale: Consider data volume and processing requirements
- Document Settings: Maintain clear documentation of configuration choices
- Monitor Health: Set up alerts for source performance and errors
Error Handling
Common creation errors and solutions:
- Invalid Source Type: Ensure the connector type is supported
- Missing Credentials: Provide valid data credentials or inline credentials
- Configuration Errors: Verify source_config parameters for the connector type
- Permission Issues: Ensure you have access to create data sources
- Duplicate Names: Use unique names within your organization