Control Ingestion
Data source ingestion can be controlled through various API endpoints that allow you to start, stop, and manage data collection processes. These controls provide immediate management of data ingestion while maintaining scheduled operations and configuration integrity.
Activate and Pause Source
Data sources can be activated to start immediate data ingestion or paused to stop ongoing collection. These controls work alongside scheduled ingestion to provide flexible data collection management.
Activate Source
Trigger immediate data ingestion by activating a data source:
- Nexla API
PUT /data_sources/{source_id}/activate
Example with curl:
curl -X PUT https://api.nexla.io/data_sources/5001/activate \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
Pause Source
Stop ongoing data ingestion by pausing a data source:
- Nexla API
PUT /data_sources/{source_id}/pause
Example with curl:
curl -X PUT https://api.nexla.io/data_sources/5001/pause \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
Activation Behavior
When you activate or pause a source:
- Immediate Effect: Changes take effect immediately
- Scheduled Operations: Scheduled ingestion continues based on configuration
- Flow Integration: Affects all downstream data flows
- Status Updates: Source status changes to reflect current state
Re-ingest Files
For file-based data sources, you can trigger re-ingestion of specific files. This is useful for reprocessing files that failed ingestion or for handling updated files.
Re-ingest Endpoint
- Nexla API
POST /data_sources/{source_id}/file/ingest
Example Request Body:
{
"file": "daily/customer_data_2023-01-15.csv"
}
File Path Requirements
The file path must:
- Start from Source Root: Begin with the location specified in source configuration
- Match File Patterns: Conform to any file pattern filters in source_config
- Be Accessible: Exist and be readable by the source credentials
Response Structure
- Nexla API
{
"status": "ok",
"message": "File ingestion initiated",
"file": "daily/customer_data_2023-01-15.csv",
"source_id": 5001
}
Validate Source Configuration
Source configuration validation ensures that your data source settings are complete and correct before attempting data ingestion. This helps prevent ingestion failures and configuration errors.
Validation Endpoint
- Nexla API
POST /data_sources/{source_id}/config/validate
Example with curl:
curl -X POST https://api.nexla.io/data_sources/5001/config/validate \
-H "Authorization: Bearer <Access-Token>" \
-H "Content-Type: application/json"
Optional Configuration Override
You can validate a different configuration without updating the source:
- Nexla API
{
"source_config": {
"bucket": "test-bucket",
"prefix": "test/",
"file_pattern": "*.csv"
}
}
Validation Response
The validation endpoint provides detailed feedback about configuration issues:
- Nexla API
{
"status": "ok",
"output": [
{
"name": "bucket",
"value": "my-data-bucket",
"errors": [],
"visible": true,
"recommendedValues": []
},
{
"name": "prefix",
"value": "daily/",
"errors": [],
"visible": true,
"recommendedValues": []
},
{
"name": "file_pattern",
"value": "*.csv",
"errors": [],
"visible": true,
"recommendedValues": []
}
]
}
Validation Errors
Common validation issues include:
- Missing Required Fields: Essential configuration parameters not provided
- Invalid Values: Values outside acceptable ranges or formats
- Credential Issues: Authentication or access problems
- Path Problems: Invalid file paths or bucket references
Test Source Connection
Before activating a source, you can test the connection to ensure credentials and configuration are working correctly.
Test Endpoint
- Nexla API
PUT /data_sources/{source_id}/test
Example with curl:
curl -X PUT https://api.nexla.io/data_sources/5001/test \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
Test Response
- Nexla API
{
"status": "ok",
"message": "Connection test successful",
"details": {
"connection": "established",
"authentication": "verified",
"access": "confirmed"
}
}
Ingestion Scheduling
Data sources support various scheduling options for automated data collection:
Cron-based Scheduling
Configure automatic ingestion using cron expressions:
{
"source_config": {
"poll_schedule": "0 */6 * * *",
"timezone": "UTC"
}
}
Event-driven Scheduling
Trigger ingestion based on external events:
{
"source_config": {
"trigger_type": "webhook",
"webhook_url": "https://api.example.com/trigger"
}
}
Manual Control
Override scheduled operations with manual activation:
- Immediate Ingestion: Activate source for immediate data collection
- Scheduled Override: Pause scheduled operations temporarily
- Resume Operations: Restart scheduled ingestion after manual control
Monitoring Ingestion
Track the performance and health of your data sources:
Status Monitoring
Monitor source status through:
- API Endpoints: Check current status and health
- Flow Integration: Monitor impact on data flows
- Performance Metrics: Track ingestion rates and volumes
- Error Logs: Review and resolve ingestion issues
Health Checks
Regular health checks include:
- Connection Status: Verify connectivity to source systems
- Credential Validity: Ensure authentication still works
- Configuration Integrity: Validate source settings
- Performance Metrics: Monitor ingestion efficiency
Best Practices
To ensure effective ingestion control:
- Test Before Production: Validate configurations in test environments
- Monitor Performance: Track ingestion rates and error patterns
- Use Validation: Validate configurations before activation
- Plan Scheduling: Design efficient ingestion schedules
- Handle Errors: Implement proper error handling and retry logic
- Document Changes: Keep track of configuration modifications
Error Handling
Common ingestion control errors and solutions:
- Connection Failures: Check network connectivity and credentials
- Configuration Errors: Validate source_config parameters
- Permission Issues: Verify access rights to source systems
- File Access Problems: Ensure file paths and permissions are correct
- Rate Limiting: Handle API rate limits and throttling