Monitor a Data Source
Data source monitoring provides comprehensive visibility into the performance, health, and operational status of your data ingestion processes. These monitoring capabilities help you track data quality, identify issues, and optimize ingestion performance.
Lifetime Ingestion Metrics
Lifetime metrics provide a comprehensive view of all data ingested by a source since its creation. This gives you a complete picture of the source's historical performance and data volume.
Lifetime Metrics Endpoint
- Nexla API
GET /data_sources/{source_id}/metrics
Example with curl:
curl https://api.nexla.io/data_sources/5001/metrics \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
Response Structure
- Nexla API
{
"status": 200,
"metrics": {
"records": 1250000,
"size": 2048576000,
"files_processed": 1500,
"errors": 25,
"last_ingestion": "2023-01-15T10:30:00.000Z"
}
}
Lifetime Metrics Benefits
These metrics help you understand:
- Total Data Volume: Complete picture of data processed
- Historical Performance: Long-term trends and patterns
- Error Rates: Overall reliability and quality
- Resource Utilization: Data processing efficiency
Aggregated Ingestion Metrics
Aggregated metrics provide time-based views of ingestion performance, allowing you to analyze trends, identify patterns, and monitor daily, weekly, or monthly performance.
Daily Aggregation Endpoint
- Nexla API
GET /data_sources/{source_id}/metrics?aggregate=1
Optional Query Parameters:
?from=2023-01-01T00:00:00&to=2023-01-31T23:59:59&page=1&size=100
Response Structure
- Nexla API
{
"status": 200,
"metrics": [
{
"time": "2023-01-15",
"records": 53054,
"size": 12476341,
"files_processed": 45,
"errors": 2
},
{
"time": "2023-01-16",
"records": 66618,
"size": 15829589,
"files_processed": 52,
"errors": 0
},
{
"time": "2023-01-17",
"records": 25832,
"size": 6645994,
"files_processed": 18,
"errors": 1
}
],
"pagination": {
"page": 1,
"size": 100,
"total": 31
}
}
Aggregation Granularity
You can adjust the time granularity of metrics:
- Daily:
?aggregate=1(default) - Hourly:
?aggregate=hour - Weekly:
?aggregate=week - Monthly:
?aggregate=month
Ingestion Run Metrics
Ingestion run metrics provide detailed information about individual ingestion cycles, giving you visibility into the performance of each data collection event.
Run Summary Endpoint
- Nexla API
GET /data_sources/{source_id}/metrics/run_summary
Optional Query Parameters:
?from=2023-01-15T00:00:00&to=2023-01-15T23:59:59&page=1&size=50
Response Structure
- Nexla API
{
"status": 200,
"metrics": {
"1673776800000": {
"run_id": "1673776800000",
"start_time": "2023-01-15T06:00:00.000Z",
"end_time": "2023-01-15T06:15:00.000Z",
"records": 1364,
"size": 971330,
"files_processed": 12,
"errors": 0,
"status": "completed"
},
"1673780400000": {
"run_id": "1673780400000",
"start_time": "2023-01-15T07:00:00.000Z",
"end_time": "2023-01-15T07:08:00.000Z",
"records": 330,
"size": 235029,
"files_processed": 3,
"errors": 0,
"status": "completed"
}
}
}
Run Metrics Analysis
Use run metrics to analyze:
- Performance Patterns: Identify slow or fast ingestion cycles
- Error Tracking: Monitor error rates per ingestion run
- Resource Usage: Track processing time and efficiency
- Scheduling Optimization: Optimize ingestion frequency
File Processing Metrics
For file-based sources, you can monitor detailed statistics about file processing, including success rates, failure reasons, and processing status.
File Statistics Endpoint
- Nexla API
GET /data_sources/{source_id}/metrics/files_stats
Optional Query Parameters:
?from=2023-01-15T00:00:00&to=2023-01-15T23:59:59&status=all
Response Structure
- Nexla API
{
"status": 200,
"file_stats": {
"total_files": 150,
"processed": 142,
"failed": 5,
"queued": 3,
"processing": 0,
"status_breakdown": {
"success": 142,
"validation_error": 3,
"format_error": 2,
"access_denied": 0
}
},
"recent_files": [
{
"file_path": "daily/customer_data_2023-01-15.csv",
"status": "processed",
"records": 1250,
"size": 256000,
"processing_time": 45,
"timestamp": "2023-01-15T10:30:00.000Z"
}
]
}
Real-time Monitoring
Monitor data sources in real-time to get immediate alerts and status updates.
Health Check Endpoint
- Nexla API
GET /data_sources/{source_id}/health
Health Status Response
- Nexla API
{
"status": "healthy",
"last_check": "2023-01-15T10:30:00.000Z",
"connection_status": "connected",
"credential_status": "valid",
"last_successful_ingestion": "2023-01-15T10:00:00.000Z",
"ingestion_lag": 1800,
"alerts": []
}
Performance Analytics
Analyze ingestion performance to optimize your data collection processes.
Performance Metrics
Track key performance indicators:
- Ingestion Rate: Records processed per second
- Throughput: Data volume processed per time unit
- Latency: Time from data availability to processing
- Efficiency: Resource utilization and cost per record
Performance Optimization
Use metrics to:
- Adjust Scheduling: Optimize ingestion frequency
- Scale Resources: Add or remove processing capacity
- Improve Configuration: Fine-tune source settings
- Monitor Costs: Track resource consumption
Alerting and Notifications
Set up automated alerts for monitoring critical issues:
Alert Types
- Ingestion Failures: Failed data collection attempts
- Performance Degradation: Slower than expected processing
- Data Quality Issues: High error rates or validation failures
- Resource Constraints: Memory, CPU, or storage limitations
Alert Configuration
Configure alerts through:
- Thresholds: Set performance and error rate limits
- Channels: Email, webhook, or integration notifications
- Escalation: Automatic escalation for critical issues
- Suppression: Temporarily disable alerts during maintenance
Data Quality Monitoring
Monitor the quality and consistency of ingested data:
Quality Metrics
- Completeness: Percentage of non-null values
- Accuracy: Data validation against business rules
- Consistency: Format and value uniformity
- Timeliness: Data freshness and update frequency
Quality Alerts
Set up alerts for:
- Data Drift: Unexpected changes in data structure
- Anomalies: Unusual patterns or values
- Missing Data: Gaps in expected data collection
- Validation Failures: Data that doesn't meet quality standards
Best Practices
To maximize the value of data source monitoring:
- Set Baselines: Establish normal performance ranges
- Monitor Trends: Track performance over time
- Set Alerts: Configure automated notifications
- Analyze Patterns: Identify recurring issues
- Optimize Continuously: Use insights to improve performance
- Document Issues: Keep records of problems and solutions
Error Handling
Common monitoring issues and solutions:
- Missing Metrics: Verify source is active and collecting data
- Inconsistent Data: Check for configuration changes or source issues
- Performance Degradation: Investigate resource constraints or bottlenecks
- Connection Failures: Verify credentials and network connectivity
- Data Quality Issues: Review validation rules and data sources
Related Operations
After monitoring your data sources, you may need to:
Update Configuration
PUT /data_sources/{source_id}
Test Connection
PUT /data_sources/{source_id}/test
Validate Configuration
POST /data_sources/{source_id}/config/validate
Activate or Pause
PUT /data_sources/{source_id}/activate
PUT /data_sources/{source_id}/pause