Skip to main content

Monitor a Data Source

Data source monitoring provides comprehensive visibility into the performance, health, and operational status of your data ingestion processes. These monitoring capabilities help you track data quality, identify issues, and optimize ingestion performance.

Lifetime Ingestion Metrics

Lifetime metrics provide a comprehensive view of all data ingested by a source since its creation. This gives you a complete picture of the source's historical performance and data volume.

Lifetime Metrics Endpoint

Lifetime Ingestion Metrics: Request
GET /data_sources/{source_id}/metrics

Example with curl:

curl https://api.nexla.io/data_sources/5001/metrics \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"

Response Structure

Lifetime Ingestion Metrics: Response
{
"status": 200,
"metrics": {
"records": 1250000,
"size": 2048576000,
"files_processed": 1500,
"errors": 25,
"last_ingestion": "2023-01-15T10:30:00.000Z"
}
}

Lifetime Metrics Benefits

These metrics help you understand:

  • Total Data Volume: Complete picture of data processed
  • Historical Performance: Long-term trends and patterns
  • Error Rates: Overall reliability and quality
  • Resource Utilization: Data processing efficiency

Aggregated Ingestion Metrics

Aggregated metrics provide time-based views of ingestion performance, allowing you to analyze trends, identify patterns, and monitor daily, weekly, or monthly performance.

Daily Aggregation Endpoint

Daily Ingestion Metrics: Request
GET /data_sources/{source_id}/metrics?aggregate=1

Optional Query Parameters:

?from=2023-01-01T00:00:00&to=2023-01-31T23:59:59&page=1&size=100

Response Structure

Daily Ingestion Metrics: Response
{
"status": 200,
"metrics": [
{
"time": "2023-01-15",
"records": 53054,
"size": 12476341,
"files_processed": 45,
"errors": 2
},
{
"time": "2023-01-16",
"records": 66618,
"size": 15829589,
"files_processed": 52,
"errors": 0
},
{
"time": "2023-01-17",
"records": 25832,
"size": 6645994,
"files_processed": 18,
"errors": 1
}
],
"pagination": {
"page": 1,
"size": 100,
"total": 31
}
}

Aggregation Granularity

You can adjust the time granularity of metrics:

  • Daily: ?aggregate=1 (default)
  • Hourly: ?aggregate=hour
  • Weekly: ?aggregate=week
  • Monthly: ?aggregate=month

Ingestion Run Metrics

Ingestion run metrics provide detailed information about individual ingestion cycles, giving you visibility into the performance of each data collection event.

Run Summary Endpoint

Ingestion Run Metrics: Request
GET /data_sources/{source_id}/metrics/run_summary

Optional Query Parameters:

?from=2023-01-15T00:00:00&to=2023-01-15T23:59:59&page=1&size=50

Response Structure

Ingestion Run Metrics: Response
{
"status": 200,
"metrics": {
"1673776800000": {
"run_id": "1673776800000",
"start_time": "2023-01-15T06:00:00.000Z",
"end_time": "2023-01-15T06:15:00.000Z",
"records": 1364,
"size": 971330,
"files_processed": 12,
"errors": 0,
"status": "completed"
},
"1673780400000": {
"run_id": "1673780400000",
"start_time": "2023-01-15T07:00:00.000Z",
"end_time": "2023-01-15T07:08:00.000Z",
"records": 330,
"size": 235029,
"files_processed": 3,
"errors": 0,
"status": "completed"
}
}
}

Run Metrics Analysis

Use run metrics to analyze:

  • Performance Patterns: Identify slow or fast ingestion cycles
  • Error Tracking: Monitor error rates per ingestion run
  • Resource Usage: Track processing time and efficiency
  • Scheduling Optimization: Optimize ingestion frequency

File Processing Metrics

For file-based sources, you can monitor detailed statistics about file processing, including success rates, failure reasons, and processing status.

File Statistics Endpoint

File Processing Metrics: Request
GET /data_sources/{source_id}/metrics/files_stats

Optional Query Parameters:

?from=2023-01-15T00:00:00&to=2023-01-15T23:59:59&status=all

Response Structure

File Processing Metrics: Response
{
"status": 200,
"file_stats": {
"total_files": 150,
"processed": 142,
"failed": 5,
"queued": 3,
"processing": 0,
"status_breakdown": {
"success": 142,
"validation_error": 3,
"format_error": 2,
"access_denied": 0
}
},
"recent_files": [
{
"file_path": "daily/customer_data_2023-01-15.csv",
"status": "processed",
"records": 1250,
"size": 256000,
"processing_time": 45,
"timestamp": "2023-01-15T10:30:00.000Z"
}
]
}

Real-time Monitoring

Monitor data sources in real-time to get immediate alerts and status updates.

Health Check Endpoint

Source Health Check: Request
GET /data_sources/{source_id}/health

Health Status Response

Source Health Status: Response
{
"status": "healthy",
"last_check": "2023-01-15T10:30:00.000Z",
"connection_status": "connected",
"credential_status": "valid",
"last_successful_ingestion": "2023-01-15T10:00:00.000Z",
"ingestion_lag": 1800,
"alerts": []
}

Performance Analytics

Analyze ingestion performance to optimize your data collection processes.

Performance Metrics

Track key performance indicators:

  • Ingestion Rate: Records processed per second
  • Throughput: Data volume processed per time unit
  • Latency: Time from data availability to processing
  • Efficiency: Resource utilization and cost per record

Performance Optimization

Use metrics to:

  • Adjust Scheduling: Optimize ingestion frequency
  • Scale Resources: Add or remove processing capacity
  • Improve Configuration: Fine-tune source settings
  • Monitor Costs: Track resource consumption

Alerting and Notifications

Set up automated alerts for monitoring critical issues:

Alert Types

  • Ingestion Failures: Failed data collection attempts
  • Performance Degradation: Slower than expected processing
  • Data Quality Issues: High error rates or validation failures
  • Resource Constraints: Memory, CPU, or storage limitations

Alert Configuration

Configure alerts through:

  • Thresholds: Set performance and error rate limits
  • Channels: Email, webhook, or integration notifications
  • Escalation: Automatic escalation for critical issues
  • Suppression: Temporarily disable alerts during maintenance

Data Quality Monitoring

Monitor the quality and consistency of ingested data:

Quality Metrics

  • Completeness: Percentage of non-null values
  • Accuracy: Data validation against business rules
  • Consistency: Format and value uniformity
  • Timeliness: Data freshness and update frequency

Quality Alerts

Set up alerts for:

  • Data Drift: Unexpected changes in data structure
  • Anomalies: Unusual patterns or values
  • Missing Data: Gaps in expected data collection
  • Validation Failures: Data that doesn't meet quality standards

Best Practices

To maximize the value of data source monitoring:

  1. Set Baselines: Establish normal performance ranges
  2. Monitor Trends: Track performance over time
  3. Set Alerts: Configure automated notifications
  4. Analyze Patterns: Identify recurring issues
  5. Optimize Continuously: Use insights to improve performance
  6. Document Issues: Keep records of problems and solutions

Error Handling

Common monitoring issues and solutions:

  • Missing Metrics: Verify source is active and collecting data
  • Inconsistent Data: Check for configuration changes or source issues
  • Performance Degradation: Investigate resource constraints or bottlenecks
  • Connection Failures: Verify credentials and network connectivity
  • Data Quality Issues: Review validation rules and data sources

After monitoring your data sources, you may need to:

Update Configuration

PUT /data_sources/{source_id}

Test Connection

PUT /data_sources/{source_id}/test

Validate Configuration

POST /data_sources/{source_id}/config/validate

Activate or Pause

PUT /data_sources/{source_id}/activate
PUT /data_sources/{source_id}/pause