Skip to main content

List/View Data Sources

Listing and viewing data sources in Nexla allows you to understand your data ingestion architecture, monitor source status, and manage data pipelines. The system provides comprehensive APIs for viewing source information, configuration, and operational details.

List All Data Sources

The primary endpoint for listing data sources is the /data_sources endpoint, which provides comprehensive information about all sources accessible to your account.

API Endpoint

To retrieve all accessible data sources:

GET /data_sources
List All Data Sources: Request
GET /data_sources

Response Structure

The response includes comprehensive source information including configuration, status, and metadata.

List All Data Sources: Response
[
{
"id": 1001,
"name": "Customer Data Source",
"description": "Customer data ingestion from S3",
"connector_type": "s3",
"status": "ACTIVE",
"owner": {
"id": 42,
"full_name": "John Smith"
},
"org": {
"id": 101,
"name": "Acme Corporation"
},
"access_roles": ["owner"],
"data_credentials": {
"id": 5001,
"name": "S3 Credentials",
"credentials_type": "s3"
},
"source_config": {
"path": "customer-data-bucket/raw",
"file_pattern": "*.csv",
"recursive": true
},
"flow_type": "streaming",
"created_at": "2023-01-15T10:30:00.000Z",
"updated_at": "2023-01-15T15:45:00.000Z"
},
{
"id": 1002,
"name": "Sales Data Source",
"description": "Sales data from PostgreSQL database",
"connector_type": "postgres",
"status": "PAUSED",
"owner": {
"id": 42,
"full_name": "John Smith"
},
"org": {
"id": 101,
"name": "Acme Corporation"
},
"access_roles": ["owner"],
"data_credentials": {
"id": 5002,
"name": "PostgreSQL Credentials",
"credentials_type": "postgres"
},
"source_config": {
"host": "db.example.com",
"port": 5432,
"database": "sales_db",
"table": "transactions"
},
"flow_type": "in_memory",
"created_at": "2023-01-14T14:20:00.000Z",
"updated_at": "2023-01-15T12:15:00.000Z"
}
]

Show Source by ID

To retrieve a specific data source by its identifier, use the source ID endpoint.

Source by ID Endpoint

GET /data_sources/{source_id}
Show Source by ID: Request
GET /data_sources/1001

Source by ID Response

The response provides detailed information about the specific source, including all configuration and status details.

Show Source by ID: Response
{
"id": 1001,
"name": "Customer Data Source",
"description": "Customer data ingestion from S3",
"connector_type": "s3",
"status": "ACTIVE",
"owner": {
"id": 42,
"full_name": "John Smith"
},
"org": {
"id": 101,
"name": "Acme Corporation"
},
"access_roles": ["owner"],
"data_credentials": {
"id": 5001,
"name": "S3 Credentials",
"credentials_type": "s3"
},
"source_config": {
"path": "customer-data-bucket/raw",
"file_pattern": "*.csv",
"recursive": true,
"batch_size": 1000,
"polling_interval": "5m"
},
"flow_type": "streaming",
"data_set": {
"id": 3001,
"name": "Customer Analytics Dataset"
},
"created_at": "2023-01-15T10:30:00.000Z",
"updated_at": "2023-01-15T15:45:00.000Z"
}

Expand Parameter

Use the expand parameter to include additional related information in the response.

Expand Options

You can expand various related resources:

GET /data_sources?expand=data_credentials
GET /data_sources?expand=data_set
GET /data_sources?expand=flow
GET /data_sources?expand=all
Expand Credentials: Request
GET /data_sources?expand=data_credentials

Expanded Response

The expanded response includes detailed information about related resources.

Expanded Response: Example
[
{
"id": 1001,
"name": "Customer Data Source",
"connector_type": "s3",
"status": "ACTIVE",
"data_credentials": {
"id": 5001,
"name": "S3 Credentials",
"credentials_type": "s3",
"verified_status": "200 Ok",
"created_at": "2023-01-10T09:00:00.000Z"
},
"data_set": {
"id": 3001,
"name": "Customer Analytics Dataset",
"status": "ACTIVE",
"record_count": 1500000
},
"flow": {
"id": 2001,
"status": "ACTIVE",
"flow_type": "streaming"
}
}
]

Source Status and Monitoring

Data sources have various statuses that indicate their operational state.

Status Types

  • ACTIVE: Source is actively ingesting data
  • PAUSED: Source is temporarily stopped
  • ERROR: Source encountered an error and stopped
  • INACTIVE: Source is not currently processing

Status Monitoring

Monitor source status through the API:

GET /data_sources/{source_id}/status
GET /data_sources/{source_id}/metrics

Source Configuration

Each data source has specific configuration based on its connector type.

File System Configuration

{
"path": "bucket-name/folder",
"file_pattern": "*.csv",
"recursive": true,
"batch_size": 1000
}

Database Configuration

{
"host": "db.example.com",
"port": 5432,
"database": "analytics_db",
"table": "customer_data"
}

API Configuration

{
"base_url": "https://api.example.com",
"endpoint": "/customers",
"auth_type": "bearer",
"polling_interval": "1h"
}

Integration with Data Flows

Data sources are integral components of Nexla's data flow architecture.

Flow Relationships

  • Origin Node: Sources serve as the starting point for data flows
  • Data Processing: Sources feed data into Nexsets for transformation
  • Flow Control: Source status affects entire flow operation

Flow Management

Manage flows through source endpoints:

GET /data_sources/{source_id}/flow
PUT /data_sources/{source_id}/flow/activate
PUT /data_sources/{source_id}/flow/pause

Best Practices

To effectively list and view data sources:

  1. Use Pagination: Implement pagination for large source collections
  2. Filter by Status: Focus on active or problematic sources
  3. Monitor Performance: Track source metrics and performance
  4. Organize by Type: Group sources by connector type for management
  5. Regular Review: Periodically review source configurations and access

Error Handling

Common source listing issues and solutions:

  • Permission Denied: Ensure you have appropriate access rights
  • Invalid Source ID: Verify the source ID exists and is accessible
  • Organization Issues: Check organization membership and access
  • Resource Not Found: Confirm the requested source exists

After viewing sources, you may need to:

Control Source Status

PUT /data_sources/{source_id}/activate
PUT /data_sources/{source_id}/pause

Update Source Configuration

PUT /data_sources/{source_id}

Monitor Source Performance

GET /data_sources/{source_id}/metrics
GET /data_sources/{source_id}/logs

Manage Source Access

GET /data_sources/{source_id}/access
PUT /data_sources/{source_id}/access