Data Sources

Data sources are the foundation of data ingestion in the Nexla platform. They define where data comes from, how it should be accessed, and the configuration needed to extract data efficiently. Each data source represents a connection to an external system, database, or service that contains data you want to process.

Core Concepts

Data sources in Nexla provide a unified interface for accessing data from various systems and formats. They handle the complexity of authentication, connection management, and data extraction, allowing you to focus on data processing rather than infrastructure concerns.

Source Types

Nexla supports a wide range of data source types, each optimized for specific data systems:

Database Sources: MySQL, PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, and more
Cloud Storage: AWS S3, Google Cloud Storage, Azure Blob Storage, Box, Dropbox
Streaming Platforms: Kafka, Confluent Kafka, Google Pub/Sub, Azure Event Hubs
APIs: REST APIs, SOAP services, custom endpoints
File Systems: FTP, SFTP, WebDAV, local file uploads
SaaS Applications: Salesforce, HubSpot, Marketo, and other business applications

Key Components

Every data source consists of several essential components:

Authentication: Data credentials that securely store connection information
Configuration: Source-specific settings that define how to access and extract data
Scheduling: Ingestion schedules for automated data collection
Monitoring: Health checks and performance metrics
Data Sets: Automatically detected schemas and data structures

Data Source Lifecycle

Data sources follow a defined lifecycle that ensures reliable data ingestion:

Creation and Configuration

When you create a data source, you specify:

Source Type: The connector type (e.g., s3, mysql, rest)
Data Credentials: Authentication and connection details
Source Configuration: Connector-specific settings and parameters
Ingestion Schedule: When and how often to collect data

Activation and Monitoring

Once configured, data sources can be:

Activated: Started to begin data ingestion
Paused: Temporarily stopped while maintaining configuration
Monitored: Tracked for performance and health metrics
Updated: Modified to change configuration or credentials

Data Processing

Active data sources:

Extract Data: Pull data from the source system
Detect Schemas: Automatically identify data structures
Create Data Sets: Generate organized data collections
Trigger Flows: Initiate data processing pipelines

Integration with Data Flows

Data sources are the starting point for data flows. They provide the raw data that flows through your processing pipeline:

Origin Nodes: Data sources serve as the origin nodes in flow structures
Automatic Detection: Schemas are automatically detected and data sets are created
Flow Management: Sources can be activated, paused, and managed as part of larger flows
Resource Association: Sources are linked to data sets, credentials, and other flow resources

Best Practices

To ensure optimal performance and reliability:

Use Appropriate Credentials: Store sensitive connection information securely
Configure Efficient Schedules: Balance data freshness with system resources
Monitor Performance: Track ingestion rates and error patterns
Plan for Scale: Consider data volume growth and processing requirements
Test Configurations: Validate source settings before production use

API Endpoints

The data sources API provides comprehensive endpoints for managing your data sources:

List Sources: GET /data_sources - Retrieve all accessible sources
Create Source: POST /data_sources - Set up new data sources
Get Source: GET /data_sources/{id} - Retrieve specific source details
Update Source: PUT /data_sources/{id} - Modify source configuration
Activate Source: PUT /data_sources/{id}/activate - Start data ingestion
Pause Source: PUT /data_sources/{id}/pause - Stop data ingestion
Copy Source: POST /data_sources/{id}/copy - Duplicate existing sources

For detailed information about specific operations, see the individual documentation pages for each data source management task.

Core Concepts​

Source Types​

Key Components​

Data Source Lifecycle​

Creation and Configuration​

Activation and Monitoring​

Data Processing​

Integration with Data Flows​

Best Practices​

API Endpoints​