Skip to main content

Data Sources

Data sources are the foundation of data ingestion in the Nexla platform. They define where data comes from, how it should be accessed, and the configuration needed to extract data efficiently. Each data source represents a connection to an external system, database, or service that contains data you want to process.

Core Concepts

Data sources in Nexla provide a unified interface for accessing data from various systems and formats. They handle the complexity of authentication, connection management, and data extraction, allowing you to focus on data processing rather than infrastructure concerns.

Source Types

Nexla supports a wide range of data source types, each optimized for specific data systems:

  • Database Sources: MySQL, PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, and more
  • Cloud Storage: AWS S3, Google Cloud Storage, Azure Blob Storage, Box, Dropbox
  • Streaming Platforms: Kafka, Confluent Kafka, Google Pub/Sub, Azure Event Hubs
  • APIs: REST APIs, SOAP services, custom endpoints
  • File Systems: FTP, SFTP, WebDAV, local file uploads
  • SaaS Applications: Salesforce, HubSpot, Marketo, and other business applications

Key Components

Every data source consists of several essential components:

  1. Authentication: Data credentials that securely store connection information
  2. Configuration: Source-specific settings that define how to access and extract data
  3. Scheduling: Ingestion schedules for automated data collection
  4. Monitoring: Health checks and performance metrics
  5. Data Sets: Automatically detected schemas and data structures

Data Source Lifecycle

Data sources follow a defined lifecycle that ensures reliable data ingestion:

Creation and Configuration

When you create a data source, you specify:

  • Source Type: The connector type (e.g., s3, mysql, rest)
  • Data Credentials: Authentication and connection details
  • Source Configuration: Connector-specific settings and parameters
  • Ingestion Schedule: When and how often to collect data

Activation and Monitoring

Once configured, data sources can be:

  • Activated: Started to begin data ingestion
  • Paused: Temporarily stopped while maintaining configuration
  • Monitored: Tracked for performance and health metrics
  • Updated: Modified to change configuration or credentials

Data Processing

Active data sources:

  • Extract Data: Pull data from the source system
  • Detect Schemas: Automatically identify data structures
  • Create Data Sets: Generate organized data collections
  • Trigger Flows: Initiate data processing pipelines

Integration with Data Flows

Data sources are the starting point for data flows. They provide the raw data that flows through your processing pipeline:

  • Origin Nodes: Data sources serve as the origin nodes in flow structures
  • Automatic Detection: Schemas are automatically detected and data sets are created
  • Flow Management: Sources can be activated, paused, and managed as part of larger flows
  • Resource Association: Sources are linked to data sets, credentials, and other flow resources

Best Practices

To ensure optimal performance and reliability:

  1. Use Appropriate Credentials: Store sensitive connection information securely
  2. Configure Efficient Schedules: Balance data freshness with system resources
  3. Monitor Performance: Track ingestion rates and error patterns
  4. Plan for Scale: Consider data volume growth and processing requirements
  5. Test Configurations: Validate source settings before production use

API Endpoints

The data sources API provides comprehensive endpoints for managing your data sources:

  • List Sources: GET /data_sources - Retrieve all accessible sources
  • Create Source: POST /data_sources - Set up new data sources
  • Get Source: GET /data_sources/{id} - Retrieve specific source details
  • Update Source: PUT /data_sources/{id} - Modify source configuration
  • Activate Source: PUT /data_sources/{id}/activate - Start data ingestion
  • Pause Source: PUT /data_sources/{id}/pause - Stop data ingestion
  • Copy Source: POST /data_sources/{id}/copy - Duplicate existing sources

For detailed information about specific operations, see the individual documentation pages for each data source management task.