Skip to main content

Elasticsearch Data Source

The Elasticsearch connector enables you to ingest data from your Elasticsearch indices, allowing you to retrieve documents, perform searches, and extract data from your Elasticsearch cluster. This connector is particularly useful for applications that need to extract indexed data, sync Elasticsearch data to other systems, or analyze search results. Follow the instructions below to create a new data flow that ingests data from an Elasticsearch source in Nexla.
elasticsearch_api.png

Elasticsearch

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the Elasticsearch connector tile from the list of available connectors. Then, select the credential that will be used to connect to the Elasticsearch instance, and click Next; or, create a new Elasticsearch credential for use in this flow.

  3. In Nexla, Elasticsearch data sources can be created using pre-built endpoint templates, which expedite source setup for common Elasticsearch endpoints. Each template is designed specifically for the corresponding Elasticsearch endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    Elasticsearch sources can also be configured manually, allowing you to ingest data from Elasticsearch endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Elasticsearch endpoints. Each template is designed specifically for the corresponding Elasticsearch endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

    Search Documents

    This endpoint performs a search query on an Elasticsearch index to retrieve matching documents. Use this endpoint when you need to search for documents, filter results, or extract data based on specific criteria.

    • Enter the search query in the Query field. This should be a valid Elasticsearch query in JSON format. You can use match queries, term queries, range queries, and other Elasticsearch query types. Leave empty to retrieve all documents (up to API limits).
    • Optionally, enter the maximum number of results to return in the Size field. The default is 10. This controls how many documents are returned per request.
    • Optionally, enter the starting offset for pagination in the From field. The default is 0. This allows you to paginate through large result sets.
    • Enter a schedule in the Schedule field to specify when this data source should run. The schedule uses cron expression format (e.g., 0 6 * * * for daily at 6 AM).
    • The endpoint uses the GET method to /your-index/_search with the query in the request body or as query parameters.

    Elasticsearch search queries support a wide range of query types including match, term, range, bool, and more. The query syntax follows Elasticsearch's Query DSL. For complete information about Elasticsearch search queries, see the Elasticsearch Search API Documentation.

    Get Document by ID

    This endpoint retrieves a specific document from an Elasticsearch index by its document ID. Use this endpoint when you need to retrieve a specific document, fetch document metadata, or get a single document by its unique identifier.

    • Enter the Document ID in the Document ID field. This should be the exact ID of the document you want to retrieve from the Elasticsearch index.
    • Enter a schedule in the Schedule field to specify when this data source should run. The schedule uses cron expression format.
    • The endpoint uses the GET method to /your-index/_doc/{document_id} to retrieve the specified document.

    Document IDs in Elasticsearch are unique identifiers for each document in an index. You can find document IDs from search results or when indexing documents. For complete information about getting documents by ID, see the Elasticsearch Get API Documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Elasticsearch data sources can be manually configured to ingest data from any valid Elasticsearch API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Elasticsearch sources, such as sources that use chained API calls to fetch data from multiple endpoints or sources that require custom authentication headers or request parameters.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the Elasticsearch API from the Method pulldown menu. The most common methods are:

    • GET: For retrieving data from the API (most common for Elasticsearch data sources)
    • POST: For search queries with complex request bodies

API Endpoint URL

  1. Enter the URL of the Elasticsearch API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including the protocol (https://) and any required path parameters.

Elasticsearch API URLs typically follow the format: https://your-cluster.es.amazonaws.com/your-index/_search for search operations, or https://your-cluster.es.amazonaws.com/your-index/_doc/{document_id} for getting specific documents. Replace your-cluster.es.amazonaws.com with your actual Elasticsearch cluster URL and your-index with your index name. Ensure the API endpoint URL is correct and accessible with your current credentials. You can test the endpoint using the Test button after configuring the URL. For complete information about Elasticsearch API endpoints, see the Elasticsearch API Documentation.

Request Headers

  1. If Nexla should include any additional request headers in API calls to this source, enter the headers & corresponding values as comma-separated pairs in the Request Headers field (e.g., header1:value1,header2:value2).

You do not need to include authentication headers (basic authentication or API key headers) as these are automatically included from your credentials. However, you may need to include additional headers for specific Elasticsearch API features. The Content-Type header should be set to application/json for most Elasticsearch API requests.

Response Data Path

  1. Enter the JSON path expression that identifies the location of the data array in the API response in the Response Data Path field. This path tells Nexla where to find the array of records in the JSON response.

For Elasticsearch search responses, the data path is typically $.hits.hits[*] to extract individual documents from the hits array. For single document responses, use $ to extract the entire document object. The _source field contains the actual document data. JSON path expressions use dot notation and array indexing to navigate the response structure. For complete information about Elasticsearch API response formats, see the Elasticsearch API Documentation.

Schedule

  1. Enter a schedule in the Schedule field to specify when this data source should run. The schedule uses cron expression format to define the frequency and timing of data ingestion.

Common cron expressions include: 0 6 * * * for daily at 6 AM, 0 */6 * * * for every 6 hours, and 0 0 * * 0 for weekly on Sunday at midnight. For more information about cron expressions, see the Nexla documentation on scheduling.