Skip to main content

Dremio Data Source

The Dremio connector enables you to ingest catalog metadata, dataset lineage, Reflection definitions, and SQL job results from your Dremio lakehouse into Nexla. Follow the instructions below to create a new data flow that ingests data from a Dremio source in Nexla.
dremio_api.png

Dremio

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the Dremio connector tile from the list of available connectors. Then, select the credential that will be used to connect to the Dremio instance, and click Next; or, create a new Dremio credential for use in this flow.

  3. In Nexla, Dremio data sources can be created using pre-built endpoint templates, which expedite source setup for common Dremio API endpoints. Each template is designed specifically for the corresponding Dremio API endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    Dremio sources can also be configured manually, allowing you to ingest data from Dremio API endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Dremio API endpoints. Each template is designed specifically for the corresponding Dremio endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

List Catalog Root

Returns the list of top-level catalog containers in the Dremio instance, including all registered sources, spaces, and the home space. Use this endpoint to discover what data assets are available at the root of the Dremio catalog—useful for building inventory reports, auditing registered data sources, or navigating the catalog structure programmatically.

  • No additional configuration parameters are required for this endpoint. Select it from the Endpoint menu and proceed directly to endpoint testing.
  • Each record returned by this endpoint represents a top-level catalog container and includes its entity type (SOURCE, SPACE, HOME), display name, and opaque catalog ID. The catalog ID can be used in subsequent calls to the Get Catalog Entity by ID or Get Catalog Entity Lineage endpoints.

Additional reference information is available in the Dremio Catalog API documentation.

Get Catalog Entity by ID

Retrieves a specific catalog entity—such as a source, space, folder, physical dataset, or virtual dataset (view)—using its opaque catalog ID. Use this endpoint when you have an entity's ID and need to fetch its full metadata, including schema information, access controls, and configuration details.

  • Enter the opaque catalog ID of the entity you want to retrieve in the Catalog Entity ID field. This ID is a system-generated string identifier unique to each catalog object in Dremio.

    • Catalog entity IDs can be found in the response from the List Catalog Root or Get Catalog Entity by Path endpoints, or by inspecting the URL when viewing an entity in the Dremio console.
    • Example ID format: 3a4b5c6d-7e8f-9012-abcd-ef1234567890

Additional reference information is available in the Dremio Catalog API documentation.

Get Catalog Entity by Path

Looks up a catalog entity using its human-readable dot-separated path rather than its opaque ID. This is useful when you know the logical path to a dataset or space in the Dremio catalog (e.g., the path as it appears in the Dremio SQL editor) but do not have the entity's system-generated ID.

  • Enter the dot-separated catalog path of the entity in the Catalog Path field. The path mirrors the hierarchy you see in the Dremio UI.

    • For a top-level source called MyS3Source, enter: MyS3Source
    • For a folder called SalesData inside a space called Analytics, enter: Analytics.SalesData
    • For a virtual dataset (view) called MonthlyRevenue inside that folder, enter: Analytics.SalesData.MonthlyRevenue

Additional reference information is available in the Dremio Catalog API documentation.

Get Catalog Entity Lineage

Retrieves the upstream and downstream data lineage for a specific dataset or virtual dataset (view) in the Dremio catalog. Use this endpoint to understand data provenance, trace dependencies between views, or audit which source datasets feed into downstream analytics views.

  • Enter the opaque catalog ID of the dataset whose lineage you want to retrieve in the Catalog Entity ID field.

    • This must be the ID of a physical dataset or virtual dataset (view). Lineage is not available for spaces, folders, or sources directly.
    • You can obtain the catalog ID by first querying the List Catalog Root or Get Catalog Entity by Path endpoint.
  • The response includes lists of upstream parents (datasets that feed into this entity) and downstream children (datasets or views that depend on this entity).

Additional reference information is available in the Dremio Catalog API documentation.

List Reflections

Returns all Reflections configured in the Dremio instance. Reflections are Dremio's query acceleration mechanism—they are pre-computed, Apache Arrow-formatted materializations of datasets or aggregations that dramatically speed up query execution. Use this endpoint to inventory acceleration configurations, monitor Reflection health, or audit which datasets have active materializations.

  • No additional configuration parameters are required for this endpoint. Select it from the Endpoint menu and proceed directly to endpoint testing.
  • The response includes all raw Reflections (full dataset materializations) and aggregation Reflections (pre-computed aggregates) defined across the Dremio instance. Each Reflection record includes its type, status, associated dataset path, and configuration details such as display fields, partition fields, and sort fields.

Additional reference information is available in the Dremio Reflections API documentation.

Get Job Status

Polls the current status of a previously submitted SQL job. Dremio processes SQL queries asynchronously—when a query is submitted via the SQL API, it returns a job ID that can be used to check the job's progress. Use this endpoint to monitor whether a job is RUNNING, COMPLETED, FAILED, or CANCELED before attempting to retrieve its results.

  • Enter the job ID of the SQL job you want to check in the Job ID field. Job IDs are returned by the Submit SQL Query destination endpoint when a query is submitted to Dremio.
  • The response includes the job state, submission time, start time, end time (if completed), and error messages (if the job failed).

Additional reference information is available in the Dremio Job API documentation.

Get Job Results

Retrieves the paginated result rows for a completed SQL job. After a job submitted via the Submit SQL Query endpoint reaches COMPLETED status, use this endpoint to fetch the actual query result data. Nexla automatically handles pagination using Dremio's offset-based paging (100 rows per page by default) to retrieve all available result rows.

  • Enter the job ID of the completed SQL job in the Job ID field. This must be the ID of a job that has reached COMPLETED status; jobs that are still RUNNING or have FAILED will not return result rows.
  • Nexla will automatically paginate through the results using the offset and limit query parameters, fetching pages of 100 rows until all results have been retrieved.
  • The result rows are returned in the rows array of each paginated response. Each row is a JSON object with field names matching the columns of the executed SQL query.

Verify that your SQL job has COMPLETED before selecting this endpoint. Use the Get Job Status endpoint to check job state. Additional reference information is available in the Dremio Job API documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Dremio data sources can be manually configured to ingest data from any valid Dremio API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Dremio sources, such as sources that chain multiple API calls together, apply custom request headers, or use Nexla macros to dynamically parameterize API calls based on date ranges or values from other Nexla data sources.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the Dremio API from the Method pulldown menu. The most common methods for Dremio are:

    • GET: For retrieving catalog entities, Reflections, job status, and job results
    • POST: For submitting SQL queries or creating catalog entities and Reflections

API Endpoint URL

  1. Enter the URL of the Dremio API endpoint from which this source will fetch data in the Set API URL field. All Dremio REST API v3 endpoints follow the pattern {base_url}/api/v3/{resource_path}.

    Common endpoint URL patterns include:

    • Catalog root: {base_url}/api/v3/catalog
    • Catalog entity by ID: {base_url}/api/v3/catalog/{id}
    • Catalog entity by path: {base_url}/api/v3/catalog/by-path/{path}
    • Reflections list: {base_url}/api/v3/reflection
    • Job status: {base_url}/api/v3/job/{job-id}
    • Job results: {base_url}/api/v3/job/{job-id}/results

The base URL for your Dremio instance is configured in your Dremio credential. You do not need to re-enter credentials or authentication headers here—Nexla automatically includes the Authorization: Bearer {token} header from your saved credential in all API calls.

Date/Time Macros (API URL)

Optional

Optionally, the API URL can be customized using macros—all macros added to the API URL will be converted into values when Nexla executes the API call. Macros are dynamic placeholders that allow you to create flexible API endpoints that can adapt to different time periods or data requirements.

Date/time macros are particularly useful when querying Dremio job results for jobs submitted within a certain time window, or when constructing Dremio API paths that reference time-partitioned datasets.

  1. To add a macro, type { at the appropriate position in the API URL (within the Set API URL field), and select the desired macro from the dropdown list.

    • {now} – The current datetime
    • {now-1} – The datetime one time unit before the current datetime
    • {now+1} – The datetime one time unit after the current datetime
    • custom – Datetime macros can reference any number of time units before or after the current datetime—for example, enter (now-4) to indicate the datetime four time units before the current datetime
  2. Select the format that will be applied to datetime macros from the Date Format for Date/Time Macro pulldown menu. This format will be applied to the base datetime value of the macro—i.e., the value of {now} in {now-1}.

  3. Select the datetime unit that will be used to perform mathematical operations in the included macro(s) from the Time Unit for Operations pulldown menu—for example, for the macro {now-1}, when Day is selected, {now-1} will be converted to the datetime one day before the current datetime.

Lookup-Based Macros (API URL)

Optional

Column values from existing lookups can also be included as macros in the API URL. Lookup-based macros allow you to reference data from previously configured data sources or lookups, enabling dynamic API endpoints that adapt based on existing data.

Lookup-based macros are useful when building Dremio sources that retrieve job results for job IDs collected by another Nexla source, or when constructing catalog paths that reference entity IDs returned by an earlier Nexla flow step.

  1. To include a lookup column value macro, select the relevant lookup from the Add Lookups to Supported Macros pulldown menu.

  2. Type { at the appropriate position in the API URL, and select the lookup column-based macro from the dropdown list. Lookup-based macros are automatically populated into the macro list when a lookup is selected in the Add Lookups to Supported Macros pulldown menu.

Path to Data

Optional

If only a subset of the data returned by the Dremio API endpoint is needed, you can designate the portion of the response that should be included in the Nexset(s) produced from this source by specifying the path to the relevant data within the response.

For example, the Dremio Catalog root endpoint returns an object with a top-level data array containing catalog entities. By entering the JSON path $.data[*], you instruct Nexla to treat each element in that array as a separate record, rather than treating the entire response as a single record.

Path to Data is essential when Dremio API responses have nested structures. The pre-built templates already configure the correct data paths for each supported endpoint. When configuring manually, refer to the Dremio API documentation to understand the response structure for your chosen endpoint.

  • To specify which data should be treated as relevant in responses from this source, enter the path to the relevant data in the Set Path to Data in Response field.

    • For responses in JSON format, enter the JSON path that points to the object or array that should be treated as relevant data. JSON paths use dot notation (e.g., $.data[*] to access an array of items within a data object).

    • For responses in XML format, enter the XPath that points to the object/array containing relevant data.

    Path to Data Examples for Dremio:

    For the Catalog root endpoint (/api/v3/catalog), enter $.data[*] to extract each catalog container as a separate record.
    For the Job Results endpoint (/api/v3/job/{'{id}'}/results), enter $.rows[*] to extract each result row as a separate record.

Autogenerate Path Suggestions

Nexla can also autogenerate data path suggestions based on the response from the API endpoint. These suggested paths can be used as-is or modified to exactly suit your needs.

  • To use this feature, click the Test button next to the Set API URL field to fetch a sample response from the API endpoint. Suggested data paths generated based on the content & format of the response will be displayed in the Suggestions box below the Set Path to Data in Response field.

  • Click on a suggestion to automatically populate the Set Path to Data in Response field with the corresponding path. The populated path can be modified directly within the field if further customization is needed.

    PathSuggestions.png

Metadata

If metadata is included in the Dremio API response but is located outside of the defined path to relevant data, you can configure Nexla to include this data as common metadata in each record.

For example, when fetching job results from Dremio, the response includes a rowCount field at the top level, alongside the rows array. If you have set the data path to $.rows[*], you can additionally specify a metadata path to capture top-level fields such as rowCount and include them with each result row record.

Metadata paths are particularly useful for preserving Dremio API response context such as total result counts, job execution metadata, or schema information that applies to all records in the response.

  • To specify the location of metadata that should be included with each record, enter the path to the relevant metadata in the Path to Metadata in Response field.

    • For responses in JSON format, enter the JSON path to the object or array that contains the metadata.

Request Headers

Optional
  • If Nexla should include any additional request headers in API calls to this source, enter the headers & corresponding values as comma-separated pairs in the Request Headers field (e.g., header1:value1,header2:value2). Additional headers may be required for Dremio deployments that use reverse proxies, API gateways, or custom authentication middleware.

    You do not need to include the Authorization header here—it is automatically added by Nexla using the Personal Access Token from your saved Dremio credential.

Endpoint Testing

After configuring all settings for the selected endpoint, Nexla can retrieve a sample of the data that will be fetched according to the current configuration. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Save & Activate the Source

  1. Once all of the relevant steps in the above sections have been completed, click the Create button in the upper right corner of the screen to save and create the new Dremio data source. Nexla will now begin ingesting data from the configured endpoint and will organize any data that it finds into one or more Nexsets.