Skip to main content

Data Source

Follow the instructions below to create a new data flow that ingests data from a Collibra source in Nexla.
collibra_api.png

Collibra

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the Collibra connector tile from the list of available connectors. Then, select the credential that will be used to connect to the Collibra instance, and click Next; or, create a new Collibra credential for use in this flow.

  3. In Nexla, Collibra data sources can be created using pre-built endpoint templates, which expedite source setup for common Collibra endpoints. Each template is designed specifically for the corresponding Collibra endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    Collibra sources can also be configured manually, allowing you to ingest data from Collibra endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Collibra endpoints. Each template is designed specifically for the corresponding Collibra endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

    List Search Views

    This endpoint lists all search views in the account. Use this endpoint when you need to access search view information, search configurations, or available search views in your Collibra account.

    • This endpoint automatically retrieves all search views from your Collibra account. No additional configuration is required beyond selecting this endpoint template.
    • The endpoint uses GET requests to https://{environment_url}/rest/2.0/search/views where {environment_url} is your Collibra environment URL from the credential configuration. The endpoint URL is automatically constructed based on your credential's environment URL configuration.
    • The endpoint does not use pagination and returns all search views in a single request.
    • The endpoint will return all search views in your account. The response data is extracted from the results array in the API response ($.results[*]), with each search view record processed individually.

    The endpoint uses a static URL (iteration.type: static.url) and does not require pagination. The response data path is $.results[*], which extracts all items from the results array in the API response. For detailed information about listing search views, see the Collibra API documentation.

    Search

    This endpoint performs a search and returns a list of resources which meet the search criteria defined in the request body. Use this endpoint when you need to search for specific resources, assets, or data elements in your Collibra account based on custom search criteria.

    • Enter the search criteria in the Search criteria field. This should be a JSON payload that defines the search parameters, filters, and criteria for the search operation. The search criteria should follow the Collibra API specification for search requests.

    • Select the API version number from the Version Number dropdown menu. Available options are:

      • 2.0: Collibra API v2.0 (default, recommended)
      • 1.0: Collibra API v1.0 (legacy)

      You can also add custom version numbers if needed. The default value is 2.0 if not specified.

    • The endpoint uses POST requests to https://{environment_url}/rest/{version_number}/search where {environment_url} is your Collibra environment URL and {version_number} is the selected API version. The endpoint URL is automatically constructed based on your credential's environment URL configuration and the selected version number.
    • The endpoint uses offset-based pagination, automatically fetching additional pages as needed using the offset and limit query parameters. The endpoint starts from offset 0 and continues fetching pages until all available results have been retrieved. By default, the endpoint retrieves up to 20 items per page.
    • The endpoint will return all resources matching the search criteria. The response data is extracted from the results array in the API response ($.results[*]), with each resource record processed individually.

    The search criteria must be properly formatted JSON that matches the Collibra API specification for search requests. This endpoint supports pagination through offset and limit query parameters. Nexla automatically handles pagination to retrieve all available records by incrementing the offset until no more data is returned. The endpoint uses offset-based pagination (iteration.type: paging.incrementing.offset), starting from offset 0. By default, the endpoint retrieves up to 20 items per page (page.expected.rows: 20). The response data path is $.results[*], which extracts all items from the results array in the API response. For detailed information about performing searches, see the Collibra API documentation.

    Read Sample Data

    This endpoint reads the available sample data from the Collibra cloud repository or Edge cache depending on how the data is collected. Use this endpoint when you need to access sample data, data previews, or data samples for specific assets in your Collibra catalog.

    • Enter the asset ID for which you want to retrieve sample data in the Asset ID field. This is the unique identifier of the asset in your Collibra catalog for which you want to read sample data.

    • The endpoint uses GET requests to https://{environment_url}/rest/catalogSampling/v1/samples/{asset_id} where {environment_url} is your Collibra environment URL and {asset_id} is the asset ID you provide. The endpoint URL is automatically constructed based on your credential's environment URL configuration and the provided asset ID.
    • The endpoint does not use pagination and returns the complete sample data in a single request.
    • The endpoint will return sample data for the specified asset. The response data is extracted from the root-level object in the API response ($), and Nexla will process the entire response structure.

    Asset IDs can be obtained from other Collibra API endpoints, such as the Search endpoint, which returns assets with their corresponding IDs. The endpoint uses a static URL (iteration.type: static.url) and does not require pagination. The response data path is $, which extracts the entire root-level object from the API response. For detailed information about reading sample data, see the Collibra API documentation.

    List Catalog Database Details

    This endpoint lists catalog database details from the API endpoints. Use this endpoint when you need to access database metadata, database configurations, or database details from your Collibra catalog.

    • Enter the resource type you want to retrieve in the Resource Type field. This should be the type of catalog database resource you want to list (e.g., databases, schemas, tables). The resource type determines which catalog database details will be retrieved.

    • The endpoint uses GET requests to https://{environment_url}/rest/catalogDatabase/v1/{resource_type} where {environment_url} is your Collibra environment URL and {resource_type} is the resource type you provide. The endpoint URL is automatically constructed based on your credential's environment URL configuration and the provided resource type.
    • The endpoint does not use pagination and returns all catalog database details for the specified resource type in a single request.
    • The endpoint will return all catalog database details for the specified resource type. The response data is extracted from the results array in the API response ($.results[*]), with each database detail record processed individually.

    Resource types should match the Collibra catalog database API specification. Common resource types include databases, schemas, tables, and columns. The endpoint uses a static URL (iteration.type: static.url) and does not require pagination. The response data path is $.results[*], which extracts all items from the results array in the API response. For detailed information about listing catalog database details, see the Collibra API documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Collibra data sources can be manually configured to ingest data from any valid Collibra API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Collibra sources, such as sources that use chained API calls to fetch data from multiple endpoints or sources that require custom authentication headers or request parameters.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the Collibra API from the Method pulldown menu. The most common methods are:

    • GET: For retrieving data from the API
    • POST: For sending data to the API or triggering actions (e.g., search operations)
    • PUT: For updating existing data
    • PATCH: For partial updates to existing data
    • DELETE: For removing data

API Endpoint URL

  1. Enter the URL of the Collibra API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including the protocol (https://) and any required path parameters. Collibra API endpoints typically follow the pattern https://{environment_url}/rest/{version}/{endpoint_path} where {environment_url} is your Collibra environment URL.

Ensure the API endpoint URL is correct and accessible with your current credentials. You can test the endpoint using the Test button after configuring the URL. The endpoint requires Basic Authentication, which is handled automatically by your credential configuration. For detailed information about Collibra API endpoints and available APIs, see the Collibra API documentation.

Path to Data

  1. Enter the JSON path that identifies the location of the relevant data within the API response in the Path to Data field. JSON paths use dot notation to navigate through nested JSON structures.

    • For example, if your API response has the structure {"results": [...]}, you would enter $.results[*] to extract all items from the results array.
    • Use $[*] to extract all items from a root-level array.
    • Use $ to extract the entire root-level object.

JSON paths are case-sensitive and must match the exact structure of your API response. Collibra API responses typically use a results array to contain the actual data for list endpoints, or a root-level object for single resource endpoints. Use the Test button to verify that your JSON path correctly extracts the desired data from the API response.

Save the Data Source

  1. Once all configuration steps have been completed, click the Save button to save your data source configuration.

  2. The data source will now be available in your data flow and will begin ingesting data according to the configured schedule and endpoint settings.