Skip to main content

Data Source

Follow the instructions below to create a new data flow that ingests data from a Pinecone API source in Nexla.
pinecone_api.png

Pinecone API

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the Pinecone API connector tile from the list of available connectors. Then, select the credential that will be used to connect to your Pinecone account, and click Next; or, create a new Pinecone API credential for use in this flow.

  3. In Nexla, Pinecone API data sources can be created using pre-built endpoint templates, which expedite source setup for common Pinecone endpoints. Each template is designed specifically for the corresponding Pinecone endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    Pinecone API sources can also be configured manually, allowing you to ingest data from Pinecone endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Pinecone endpoints. Each template is designed specifically for the corresponding Pinecone endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

    Query Vectors

    This endpoint template searches a namespace using a query vector from your Pinecone index. Use this template when you need to perform similarity search to find vectors that are most similar to a query vector, which is useful for recommendation systems, search functionality, and other AI-powered applications.

    • Enter the query vector in the Vector field. This should be a vector array in JSON format (e.g., [0.1, 0.2, 0.3, ...]). The query vector should be the same length as the dimension of the index being queried. The query vector is used to find the most similar vectors in your Pinecone index.
    • Enter the number of results to return in the Top K field. This should be the number of most similar vectors you want to retrieve (default: 100). The Top K value determines how many results will be returned for each query.
    • Select whether to include vector values in the Include Values field. Available options include true (include vector values in the response) and false (do not include vector values, default: false). Including values increases response size but provides the full vector data.
    • Select whether to include vector metadata in the Include Metadata field. Available options include true (include vector metadata in the response, default: true) and false (do not include vector metadata). Including metadata provides additional information associated with each vector.

    This endpoint performs similarity search using a query vector to find the most similar vectors in your Pinecone index. The query vector should match the dimension of your index. The endpoint returns the top K most similar vectors based on the similarity metric configured for your index.

    For detailed information about vector queries, similarity search, API response structures, and available query parameters, see the Pinecone API documentation.

    List Vector IDs

    This endpoint template lists the IDs of vectors in a single namespace from your Pinecone index. This is supported only for serverless indices. Use this template when you need to retrieve a list of all vector IDs in a namespace, which is useful for inventory management, data discovery, or batch operations.

    • Enter the namespace in the Namespace field. This should be the namespace name for which you want to list vector IDs. The namespace determines which vectors' IDs will be listed. Leave this field empty to list vectors in the default namespace.
    • Enter an optional prefix in the Prefix field. This should be a prefix to limit the results to IDs with a common prefix. The prefix is useful for filtering vector IDs by a common naming pattern. Leave this field empty to list all vector IDs in the namespace.

    This endpoint lists the IDs of vectors in a single namespace from your Pinecone index. This endpoint is supported only for serverless indices. The endpoint uses token-based pagination to handle large datasets efficiently. Nexla will automatically fetch subsequent pages of data by following the pagination token returned in the API response.

    For detailed information about listing vector IDs, namespace management, pagination, and serverless index support, see the Pinecone API documentation.

    Query Vectors By ID

    This endpoint template searches a namespace using the unique ID of a vector as a query vector from your Pinecone index. Use this template when you need to find vectors similar to a specific vector by its ID, which is useful for finding related content or recommendations based on existing vectors.

    • Enter the vector ID in the Vector ID field. This should be the unique identifier of the vector you want to use as a query. The vector ID determines which vector will be used as the query vector to find similar vectors.
    • Enter the number of results to return in the Top K field. This should be the number of most similar vectors you want to retrieve. The Top K value determines how many results will be returned for each query.
    • Select whether to include vector values in the Include Values field. Available options include true (include vector values in the response) and false (do not include vector values). Including values increases response size but provides the full vector data.
    • Select whether to include vector metadata in the Include Metadata field. Available options include true (include vector metadata in the response) and false (do not include vector metadata). Including metadata provides additional information associated with each vector.
    • Enter the namespace in the Namespace field. This should be the namespace name where you want to query vectors. The namespace determines which namespace will be searched. Leave this field empty to query vectors in the default namespace.

    This endpoint performs similarity search using a vector ID as a query to find the most similar vectors in your Pinecone index. The endpoint uses the vector with the specified ID as the query vector to find similar vectors. The endpoint returns the top K most similar vectors based on the similarity metric configured for your index.

    For detailed information about querying vectors by ID, similarity search, API response structures, and available query parameters, see the Pinecone API documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Pinecone API data sources can be manually configured to ingest data from any valid Pinecone API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Pinecone API sources, such as sources that use chained API calls to fetch related data or sources that require custom query parameters or filters.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the Pinecone API from the Method pulldown menu. Pinecone API typically uses GET method for retrieving data and POST method for querying vectors.

API Endpoint URL

  1. Enter the URL of the Pinecone API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including your index host URL (from your credential), the API path (e.g., /query, /vectors/list), and any required query parameters. Include any required path parameters in the URL.

Ensure the API endpoint URL is correct and accessible with your current credentials. The Pinecone API endpoint URL should include your index host URL and the appropriate API path. You can test the endpoint using the Test button after configuring the URL.

Response Data Path

  1. Enter the JSONPath expression in the Response Data Path field to specify which part of the API response should be treated as the relevant data by Nexla. For Pinecone API responses, use $.matches[*] to extract all matches from the matches array for query endpoints, or $.vectors[*] to extract all vector IDs from the vectors array for list endpoints, depending on your endpoint.

The JSONPath expression must correctly reference the structure of your Pinecone API response. Pinecone API responses may have different structures depending on the endpoint. Ensure your JSONPath expression matches the structure returned by your specific endpoint. The JSONPath expression determines which data will be extracted and processed by Nexla.

Pagination (if applicable)

  1. If your endpoint supports pagination, configure the pagination settings in the Pagination section. Pinecone API uses token-based pagination with paginationToken for some endpoints. Select the pagination type that matches your endpoint's pagination mechanism.

  2. Configure the pagination parameters based on your selected pagination type. For token-based pagination, specify the JSONPath expression to the next token in the response ($.pagination.next) and the parameter name (paginationToken).

Save the Data Source

  1. Once all of the relevant steps in the above sections have been completed, click the Next button to proceed with the rest of the data flow configuration, or click Save to save the data source configuration for later use.