Skip to main content

Pinecone (Native)

Nexla's bi-directional connectors can both send data to and receive data from any data system. This means that once a user has created or gained access to a credential for any data system, building any data flow to ingest data from or send data to a location within that data system requires only a few simple steps.

pinecone.png
Pinecone API Connector

For instructions pertaining to the Pinecone API connector, see the Pinecone API connector guide.


1. Credentials

This section provides information about and step-by-step instructions for creating a new Pinecone (native) credential in Nexla.

Note

Some actions are performed in the Pinecone console before creating the credential in the Nexla UI.


Generate Pinecone API Key

Pinecone requires an API key to authenticate with and make calls to the Pinecone API. This key will be stored within the Pinecone credential in Nexla and used to connect to your Pinecone project. Follow the steps below to generate a new API key in the Pinecone console:

  1. Log into the Pinecone console, and select the project that will be accessed with Nexla.

  2. Navigate to the API keys screen, and click the Create API key button.

CreateKey.png
  1. Enter a name for the API key, and click the Create API key button to generate key. Using a name that describes the purpose of the API key, such as Nexla or Nexla ProjectName, is recommended for record-keeping.
CreateKey2.png
  1. Copy the newly generated API key for use when creating the Pinecone credential in Nexla.
CreateKey3.png

Create the Pinecone Credential

After generating the Pinecone API key, log into your Nexla account, and follow the steps below to create a new Pinecone credential.

  1. In the Integrate screen, click the New Data Flow button; then, select the FlexFlow data flow type, and click Create.

  2. Select the Pinecone connector; then, in the Authenticate screen, click the Add Credential tile.

AddCred.png
  1. Enter a name for the credential in the Credential Name field, along with a brief, informative description in the Credential Description field.

  2. Paste the API key generated in the previous section into the Pinecone API Key field.

  3. Click the Save button to create the credential, and continue creating the data source (beginning with step # in the Data Source section below). The credential will also now appear in a tile on the Authenticate screen during data source/destination creation and can be used to create additional data sources and/or destinations for this Pinecone database.


2. Data Source

Data sources can easily be configured ingest data from any Pinecone database index accessible to a credential in the Nexla account. Pinecone data sources can be configured to perform a variety of query operations, each with additional settings available to further refine the data that will be ingested.

  1. Navigate to the Integrate screen, and click the New Data Flow button. Then, select the FlexFlow data flow type, and click Create.

  2. Select the Pinecone connector tile. Then, in the Authenticate screen, select the Pinecone credential that will be used to connect to the data source.

    Pinecone Credentials

    The Authenticate screen displays all Pinecone credentials accessible to the user's account. Be sure to select the credential corresponding to the Pinecone project that will be accessed in this data flow.

    To create a new Pinecone credential, follow the steps in the Credentials section above.


Configure the Data Source

  1. Enter a name for the data source in the Name field, and provide a brief, informative description of the source in the Description field.

    Resource Descriptions

    Resource descriptions should provide information about the resource purpose, data freshness, etc. that can help the owner and other users efficiently understand and utilize the resource.

NameDesc.png
  1. Specify the index within the Pinecone database that will be queried with this data source by entering the index name in the Index field.
Index.png
  1. Enter the namespace within the Pinecone database that will be queried with this data source in the Namespace field. To create this data source without specifying a namespace, leave this field blank—in this case, queries will be performed within the default namespace.
Namespace.png
  1. Select the type of query operation that will be performed for this data source from the Query Type pulldown menu. Then, click the link for the selected query type in the list below, and follow the instructions to complete query setup.

    • Fetch Similar VectorsRetrieve vectors similar to a provided dense or sparse vector within the database
    • Fetch VectorsRetrieve all vectors or a subset of vectors from the database
    • Fetch Vector IDsRetrieve a set of vector IDs from the database
QueryType.png

Fetch Similar Vectors

SimilarVectors.png
  1. Enter the number of similar vectors that will be fetched for this data source in the Top K Similar Vectors field. Vectors will be ranked from most to least similar according to the configured data source settings, and the top K most similar vectors will be included in the resulting Nexset.

  2. Optional: Similar vector query results can be further refined by filtering according to metadata parameters. To apply a filter to the query results, enter the filter as a JSON-formatted string in the Search Filter field.

    For example, the filter { "vec_id":{ "$lte": 100 }} could be used to include only vectors with ID values less than or equal to 100 in the query results.

    Search Filters

    For more information about Pinecone metadata parameters and metadata querying language, see this Pinecone documentation.


  3. Use the Search By Criteria pulldown menu to specify the type of similarity search that will be performed for this data source.

    • Dense VectorPerform the search according to dense vectors
    • Vector IDPerform the search according to vector IDs

▷   When Dense Vector is selected:

DenseVector.png
  • In the Dense Vector field, enter the dense vector that will be used for the similarity search as a list of float values (e.g., 0.1,0.2,0.5,0.4).
  • List the indices of non-zero values included in the sparse vector for use in the similarity search in the Sparse Vector Indices field.
  • List the values corresponding to the indices included above in the Sparse Vector Values field.

▷   When Vector ID is selected:

VectorID.png
  • Enter the unique identifier of the vector that will be used for the similarity search in the Vector Identifier field.

Fetch Vectors

FetchVectors.png
  • Enter the vector ID prefix designating which vectors will be retrieved from the database in the Pinecone Prefix field. The prefix should be entered with no spaces.

Fetch Vector IDs

FetchVectorIDs.png
  • Enter the vector ID prefix designating which vector IDs will be retrieved from the database in the Pinecone Prefix field. The prefix should be entered with no spaces.

Nexset Creation Settings

Optional settings available in the Nexset Creation panel can be used to further refine which data will be included in the Nexset created from this data source.

NexsetCreation.png
  • Nexla can include the values of the vectors retrieved from the Pinecone database in the Nexset. To include these values, select the Include Values? checkbox, or disable this checkbox to retrieve vector information without including the vector values in the produced Nexset.

  • Metadata information for all retrieved vectors can also be included in the produced Nexset. To include vector metadata, select the Include Metadata? checkbox, or disable this checkbox to exclude vector metadata from the Nexset.


Scheduling

In the Schedule Ingestion panel, scan scheduling options can be used to define the freqency at which the Pinecone database project will be queried & scanned for new data and/or changes. Any new data/changes identified during a scan will then be processed into the detected Nexset.

  • By default, when a new data source is created, Nexla is configured to scan the source for data changes once every day. To continue with this option, no further selections are required.

  • To define how often Nexla should scan the data source for data changes, select an option from the Ingestion Frequency pulldown menu under the Scheduling settings section.

    • When options such as Every N Days or Every N Hours, a secondary pulldown menu will be populated. Select the appropriate value of N from this menu.
ScanSched.png
  • To specify the time at which Nexla should scan the source for new data changes, use the pulldown menu(s) to the right of the Ingestion Frequency menu. These time menus vary according to the selected scan frequency.
ScanTime.png

Save & Activate the Data Source

After configuring all required settings and any desired additional options, click Create in the top right corner of the screen to save & activate the data source.

Create2.png

Once the data source is created, Nexla will automatically scan it for data according to the configured settings. Identified data will be organized into a Nexset, which is a logical data product that is immediately ready to be sent to a destination.


Data Feed Macros

The properties shown in the table below can be used as macros (variables) in the data feed URL for a Pinecone data source.

Pinecone Source Parameters Available as Macros

ParameterDescriptionDefault Value
databaseName of the index/databaseNone
collectionName of the namespace/collectionEmpty
query_typeSearch query typesimilarity_search
search_bySearch parameters used for similarity searchdense_vector
vector_idVector ID used for the searchNone
dense_vectorDense vector used for the searchNone
sparse_vector_indicesSparse vector indicese used for the searchEmpty
sparse_vector_valuesSparse vector values used for the searchEmpty
topKNumber of vectors to be fetched20
pinecone.prefixVector IDs to be fetchedEmpty
pinecone.filterAllows vector search limitation based on metadataEmpty
pinecone.includeValuesDetermines whether vector values are included in the response true
pinecone.includeMetadataDetermines whether metadata is included in the responsetrue

3. Data Destination

Pinecone destinations can be configured to send Nexset data to any Pinecone database index accessible to a credential in the Nexla account.

Sending Data to Pinecone

Data sent to Pinecone must be correctly formatted for processing by the Pinecone database, as detailed in the sections below.


Prerequisite: Data Formatting

Before sending Nexset data to a Pinecone destination, ensure that the data is in the correct vector format.

Pinecone expects the following parameters for each record:

  • id – Unique record identifier (string or number) for the index namespace
  • dense_vector – Data content as dense vector values
  • metadata – Optional field containing key–value pairs that provide additional information or context about the record
  • sparse_indices – Optional field containing sparse vector indices, used to facilitate hybrid semantic–keyword searching
  • sparse_values – Optional field containing sparse vector values, used to facilitate hybrid semantic–keyword searching

Example Nexset Schema Formatted for Pinecone

Schema.png

Nexset transformations can be used to easily convert any ingested data into the required Pinecone vector format. To learn more about vectorizing ingested data, see the Sending Text Data to Vector Databases tutorial.


Create & Configure the Destination

After ensuring that the Nexset data is correctly formatted for the Pinecone database, follow the steps below to create a new Pinecone destination in Nexla.

  1. Navigate to the Integrate screen, and locate the Nexset that will be sent to the Pinecone database. Click the + icon to open the Nexset menu, and select Send to Destination.
SendNexset.png
  1. Select the Pinecone connector tile; then, in the Authenticate screen, select the credential that will be used to authenticate to the Pinecone database, and click Next.
SelectCred2.png
  1. Enter a name for the destination in the Name field, and provide a brief, informative description in the Description field.
NameDesc2.png

Data Location

Settings in the Data Location panel are used to specify the location within the Pinecone database where the Nexset data will be stored.

DataLocation.png
  1. Enter the name of the Pinecone index where the Nexset vectors will be stored in the Index field.

  2. Enter the namespace in which the Nexset vectors will be stored within the selected Pinecone database in the Namespace field. To store the Nexset vectors in the default namespace, leave this field blank.


Data Format

In most cases, using Nexset transformations to convert data into the required vector format prior to sending it to a Pinecone destination is the recommended and simplest approach; therefore, most users can skip this section.

However, for cases in which transforms are not easily applied and/or would be unnecessarily complex—such as when records contain different metadata fields that may not be known in advance—Nexla provides the option to manually configure the vector mapping structure for the destination in the Data Format panel.

DataFormat2.png
  • To define how the Nexset attributes should be mapped to the Pinecone vector fields, enter the appropriate mapping structure in the Vector Mapping field. The vector map must be entered as a valid JSON object, including curly brackets.

    Vector Mapping

    For example, in a Nexset containing records with the structure shown below, the vector ID values are contained in the identifier attribute, and the dense vector values are located in the values attribute nested within the data attribute.

    {
    "identifier": "...",
    "data": {
    "values": [...],
    }
    }

    To correctly map the Nexset data to the Pinecone vector fields, the following vector mapping structure should be entered in the Vector Mapping field:

    {
    "id_field": "identifier",
    "dense_values_field": "$.data.values"
    }

Metadata Mapping

  • By default, all Nexset data outside the defined dense_values_field and id_field attributes will be passed to the Pinecone database as metadata. However, a metadata_mapping array can be included in the vector mapping structure to identify which Nexset data that should be passed as metadata.

    Metadata Mapping

    For example, in a Nexset containing records with the structure shown below, data in the text and subject attributes should be passed as metadata, along with the tags ready and simple. Other tags and data inside status should not be included.

    {
    "id": "1",
    "vector": [...],
    "text": "some text",
    "subject": "history",
    "tags": {
    "ready": true,
    "simple": false,
    "other": false,
    ...
    },
    "status": {
    "error": "0",
    "message": "OK"
    }
    }

    To correctly pass the desired metadata along with the vector data to the Pinecone database, the following vector mapping structure should be entered in the Vector Mapping field:

    {
    "dense_values_field": "vector",
    "metadata_mapping": [
    { "type": "DELETE", "field": "status" },
    { "type": "DELETE", "field": "tags" },
    { "type": "SELECT", "field": "ready", "from": "$.tags.ready" },
    { "type": "SELECT", "field": "simple", "from": "$.tags.simple" }
    ]
    }

Performance Customization

In the Performance Customization panel, Nexla can be configured to send multiple upserts to the Pinecone database in parallel. Performing parallel upserts can increase throughput.

Performance.png
  • To enable parallel upserts for this Pinecone destination, enter the number of parallel upserts that should be performed in the Upsert Parallelism field.

Save & Activate the Destination

  1. After configuring all necessary settings for the Pinecone destination, click Done in the upper right corner of the screen to save and create the destination.

    Important: Data Movement

    Data will not begin to flow into the destination until it is activated, as shown in the following step.


  2. Once created, the destination must be activated to begin the flow of data into the destination. To activate the destination, click the + icon to open the destination menu, and select Activate.

ActivateDest.png