Vespa Data Source

The Vespa connector enables you to access Vespa's Document API to store, retrieve, update, and search documents with vector embeddings. This connector is particularly useful for applications that need to perform vector similarity search, build recommendation systems, implement real-time search capabilities, or manage large-scale document stores with advanced querying features. Follow the instructions below to create a new data flow that ingests data from a Vespa source in Nexla.

Vespa

Create a New Data Flow

To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.
Select the Vespa connector tile from the list of available connectors. Then, select the credential that will be used to connect to the Vespa API, and click Next; or, create a new Vespa credential for use in this flow.
In Nexla, Vespa data sources can be created using pre-built endpoint templates, which expedite source setup for common Vespa API endpoints. Each template is designed specifically for the corresponding Vespa API endpoint, making source configuration easy and efficient.
• To configure this source using a template, follow the instructions in Configure Using a Template.

Vespa sources can also be configured manually, allowing you to ingest data from Vespa API endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
• To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Vespa API endpoints. Each template is designed specifically for the corresponding Vespa API endpoint, making data source setup easy and efficient.

Endpoint Settings

Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.
Retrieve Document by ID
This endpoint retrieves documents by ID using Vespa's document API. Use this endpoint when you need to fetch specific documents, retrieve document details, or access documents by their unique identifiers.
Enter the namespace in the Namespace field. The default value is default. This identifies the document namespace in Vespa.
Enter the document type in the Document Type field. The default value is msmarco. This is the document type defined by your Vespa schema.
Enter the document ID in the Document ID field. This is the unique identifier for the document you want to retrieve.
The Retrieve Document by ID endpoint uses GET requests to retrieve documents from the Vespa Document API. The endpoint returns the complete document data including all fields and metadata. For more information about the Retrieve Document by ID endpoint, refer to the Vespa Document API Reference.
Vector Search with Custom Body
This endpoint enables vector-based semantic searches using custom JSON queries via Vespa's search API. Use this endpoint when you need to perform vector similarity search, implement semantic search, or query documents using vector embeddings.
Enter the custom query body in JSON format in the Body field. This should include your search query with vector search parameters, filters, and ranking expressions.
The Vector Search with Custom Body endpoint uses POST requests to execute custom search queries against the Vespa search API. The endpoint supports vector similarity search, filtering, and advanced ranking. For more information about vector search, refer to the Vespa Ranking Documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.
If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Vespa data sources can be manually configured to ingest data from any valid Vespa API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Vespa sources, such as sources that use custom search queries, sources that access multiple document types, or sources that require custom authentication headers or request parameters.

API Method

To manually configure this source, select the Advanced tab at the top of the configuration screen.
Select the API method that will be used for calls to the Vespa API from the Method pulldown menu. The most common methods are:
- GET: For retrieving documents from the Document API
- POST: For executing search queries via the Search API

API Endpoint URL

Enter the URL of the Vespa API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including the protocol (https://) and any required path parameters. Vespa API endpoints typically follow these patterns:
- Document API: {base_url}/document/{api_version}/{namespace}/{document_type}/docid/{document_id}
- Search API: {base_url}/search/

Ensure the API endpoint URL is correct and accessible with your current credentials. You can test the endpoint using the Test button after configuring the URL. The endpoint URL should use the base URL and API version configured in your credential. Vespa API requires Bearer token authentication, which is automatically included from your credential.

Path to Data

Optional

If only a subset of the data that will be returned by API endpoint is needed, you can designate the part(s) of the response that should be included in the Nexset(s) produced from this source by specifying the path to the relevant data within the response. This is particularly useful when API responses contain metadata, pagination information, or other data that you don't need for your analysis.

Path to Data is essential when API responses have nested structures. Without specifying the correct path, Nexla might not be able to properly parse and organize your data into usable records. For Vespa API responses, common paths include $ for the entire document, $.root.children[*] for search results, or $.fields for document fields.

To specify which data should be treated as relevant in responses from this source, enter the path to the relevant data in the Set Path to Data in Response field.
- For responses in JSON format enter the JSON path that points to the object or array that should be treated as relevant data. JSON paths use dot notation (e.g., $.root.children to access search results).

Request Headers

Optional

If Nexla should include any additional request headers in API calls to this source, enter the headers & corresponding values as comma-separated pairs in the Request Headers field (e.g., header1:value1,header2:value2). Additional headers are often required for API versioning, content type specifications, or custom authentication requirements.

You do not need to include any headers already present in the credentials. Common headers like Authorization, Content-Type, and Accept are typically handled automatically by Nexla based on your credential configuration. For Vespa, the Authorization header with Bearer token is automatically included from your credential.

Request Body

Optional

If the API endpoint requires a request body (which is common for POST requests to Vespa Search API), enter the request body in the Request Body field. The request body should be formatted as JSON and include your search query with vector search parameters, filters, and ranking expressions.

For Vespa search queries, the request body typically includes a yql field for the query string, hits for the number of results, ranking for ranking expressions, and queryProfile for query profiles. Refer to the Vespa documentation for the complete list of supported query parameters.

Endpoint Testing

After configuring all settings for the selected endpoint, Nexla can retrieve a sample of the data that will be fetched according to the current configuration. This allows users to verify that the source is configured correctly before saving.

To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.
If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Save & Activate the Source

Once all of the relevant steps in the above sections have been completed, click the Create button in the upper right corner of the screen to save and create the new Vespa data source. Nexla will now begin ingesting data from the configured endpoint and will organize any data that it finds into one or more Nexsets.

Create a New Data Flow​

Configure Using a Template​

Endpoint Settings​

Endpoint Testing​

Configure Manually​

API Method​

API Endpoint URL​

Path to Data​

Request Headers​

Request Body​

Endpoint Testing​

Save & Activate the Source​

Create a New Data Flow

Configure Using a Template

Endpoint Settings

Endpoint Testing

Configure Manually

API Method

API Endpoint URL

Path to Data

Request Headers

Request Body

Endpoint Testing

Save & Activate the Source