Skip to main content

Jina DeepSearch Data Source

The Jina DeepSearch connector enables you to interact with Jina's DeepSearch API for intelligent web search, iterative reasoning, and comprehensive information retrieval. This connector is particularly useful for applications that need to perform deep web searches, gather information from multiple sources, build research assistants, or create AI-powered search and discovery systems. Follow the instructions below to create a new data flow that ingests data from a Jina DeepSearch source in Nexla.
jina_deepsearch_api.png

Jina DeepSearch

Create a New Data Flow

  1. To create a new data flow, navigate to the Integrate section, and click the New Data Flow button. Then, select the desired flow type from the list, and click the Create button.

  2. Select the Jina DeepSearch connector tile from the list of available connectors. Then, select the credential that will be used to connect to the Jina DeepSearch API, and click Next; or, create a new Jina DeepSearch credential for use in this flow.

  3. In Nexla, Jina DeepSearch data sources can be created using pre-built endpoint templates, which expedite source setup for common Jina DeepSearch API endpoints. Each template is designed specifically for the corresponding Jina DeepSearch API endpoint, making source configuration easy and efficient.
    • To configure this source using a template, follow the instructions in Configure Using a Template.

    Jina DeepSearch sources can also be configured manually, allowing you to ingest data from Jina DeepSearch API endpoints not included in the pre-built templates or apply further customizations to exactly suit your needs.
    • To configure this source manually, follow the instructions in Configure Manually.

Configure Using a Template

Nexla provides pre-built templates that can be used to rapidly configure data sources to ingest data from common Jina DeepSearch API endpoints. Each template is designed specifically for the corresponding Jina DeepSearch API endpoint, making data source setup easy and efficient.

Endpoint Settings

  • Select the endpoint from which this source will fetch data from the Endpoint pulldown menu. Available endpoint templates are listed in the expandable boxes below. Click on an endpoint to see more information about it and how to configure your data source for this endpoint.

    Chat Completions

    This endpoint generates chat-based completions using Jina's DeepSearch API, which performs iterative web searches to find and synthesize information. Use this endpoint when you need to perform deep web searches, gather information from multiple sources, build research assistants, or create AI-powered search and discovery systems.

    • Enter the model ID to use in the Model field. The default value is jina-deepsearch-v1, which is the current DeepSearch model. You can specify a different model if your organization uses a specific model version.
    • Enter an array of message objects for the chat in the Messages field. Messages should be formatted as a JSON array, for example: [{"role": "user", "content": "Hello!"}]. Each message object should have a role field (typically "user" or "assistant") and a content field containing the message text.
    • Optionally, specify whether to stream the response in the Stream field. The default value is false. Set to true if you want to receive streaming responses.
    • Optionally, specify the level of reasoning effort in the Reasoning Effort field. Valid values are low, medium, or high. The default value is medium. Higher reasoning effort may provide more thorough search results but may take longer to process.
    • Optionally, enter the maximum number of tokens allowed for the DeepSearch process in the Budget Tokens field. The default value is 100000. This controls the computational budget for the search and reasoning process.
    • Optionally, enter the maximum number of retries for solving a problem in the Max Attempts field. The default value is 3. This controls how many times the API will attempt to find an answer before giving up.
    • Optionally, specify whether to force the model to take further steps even for trivial queries in the No Direct Answer field. The default value is false. Set to true to force deeper search and reasoning.
    • Optionally, enter the maximum number of URLs to include in the final answer in the Max Returned URLs field. The default value is 5. This controls how many source URLs are included in the response.
    • Optionally, specify whether to enable structured outputs matching a supplied JSON schema in the Structured Output field. The default value is false.
    • Optionally, enter a list of domains given higher priority for content retrieval in the Good Domains field. This should be formatted as a JSON array, for example: ["example.com", "researchgate.net"].
    • Optionally, enter a list of domains to be excluded from content retrieval in the Bad Domains field. This should be formatted as a JSON array.
    • Optionally, enter a list of domains to be exclusively included in content retrieval in the Only Domains field. This should be formatted as a JSON array.
    • Optionally, specify the language of the answer in the Answer Language field. Use ISO 639-1 language codes (e.g., en for English, fr for French). The default value is en.

    The Chat Completions endpoint uses POST requests to send chat messages to the Jina DeepSearch API, which then performs iterative web searches and reasoning to generate comprehensive answers. The endpoint returns chat completions with synthesized information from multiple web sources. For more information about the Chat Completions endpoint, refer to the Jina DeepSearch Documentation.

Endpoint Testing

Once the selected endpoint template has been configured, Nexla can retrieve a sample of the data that will be fetched according to the current settings. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Configure Manually

Jina DeepSearch data sources can be manually configured to ingest data from any valid Jina DeepSearch API endpoint. Manual configuration provides maximum flexibility for accessing endpoints not covered by pre-built templates or when you need custom API configurations.

With manual configuration, you can also create more complex Jina DeepSearch sources, such as sources that use chained API calls to fetch data from multiple endpoints or sources that require custom authentication headers or request parameters.

API Method

  1. To manually configure this source, select the Advanced tab at the top of the configuration screen.

  2. Select the API method that will be used for calls to the Jina DeepSearch API from the Method pulldown menu. The most common methods are:

    • POST: For sending chat completion requests to the API (most Jina DeepSearch endpoints use POST)

API Endpoint URL

  1. Enter the URL of the Jina DeepSearch API endpoint from which this source will fetch data in the Set API URL field. This should be the complete URL including the protocol (https://) and any required path parameters. Jina DeepSearch API endpoints typically follow the pattern {base_url}/{api_version}/chat/completions, where {base_url} is typically https://deepsearch.jina.ai and {api_version} is typically v1.

Ensure the API endpoint URL is correct and accessible with your current credentials. You can test the endpoint using the Test button after configuring the URL. The endpoint URL should match the base URL and API version configured in your credential.

Path to Data

Optional

If only a subset of the data that will be returned by API endpoint is needed, you can designate the part(s) of the response that should be included in the Nexset(s) produced from this source by specifying the path to the relevant data within the response. This is particularly useful when API responses contain metadata, pagination information, or other data that you don't need for your analysis.

For example, when a request call is used to fetch chat completions, the API will typically return choices data along with metadata. By entering the path to the relevant data, you can configure Nexla to extract the specific content you need.

Path to Data is essential when API responses have nested structures. Without specifying the correct path, Nexla might not be able to properly parse and organize your data into usable records. For Jina DeepSearch API responses, common paths include $.choices[*].message.content for chat completion content.

  • To specify which data should be treated as relevant in responses from this source, enter the path to the relevant data in the Set Path to Data in Response field.

    • For responses in JSON format enter the JSON path that points to the object or array that should be treated as relevant data. JSON paths use dot notation (e.g., $.choices to access the choices object).
    Path to Data Example:

    If the API response is in JSON format and includes a choices array that contains message content, the path to the response would be entered as $.choices[*].message.content.

Autogenerate Path Suggestions

Nexla can also autogenerate data path suggestions based on the response from the API endpoint. These suggested paths can be used as-is or modified to exactly suit your needs.

  • To use this feature, click the Test button next to the Set API URL field to fetch a sample response from the API endpoint. Suggested data paths generated based on the content & format of the response will be displayed in the Suggestions box below the Set Path to Data in Response field.

  • Click on a suggestion to automatically populate the Set Path to Data in Response field with the corresponding path. The populated path can be modified directly within the field if further customization is needed.

Request Headers

Optional
  • If Nexla should include any additional request headers in API calls to this source, enter the headers & corresponding values as comma-separated pairs in the Request Headers field (e.g., header1:value1,header2:value2). Additional headers are often required for API versioning, content type specifications, or custom authentication requirements.

    You do not need to include any headers already present in the credentials. Common headers like Authorization, Content-Type, and Accept are typically handled automatically by Nexla based on your credential configuration. For Jina DeepSearch, you may want to include Content-Type:application/json to specify the request format.

Request Body

Optional
  • If the API endpoint requires a request body (which is common for POST requests to Jina DeepSearch), enter the request body in the Request Body field. The request body should be formatted as JSON and include the necessary parameters for the chat completion request, such as the model, messages, reasoning_effort, budget_tokens, and other optional parameters.

    For Jina DeepSearch chat completion requests, the request body typically includes a model field (e.g., "jina-deepsearch-v1"), a messages field containing an array of message objects, and optionally fields like reasoning_effort, budget_tokens, max_attempts, and domain filtering options.

Endpoint Testing

After configuring all settings for the selected endpoint, Nexla can retrieve a sample of the data that will be fetched according to the current configuration. This allows users to verify that the source is configured correctly before saving.

  • To test the current endpoint configuration, click the Test button to the right of the endpoint selection menu. Sample data will be fetched & displayed in the Endpoint Test Result panel on the right.

  • If the sample data is not as expected, review the selected endpoint and associated settings, and make any necessary adjustments. Then, click the Test button again, and check the sample data to ensure that the correct information is displayed.

Save & Activate the Source

  1. Once all of the relevant steps in the above sections have been completed, click the Create button in the upper right corner of the screen to save and create the new Jina DeepSearch data source. Nexla will now begin ingesting data from the configured endpoint and will organize any data that it finds into one or more Nexsets.