Agentic RAG Query
The Agentic RAG endpoint uses an AI agent to search across one or more nexsets, reason over the retrieved data, and generate a natural language answer with inline citations. The agent dynamically decides which nexsets to query, what search terms to use, and how to combine results.
Supports both synchronous JSON responses and real-time Server-Sent Events (SSE) streaming. Multi-turn conversations are supported via session IDs.
Endpoint: POST /v2/agentic-rag
Pipeline Flow
The Agentic RAG pipeline processes each request through these steps:
- Verify auth — Validate the
Authorizationheader - Load admin token — Decrypt the service key for downstream Nexla API calls
- Parallel request preparation — Resolve LLM credentials, embedding credentials (if provided), and dataset metadata
- Dataset queryable check — Determine if each nexset supports SQL, Pinecone, DataFeed, or static fallback
- Parallel context enrichment — Resolve filters from
user_contextand registered schemas, load conversation history, load Pinecone credentials - Build per-nexset filters — Combine ACL filters, access rules/scopes, and pre-retrieval filters
- Tool routing — Route each nexset to the appropriate tool (SQL, Pinecone, DataFeed, or static)
- Build agent — Construct the agent with the resolved model, system prompt, and nexset tools
- Agent reasoning loop — The agent interprets the user's intent, calls relevant tools (first round in parallel), and may perform a second targeted retrieval round (max 2 rounds, 8 tool calls total)
- Synthesize and return — Generate the final answer from cross-nexset evidence, attach citations, and return the response
Authentication
All requests require an Authorization header. See the GenAI RAG API overview for full details.
Request
Content-Type: application/json
Top-Level Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
user_prompt | string | Yes | — | The natural language question to answer |
system_prompt | string | No | null | Additional request-specific instructions appended to the built-in V2 agent system prompt. This is additive, not a replacement, and is not persisted across later session turns — resend on follow-ups if the same instruction should keep applying |
nexsets | array | Yes | — | Nexset IDs to search. Accepts plain string IDs (["123"]), integers ([123]), or full NexsetSpec objects |
user_context | UserContext | Yes | — | User identity and access control context |
llm_config | LLMConfig | Yes | — | LLM credential configuration |
embedding_config | EmbeddingConfig | No | null | Embedding model credential configuration. When omitted, the embedding model is inferred from the nexset |
stream | boolean | No | false | When true, the response is an SSE stream |
debug | boolean | No | false | When true, includes diagnostic information in the response. Ignored when stream: true — the two modes are mutually exclusive |
cache_policy | string | No | default | Cache behavior: default (read + write), refresh (skip reads, overwrite), or bypass (skip reads and writes) |
skip_cache | boolean | No | false | Legacy bypass flag. When true, takes precedence over cache_policy and maps to bypass |
NexsetSpec
Each entry in the nexsets array can be a plain string/integer ID or a full object:
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | The unique nexset identifier |
filters | NexsetFilters | No | Per-nexset filters for access control and pre-retrieval narrowing |
NexsetFilters
| Field | Type | Required | Description |
|---|---|---|---|
acl_filter | array of FilterCondition | No | Access control filters. Restricts results based on authorization rules. Multiple conditions are combined with AND logic |
pre_filter | array of FilterCondition | No | Pre-retrieval metadata filters. Narrows the search space before semantic retrieval. Multiple conditions are combined with AND logic |
FilterCondition
| Field | Type | Required | Description |
|---|---|---|---|
key | string | Yes | The metadata field name to filter on (e.g., "tenant_id", "document_type") |
operator | string | Yes | The comparison operator. See Filter Operators |
value | any | Depends on operator | The value(s) to compare against. See Filter Operators for type requirements per operator |
UserContext
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
user_id | string | Yes | — | Unique identifier for the requesting user. Overridden by the user_id claim when the request is authenticated with a JWT |
session_id | string | No | null | Session ID for multi-turn conversation continuity. See Multi-Turn Conversations |
access_rules | object | No | null | Policy-level access gates. Keys and valid values are defined at filter registration time |
access_scope | object | No | null | Data ownership scope. Values are always treated as arrays with IN semantics |
filters | object | No | null | Page or session context filters. String values use EQ matching; array values use IN matching |
LLMConfig
| Field | Type | Required | Description |
|---|---|---|---|
credential_id | string | Yes | Credential ID for the LLM |
model | string | No | Model name override. When omitted, inferred from the credential |
provider | string | No | Provider override (e.g., "openai", "anthropic", "google", "azure", "mistral"). When omitted, inferred from the credential |
EmbeddingConfig
| Field | Type | Required | Description |
|---|---|---|---|
credential_id | string | Yes | Credential ID for the embedding model |
model | string | No | Embedding model name override |
provider | string | No | Provider override |
Response — JSON (Non-Streaming)
Returned when stream is false (default).
Content-Type: application/json
Response Fields
| Field | Type | Always Present | Description |
|---|---|---|---|
answer | string | Yes | The generated answer. Contains [N] markers referencing entries in the citations array |
citations | array of Citation | Yes | Source metadata for each cited reference |
usage | Usage | Yes | Token usage statistics for the request |
cost | Cost or null | Yes | Estimated cost breakdown for the request (LLM + embedding). null when cost accounting is unavailable for the selected credentials |
model | string | Yes | The LLM model name used |
provider | string | Yes | The LLM provider used |
warnings | array of Warning | No | Present when recoverable degradations occurred. Absent on fully successful requests |
intermediate_responses | object | No | Present only when debug: true. Contains: tool_calls (array of {tool_name, tool_call_id, display_name, arguments, result}), thinking (string or null), all_sources (array), skipped_filter_keys (array of strings), and agent_duration_ms (integer) |
Citation
| Field | Type | Description |
|---|---|---|
index | integer | 1-based citation number matching [N] markers in the answer text |
nexset_id | string | The nexset this citation originated from |
nexset_name | string | Display name of the source nexset |
nexset_source_type | string or null | Underlying source type of the nexset (e.g., FILE, SQL_DATABASE, STATIC) |
nexset_connector_type | string or null | Connector identifier used to reach the source (e.g., s3, postgres, snowflake) |
nexset_tags | array of string | Free-form tags attached to the nexset. Empty array when none are set |
document_id | string or null | Document identifier, if available |
source_url | string or null | URL of the source document, if available |
title | string or null | Title of the source document. Falls back to nexset name if no title is available |
page_numbers | array | Page numbers where the cited content appears |
bounding_boxes | array | Bounding box coordinates for the cited content (e.g., for PDFs) |
relevance_score | float or null | Semantic similarity score of the best-matching chunk for this source |
Per-org citation processors may add additional fields (e.g., chunks, citation_id) to each citation. The fields above are the contract guaranteed for every caller; extra per-org fields are additive only and never break the base shape.
Usage
| Field | Type | Description |
|---|---|---|
requests | integer or null | Number of LLM and tool-call requests made |
tool_calls | integer or null | Number of tool invocations the agent performed |
input_tokens | integer or null | Total input tokens consumed |
output_tokens | integer or null | Total output tokens generated |
cache_read_tokens | integer or null | Prompt-cache read tokens (for providers that report them, e.g., Anthropic) |
cache_write_tokens | integer or null | Prompt-cache write tokens |
total_tokens | integer or null | Sum of input and output tokens |
details | object or null | Provider-specific token breakdowns, when available |
Cost
| Field | Type | Description |
|---|---|---|
llm_cost | string or null | Estimated LLM cost for the request, as a decimal string in the reported currency |
embedding_cost | string or null | Estimated embedding cost for the request |
total_cost | string or null | Sum of LLM and embedding cost |
currency | string | ISO 4217 currency code (currently USD) |
Warnings
Non-fatal degradations are surfaced via an optional top-level warnings array. The array is absent on a fully successful request and present (with at least one entry) when any recoverable degradation occurred.
| Field | Type | Description |
|---|---|---|
code | string | Warning code (see table below) |
message | string | Human-readable description of the degradation |
| Code | Meaning |
|---|---|
SESSION_HISTORY_UNAVAILABLE | Prior-turn conversation history could not be loaded. This request proceeded without it |
SESSION_HISTORY_SAVE_FAILED | This turn's conversation could not be saved. Subsequent turns in the same session may not include it |
CITATIONS_UNRESOLVED | The answer contains citation markers ([1], [2], …) but the citations array is empty — citation resolution failed |
USAGE_METRICS_UNAVAILABLE | Usage and cost metrics could not be computed; the usage / cost fields may be empty or partial |
Example JSON Response
{
"answer": "The Q2 sales figures show a 12% increase over Q1, reaching $4.2M in total revenue [1]. The Western region contributed the most growth at 18% [2].",
"citations": [
{
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"nexset_source_type": "FILE",
"nexset_connector_type": "s3",
"nexset_tags": ["quarterly", "revenue"],
"document_id": "doc-q2-2025",
"source_url": null,
"title": "Q2 2025 Revenue Summary",
"page_numbers": [3],
"bounding_boxes": [],
"relevance_score": 0.94
},
{
"index": 2,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"nexset_source_type": "FILE",
"nexset_connector_type": "s3",
"nexset_tags": [],
"document_id": "doc-regional-breakdown",
"source_url": null,
"title": "Regional Sales Breakdown",
"page_numbers": [1, 2],
"bounding_boxes": [],
"relevance_score": 0.87
}
],
"usage": {
"requests": 3,
"tool_calls": 2,
"input_tokens": 1250,
"output_tokens": 340,
"cache_read_tokens": 0,
"cache_write_tokens": 0,
"total_tokens": 1590,
"details": null
},
"cost": {
"llm_cost": "0.0147",
"embedding_cost": "0.0002",
"total_cost": "0.0149",
"currency": "USD"
},
"model": "gpt-4o",
"provider": "openai"
}
Response — SSE (Streaming)
Returned when stream is true.
Content-Type: text/event-stream
Headers: Cache-Control: no-cache, Connection: keep-alive, X-Accel-Buffering: no
Each event follows the standard SSE format:
event: <event_type>
data: <json_payload>
Event Lifecycle
The typical event sequence for a successful request:
message_start
[tool_call_start -> tool_call_delta* -> tool_call_result]* (zero or more tool cycles)
[thinking_delta]* (interleaved with tool cycles)
generation_start
content_block_start
[content_block_delta | inline_citation]* (interleaved text and citations)
content_block_stop
citation_block
message_delta
message_stop
The agent may perform multiple tool-call cycles before generating the final answer. Each cycle queries a different nexset or refines the search.
Event Types
message_start
Emitted once at the start of the stream. Contains the message ID and model info.
{
"type": "message_start",
"message": {
"id": "msg_abc123...",
"type": "message",
"role": "assistant",
"content": [],
"model": "gpt-4o",
"stop_reason": null,
"usage": { "input_tokens": null, "output_tokens": null }
}
}
tool_call_start
Emitted when the agent begins a tool call (nexset search).
{
"type": "tool_call_start",
"tool_name": "search_sales_reports_10000",
"tool_call_id": "call_abc123",
"display_name": "Sales Reports"
}
In deployments with extended tool metadata enabled, tool_call_start events may also include input_query (the search query) and tool_args (partial arguments) on a best-effort basis. These fields are optional and should not be relied upon for critical logic.
tool_call_delta
Streamed fragments of the tool call arguments.
{
"type": "tool_call_delta",
"args_delta": "{\"query\": \"Q2 sales\"}"
}
tool_call_result
The result returned by a tool call. Content is serialized as JSON and truncated to approximately 500 KB; if truncation occurs, the server appends a trailing "... [truncated]" marker inside the serialized payload.
{
"type": "tool_call_result",
"tool_call_id": "call_abc123",
"content": { "nexset_id": "10000", "total_results": 5, "chunks": [...] }
}
thinking_delta
Model reasoning content (emitted only for models that support extended thinking, e.g., Anthropic Claude with thinking enabled). thinking_delta events can be interleaved with tool-call events — they may appear during tool-call cycles, not only as a contiguous preamble.
{
"type": "thinking_delta",
"thinking": "Let me search for Q2 sales data..."
}
generation_start
Signals that the agent has finished tool calls and is generating the final answer.
{
"type": "generation_start"
}
content_block_start
Marks the beginning of a text content block.
{
"type": "content_block_start",
"index": 0,
"content_block": { "type": "text", "text": "" }
}
content_block_delta
An incremental text fragment of the answer.
{
"type": "content_block_delta",
"index": 0,
"delta": { "type": "text_delta", "text": "The Q2 sales figures show" }
}
inline_citation
Emitted when a [N] citation marker is detected in the text stream. Sent only on the first occurrence of each citation index.
{
"type": "inline_citation",
"citation_index": 1,
"source": {
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"document_id": "doc-q2-2025",
"source_url": null,
"title": "Q2 2025 Revenue Summary",
"page_numbers": [3],
"bounding_boxes": [],
"relevance_score": 0.94
}
}
content_block_stop
Marks the end of a text content block.
{
"type": "content_block_stop",
"index": 0
}
citation_block
Emitted after all content blocks. Contains the complete list of cited sources.
{
"type": "citation_block",
"citations": [
{
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"document_id": "doc-q2-2025",
"title": "Q2 2025 Revenue Summary",
"relevance_score": 0.94
}
]
}
message_delta
Emitted near the end of the stream. Contains the stop reason and final token usage.
{
"type": "message_delta",
"delta": { "stop_reason": "end_turn", "stop_sequence": null },
"usage": { "input_tokens": 1250, "output_tokens": 340 }
}
In deployments with cost accounting enabled, message_delta events also include a cost object matching the Cost schema from the non-streaming response.
stop_reason values:
"end_turn"— Successful completion"error"— The stream terminated due to an error
message_stop
Final event in the stream.
{
"type": "message_stop"
}
error
Emitted when an error occurs during streaming.
{
"type": "error",
"error": { "type": "server_error", "message": "Agent stream timed out after 300s" }
}
When an error occurs mid-stream, the server attempts to close any open content blocks, emit the error event, and then send message_delta (with stop_reason: "error") and message_stop to cleanly terminate the stream.
Streaming Error Types
Mid-stream errors are emitted as SSE error events with a machine-parseable error.type value:
error_type | Equivalent HTTP | Meaning |
|---|---|---|
tool_limit_exceeded | 429 | Agent-side: tool-call / request budget hit |
llm_usage_limit_exceeded | 429 | LLM provider rate-limited the request |
upstream_llm_error | 502 | Upstream LLM/HTTP error other than rate-limit |
agent_run_failed | 500 | Unclassified agent execution error |
stream_timeout | — | SSE stream wall-clock timeout reached (300s) |
queue_failure | — | Internal SSE event queue failure |
all_tools_failed | 502 | Every data-source tool call returned an error |
For all_tools_failed and queue_failure, content deltas may have already been flushed to the client before the error event arrives — the agent can produce plausible text from zero evidence, or a queue fault can occur after useful content has streamed. Clients must treat any error event followed by message_delta with stop_reason=error as invalidating the preceding content block. Surface an explicit error to the end user rather than rendering the partial answer, even if the rendered text looks complete.
Filters
Filters control which data the agent can access during retrieval. There are two layers:
ACL Filters (Access Control)
Restrict which records a user is authorized to see. Applied as hard constraints — results that don't match are excluded regardless of relevance. Configured via nexsets[].filters.acl_filter.
Pre-Retrieval Filters
Narrow the search space before semantic retrieval. Used for scoping queries to specific document types, date ranges, or other metadata dimensions. Configured via nexsets[].filters.pre_filter.
Filter Resolution
Filters can be applied in two ways:
- Explicit per-nexset filters — Passed directly in
nexsets[].filtersusing FilterCondition objects. - Server-side resolution from user context — When
user_contextcontainsaccess_rules,access_scope, orfilters, the server resolves these values against registered filter schemas for each nexset and generates the appropriate filter conditions automatically.
Both sources are merged. Explicit filters and server-resolved filters are combined with AND logic.
Filter Operators
| Operator | Value Type | Description |
|---|---|---|
EQ | single value | Equals |
NEQ | single value | Not equals |
IN | array | Value is in the provided list |
NOT_IN | array | Value is not in the provided list |
GT | single value | Greater than |
GTE | single value | Greater than or equal |
LT | single value | Less than |
LTE | single value | Less than or equal |
CONTAINS | single value | Field contains the value (substring match) |
NOT_CONTAINS | single value | Field does not contain the value |
EXISTS | (ignored) | Field exists and is not null |
NOT_EXISTS | (ignored) | Field does not exist or is null |
BETWEEN | array of two values [min, max] | Value is within the inclusive range |
Filter Value Semantics at Query Time
When using server-side resolution (no explicit operator in user_context), the value type determines the operator:
| Value Type | Inferred Operator | Example |
|---|---|---|
| String | EQ (equals) | "tenant_id": "17001" |
| Array | IN (contains) | "property_id": ["42001", "99001"] |
Execution Order
Nexla applies filters in deterministic order — the caller does not control this:
- access_rules — hard gate, fail fast if role or policy is violated
- access_scope — applied as data-level restriction across all relevant nexsets
- filters — intersected with access_scope result, applied per nexset based on registration
- Embedding / retrieval — vector and/or SQL based on nexset type
- LLM generation — answer synthesized from retrieved context
Multi-Turn Conversations
To maintain conversation context across multiple requests, set user_context.session_id to a stable identifier.
- The server persists conversation history and includes it in subsequent requests within the same session
- Sessions are scoped by the combination of your API key,
user_id, andsession_id - Sessions expire after 7 days of inactivity
- If conversation history storage is unavailable on load, the request proceeds without history (HTTP 200) and the response carries a
warnings[]entry with codeSESSION_HISTORY_UNAVAILABLE. If a save fails after the answer is generated, the response carriesSESSION_HISTORY_SAVE_FAILEDand subsequent turns may miss this turn's context - To start a fresh conversation, use a new
session_id system_promptis request-scoped: it is appended to the built-in V2 agent prompt for this request only and is not persisted in conversation history. Resend it on follow-up turns if the same instruction should keep applying
Cache Management
Two operator-facing endpoints exist to evict server-side cache entries when the upstream data they reference has changed. Both require the same Authorization credential as /v2/agentic-rag and operate on entries scoped to the caller's service-key namespace.
POST /v2/agentic-rag/cache/clear
Clears every nexset-scoped bucket for a single nexset in one shot.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
nexset_id | string | Yes (when clear_all=false) | Nexset to clear |
clear_all | boolean | No | Reserved. Currently rejected with 403 — global clears are not exposed on this endpoint |
Response:
{
"status": "ok",
"scope": "nexset",
"nexset_id": "10000",
"deleted": {
"pinecone": 3,
"filter_schema": 1,
"normalization_map": 1,
"dataset_info": 1
}
}
The pinecone_partial: true flag may appear when a Redis SCAN-based delete was truncated.
POST /v2/agentic-rag/cache/invalidate
Targets specific buckets rather than every nexset-scoped bucket.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
buckets | array | Yes | One or more bucket names from the table below |
nexset_id | string | Conditional | Required when buckets contains pinecone, filter_schema, normalization_map, or dataset_info |
credential_id | string | Conditional | Required when buckets contains credentials |
credential_mode | string | No | llm or embedding. When omitted, both modes are cleared for the given credential_id |
Cache buckets:
| Bucket | Scope Key | Description |
|---|---|---|
credentials | credential_id (+ optional credential_mode) | Cached LLM/embedding credential resolutions |
pinecone | nexset_id | Cached Pinecone query results for the nexset (pattern delete) |
filter_schema | nexset_id | Cached filter schema rows for the nexset |
normalization_map | nexset_id | Cached field normalization map for the nexset |
dataset_info | nexset_id | Cached dataset metadata (name, schema, source/connector type) used by the resolver |
Response:
{
"status": "ok",
"deleted": {
"filter_schema": 1,
"dataset_info": 1
}
}
Unsupported bucket names are rejected with HTTP 400.
Cache Policy
The cache_policy request field on POST /v2/agentic-rag controls how the server uses its execution caches for a single request:
| Value | Behavior |
|---|---|
default | Read from cache when present; write fresh entries on misses. Standard production behavior |
refresh | Skip cache reads, force live fetches, then overwrite the cache with the fresh result. Use after upstream data has changed but you do not want to manually invalidate |
bypass | Skip cache reads and writes. Each cache lookup is a live fetch and nothing is persisted. Use for one-off debugging or for callers that should never touch shared cache state |
The legacy skip_cache: true flag is still accepted and maps to bypass. It takes precedence over cache_policy if both are provided.
Limits and Timeouts
| Limit | Value | Description |
|---|---|---|
| Tool calls per request | 8 | Maximum number of search operations the agent can perform in a single request. Exceeding this raises HTTP 429 (non-streaming) or emits an SSE error event with error_type=tool_limit_exceeded (streaming) |
| Stream timeout | 300 seconds | Maximum wall-clock time for an SSE stream. The stream is terminated with error_type=stream_timeout if no events are produced within this window |
Error Reference
Error responses are returned as standard HTTP error responses with a JSON body:
{
"detail": "Error message describing the issue"
}
| Status | Condition | Detail | Retryable |
|---|---|---|---|
| 400 | No nexsets provided | "At least one nexset is required" | No |
| 401 | Missing or invalid Authorization header; JWT missing org_id | "org_id missing from JWT claims" (JWT case) | No |
| 403 | LLM credentials could not be resolved to a valid API key | "LLM credentials required: credential_id must resolve to valid API key" | No |
| 422 | Model name cannot be determined from the credential or request; user_prompt empty or whitespace-only | Varies | No |
| 429 | Agent exceeded its tool-call / request budget (agent-side limit) | "Agent exceeded tool-call / request limit" | Yes — split the query or reduce scope |
| 429 | LLM provider rate-limited the request (provider-side quota) | "LLM provider rate-limited the request" (passes through upstream Retry-After when present) | Yes — honor Retry-After |
| 500 | Internal server error (e.g., credential resolution failure, unclassified agent run error) | Varies | Maybe |
| 502 | Upstream LLM provider error during agent run; OR all data-source tools failed for this request | Varies | Maybe |
| 503 | Filter enforcement service unavailable, or any nexset's filter schema could not be loaded (fail-closed to prevent unauthorized access) | "Server-side filter enforcement is temporarily unavailable..." | Yes |
Retry Guidance
There is no explicit retryable field in error responses. Use standard HTTP semantics:
- 400, 401, 403, 422 — client-side issues. Fix the request before retrying
- 429 — rate-limited. Both agent-side (
"Agent exceeded tool-call / request limit") and provider-side ("LLM provider rate-limited the request") are retryable. Honor theRetry-Afterheader if present. For the agent-side case, consider splitting the query - 500, 502 — may be transient. Retry with exponential backoff
- 503 — explicitly transient. Retry after a short delay
For streaming responses, errors that occur after the stream has opened are delivered as typed error SSE events rather than HTTP status codes (HTTP 200 has already been sent). The stream terminates cleanly with message_delta (stop_reason=error) and message_stop. See Streaming Error Types for the typed error_type values.
Examples
Minimal Request (Non-Streaming)
curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "What are the latest sales figures for Q2?",
"system_prompt": "Focus on a concise executive summary.",
"nexsets": ["10000", "10001"],
"user_context": {
"user_id": "user-123"
},
"llm_config": {
"credential_id": "cred-456"
}
}'
Request with Filters and Streaming
curl -N -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "Find lease renewal terms for building A",
"system_prompt": "Return the result as short bullet points.",
"nexsets": [
{
"id": "10000",
"filters": {
"acl_filter": [
{ "key": "tenant_id", "operator": "EQ", "value": "tenant-1" }
],
"pre_filter": [
{ "key": "document_type", "operator": "EQ", "value": "lease" }
]
}
}
],
"user_context": {
"user_id": "user-123",
"session_id": "session-abc",
"access_rules": { "tenant_id": "tenant-1" }
},
"llm_config": {
"credential_id": "cred-456",
"model": "gpt-4o",
"provider": "openai"
},
"stream": true
}'
Multi-Turn Conversation
Send the first question:
curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "What were Q2 sales?",
"system_prompt": "Respond in executive-summary style.",
"nexsets": ["10000"],
"user_context": {
"user_id": "user-123",
"session_id": "conv-001"
},
"llm_config": { "credential_id": "cred-456" }
}'
Then ask a follow-up using the same session_id:
curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "How does that compare to Q1?",
"nexsets": ["10000"],
"user_context": {
"user_id": "user-123",
"session_id": "conv-001"
},
"llm_config": { "credential_id": "cred-456" }
}'
The agent will have access to the conversation history and understand "that" refers to Q2 sales. If you want the same request-specific system_prompt to apply on the follow-up, include it again in the second request.
Example SSE Stream
event: message_start
data: {"type":"message_start","message":{"id":"msg_a1b2c3","type":"message","role":"assistant","content":[],"model":"gpt-4o","stop_reason":null,"usage":{"input_tokens":null,"output_tokens":null}}}
event: tool_call_start
data: {"type":"tool_call_start","tool_name":"search_sales_reports_10000","tool_call_id":"call_x1","display_name":"Sales Reports"}
event: tool_call_delta
data: {"type":"tool_call_delta","args_delta":"{\"query\":\"Q2 sales figures\"}"}
event: tool_call_result
data: {"type":"tool_call_result","tool_call_id":"call_x1","content":{"nexset_id":"10000","total_results":5,"chunks":[{"text":"Q2 revenue reached $4.2M...","citation_index":1}]}}
event: generation_start
data: {"type":"generation_start"}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"The Q2 sales figures show a 12% increase "}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"over Q1, reaching $4.2M in total revenue [1]."}}
event: inline_citation
data: {"type":"inline_citation","citation_index":1,"source":{"index":1,"nexset_id":"10000","nexset_name":"Sales Reports","document_id":"doc-q2-2025","source_url":null,"title":"Q2 2025 Revenue Summary","page_numbers":[3],"bounding_boxes":[],"relevance_score":0.94}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: citation_block
data: {"type":"citation_block","citations":[{"index":1,"nexset_id":"10000","nexset_name":"Sales Reports","document_id":"doc-q2-2025","source_url":null,"title":"Q2 2025 Revenue Summary","page_numbers":[3],"bounding_boxes":[],"relevance_score":0.94}]}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":1250,"output_tokens":340}}
event: message_stop
data: {"type":"message_stop"}