Skip to main content

Agentic RAG Query

The Agentic RAG endpoint uses an AI agent to search across one or more nexsets, reason over the retrieved data, and generate a natural language answer with inline citations. The agent dynamically decides which nexsets to query, what search terms to use, and how to combine results.

Supports both synchronous JSON responses and real-time Server-Sent Events (SSE) streaming. Multi-turn conversations are supported via session IDs.

Endpoint: POST /v2/agentic-rag

Pipeline Flow

The Agentic RAG pipeline processes each request through these steps:

  1. Verify auth — Validate the Authorization header
  2. Load admin token — Decrypt the service key for downstream Nexla API calls
  3. Parallel request preparation — Resolve LLM credentials, embedding credentials (if provided), and dataset metadata
  4. Dataset queryable check — Determine if each nexset supports SQL, Pinecone, DataFeed, or static fallback
  5. Parallel context enrichment — Resolve filters from user_context and registered schemas, load conversation history, load Pinecone credentials
  6. Build per-nexset filters — Combine ACL filters, access rules/scopes, and pre-retrieval filters
  7. Tool routing — Route each nexset to the appropriate tool (SQL, Pinecone, DataFeed, or static)
  8. Build agent — Construct the agent with the resolved model, system prompt, and nexset tools
  9. Agent reasoning loop — The agent interprets the user's intent, calls relevant tools (first round in parallel), and may perform a second targeted retrieval round (max 2 rounds, 8 tool calls total)
  10. Synthesize and return — Generate the final answer from cross-nexset evidence, attach citations, and return the response

Authentication

All requests require an Authorization header. See the GenAI RAG API overview for full details.

Request

Content-Type: application/json

Top-Level Fields

FieldTypeRequiredDefaultDescription
user_promptstringYesThe natural language question to answer
system_promptstringNonullAdditional request-specific instructions appended to the built-in V2 agent system prompt. This is additive, not a replacement, and is not persisted across later session turns — resend on follow-ups if the same instruction should keep applying
nexsetsarrayYesNexset IDs to search. Accepts plain string IDs (["123"]), integers ([123]), or full NexsetSpec objects
user_contextUserContextYesUser identity and access control context
llm_configLLMConfigYesLLM credential configuration
embedding_configEmbeddingConfigNonullEmbedding model credential configuration. When omitted, the embedding model is inferred from the nexset
streambooleanNofalseWhen true, the response is an SSE stream
debugbooleanNofalseWhen true, includes diagnostic information in the response. Ignored when stream: true — the two modes are mutually exclusive
cache_policystringNodefaultCache behavior: default (read + write), refresh (skip reads, overwrite), or bypass (skip reads and writes)
skip_cachebooleanNofalseLegacy bypass flag. When true, takes precedence over cache_policy and maps to bypass

NexsetSpec

Each entry in the nexsets array can be a plain string/integer ID or a full object:

FieldTypeRequiredDescription
idstringYesThe unique nexset identifier
filtersNexsetFiltersNoPer-nexset filters for access control and pre-retrieval narrowing

NexsetFilters

FieldTypeRequiredDescription
acl_filterarray of FilterConditionNoAccess control filters. Restricts results based on authorization rules. Multiple conditions are combined with AND logic
pre_filterarray of FilterConditionNoPre-retrieval metadata filters. Narrows the search space before semantic retrieval. Multiple conditions are combined with AND logic

FilterCondition

FieldTypeRequiredDescription
keystringYesThe metadata field name to filter on (e.g., "tenant_id", "document_type")
operatorstringYesThe comparison operator. See Filter Operators
valueanyDepends on operatorThe value(s) to compare against. See Filter Operators for type requirements per operator

UserContext

FieldTypeRequiredDefaultDescription
user_idstringYesUnique identifier for the requesting user. Overridden by the user_id claim when the request is authenticated with a JWT
session_idstringNonullSession ID for multi-turn conversation continuity. See Multi-Turn Conversations
access_rulesobjectNonullPolicy-level access gates. Keys and valid values are defined at filter registration time
access_scopeobjectNonullData ownership scope. Values are always treated as arrays with IN semantics
filtersobjectNonullPage or session context filters. String values use EQ matching; array values use IN matching

LLMConfig

FieldTypeRequiredDescription
credential_idstringYesCredential ID for the LLM
modelstringNoModel name override. When omitted, inferred from the credential
providerstringNoProvider override (e.g., "openai", "anthropic", "google", "azure", "mistral"). When omitted, inferred from the credential

EmbeddingConfig

FieldTypeRequiredDescription
credential_idstringYesCredential ID for the embedding model
modelstringNoEmbedding model name override
providerstringNoProvider override

Response — JSON (Non-Streaming)

Returned when stream is false (default).

Content-Type: application/json

Response Fields

FieldTypeAlways PresentDescription
answerstringYesThe generated answer. Contains [N] markers referencing entries in the citations array
citationsarray of CitationYesSource metadata for each cited reference
usageUsageYesToken usage statistics for the request
costCost or nullYesEstimated cost breakdown for the request (LLM + embedding). null when cost accounting is unavailable for the selected credentials
modelstringYesThe LLM model name used
providerstringYesThe LLM provider used
warningsarray of WarningNoPresent when recoverable degradations occurred. Absent on fully successful requests
intermediate_responsesobjectNoPresent only when debug: true. Contains: tool_calls (array of {tool_name, tool_call_id, display_name, arguments, result}), thinking (string or null), all_sources (array), skipped_filter_keys (array of strings), and agent_duration_ms (integer)

Citation

FieldTypeDescription
indexinteger1-based citation number matching [N] markers in the answer text
nexset_idstringThe nexset this citation originated from
nexset_namestringDisplay name of the source nexset
nexset_source_typestring or nullUnderlying source type of the nexset (e.g., FILE, SQL_DATABASE, STATIC)
nexset_connector_typestring or nullConnector identifier used to reach the source (e.g., s3, postgres, snowflake)
nexset_tagsarray of stringFree-form tags attached to the nexset. Empty array when none are set
document_idstring or nullDocument identifier, if available
source_urlstring or nullURL of the source document, if available
titlestring or nullTitle of the source document. Falls back to nexset name if no title is available
page_numbersarrayPage numbers where the cited content appears
bounding_boxesarrayBounding box coordinates for the cited content (e.g., for PDFs)
relevance_scorefloat or nullSemantic similarity score of the best-matching chunk for this source
note

Per-org citation processors may add additional fields (e.g., chunks, citation_id) to each citation. The fields above are the contract guaranteed for every caller; extra per-org fields are additive only and never break the base shape.

Usage

FieldTypeDescription
requestsinteger or nullNumber of LLM and tool-call requests made
tool_callsinteger or nullNumber of tool invocations the agent performed
input_tokensinteger or nullTotal input tokens consumed
output_tokensinteger or nullTotal output tokens generated
cache_read_tokensinteger or nullPrompt-cache read tokens (for providers that report them, e.g., Anthropic)
cache_write_tokensinteger or nullPrompt-cache write tokens
total_tokensinteger or nullSum of input and output tokens
detailsobject or nullProvider-specific token breakdowns, when available

Cost

FieldTypeDescription
llm_coststring or nullEstimated LLM cost for the request, as a decimal string in the reported currency
embedding_coststring or nullEstimated embedding cost for the request
total_coststring or nullSum of LLM and embedding cost
currencystringISO 4217 currency code (currently USD)

Warnings

Non-fatal degradations are surfaced via an optional top-level warnings array. The array is absent on a fully successful request and present (with at least one entry) when any recoverable degradation occurred.

FieldTypeDescription
codestringWarning code (see table below)
messagestringHuman-readable description of the degradation
CodeMeaning
SESSION_HISTORY_UNAVAILABLEPrior-turn conversation history could not be loaded. This request proceeded without it
SESSION_HISTORY_SAVE_FAILEDThis turn's conversation could not be saved. Subsequent turns in the same session may not include it
CITATIONS_UNRESOLVEDThe answer contains citation markers ([1], [2], …) but the citations array is empty — citation resolution failed
USAGE_METRICS_UNAVAILABLEUsage and cost metrics could not be computed; the usage / cost fields may be empty or partial

Example JSON Response

{
"answer": "The Q2 sales figures show a 12% increase over Q1, reaching $4.2M in total revenue [1]. The Western region contributed the most growth at 18% [2].",
"citations": [
{
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"nexset_source_type": "FILE",
"nexset_connector_type": "s3",
"nexset_tags": ["quarterly", "revenue"],
"document_id": "doc-q2-2025",
"source_url": null,
"title": "Q2 2025 Revenue Summary",
"page_numbers": [3],
"bounding_boxes": [],
"relevance_score": 0.94
},
{
"index": 2,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"nexset_source_type": "FILE",
"nexset_connector_type": "s3",
"nexset_tags": [],
"document_id": "doc-regional-breakdown",
"source_url": null,
"title": "Regional Sales Breakdown",
"page_numbers": [1, 2],
"bounding_boxes": [],
"relevance_score": 0.87
}
],
"usage": {
"requests": 3,
"tool_calls": 2,
"input_tokens": 1250,
"output_tokens": 340,
"cache_read_tokens": 0,
"cache_write_tokens": 0,
"total_tokens": 1590,
"details": null
},
"cost": {
"llm_cost": "0.0147",
"embedding_cost": "0.0002",
"total_cost": "0.0149",
"currency": "USD"
},
"model": "gpt-4o",
"provider": "openai"
}

Response — SSE (Streaming)

Returned when stream is true.

Content-Type: text/event-stream Headers: Cache-Control: no-cache, Connection: keep-alive, X-Accel-Buffering: no

Each event follows the standard SSE format:

event: <event_type>
data: <json_payload>

Event Lifecycle

The typical event sequence for a successful request:

message_start
[tool_call_start -> tool_call_delta* -> tool_call_result]* (zero or more tool cycles)
[thinking_delta]* (interleaved with tool cycles)
generation_start
content_block_start
[content_block_delta | inline_citation]* (interleaved text and citations)
content_block_stop
citation_block
message_delta
message_stop

The agent may perform multiple tool-call cycles before generating the final answer. Each cycle queries a different nexset or refines the search.

Event Types

message_start

Emitted once at the start of the stream. Contains the message ID and model info.

{
"type": "message_start",
"message": {
"id": "msg_abc123...",
"type": "message",
"role": "assistant",
"content": [],
"model": "gpt-4o",
"stop_reason": null,
"usage": { "input_tokens": null, "output_tokens": null }
}
}

tool_call_start

Emitted when the agent begins a tool call (nexset search).

{
"type": "tool_call_start",
"tool_name": "search_sales_reports_10000",
"tool_call_id": "call_abc123",
"display_name": "Sales Reports"
}
note

In deployments with extended tool metadata enabled, tool_call_start events may also include input_query (the search query) and tool_args (partial arguments) on a best-effort basis. These fields are optional and should not be relied upon for critical logic.

tool_call_delta

Streamed fragments of the tool call arguments.

{
"type": "tool_call_delta",
"args_delta": "{\"query\": \"Q2 sales\"}"
}

tool_call_result

The result returned by a tool call. Content is serialized as JSON and truncated to approximately 500 KB; if truncation occurs, the server appends a trailing "... [truncated]" marker inside the serialized payload.

{
"type": "tool_call_result",
"tool_call_id": "call_abc123",
"content": { "nexset_id": "10000", "total_results": 5, "chunks": [...] }
}

thinking_delta

Model reasoning content (emitted only for models that support extended thinking, e.g., Anthropic Claude with thinking enabled). thinking_delta events can be interleaved with tool-call events — they may appear during tool-call cycles, not only as a contiguous preamble.

{
"type": "thinking_delta",
"thinking": "Let me search for Q2 sales data..."
}

generation_start

Signals that the agent has finished tool calls and is generating the final answer.

{
"type": "generation_start"
}

content_block_start

Marks the beginning of a text content block.

{
"type": "content_block_start",
"index": 0,
"content_block": { "type": "text", "text": "" }
}

content_block_delta

An incremental text fragment of the answer.

{
"type": "content_block_delta",
"index": 0,
"delta": { "type": "text_delta", "text": "The Q2 sales figures show" }
}

inline_citation

Emitted when a [N] citation marker is detected in the text stream. Sent only on the first occurrence of each citation index.

{
"type": "inline_citation",
"citation_index": 1,
"source": {
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"document_id": "doc-q2-2025",
"source_url": null,
"title": "Q2 2025 Revenue Summary",
"page_numbers": [3],
"bounding_boxes": [],
"relevance_score": 0.94
}
}

content_block_stop

Marks the end of a text content block.

{
"type": "content_block_stop",
"index": 0
}

citation_block

Emitted after all content blocks. Contains the complete list of cited sources.

{
"type": "citation_block",
"citations": [
{
"index": 1,
"nexset_id": "10000",
"nexset_name": "Sales Reports",
"document_id": "doc-q2-2025",
"title": "Q2 2025 Revenue Summary",
"relevance_score": 0.94
}
]
}

message_delta

Emitted near the end of the stream. Contains the stop reason and final token usage.

{
"type": "message_delta",
"delta": { "stop_reason": "end_turn", "stop_sequence": null },
"usage": { "input_tokens": 1250, "output_tokens": 340 }
}
note

In deployments with cost accounting enabled, message_delta events also include a cost object matching the Cost schema from the non-streaming response.

stop_reason values:

  • "end_turn" — Successful completion
  • "error" — The stream terminated due to an error

message_stop

Final event in the stream.

{
"type": "message_stop"
}

error

Emitted when an error occurs during streaming.

{
"type": "error",
"error": { "type": "server_error", "message": "Agent stream timed out after 300s" }
}

When an error occurs mid-stream, the server attempts to close any open content blocks, emit the error event, and then send message_delta (with stop_reason: "error") and message_stop to cleanly terminate the stream.

Streaming Error Types

Mid-stream errors are emitted as SSE error events with a machine-parseable error.type value:

error_typeEquivalent HTTPMeaning
tool_limit_exceeded429Agent-side: tool-call / request budget hit
llm_usage_limit_exceeded429LLM provider rate-limited the request
upstream_llm_error502Upstream LLM/HTTP error other than rate-limit
agent_run_failed500Unclassified agent execution error
stream_timeoutSSE stream wall-clock timeout reached (300s)
queue_failureInternal SSE event queue failure
all_tools_failed502Every data-source tool call returned an error
Mid-stream error UX contract

For all_tools_failed and queue_failure, content deltas may have already been flushed to the client before the error event arrives — the agent can produce plausible text from zero evidence, or a queue fault can occur after useful content has streamed. Clients must treat any error event followed by message_delta with stop_reason=error as invalidating the preceding content block. Surface an explicit error to the end user rather than rendering the partial answer, even if the rendered text looks complete.

Filters

Filters control which data the agent can access during retrieval. There are two layers:

ACL Filters (Access Control)

Restrict which records a user is authorized to see. Applied as hard constraints — results that don't match are excluded regardless of relevance. Configured via nexsets[].filters.acl_filter.

Pre-Retrieval Filters

Narrow the search space before semantic retrieval. Used for scoping queries to specific document types, date ranges, or other metadata dimensions. Configured via nexsets[].filters.pre_filter.

Filter Resolution

Filters can be applied in two ways:

  1. Explicit per-nexset filters — Passed directly in nexsets[].filters using FilterCondition objects.
  2. Server-side resolution from user context — When user_context contains access_rules, access_scope, or filters, the server resolves these values against registered filter schemas for each nexset and generates the appropriate filter conditions automatically.

Both sources are merged. Explicit filters and server-resolved filters are combined with AND logic.

Filter Operators

OperatorValue TypeDescription
EQsingle valueEquals
NEQsingle valueNot equals
INarrayValue is in the provided list
NOT_INarrayValue is not in the provided list
GTsingle valueGreater than
GTEsingle valueGreater than or equal
LTsingle valueLess than
LTEsingle valueLess than or equal
CONTAINSsingle valueField contains the value (substring match)
NOT_CONTAINSsingle valueField does not contain the value
EXISTS(ignored)Field exists and is not null
NOT_EXISTS(ignored)Field does not exist or is null
BETWEENarray of two values [min, max]Value is within the inclusive range

Filter Value Semantics at Query Time

When using server-side resolution (no explicit operator in user_context), the value type determines the operator:

Value TypeInferred OperatorExample
StringEQ (equals)"tenant_id": "17001"
ArrayIN (contains)"property_id": ["42001", "99001"]

Execution Order

Nexla applies filters in deterministic order — the caller does not control this:

  1. access_rules — hard gate, fail fast if role or policy is violated
  2. access_scope — applied as data-level restriction across all relevant nexsets
  3. filters — intersected with access_scope result, applied per nexset based on registration
  4. Embedding / retrieval — vector and/or SQL based on nexset type
  5. LLM generation — answer synthesized from retrieved context

Multi-Turn Conversations

To maintain conversation context across multiple requests, set user_context.session_id to a stable identifier.

  • The server persists conversation history and includes it in subsequent requests within the same session
  • Sessions are scoped by the combination of your API key, user_id, and session_id
  • Sessions expire after 7 days of inactivity
  • If conversation history storage is unavailable on load, the request proceeds without history (HTTP 200) and the response carries a warnings[] entry with code SESSION_HISTORY_UNAVAILABLE. If a save fails after the answer is generated, the response carries SESSION_HISTORY_SAVE_FAILED and subsequent turns may miss this turn's context
  • To start a fresh conversation, use a new session_id
  • system_prompt is request-scoped: it is appended to the built-in V2 agent prompt for this request only and is not persisted in conversation history. Resend it on follow-up turns if the same instruction should keep applying

Cache Management

Two operator-facing endpoints exist to evict server-side cache entries when the upstream data they reference has changed. Both require the same Authorization credential as /v2/agentic-rag and operate on entries scoped to the caller's service-key namespace.

POST /v2/agentic-rag/cache/clear

Clears every nexset-scoped bucket for a single nexset in one shot.

Request body:

FieldTypeRequiredDescription
nexset_idstringYes (when clear_all=false)Nexset to clear
clear_allbooleanNoReserved. Currently rejected with 403 — global clears are not exposed on this endpoint

Response:

{
"status": "ok",
"scope": "nexset",
"nexset_id": "10000",
"deleted": {
"pinecone": 3,
"filter_schema": 1,
"normalization_map": 1,
"dataset_info": 1
}
}

The pinecone_partial: true flag may appear when a Redis SCAN-based delete was truncated.

POST /v2/agentic-rag/cache/invalidate

Targets specific buckets rather than every nexset-scoped bucket.

Request body:

FieldTypeRequiredDescription
bucketsarrayYesOne or more bucket names from the table below
nexset_idstringConditionalRequired when buckets contains pinecone, filter_schema, normalization_map, or dataset_info
credential_idstringConditionalRequired when buckets contains credentials
credential_modestringNollm or embedding. When omitted, both modes are cleared for the given credential_id

Cache buckets:

BucketScope KeyDescription
credentialscredential_id (+ optional credential_mode)Cached LLM/embedding credential resolutions
pineconenexset_idCached Pinecone query results for the nexset (pattern delete)
filter_schemanexset_idCached filter schema rows for the nexset
normalization_mapnexset_idCached field normalization map for the nexset
dataset_infonexset_idCached dataset metadata (name, schema, source/connector type) used by the resolver

Response:

{
"status": "ok",
"deleted": {
"filter_schema": 1,
"dataset_info": 1
}
}

Unsupported bucket names are rejected with HTTP 400.

Cache Policy

The cache_policy request field on POST /v2/agentic-rag controls how the server uses its execution caches for a single request:

ValueBehavior
defaultRead from cache when present; write fresh entries on misses. Standard production behavior
refreshSkip cache reads, force live fetches, then overwrite the cache with the fresh result. Use after upstream data has changed but you do not want to manually invalidate
bypassSkip cache reads and writes. Each cache lookup is a live fetch and nothing is persisted. Use for one-off debugging or for callers that should never touch shared cache state

The legacy skip_cache: true flag is still accepted and maps to bypass. It takes precedence over cache_policy if both are provided.

Limits and Timeouts

LimitValueDescription
Tool calls per request8Maximum number of search operations the agent can perform in a single request. Exceeding this raises HTTP 429 (non-streaming) or emits an SSE error event with error_type=tool_limit_exceeded (streaming)
Stream timeout300 secondsMaximum wall-clock time for an SSE stream. The stream is terminated with error_type=stream_timeout if no events are produced within this window

Error Reference

Error responses are returned as standard HTTP error responses with a JSON body:

{
"detail": "Error message describing the issue"
}
StatusConditionDetailRetryable
400No nexsets provided"At least one nexset is required"No
401Missing or invalid Authorization header; JWT missing org_id"org_id missing from JWT claims" (JWT case)No
403LLM credentials could not be resolved to a valid API key"LLM credentials required: credential_id must resolve to valid API key"No
422Model name cannot be determined from the credential or request; user_prompt empty or whitespace-onlyVariesNo
429Agent exceeded its tool-call / request budget (agent-side limit)"Agent exceeded tool-call / request limit"Yes — split the query or reduce scope
429LLM provider rate-limited the request (provider-side quota)"LLM provider rate-limited the request" (passes through upstream Retry-After when present)Yes — honor Retry-After
500Internal server error (e.g., credential resolution failure, unclassified agent run error)VariesMaybe
502Upstream LLM provider error during agent run; OR all data-source tools failed for this requestVariesMaybe
503Filter enforcement service unavailable, or any nexset's filter schema could not be loaded (fail-closed to prevent unauthorized access)"Server-side filter enforcement is temporarily unavailable..."Yes

Retry Guidance

There is no explicit retryable field in error responses. Use standard HTTP semantics:

  • 400, 401, 403, 422 — client-side issues. Fix the request before retrying
  • 429 — rate-limited. Both agent-side ("Agent exceeded tool-call / request limit") and provider-side ("LLM provider rate-limited the request") are retryable. Honor the Retry-After header if present. For the agent-side case, consider splitting the query
  • 500, 502 — may be transient. Retry with exponential backoff
  • 503 — explicitly transient. Retry after a short delay

For streaming responses, errors that occur after the stream has opened are delivered as typed error SSE events rather than HTTP status codes (HTTP 200 has already been sent). The stream terminates cleanly with message_delta (stop_reason=error) and message_stop. See Streaming Error Types for the typed error_type values.

Examples

Minimal Request (Non-Streaming)

curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "What are the latest sales figures for Q2?",
"system_prompt": "Focus on a concise executive summary.",
"nexsets": ["10000", "10001"],
"user_context": {
"user_id": "user-123"
},
"llm_config": {
"credential_id": "cred-456"
}
}'

Request with Filters and Streaming

curl -N -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "Find lease renewal terms for building A",
"system_prompt": "Return the result as short bullet points.",
"nexsets": [
{
"id": "10000",
"filters": {
"acl_filter": [
{ "key": "tenant_id", "operator": "EQ", "value": "tenant-1" }
],
"pre_filter": [
{ "key": "document_type", "operator": "EQ", "value": "lease" }
]
}
}
],
"user_context": {
"user_id": "user-123",
"session_id": "session-abc",
"access_rules": { "tenant_id": "tenant-1" }
},
"llm_config": {
"credential_id": "cred-456",
"model": "gpt-4o",
"provider": "openai"
},
"stream": true
}'

Multi-Turn Conversation

Send the first question:

curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "What were Q2 sales?",
"system_prompt": "Respond in executive-summary style.",
"nexsets": ["10000"],
"user_context": {
"user_id": "user-123",
"session_id": "conv-001"
},
"llm_config": { "credential_id": "cred-456" }
}'

Then ask a follow-up using the same session_id:

curl -X POST https://api-genai.nexla.io/v2/agentic-rag \
-H "Content-Type: application/json" \
-H "Authorization: Basic YOUR_API_KEY" \
-d '{
"user_prompt": "How does that compare to Q1?",
"nexsets": ["10000"],
"user_context": {
"user_id": "user-123",
"session_id": "conv-001"
},
"llm_config": { "credential_id": "cred-456" }
}'

The agent will have access to the conversation history and understand "that" refers to Q2 sales. If you want the same request-specific system_prompt to apply on the follow-up, include it again in the second request.

Example SSE Stream

event: message_start
data: {"type":"message_start","message":{"id":"msg_a1b2c3","type":"message","role":"assistant","content":[],"model":"gpt-4o","stop_reason":null,"usage":{"input_tokens":null,"output_tokens":null}}}

event: tool_call_start
data: {"type":"tool_call_start","tool_name":"search_sales_reports_10000","tool_call_id":"call_x1","display_name":"Sales Reports"}

event: tool_call_delta
data: {"type":"tool_call_delta","args_delta":"{\"query\":\"Q2 sales figures\"}"}

event: tool_call_result
data: {"type":"tool_call_result","tool_call_id":"call_x1","content":{"nexset_id":"10000","total_results":5,"chunks":[{"text":"Q2 revenue reached $4.2M...","citation_index":1}]}}

event: generation_start
data: {"type":"generation_start"}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"The Q2 sales figures show a 12% increase "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"over Q1, reaching $4.2M in total revenue [1]."}}

event: inline_citation
data: {"type":"inline_citation","citation_index":1,"source":{"index":1,"nexset_id":"10000","nexset_name":"Sales Reports","document_id":"doc-q2-2025","source_url":null,"title":"Q2 2025 Revenue Summary","page_numbers":[3],"bounding_boxes":[],"relevance_score":0.94}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: citation_block
data: {"type":"citation_block","citations":[{"index":1,"nexset_id":"10000","nexset_name":"Sales Reports","document_id":"doc-q2-2025","source_url":null,"title":"Q2 2025 Revenue Summary","page_numbers":[3],"bounding_boxes":[],"relevance_score":0.94}]}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":1250,"output_tokens":340}}

event: message_stop
data: {"type":"message_stop"}