Available Models

The Available Models endpoint lists the LLM and embedding models available for use with the Agentic RAG endpoint.

Endpoint: GET /list_models

Authentication

All requests require an Authorization header. See the GenAI RAG API overview for full details.

Request

curl https://api-genai.nexla.io/list_models \
  -H "Authorization: Basic YOUR_API_KEY"

Response

Content-Type: application/json

Status: 200 OK

Returns an object mapping each provider to its available models. The exact models available depend on your organization's configuration and credential setup.

LLM Models by Provider

OpenAI

Model	Max Tokens
gpt-5.5	1,050,000
gpt-5.4	1,050,000
gpt-5.2	400,000
gpt-5.2-codex	400,000
gpt-5.4-mini	400,000
gpt-5-mini	400,000
gpt-5.4-nano	400,000

Anthropic

Model	Max Tokens
claude-opus-4-7	1,000,000
claude-sonnet-4-6	1,000,000
claude-sonnet-4-5-20250514	1,000,000
claude-haiku-4-5-20251001	200,000

Google

Model	Max Tokens
gemini-3.1-pro-preview	1,048,576
gemini-3-flash-preview	1,048,576
gemini-3.1-flash-lite-preview	1,048,576

Azure OpenAI

Model	Max Tokens
gpt-5.2	400,000
gpt-5.2-codex	400,000

Azure AI

Model	Max Tokens
Llama-4-Maverick-17B-128E-Instruct	135,232

Nvidia

Model	Max Tokens
nvidia/llama-3.3-nemotron-super-49b-v1.5	128,000
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning	128,000

Together AI (Open Source)

Model	Max Tokens
Qwen/Qwen3.6-Plus	1,048,576
Qwen/Qwen3.5-397B-A17B	256,000
Qwen/Qwen3-VL-32B-Thinking	262,144
openai/gpt-oss-120b	128,000
openai/gpt-oss-20b	128,000
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	135,232
meta-llama/Llama-4-Scout-17B-16E-Instruct	135,232
deepseek-ai/DeepSeek-V4-Pro	512,000

Embedding Models

When embedding_config is omitted from the Agentic RAG request, the embedding model is inferred per-nexset based on the vector store configuration.

OpenAI & Azure

Model	Dimensions
text-embedding-3-small	1,536
text-embedding-3-large	3,072
text-embedding-ada-002	1,536

Other Providers

Provider	Supported Models
Nvidia	Nvidia NIM embedding endpoints
Voyage	Voyage AI embedding endpoints

To use a specific embedding model, provide embedding_config:

{
  "embedding_config": {
    "credential_id": "cred-789",
    "model": "text-embedding-3-small"
  }
}

Model Inference

When llm_config.model and llm_config.provider are omitted from the Agentic RAG request, the model is inferred from the credential_id. Each credential is associated with a specific provider and model at creation time.

To override the inferred model, explicitly set llm_config.model and/or llm_config.provider:

{
  "llm_config": {
    "credential_id": "cred-456",
    "model": "gpt-5.5",
    "provider": "openai"
  }
}

Reasoning Configuration

Some models support reasoning or thinking parameters that control how deeply the model reasons before responding:

Provider	Models	Parameter	Values
OpenAI	gpt-5 series	`reasoning_effort`	`none`, `minimal` (gpt-5 only), `low`, `medium`, `high`, `xhigh`
Anthropic	claude-sonnet-4-6, claude-opus-4-7	Extended thinking	Budget tokens (default: 4096)
Google	gemini-3.x	`thinking_level`	`low`, `medium`, `high`

For OpenAI gpt-5* models, set reasoning_effort and reasoning_summary per request via the llm_config object on the Agentic RAG endpoint. See LLMConfig for the field definitions. reasoning_summary controls the volume of thinking_delta SSE events streamed during generation.

Choosing a Model

Consider these trade-offs when selecting a model:

Factor	Guidance
Accuracy	Larger models (e.g., `gpt-5.5`, `claude-opus-4-7`, `gemini-3.1-pro-preview`) produce more accurate answers and better handle complex multi-nexset queries
Cost	Smaller models (e.g., `gpt-5.4-nano`, `claude-haiku-4-5-20251001`, `gemini-3.1-flash-lite-preview`) are significantly cheaper per token — suitable for high-volume or lower-stakes queries
Latency	Smaller models respond faster, especially for streaming use cases
Extended thinking	Anthropic Claude and Gemini 3.x models emit `thinking_delta` SSE events during streaming, providing visibility into the agent's reasoning
Tool calls	All supported models can perform tool calls, but larger models tend to make better routing decisions when querying multiple nexsets
Context window	OpenAI gpt-5.5 and gpt-5.4, Anthropic Claude 4.x (except Haiku), and Google Gemini 3.x models offer ~1M token context windows — useful for nexsets with very large schemas or many columns

Authentication​

Request​

Response​

LLM Models by Provider​

OpenAI​

Anthropic​

Google​

Azure OpenAI​

Azure AI​

Nvidia​

Together AI (Open Source)​

Embedding Models​

OpenAI & Azure​

Other Providers​

Model Inference​

Reasoning Configuration​

Choosing a Model​