Skip to main content

Available Models

The Available Models endpoint lists the LLM and embedding models available for use with the Agentic RAG endpoint.

Endpoint: GET /list_models

Authentication

All requests require an Authorization header. See the GenAI RAG API overview for full details.

Request

curl https://api-genai.nexla.io/list_models \
-H "Authorization: Basic YOUR_API_KEY"

Response

Content-Type: application/json

Status: 200 OK

Returns an object mapping each provider to its available models. The exact models available depend on your organization's configuration and credential setup.

LLM Models by Provider

OpenAI

ModelMax Tokens
gpt-5.51,050,000
gpt-5.41,050,000
gpt-5.2400,000
gpt-5.2-codex400,000
gpt-5.4-mini400,000
gpt-5-mini400,000
gpt-5.4-nano400,000

Anthropic

ModelMax Tokens
claude-opus-4-71,000,000
claude-sonnet-4-61,000,000
claude-sonnet-4-5-202505141,000,000
claude-haiku-4-5-20251001200,000

Google

ModelMax Tokens
gemini-3.1-pro-preview1,048,576
gemini-3-flash-preview1,048,576
gemini-3.1-flash-lite-preview1,048,576

Azure OpenAI

ModelMax Tokens
gpt-5.2400,000
gpt-5.2-codex400,000

Azure AI

ModelMax Tokens
Llama-4-Maverick-17B-128E-Instruct135,232

Nvidia

ModelMax Tokens
nvidia/llama-3.3-nemotron-super-49b-v1.5128,000
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning128,000

Together AI (Open Source)

ModelMax Tokens
Qwen/Qwen3.6-Plus1,048,576
Qwen/Qwen3.5-397B-A17B256,000
Qwen/Qwen3-VL-32B-Thinking262,144
openai/gpt-oss-120b128,000
openai/gpt-oss-20b128,000
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8135,232
meta-llama/Llama-4-Scout-17B-16E-Instruct135,232
deepseek-ai/DeepSeek-V4-Pro512,000

Embedding Models

When embedding_config is omitted from the Agentic RAG request, the embedding model is inferred per-nexset based on the vector store configuration.

OpenAI & Azure

ModelDimensions
text-embedding-3-small1,536
text-embedding-3-large3,072
text-embedding-ada-0021,536

Other Providers

ProviderSupported Models
NvidiaNvidia NIM embedding endpoints
VoyageVoyage AI embedding endpoints

To use a specific embedding model, provide embedding_config:

{
"embedding_config": {
"credential_id": "cred-789",
"model": "text-embedding-3-small"
}
}

Model Inference

When llm_config.model and llm_config.provider are omitted from the Agentic RAG request, the model is inferred from the credential_id. Each credential is associated with a specific provider and model at creation time.

To override the inferred model, explicitly set llm_config.model and/or llm_config.provider:

{
"llm_config": {
"credential_id": "cred-456",
"model": "gpt-5.5",
"provider": "openai"
}
}

Reasoning Configuration

Some models support reasoning or thinking parameters that control how deeply the model reasons before responding:

ProviderModelsParameterValues
OpenAIgpt-5 seriesreasoning_effortnone, minimal (gpt-5 only), low, medium, high, xhigh
Anthropicclaude-sonnet-4-6, claude-opus-4-7Extended thinkingBudget tokens (default: 4096)
Googlegemini-3.xthinking_levellow, medium, high

For OpenAI gpt-5* models, set reasoning_effort and reasoning_summary per request via the llm_config object on the Agentic RAG endpoint. See LLMConfig for the field definitions. reasoning_summary controls the volume of thinking_delta SSE events streamed during generation.

Choosing a Model

Consider these trade-offs when selecting a model:

FactorGuidance
AccuracyLarger models (e.g., gpt-5.5, claude-opus-4-7, gemini-3.1-pro-preview) produce more accurate answers and better handle complex multi-nexset queries
CostSmaller models (e.g., gpt-5.4-nano, claude-haiku-4-5-20251001, gemini-3.1-flash-lite-preview) are significantly cheaper per token — suitable for high-volume or lower-stakes queries
LatencySmaller models respond faster, especially for streaming use cases
Extended thinkingAnthropic Claude and Gemini 3.x models emit thinking_delta SSE events during streaming, providing visibility into the agent's reasoning
Tool callsAll supported models can perform tool calls, but larger models tend to make better routing decisions when querying multiple nexsets
Context windowOpenAI gpt-5.5 and gpt-5.4, Anthropic Claude 4.x (except Haiku), and Google Gemini 3.x models offer ~1M token context windows — useful for nexsets with very large schemas or many columns