Models
AIVAX provides models from different providers to make development even faster, eliminating the need to set up an account for each provider to access their latest models.
See the list below of available models and their pricing. All prices consider the total input and output tokens, with or without cache.
All prices are in United States dollars.
amazon
| Model name | Pricing | Description |
|---|---|---|
@amazon/nova-pro
|
Input:
$ 0.80 /1m tokens
Output:
$ 3.20 /1m tokens
|
A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
Input: accepts images, videos
Function calls
Reasoning
|
@amazon/nova-lite
|
Input:
$ 0.06 /1m tokens
Output:
$ 0.24 /1m tokens
|
A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
Input: accepts images, videos
Function calls
Reasoning
|
@amazon/nova-micro
|
Input:
$ 0.04 /1m tokens
Output:
$ 0.14 /1m tokens
|
A text-only model that delivers the lowest latency responses at very low cost.
Function calls
|
anthropic
| Model name | Pricing | Description |
|---|---|---|
@anthropic/claude-4.1-opus
|
Input:
$ 15.00 /1m tokens
Input (cached):
$ 1.50 /1m tokens
Output:
$ 75.00 /1m tokens
|
Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks.
Input: accepts images
Function calls
Reasoning
|
@anthropic/claude-4.5-opus
|
Input:
$ 5.00 /1m tokens
Input (cached):
$ 0.50 /1m tokens
Output:
$ 25.00 /1m tokens
|
Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks.
Input: accepts images
Function calls
Reasoning
|
@anthropic/claude-4.5-sonnet
|
Input:
$ 3.00 /1m tokens
Input (cached):
$ 0.30 /1m tokens
Output:
$ 15.00 /1m tokens
|
Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
Input: accepts images
Function calls
Reasoning
|
@anthropic/claude-4-sonnet
|
Input:
$ 3.00 /1m tokens
Input (cached):
$ 0.30 /1m tokens
Output:
$ 15.00 /1m tokens
|
Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.
Input: accepts images
Function calls
Reasoning
|
@anthropic/claude-4.5-haiku
|
Input:
$ 1.00 /1m tokens
Input (cached):
$ 0.10 /1m tokens
Output:
$ 5.00 /1m tokens
|
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near‑frontier intelligence with much lower cost and latency than larger Claude models.
Input: accepts images
Function calls
|
@anthropic/claude-3.5-haiku
|
Input:
$ 0.80 /1m tokens
Input (cached):
$ 0.08 /1m tokens
Output:
$ 4.00 /1m tokens
|
Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
Input: accepts images
Function calls
|
@anthropic/claude-3-haiku
|
Input:
$ 0.25 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 1.25 /1m tokens
|
Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts.
Input: accepts images
Function calls
|
cohere
| Model name | Pricing | Description |
|---|---|---|
@cohere/command-a
|
Input:
$ 2.50 /1m tokens
Output:
$ 10.00 /1m tokens
|
Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
Input: accepts images
Function calls
|
deepseekai
| Model name | Pricing | Description |
|---|---|---|
@deepseekai/r1
|
Input:
$ 0.50 /1m tokens
Input (cached):
$ 0.40 /1m tokens
Output:
$ 2.15 /1m tokens
|
The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek‑R1‑0528.
Function calls
Reasoning
|
@deepseekai/v3.1-terminus
|
Input:
$ 0.27 /1m tokens
Input (cached):
$ 0.22 /1m tokens
Output:
$ 1.00 /1m tokens
|
DeepSeek‑V3.1 is post‑trained on the top of DeepSeek‑V3.1‑Base, which is built upon the original V3 base checkpoint through a two‑phase long context extension approach, following the methodology outlined in the original DeepSeek‑V3 report.
Function calls
Reasoning
|
@deepseekai/v3.2-speciale
|
Input:
$ 0.28 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.42 /1m tokens
|
DeepSeek‑V3.2‑Speciale is a high‑compute version of DeepSeek‑V3.2, designed for maximum reasoning and agentic performance.
Reasoning
|
@deepseekai/v3.2
|
Input:
$ 0.28 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.42 /1m tokens
|
DeepSeek‑V3.2 is a large language model optimized for high computational efficiency and strong tool‑use reasoning.
Function calls
Reasoning
|
google
| Model name | Pricing | Description |
|---|---|---|
@google/gemini-3-pro
|
Input:
$ 2.00 /1m tokens
Output:
$ 12.00 /1m tokens
|
Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%).
Input: accepts images, videos, audio
Function calls
Reasoning
|
@google/gemini-2.5-pro
|
Input:
$ 1.25 /1m tokens
Input (cached):
$ 0.31 /1m tokens
Output:
$ 10.00 /1m tokens
|
One of the most powerful models today.
Input: accepts images, videos, audio
Function calls
Reasoning
|
@google/gemini-3-flash
|
Input:
$ 0.50 /1m tokens
Input (cached):
$ 0.05 /1m tokens
Output:
$ 3.00 /1m tokens
|
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance.
Input: accepts images, videos, audio
Function calls
Reasoning
|
@google/gemini-2.5-flash
|
Input:
$ 0.30 /1m tokens
Input (cached):
$ 0.08 /1m tokens
Output:
$ 2.50 /1m tokens
|
Google's best model in terms of price‑performance, offering well‑rounded capabilities. 2.5 Flash is best for large scale processing, low‑latency, high volume tasks that require thinking, and agentic use cases.
Input: accepts images, videos, audio
Function calls
Reasoning
|
@google/gemini-2.5-flash-lite
|
Input:
$ 0.10 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.40 /1m tokens
|
A Gemini 2.5 Flash model optimized for cost efficiency and low latency.
Input: accepts images, videos, audio
Function calls
Reasoning
|
@google/gemini-2.0-flash
|
Input:
$ 0.10 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.40 /1m tokens
|
Gemini 2.0 Flash delivers next‑gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window.
Input: accepts images, videos, audio
Function calls
|
@google/gemini-2.0-flash-lite
|
Input:
$ 0.08 /1m tokens
Output:
$ 0.30 /1m tokens
|
General‑purpose model, with image recognition, smart and fast. Great for an economical chat.
Input: accepts images, videos, audio
Function calls
|
inception
| Model name | Pricing | Description |
|---|---|---|
@inception/mercury
|
Input:
$ 0.25 /1m tokens
Output:
$ 1.00 /1m tokens
|
Extremely fast model by generative diffusion.
Function calls
|
metaai
| Model name | Pricing | Description |
|---|---|---|
@metaai/llama-3.3-70b
|
Input:
$ 0.59 /1m tokens
Output:
$ 0.79 /1m tokens
|
Previous generation model with many parameters and surprisingly fast speed.
Function calls
|
@metaai/llama-4-maverick-17b-128e
|
Input:
$ 0.20 /1m tokens
Output:
$ 0.60 /1m tokens
|
Fast model, with 17 billion activated parameters and 128 experts.
Input: accepts images
Function calls
|
@metaai/llama-4-scout-17b-16e
|
Input:
$ 0
|
Português
English