Models

AIVAX provides models from different providers to make development even faster, eliminating the need to set up an account for each provider to access their latest models.

See the list below of available models and their pricing. All prices consider the total input and output tokens, with or without cache.

All prices are in United States dollars.

amazon

Model name	Pricing	Description
`@amazon/nova-pro`	Input: $ 0.80 /1m tokens Output: $ 3.20 /1m tokens	A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Input: accepts images, videos Function calls Reasoning
`@amazon/nova-lite`	Input: $ 0.06 /1m tokens Output: $ 0.24 /1m tokens	A very low cost multimodal model that is lightning fast for processing image, video, and text inputs. Input: accepts images, videos Function calls Reasoning
`@amazon/nova-micro`	Input: $ 0.04 /1m tokens Output: $ 0.14 /1m tokens	A text-only model that delivers the lowest latency responses at very low cost. Function calls

anthropic

Model name	Pricing	Description
`@anthropic/claude-4.1-opus`	Input: $ 15.00 /1m tokens Input (cached): $ 1.50 /1m tokens Output: $ 75.00 /1m tokens	Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. Input: accepts images Function calls Reasoning
`@anthropic/claude-4.5-opus`	Input: $ 5.00 /1m tokens Input (cached): $ 0.50 /1m tokens Output: $ 25.00 /1m tokens	Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks. Input: accepts images Function calls Reasoning
`@anthropic/claude-4.5-sonnet`	Input: $ 3.00 /1m tokens Input (cached): $ 0.30 /1m tokens Output: $ 15.00 /1m tokens	Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4. Input: accepts images Function calls Reasoning
`@anthropic/claude-4-sonnet`	Input: $ 3.00 /1m tokens Input (cached): $ 0.30 /1m tokens Output: $ 15.00 /1m tokens	Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more. Input: accepts images Function calls Reasoning
`@anthropic/claude-4.5-haiku`	Input: $ 1.00 /1m tokens Input (cached): $ 0.10 /1m tokens Output: $ 5.00 /1m tokens	Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near‑frontier intelligence with much lower cost and latency than larger Claude models. Input: accepts images Function calls
`@anthropic/claude-3.5-haiku`	Input: $ 0.80 /1m tokens Input (cached): $ 0.08 /1m tokens Output: $ 4.00 /1m tokens	Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Input: accepts images Function calls
`@anthropic/claude-3-haiku`	Input: $ 0.25 /1m tokens Input (cached): $ 0.03 /1m tokens Output: $ 1.25 /1m tokens	Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts. Input: accepts images Function calls

cohere

Model name	Pricing	Description
`@cohere/command-a`	Input: $ 2.50 /1m tokens Output: $ 10.00 /1m tokens	Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024. Input: accepts images Function calls

deepseekai

Model name	Pricing	Description
`@deepseekai/r1`	Input: $ 0.50 /1m tokens Input (cached): $ 0.40 /1m tokens Output: $ 2.15 /1m tokens	The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek‑R1‑0528. Function calls Reasoning
`@deepseekai/v3.1-terminus`	Input: $ 0.27 /1m tokens Input (cached): $ 0.22 /1m tokens Output: $ 1.00 /1m tokens	DeepSeek‑V3.1 is post‑trained on the top of DeepSeek‑V3.1‑Base, which is built upon the original V3 base checkpoint through a two‑phase long context extension approach, following the methodology outlined in the original DeepSeek‑V3 report. Function calls Reasoning
`@deepseekai/v3.2-speciale`	Input: $ 0.28 /1m tokens Input (cached): $ 0.03 /1m tokens Output: $ 0.42 /1m tokens	DeepSeek‑V3.2‑Speciale is a high‑compute version of DeepSeek‑V3.2, designed for maximum reasoning and agentic performance. Reasoning
`@deepseekai/v3.2`	Input: $ 0.28 /1m tokens Input (cached): $ 0.03 /1m tokens Output: $ 0.42 /1m tokens	DeepSeek‑V3.2 is a large language model optimized for high computational efficiency and strong tool‑use reasoning. Function calls Reasoning

google

Model name	Pricing	Description
`@google/gemini-3-pro`	Input: $ 2.00 /1m tokens Output: $ 12.00 /1m tokens	Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%). Input: accepts images, videos, audio Function calls Reasoning
`@google/gemini-2.5-pro`	Input: $ 1.25 /1m tokens Input (cached): $ 0.31 /1m tokens Output: $ 10.00 /1m tokens	One of the most powerful models today. Input: accepts images, videos, audio Function calls Reasoning
`@google/gemini-3-flash`	Input: $ 0.50 /1m tokens Input (cached): $ 0.05 /1m tokens Output: $ 3.00 /1m tokens	Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. Input: accepts images, videos, audio Function calls Reasoning
`@google/gemini-2.5-flash`	Input: $ 0.30 /1m tokens Input (cached): $ 0.08 /1m tokens Output: $ 2.50 /1m tokens	Google's best model in terms of price‑performance, offering well‑rounded capabilities. 2.5 Flash is best for large scale processing, low‑latency, high volume tasks that require thinking, and agentic use cases. Input: accepts images, videos, audio Function calls Reasoning
`@google/gemini-2.5-flash-lite`	Input: $ 0.10 /1m tokens Input (cached): $ 0.03 /1m tokens Output: $ 0.40 /1m tokens	A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input: accepts images, videos, audio Function calls Reasoning
`@google/gemini-2.0-flash`	Input: $ 0.10 /1m tokens Input (cached): $ 0.03 /1m tokens Output: $ 0.40 /1m tokens	Gemini 2.0 Flash delivers next‑gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Input: accepts images, videos, audio Function calls
`@google/gemini-2.0-flash-lite`	Input: $ 0.08 /1m tokens Output: $ 0.30 /1m tokens	General‑purpose model, with image recognition, smart and fast. Great for an economical chat. Input: accepts images, videos, audio Function calls

inception

Model name	Pricing	Description
`@inception/mercury`	Input: $ 0.25 /1m tokens Output: $ 1.00 /1m tokens	Extremely fast model by generative diffusion. Function calls

metaai

Model name	Pricing	Description
`@metaai/llama-3.3-70b`	Input: $ 0.59 /1m tokens Output: $ 0.79 /1m tokens	Previous generation model with many parameters and surprisingly fast speed. Function calls
`@metaai/llama-4-maverick-17b-128e`	Input: $ 0.20 /1m tokens Output: $ 0.60 /1m tokens	Fast model, with 17 billion activated parameters and 128 experts. Input: accepts images Function calls
`@metaai/llama-4-scout-17b-16e`	Input: $ 0

Table of Contents