Table of Contents

Models

AIVAX provides models from different providers to make development even faster, eliminating the need to set up an account for each provider to access their latest models.

See the list below of available models and their pricing. All prices consider the total input and output tokens, with or without cache.

All prices are in United States dollars.

amazon

Model name Pricing Description
@amazon/nova-pro
Input:
$ 0.80 /1m tokens
Output:
$ 3.20 /1m tokens
A highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.
Input: accepts images, videos
Function calls
Reasoning
@amazon/nova-lite
Input:
$ 0.06 /1m tokens
Output:
$ 0.24 /1m tokens
A very low cost multimodal model that is lightning fast for processing image, video, and text inputs.
Input: accepts images, videos
Function calls
Reasoning
@amazon/nova-micro
Input:
$ 0.04 /1m tokens
Output:
$ 0.14 /1m tokens
A text-only model that delivers the lowest latency responses at very low cost.
Function calls

anthropic

Model name Pricing Description
@anthropic/claude-4.1-opus
Input:
$ 15.00 /1m tokens
Input (cached):
$ 1.50 /1m tokens
Output:
$ 75.00 /1m tokens
Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks.
Input: accepts images
Function calls
Reasoning
@anthropic/claude-4.5-opus
Input:
$ 5.00 /1m tokens
Input (cached):
$ 0.50 /1m tokens
Output:
$ 25.00 /1m tokens
Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks.
Input: accepts images
Function calls
Reasoning
@anthropic/claude-4.5-sonnet
Input:
$ 3.00 /1m tokens
Input (cached):
$ 0.30 /1m tokens
Output:
$ 15.00 /1m tokens
Claude Sonnet 4.5 is the newest model in the Sonnet series, offering improvements and updates over Sonnet 4.
Input: accepts images
Function calls
Reasoning
@anthropic/claude-4-sonnet
Input:
$ 3.00 /1m tokens
Input (cached):
$ 0.30 /1m tokens
Output:
$ 15.00 /1m tokens
Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.
Input: accepts images
Function calls
Reasoning
@anthropic/claude-4.5-haiku
Input:
$ 1.00 /1m tokens
Input (cached):
$ 0.10 /1m tokens
Output:
$ 5.00 /1m tokens
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near‑frontier intelligence with much lower cost and latency than larger Claude models.
Input: accepts images
Function calls
@anthropic/claude-3.5-haiku
Input:
$ 0.80 /1m tokens
Input (cached):
$ 0.08 /1m tokens
Output:
$ 4.00 /1m tokens
Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks.
Input: accepts images
Function calls
@anthropic/claude-3-haiku
Input:
$ 0.25 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 1.25 /1m tokens
Claude 3 Haiku is Anthropic's fastest model yet, designed for enterprise workloads which often involve longer prompts.
Input: accepts images
Function calls

cohere

Model name Pricing Description
@cohere/command-a
Input:
$ 2.50 /1m tokens
Output:
$ 10.00 /1m tokens
Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.
Input: accepts images
Function calls

deepseekai

Model name Pricing Description
@deepseekai/r1
Input:
$ 0.50 /1m tokens
Input (cached):
$ 0.40 /1m tokens
Output:
$ 2.15 /1m tokens
The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek‑R1‑0528.
Function calls
Reasoning
@deepseekai/v3.1-terminus
Input:
$ 0.27 /1m tokens
Input (cached):
$ 0.22 /1m tokens
Output:
$ 1.00 /1m tokens
DeepSeek‑V3.1 is post‑trained on the top of DeepSeek‑V3.1‑Base, which is built upon the original V3 base checkpoint through a two‑phase long context extension approach, following the methodology outlined in the original DeepSeek‑V3 report.
Function calls
Reasoning
@deepseekai/v3.2-speciale
Input:
$ 0.28 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.42 /1m tokens
DeepSeek‑V3.2‑Speciale is a high‑compute version of DeepSeek‑V3.2, designed for maximum reasoning and agentic performance.
Reasoning
@deepseekai/v3.2
Input:
$ 0.28 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.42 /1m tokens
DeepSeek‑V3.2 is a large language model optimized for high computational efficiency and strong tool‑use reasoning.
Function calls
Reasoning

google

Model name Pricing Description
@google/gemini-3-pro
Input:
$ 2.00 /1m tokens
Output:
$ 12.00 /1m tokens
Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%).
Input: accepts images, videos, audio
Function calls
Reasoning
@google/gemini-2.5-pro
Input:
$ 1.25 /1m tokens
Input (cached):
$ 0.31 /1m tokens
Output:
$ 10.00 /1m tokens
One of the most powerful models today.
Input: accepts images, videos, audio
Function calls
Reasoning
@google/gemini-3-flash
Input:
$ 0.50 /1m tokens
Input (cached):
$ 0.05 /1m tokens
Output:
$ 3.00 /1m tokens
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance.
Input: accepts images, videos, audio
Function calls
Reasoning
@google/gemini-2.5-flash
Input:
$ 0.30 /1m tokens
Input (cached):
$ 0.08 /1m tokens
Output:
$ 2.50 /1m tokens
Google's best model in terms of price‑performance, offering well‑rounded capabilities. 2.5 Flash is best for large scale processing, low‑latency, high volume tasks that require thinking, and agentic use cases.
Input: accepts images, videos, audio
Function calls
Reasoning
@google/gemini-2.5-flash-lite
Input:
$ 0.10 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.40 /1m tokens
A Gemini 2.5 Flash model optimized for cost efficiency and low latency.
Input: accepts images, videos, audio
Function calls
Reasoning
@google/gemini-2.0-flash
Input:
$ 0.10 /1m tokens
Input (cached):
$ 0.03 /1m tokens
Output:
$ 0.40 /1m tokens
Gemini 2.0 Flash delivers next‑gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window.
Input: accepts images, videos, audio
Function calls
@google/gemini-2.0-flash-lite
Input:
$ 0.08 /1m tokens
Output:
$ 0.30 /1m tokens
General‑purpose model, with image recognition, smart and fast. Great for an economical chat.
Input: accepts images, videos, audio
Function calls

inception

Model name Pricing Description
@inception/mercury
Input:
$ 0.25 /1m tokens
Output:
$ 1.00 /1m tokens
Extremely fast model by generative diffusion.
Function calls

metaai

Model name Pricing Description
@metaai/llama-3.3-70b
Input:
$ 0.59 /1m tokens
Output:
$ 0.79 /1m tokens
Previous generation model with many parameters and surprisingly fast speed.
Function calls
@metaai/llama-4-maverick-17b-128e
Input:
$ 0.20 /1m tokens
Output:
$ 0.60 /1m tokens
Fast model, with 17 billion activated parameters and 128 experts.
Input: accepts images
Function calls
@metaai/llama-4-scout-17b-16e
Input:
$ 0