Plans and Limits

AIVAX has three account plans: Free, Pro, and Max. The current plan is stored on the account and controls model access, commissions, rate limits, RAG quotas, tool limits, storage quota, conversation retention, and subscription-model reserve windows.

For commercial subscription prices and plan packaging, use the AIVAX pricing page. This page documents the technical limits of the API.

How limits are enforced

Limits are enforced at different layers:

Authentication rejects missing, expired, or unknown API keys.
Public API keys are restricted to public routes and have key-level and per-IP request and token limits.
Balance middleware rejects billable requests when the account balance is below the required minimum.
Storage middleware rejects requests when account storage exceeds the plan quota.
Inference checks model access, request rate, input-token rate, BYOK rate, and Free-plan context size.
RAG checks collection count, search rate, insertion rate, and JSONL import size.
Built-in tools check daily service limits.
Batch processing checks how many workflow items can be processed per day.

Reference:

Plan summary

Feature	Free	Pro	Max
Model access	Low-price/basic models	Advanced models	All models
Inference commission multiplier	1.25x	1.05x	1.00x
BYOK requests	30/min	200/min	Unlimited
Maximum context	65,536 input tokens enforced at runtime	Model or gateway limit	Model or gateway limit
Subscription model reserve	None	250 units/6h and 3,000 units/week	1,000 units/6h and 15,000 units/week
Storage quota	30 MB	2 GB	20 GB
Conversation retention	2 hours	2 days	30 days
Support level	Email	Priority	Dedicated

Integrated inference

Integrated model requests are rate-limited by request count and input tokens.

Plan	Request limit	Input-token limit
Free	20/min and 500/day	1,000,000/min
Pro	200/min	20,000,000/min
Max	Unlimited	Unlimited

Model rate-limit groups adjust request-count thresholds:

Rate-limit group	Threshold multiplier
Common	1.0x
Discounted	0.5x
Low	0.3x
Free	0.1x

For example, a Pro account normally has 200 integrated-model requests per minute. With a Discounted model group, the adjusted threshold is 100 requests per minute.

BYOK inference

BYOK means the gateway uses a provider key configured on the gateway instead of an integrated AIVAX model. BYOK still passes through AIVAX infrastructure and is rate-limited by plan:

Plan	BYOK request limit
Free	30/min
Pro	200/min
Max	Unlimited

Public API keys

Public keys have additional limits independent of the account plan.

Scope	Request limits
Per remote address	3/5s, 20/min, 300/hour, 1,000/day
Global per key	10/5s, 60/min, 1,500/hour, 10,000/day

Scope	Token limits
Per remote address	100,000/5min, 500,000/30min, 2,000,000/6h, 5,000,000/day
Global per key	500,000/5min, 2,000,000/30min, 10,000,000/6h, 25,000,000/day

Public keys can be used for RAG semantic search, RAG answer generation, standalone reranking, speech generation, media descriptions, image generation, and chat completions. For chat completions, public keys also require a full AI Gateway UUID, restrict request parameters, and omit server-side tool surfaces. See Authentication.

RAG and collection limits

Feature	Free	Pro	Max
Collections	5	Unlimited	Unlimited
Semantic searches	20/min	500/min	3,000/min
Reranking searches	30/min	1,000/min	Unlimited
Document insertions	500/day	10,000/day	Unlimited
JSONL documents per import request	1,000	10,000	1,000,000
Compound file processing	Not available	3 files/day	10 files/day

The JSONL import endpoint rejects a request when it reaches the plan's per-request document limit.

The reranking limit is shared by the standalone /api/v1/rank endpoint and every Semantic Search request that uses rrf, lexical, or smart. Searches with reranker: "none" do not consume this quota.

Built-in tool limits

Tool category	Free	Pro	Max
Web search	15/day	1,000/day	10,000/day
X/Twitter search	Not available	1,000/day	10,000/day
Advanced web search	Not available	100/day	1,000/day
Document and web page generation	5/day	1,000/day	50,000/day
Image generation and editing	5/day	500/day	5,000/day
General service actions	30/day	5,000/day	100,000/day
Bash commands	30/hour	1,500/hour	10,000/hour

General service actions share the service-action quota shown above.

Speech generation

Text-to-speech requests use a dedicated plan limit:

Plan	Text-to-speech requests
Free	3/min and 15/hour
Pro	30/min
Max	300/min

Documentation MCP

The authenticated Documentation MCP applies per-account limits to its tools:

Tool	Limit
Search documentation	10/min and 400/4h
List models	30/min and 1,000/day

Batch processing

Batch imports have request-size guards, and workflow processing has a daily plan limit.

Feature	Free	Pro	Max
Workflow items processed	500/day	100,000/day	Unlimited
Files per import request	1,000	1,000	1,000
Total import size	100 MB/request	100 MB/request	100 MB/request
Single imported file size	10 MB	10 MB	10 MB

Batch is asynchronous. If processing is paused or fails because of quota, retry after the quota window resets or upgrade the account.

Table of Contents