How much does your AI cost?

Pick a scenario to see real costs
Prompt Caching Cache stable prompt prefixes (system instructions, few-shot examples). Cached input costs 90% less. Best with large, rarely changing prefixes.
0%
Cache repeated prompt prefixes — up to 90% off input
Batch API Send requests in bulk, get results within 24h. 50% discount on all tokens. Ideal for analytics, content generation, batch processing.
Off
Non-urgent jobs via Batch API — 50% off everything
Model Routing Route simple queries to a cheap model, complex ones to premium. A small classifier decides. Can save 50-80% with minimal quality loss.
0%
Route simple queries to a cheaper model automatically
Context Trimming Reduce input size: summarize chat history, trim irrelevant context, use RAG chunks instead of full documents.
0%
Trim context to reduce token count by ~30%
Current cost $0/mo
Optimized $0/mo
$0
Savings
Current $0
Optimized $0
Save $0
Use case
Tier
Provider
Provider Model Input $/M Output $/M Cached $/M Monthly Context Tier

Frequently Asked Questions

How is the cost per request calculated?

Cost = (input tokens × input price + output tokens × output price) / 1,000,000. Prices are per million tokens (MTok). For example, Claude Sonnet 4.6 at 1,000 input + 500 output tokens costs $0.003 + $0.0075 = $0.0105 per request.

What are tokens and how do they relate to words?

Tokens are the units LLMs process. On average, 1 English word ≈ 1.3 tokens, and 1 character ≈ 0.25 tokens. A 1,000-word document is roughly 1,300 tokens. Code tends to use more tokens per character than natural language.

Which model is cheapest for coding tasks?

DeepSeek V3.2 ($0.28/$0.42 per MTok) and Mistral Small 3.2 ($0.06/$0.18) offer the best price-to-quality ratio for code. For complex architecture, Claude Opus 4.6 or GPT-5.4 provide better results at higher cost. Use the Code Assistant preset to compare.

What is prompt caching and how much does it save?

Prompt caching stores stable prompt prefixes (system instructions, few-shot examples) so you don't re-send them every request. Cached input costs 90% less with Anthropic and Google, ~90% with OpenAI. If 60% of your input is cacheable, you save ~54% on input costs. Use the Optimizer tab to calculate exact savings.

What is the Batch API and when should I use it?

Batch API lets you send requests in bulk and receive results within 24 hours, at 50% discount on all tokens. Supported by Anthropic, OpenAI, Google and Amazon. Ideal for analytics, content generation, data processing — any workload that doesn't need real-time responses.

How accurate are these prices?

Prices are sourced directly from official provider pricing pages (Anthropic, OpenAI, Google, DeepSeek, Mistral, Meta, xAI, Amazon) and updated regularly. Last update: March 2026. All calculations run locally in your browser — no data is sent to any server.

What is model routing and how does it reduce costs?

Model routing sends simple queries to a cheap model (e.g. GPT-4.1 Nano at $0.10/MTok) and complex ones to a premium model (e.g. Claude Opus at $5/MTok). A small classifier decides per-request. This can save 50–80% with minimal quality loss on mixed workloads.

How do I choose between frontier, optimal, and budget models?

Frontier models (Claude Opus 4.6, GPT-5.4, Gemini 2.5 Pro, Grok 4) deliver the best quality for complex tasks. Optimal models (Claude Sonnet 4.6, GPT-5, Mistral Large 3) balance cost and quality. Budget models (Haiku, GPT-5 Mini, Flash, DeepSeek) are best for high-volume, simpler tasks where speed matters more than nuance.

Prices updated: . Sources: official provider pricing pages.