Which AI API is cheapest in 2026?

Gemini 3.1 Flash-Lite at $0.25/$1.50 is the cheapest mainstream model from a Tier-1 provider. GPT-5.4 Nano at $0.20/$1.25 is comparable. For flagship-tier models, Gemini 3.1 Pro at $2/$12 is significantly cheaper than Claude Opus 4.7 at $5/$25, while GPT-5.4 sits in the middle at $2.50/$15.

What is a token in AI pricing?

A token is roughly 4 characters of English text, or about 0.75 words. A 1,000-word document is approximately 1,300 tokens. Prices are billed per million tokens, with input tokens (what you send) and output tokens (what the model returns) billed at different rates. Output tokens typically cost 4-6x more than input tokens.

How do I reduce my AI API costs?

Three biggest levers: (1) Prompt caching saves up to 90% on repeated context. (2) Batch processing saves a flat 50% in exchange for 24-hour delivery. (3) Tiered model routing - sending easy queries to cheaper models and reserving flagships for hard problems - typically cuts costs by 60-80%. Stacking all three can reduce production costs by 90-95%.

Is the Gemini API really free?

Google AI Studio offers a free tier for Gemini Flash and Flash-Lite models with rate limits of 5-15 requests per minute and up to 1,500 daily requests. As of April 1, 2026, Gemini Pro models are paid-only. Both Anthropic and OpenAI charge from the first request with no permanent free tier for direct API access.

Why are output tokens more expensive than input tokens?

Generating output tokens requires the model to perform inference one token at a time, which is computationally more expensive than reading input. The 5x ratio (typical across providers) reflects this compute cost difference. Optimizing for shorter outputs is one of the easiest ways to reduce your bill.

Free Tool · Updated April 2026

Free AI Cost Calculator: Claude vs ChatGPT vs Gemini token costs.

Compare what your workload actually costs across every major AI model. 13 models from Anthropic, OpenAI, and Google. No signup. Pricing verified from official docs.

13 models compared 3 providers 4 volume presets Updated April 2026

AI API costs are now one of the fastest-growing line items in engineering budgets, and the pricing across the three major providers is no longer remotely consistent. Anthropic sits at the premium end. Google undercuts almost everyone on raw token price. OpenAI sits in the middle but offers the cheapest budget-tier model. Picking the wrong model can multiply your monthly bill by 10-25x for the same workload.

The calculator below uses current pricing from each provider's official documentation, verified April 2026. Plug in your actual workload (input tokens per request, output tokens per request, daily request volume) and see what each model would cost monthly. The results are sorted from cheapest to most expensive, with bar widths showing relative cost.

Free AI Cost Calculator

Compare what your workload costs across Claude, ChatGPT, and Gemini. All prices verified April 2026 from official documentation.

Input tokens per request

What you send to the model (prompt + context)

Output tokens per request

What the model writes back

Requests per day

How many times per day you call the API

Quick presets

Auto-fills typical volume for each stage

Monthly cost (30 days) —

Standard pricing only. Does not include caching, batch discounts, or long-context surcharges. Verify current pricing at Anthropic, OpenAI, and Google.

Current pricing for every major model.

Three vendors, thirteen production-grade models, three tiers each. The pricing below is verified from official documentation as of April 21, 2026. All prices are USD per 1 million tokens, separated into input (what you send) and output (what the model writes back). Output tokens cost 4-6x more than input tokens across every provider, so the easiest way to lower a bill is to constrain output length.

Anthropic Claude pricing.

Claude Opus 4.7, released April 16, 2026, is Anthropic's current flagship and matches Opus 4.6 at $5 per million input tokens and $25 per million output tokens. Both Opus models support the full 1M-token context window at standard pricing with no surcharge above 200K, which is unique among the three providers.

Claude Sonnet 4.6 at $3/$15 is the recommended default for most production work. The Sonnet tier has held this price across four generations, making it one of the most predictable price points in the industry. Sonnet 4.6 also includes the 1M context window at standard rates.

Claude Haiku 4.5 at $1/$5 is the fastest and cheapest current Claude model. It has a 200K context window (smaller than Opus and Sonnet) but delivers near-Sonnet performance on many tasks at one-third the cost. Anthropic's official pricing page has the canonical rate card.

OpenAI ChatGPT API pricing.

GPT-5.4, released March 5, 2026, is OpenAI's flagship at $2.50 per million input tokens and $15 per million output tokens for prompts under 270K. Above 270K, input pricing doubles to $5/MTok. The 1.05M context window is available but the long-context surcharge applies to the entire request once you cross the threshold.

GPT-5.4 Mini at $0.75/$4.50 is one of the most aggressively priced mid-tier models on the market. It's roughly 4x cheaper than Claude Sonnet 4.6 on input. For high-volume routine work, this is hard to beat.

GPT-5.4 Nano at $0.20/$1.25 is OpenAI's budget option, competing directly with Gemini Flash-Lite for the cheapest Tier-1 model crown. GPT-5.4 Pro at $30/$180 is the premium reasoning tier, priced 12x higher than standard GPT-5.4 because it uses significantly more compute for chain-of-thought reasoning. Most teams should never use Pro for routine work. The official rates live at OpenAI's pricing page.

Google Gemini API pricing.

Gemini 3.1 Pro costs $2 per million input tokens and $12 per million output tokens for prompts under 200K. Above 200K, prices double to $4/$18 for the entire request. Gemini supports a 2M-token context window — the largest in production — but the long-context surcharge bites hard.

Gemini 3 Flash at $0.50/$3 is the new default Flash model. It retains a free tier with reduced quotas through Google AI Studio, which makes it the only flagship-adjacent model with any free production access in 2026. Gemini 3.1 Flash-Lite at $0.25/$1.50 is the cheapest mainstream Tier-1 model, period.

One important April 2026 change: Gemini Pro models are now paid-only. Flash and Flash-Lite retain free tiers but with reduced daily quotas. Google's official pricing documents the full structure including audio and image rates.

Lever one: prompt caching.

Every major provider now offers prompt caching, where repeated context (system prompts, document context, few-shot examples) gets stored and reused at roughly 10% of the standard input rate. The first request pays a small write premium; every subsequent request within the cache TTL pays 90% less for the cached portion.

For a chatbot with a 2,000-token system prompt handling 1,000 requests per hour, caching saves roughly $4.50 per hour on Sonnet 4.6 alone. For a customer support deployment running 24/7, that's $3,200 per month in savings on a single model.

The catch: cache TTL is short (5 minutes on most providers, with extensions for paid tiers). Caching only works if your usage pattern actually repeats context within that window. Bursty traffic benefits enormously; sparse traffic gets nothing.

Lever two: batch processing.

OpenAI's Batch API, Gemini's Batch API, and Anthropic's Message Batches API all offer the same deal: 50% off all token costs in exchange for asynchronous delivery within 24 hours. For any workload that doesn't need real-time response (data classification, document processing, evaluation runs, content generation pipelines), this is the single biggest cost lever available.

Batch pricing stacks with prompt caching. A workload that uses both pays roughly 5-10% of the standard rate. For a content agency running SEO content generation on Haiku 4.5 with batch processing, monthly costs drop from $70 to $35 for 30M tokens of throughput.

The constraint is latency tolerance. If your application needs a response in under 30 seconds, batch isn't an option. If you can wait an hour or a day, it always is.

Lever three: tiered model routing.

The biggest cost lever is also the most underused: most production queries do not need flagship-tier intelligence. A 70/20/10 split (70% routed to budget models like Haiku 4.5 or GPT-5.4 Mini, 20% to balanced models like Sonnet 4.6 or GPT-5.4, 10% to flagships like Opus 4.7 only when reasoning genuinely requires it) typically cuts total API costs by 60-80% versus running everything through a flagship.

Building this routing logic requires either a custom classifier (a cheap LLM that decides which model gets the query) or a rules-based router (regex or keyword matching for query type). Both work. The custom classifier approach is more accurate but adds latency. The rules-based approach is simpler but misses edge cases.

Stack all three levers and the math gets dramatic. A workload that costs $60/month at standard Sonnet 4.6 rates can drop to under $10/month with caching, batch processing, and intelligent routing combined. For startups, this is the difference between AI being a line item and AI being a budget problem.

When the math says agents

Cheaper tokens are good. Better workflows are better.

Optimizing your token spend matters. Optimizing what you build with those tokens matters more. The Vault is 50 pre-built B2B sales agents that turn $99 in API spend into $9,000 in pipeline. Built across all three platforms.

See the Vault $99.99 →

Real workloads, real numbers.

Five concrete examples to ground the calculator. These use standard rates without caching or batch discounts. Apply those levers and divide accordingly.

Customer support chatbot (10,000 conversations per day).

Average conversation: 1,500 input tokens (system prompt + user message + brief context) and 400 output tokens. Monthly volume: 450M input tokens, 120M output tokens.

On Claude Sonnet 4.6: $1,350 input + $1,800 output = $3,150/month. On GPT-5.4 Mini: $337.50 input + $540 output = $877.50/month. On Gemini 3 Flash: $225 input + $360 output = $585/month. Choosing Gemini 3 Flash over Claude Sonnet 4.6 saves $2,565 per month or $30,780 per year for the same workload — assuming both meet quality bar, which Sonnet usually wins on nuanced support but Flash matches on routine FAQ.

Code review agent (200 PRs per day).

Average review: 8,000 input tokens (code diff + repo context + style guide) and 2,000 output tokens (review comments). Monthly: 48M input, 12M output.

On Claude Opus 4.7: $240 input + $300 output = $540/month. On GPT-5.4: $120 input + $180 output = $300/month. On Gemini 3.1 Pro: $96 input + $144 output = $240/month. Code review is one of the few cases where flagship models earn their premium — bug detection accuracy on complex code matters more than $300/month in savings. Most engineering teams pick Opus 4.7 here despite the cost.

Document processing pipeline (50,000 docs per day).

Classifying and extracting data: 1,000 input tokens, 200 output tokens per doc. Monthly: 1.5B input, 300M output.

On Claude Haiku 4.5: $1,500 input + $1,500 output = $3,000/month. With Batch API (50% off): $1,500/month. On Gemini Flash-Lite: $375 input + $450 output = $825/month. With Batch: $412/month. For pure data extraction, Flash-Lite is dramatically cheaper than Haiku and quality differences are usually negligible. This is the workload type where Google undercuts everyone.

Sales prospecting agent (1,000 prospects per day).

Per prospect: 3,000 input tokens (company research + signal data + persona profile) and 800 output tokens (personalized email + reasoning trace). Monthly: 90M input, 24M output.

On Claude Sonnet 4.6: $270 + $360 = $630/month. On GPT-5.4 Mini: $67.50 + $108 = $175.50/month. On Gemini 3 Flash: $45 + $72 = $117/month. Sales output quality matters here, but Flash performance is genuinely competitive for templated outreach. The 5x cost gap from Sonnet to Flash adds up to $6,156 saved per year.

Long-document analysis (500 docs per day, 150K tokens each).

Average input: 150K tokens (full document) and 3K output tokens (summary + analysis). Monthly: 2.25B input, 45M output.

On Claude Sonnet 4.6 (1M context standard pricing): $6,750 + $675 = $7,425/month. On GPT-5.4 (under 270K threshold): $5,625 + $675 = $6,300/month. On Gemini 3.1 Pro (under 200K threshold): $4,500 + $540 = $5,040/month. This is where Anthropic's flat 1M-context pricing matters: if your docs spike above 200K, Gemini and OpenAI hit surcharges that flip the math entirely. For long-document work, run the calculator carefully.

Five questions before you commit.

Cheapest model isn't always the right model. Before locking in a vendor for a production workload, answer these:

How latency-sensitive is the workload? Real-time chat needs sub-second response, which rules out batch processing. Background data pipelines don't, so you can take the 50% batch discount.

How much repeated context do you have? If your system prompt and reference docs are the same across thousands of requests, prompt caching cuts costs by up to 90%. If every request is unique, caching doesn't help.

Where does your context size land? Prompts under 200K stay on standard pricing for all three providers. Above 200K, only Anthropic's Opus 4.6+, Sonnet 4.6, and Haiku 4.5 hold flat rates. Gemini and OpenAI both surcharge.

What's your team's existing tooling? If you're already deep in MCP integrations, Anthropic's ecosystem fits cleanly. If your stack runs on Google Cloud, Gemini through Vertex AI is the path of least friction. If you've built around OpenAI's Functions API, switching costs are real.

How much does quality variance cost you? A flagship model that produces consistently usable output may be cheaper net than a budget model that requires manual review or frequent retries. Calculate fully-loaded cost, not just API cost.

When to default to each provider.

Default to Anthropic Claude when: coding-heavy work where Opus's SWE-bench leadership matters; long-document workloads where flat 1M-context pricing wins; tasks requiring nuanced writing or instruction-following; teams already on Claude Code or Cowork.

Default to OpenAI ChatGPT when: multimodal work involving DALL-E, voice, or computer use; building public-facing GPTs for the GPT Store; teams that need the broadest plugin and Action ecosystem; budget-tier workloads where GPT-5.4 Nano's $0.20 input is hard to beat.

Default to Google Gemini when: the team lives in Google Workspace; budget is the primary constraint and Flash-Lite meets quality bar; prompts stay under 200K consistently; you need real-time grounding via native search; you want a free tier for prototyping.

Questions people ask.

How much does the Claude API cost in 2026?

Claude Opus 4.7 and Opus 4.6 both cost $5 per million input tokens and $25 per million output tokens. Sonnet 4.6 is $3/$15. Haiku 4.5 is $1/$5. All current-generation Claude models include the full 1M-token context window at standard pricing with no long-context surcharge — which is unique among the three providers.

How much does the ChatGPT API cost in 2026?

GPT-5.4 costs $2.50 per million input tokens and $15 per million output tokens for prompts under 270K. Above 270K, input doubles to $5/MTok. GPT-5.4 Mini is $0.75/$4.50. GPT-5.4 Nano is $0.20/$1.25. GPT-5.4 Pro is $30/$180 for premium reasoning workloads.

How much does the Gemini API cost in 2026?

Gemini 3.1 Pro is $2/$12 for prompts under 200K, doubling to $4/$18 above. Gemini 3 Flash is $0.50/$3. Gemini 3.1 Flash-Lite is $0.25/$1.50 — the cheapest mainstream Tier-1 model. Gemini 2.5 Pro (legacy) is $1.25/$10. As of April 1, 2026, Pro models are paid-only; Flash and Flash-Lite retain free tier access through AI Studio.

What's the cheapest AI API right now?

For routine production work, Gemini 3.1 Flash-Lite at $0.25 per million input tokens is the cheapest mainstream model from a Tier-1 provider. GPT-5.4 Nano at $0.20 input is comparable but with a slightly higher output rate. For flagship-tier reasoning, Gemini 3.1 Pro at $2/$12 is dramatically cheaper than Claude Opus 4.7 at $5/$25.

How are tokens calculated?

A token is roughly 4 characters of English text, or about 0.75 words. A 1,000-word blog post is approximately 1,300 tokens. Each provider uses a slightly different tokenizer — OpenAI offers a free tokenizer tool for exact counts. Note that Claude Opus 4.7 uses a new tokenizer that may produce 1.0x to 1.35x more tokens than older Claude models for the same text.

What is prompt caching and how much can I save?

Prompt caching stores repeated context (system prompts, reference documents, few-shot examples) and reuses it at 10% of the standard input rate. For workloads with long system prompts and repeated context, savings reach 90% on input tokens. All three providers (Anthropic, OpenAI, Google) now offer some form of prompt caching, though the implementation details differ.

What is batch processing?

Batch APIs let you submit requests asynchronously with results delivered within 24 hours, in exchange for a flat 50% discount on all token costs. Available on Anthropic, OpenAI, and Gemini. Best for offline workloads like classification, content generation, evaluations, and ETL pipelines that don't need real-time responses.

Should I use one provider or multiple?

Most production teams in 2026 use multiple providers for different tasks. Common pattern: Claude for coding and long documents, Gemini for high-volume budget workloads and Workspace integration, OpenAI for multimodal work and the GPT Store ecosystem. The 5-15% overhead of managing multiple providers usually pays for itself in cost optimization.

Are these prices going up or down?

Down. Claude Opus dropped from $15/$75 (Opus 4) to $5/$25 (Opus 4.6/4.7) — a 67% reduction. GPT-5.4 launched at the same price as GPT-4o despite better capabilities. Gemini Flash-Lite is the cheapest tier-1 model in history at $0.25 input. The expectation through 2026 is continued downward pricing pressure as compute infrastructure improves.

Pricing sources verified April 21, 2026

Where the value really lives

Cheaper tokens save dollars. Better agents save weeks.

50 pre-built AI agents for B2B sales, written to work efficiently across Claude, ChatGPT, and Gemini. Optimized for cost. Optimized for results. Built for any platform.

Get the Vault $99.99

All Access $99.99

Prompt Leadz

Free AI Cost Calculator: Claude vs ChatGPT vs Gemini token costs