Free AI Token Counter: count tokens for Claude, ChatGPT & Gemini.
Paste your prompt, see the token count across 12 models in real time. Runs in your browser. Your data never leaves your device.
Paste your prompt.
Live token counts across 12 models. Works offline. Never leaves your browser.
Need to calculate the full cost at scale?
The counter above shows per-prompt tokens. For volume estimates at Hobby, Startup, Growth, or Enterprise scale with input and output costs side by side, use the Free AI Cost Calculator.
Open the cost calculator →Every API call to an LLM is billed by token count. Not by character, not by word, by token. The exact number determines what you pay, whether the prompt fits the context window, and how fast the response comes back. If you are building anything on top of Claude, ChatGPT, or Gemini in 2026, understanding your token count is the difference between a cost-effective product and a runaway bill.
The tool above gives you that count for twelve models across three tokenizer families. It runs entirely in your browser using calibrated heuristics per model. The official OpenAI tokenizer (tiktoken), Anthropic's tokenizer, and Google's SentencePiece each produce slightly different counts for the same text. This tool approximates all three.
The rest of this page explains how tokenization actually works, where the major tokenizers differ, and how to reduce token usage without losing prompt quality. Skip to the sections you need.
What a token actually is.
A token is the smallest unit of text a language model processes. It is not a character and not a word. It sits somewhere in between. For English text, one token is roughly four characters or about three-quarters of a word. For code, numbers, and non-Latin scripts, the ratios shift significantly.
The reason tokens exist at all is computational. Models cannot handle raw characters efficiently because the English vocabulary is too long for practical processing. Models cannot handle full words either because the vocabulary would need to contain millions of entries to cover every conjugation, plural, and compound. Tokens are the compromise. A fixed vocabulary of typically 50,000 to 200,000 subword units covers all real-world text with minimal storage.
The process of turning text into tokens is called tokenization. It happens before the model sees your input and after the model generates its output. When the model decides "dog" is the next token, it emits a number that the tokenizer converts back into the string "dog" before you see it. Every step of generation is a token decision.
How tokenizers split text.
All three major tokenizers use some form of Byte Pair Encoding (BPE). The algorithm starts with individual characters and iteratively merges the most common pairs. After millions of merges on training data, common sequences like "the", "ing", and "tion" become single tokens. Rare sequences remain as multiple tokens.
This creates some intuitive behavior and some counterintuitive behavior. Common words get one token each. Rare words split into multiple tokens. Numbers are almost always expensive because long digit sequences are rare in training data. Non-English scripts can be extremely expensive because the tokenizers were trained primarily on English.
For an illustrative example: the English sentence "The tokenizer splits this sentence" is eight tokens with most tokenizers, even though it has six words. The word "tokenizer" splits into "token" plus "izer" because "tokenizer" itself was not common enough in training data to warrant a single token. "Sentence" splits into "sen" plus "tence" for the same reason.
Claude tokenization.
Claude uses a proprietary BPE-based tokenizer. Anthropic has not published the full vocabulary but has released the official count_tokens API endpoint for exact counts. For estimation, the rule of thumb is 3.5 characters per token for English prose, 3 for code, and 2 for numeric content.
Claude tokenization tends to be slightly more granular than GPT tokenization, which means Claude will generally report more tokens for the same input. The practical impact is small for short prompts but can matter at scale. A million characters of English text produces roughly 285,000 Claude tokens versus 250,000 GPT tokens.
For exact counts in production code, use the Anthropic SDK's messages.countTokens method. It is free and does not consume your API rate limit. Call it before calling the actual generation endpoint if you need precise cost forecasting or strict context window enforcement.
GPT tokenization.
OpenAI uses tiktoken, an open-source BPE tokenizer with several encoding variants. GPT-3.5 and GPT-4 use cl100k_base with a 100,000 token vocabulary. GPT-4o and newer models use o200k_base with 200,000 tokens, handling non-Latin scripts and emoji far more efficiently than the older encoding.
The rule of thumb for GPT is 4 characters per token for English prose. The OpenAI tokenizer demo page shows the exact breakdown for any input text. For JavaScript and TypeScript code, js-tiktoken provides the official tokenizer in a format that runs in browsers.
Chat completions add a small per-message overhead. Each message in the messages array consumes an extra 3 to 4 tokens for role metadata, message boundaries, and formatting. Over long conversations this adds up. A 50-turn conversation might have 150 to 200 tokens of pure structural overhead on top of the content tokens.
Gemini tokenization.
Google uses SentencePiece, a different subword tokenization algorithm that handles multilingual text more efficiently than BPE. For English, the rule of thumb is 4.2 characters per token, slightly better than GPT and noticeably better than Claude.
Gemini exposes a dedicated countTokens endpoint both in the Gemini API and Vertex AI. Like Anthropic's endpoint, it is free and does not count against quotas. For long document analysis, call countTokens before the generation call to confirm the prompt fits within the context window, especially for multi-megabyte inputs.
Where Gemini tokenization really shines is multilingual content. CJK languages (Chinese, Japanese, Korean), Arabic, and Cyrillic scripts tokenize far more efficiently with SentencePiece than with tiktoken's default encodings. A document in Japanese might use 40 percent fewer tokens with Gemini than with GPT or Claude.
Tokens are the cost. Tools are the capability.
Once you understand token costs, the next lever is what the model can actually do. MCP servers let AI models interact with your stack. Our full 50-server guide ranks the best MCP options for B2B teams.
Read the MCP servers guide →The three levers for cutting tokens.
Reducing token usage is the single highest-ROI optimization for any production AI application. Three strategies stack for combined savings of up to 95 percent on production workloads, and they work across all three major providers.
Lever 1: Prompt caching.
Anthropic prompt caching reduces cached input tokens to 10 percent of the standard rate. OpenAI offers similar discounts on cached inputs. Google's context caching works the same way. The rule: if the same prefix appears in multiple API calls (a long system prompt, a large document, a set of tool definitions), mark it as cacheable and pay once.
The best candidates for caching are long system prompts over 1,000 tokens, fixed document context that multiple queries reference, tool schemas that do not change between calls, and few-shot example sets. Typical savings: 70 to 90 percent on applications with structured prompts.
Lever 2: Batch processing.
If your workload tolerates up to 24 hours of latency, batch processing provides a flat 50 percent discount on both input and output tokens. Anthropic and Google offer similar batch discounts. The use case is offline jobs: classification, summarization of archives, data enrichment, ETL-style operations, and evaluation runs.
The decision is operational: can the result wait until tomorrow? If yes, batch. If the user is waiting on a screen, do not batch. For asynchronous workloads, batch cuts the bill in half with essentially zero code changes.
Lever 3: Tiered routing.
Not every task needs a flagship model. Simple classification, extraction, and formatting tasks run fine on Haiku, GPT-Nano, or Flash-Lite at one tenth the cost. Hard reasoning tasks need Opus, GPT Pro, or Gemini Pro. The pattern is called tiered routing: an orchestrator decides which model to call based on task complexity.
A typical split is 70 percent of calls to the cheapest model, 20 percent to the mid-tier, and 10 percent to the flagship. Done well, this cuts 60 to 80 percent of cost with minimal quality loss. The tradeoff is engineering effort to build the router, which matters less as traffic scales.
Stacked together, the three levers can take a $60 workload down to $8.50, an 86 percent reduction. The math is multiplicative: half-off from batch, times tenth-off from caching, times tiered routing on top.
Questions people ask.
What is a token in AI models?
A token is the fundamental unit AI language models use to process text. Tokens are roughly 4 characters of English text or about 0.75 words. Common words are usually one token; rare words split into multiple tokens. Models charge by token, so token count directly determines API cost.
How accurate is this token counter?
The counter uses the characters-per-token heuristic calibrated per model family: 3.5 for Claude, 4.0 for GPT and Llama, 4.2 for Gemini. Accuracy for English text is typically within 5 to 10 percent of the official tokenizer count. For exact counts in production, use each vendor's official SDK: tiktoken for OpenAI, count_tokens for Anthropic, or countTokens for Gemini.
Why do different models report different token counts for the same text?
Each model family uses a different tokenizer trained on different data. OpenAI uses tiktoken with cl100k_base or o200k_base encoding. Anthropic uses a proprietary BPE variant. Google uses SentencePiece. The same 140-character sentence might be 40 tokens in Claude, 35 in GPT, and 33 in Gemini.
Does this counter send my data to a server?
No. All tokenization happens in your browser using JavaScript. Your prompts never leave your device. This matters for confidential prompts containing customer data, trade secrets, or any sensitive information.
Why are output tokens more expensive than input tokens?
Output tokens require sequential generation, meaning the model computes each token one at a time with full attention over all previous tokens. This is far more compute-intensive than processing input tokens in parallel. Most providers price output at 4 to 6 times the input rate.
How do I count tokens for a long document?
Paste the full document into the counter. For PDFs, extract text first using pypdf or pdfplumber, then paste. For images in vision prompts, the count is fixed per image size regardless of content: 85 tokens for a low-detail 512x512 image, 170 tokens per tile for high-detail mode.
What is the cheapest way to reduce token usage?
Three levers stack for up to 95 percent savings. First, use prompt caching to reuse repeated context (up to 90 percent off on cached tokens). Second, use batch processing for async workloads (flat 50 percent off). Third, route simple tasks to cheap models like Haiku, GPT-Nano, or Flash-Lite.
Can I count tokens for chat messages including system prompts?
Yes. Paste the full message content including system instructions, tool definitions, and conversation history. Remember that chat message overhead adds roughly 3 to 4 tokens per message turn for role metadata, which matters over long conversations.
What is the context window?
The context window is the maximum number of tokens a model can process in a single request, combining both input and output. If your input plus expected output exceeds the context window, the API will return an error or truncate. Claude Opus supports 1 million tokens, GPT-5.4 supports 270K (with 1.05M on Pro), Gemini 3.1 Pro supports 2 million.
Official tokenizer documentation
- OpenAI tiktoken (GitHub)
- OpenAI tokenizer demo page
- Anthropic: Token counting for Claude
- Google: Gemini API tokens documentation
- Google SentencePiece (GitHub)
- js-tiktoken for browser use
- Byte Pair Encoding (Wikipedia)
- OpenAI API pricing
- Anthropic Claude pricing
- Google Gemini API pricing
- Anthropic prompt caching
- OpenAI batch API
- Gemini context caching
Counting tokens is the start. Agents are the leverage.
Every token you spend on a human-written prompt is a token you could have saved with a pre-built agent that knows your workflow. The Vault is 50 of those, tuned for B2B sales.
Get the Vault $99.99
แสดงความคิดเห็น: