LLM Token Counter & Cost Estimator
Count tokens and estimate cost across GPT, Claude, Gemini, and Llama
Start typing to see token counts. Text is processed server-side and not persisted.
Tokenization runs server-side. Your text is not logged, not stored, and not used for training.
Frequently Asked Questions
What is the LLM Token Counter?
A token counter breaks your prompt into the same tokens the model will actually see, so you can predict how much of the context window you are using and how much the call will cost before sending it. It supports GPT-4.1, Claude Opus 4.7, Gemini 2.5, and Llama 3.3 side by side, each with that vendor's own tokenizer.
Why do GPT, Claude, and Gemini report different token counts for the same text?
Each provider ships a different tokenizer: OpenAI uses `o200k_base` (tiktoken), Anthropic uses a proprietary BPE, Gemini uses SentencePiece, and Llama 3 uses `tiktoken`-compatible `cl100k` variants. The same sentence can cost 10–30% more or fewer tokens depending on the model — especially for Chinese, Japanese, and code.
How accurate are the counts?
OpenAI, Claude, and Llama counts are exact because we run the official tokenizers in WebAssembly in your browser. Gemini is estimated from a published character-to-token ratio because Google has not open-sourced the exact tokenizer. For final billing, always trust the `usage` field the API returns.
Does it count output tokens too?
No. Output tokens only exist after the model responds, so any "output estimator" would be guessing. The tool focuses on input tokens and the price-per-1M-input-tokens figure published by each vendor, plus a user-supplied expected output length for a rough total.
Is my prompt sent to any server?
No. All tokenizers run locally via WASM — nothing leaves your browser. That means you can safely paste proprietary prompts, customer data, or unreleased product copy.
How should I use this when planning prompt caching?
Prompt caching on OpenAI and Anthropic charges a lower rate for tokens served from cache, but only for the prefix that is exactly reused. Use the counter to measure the length of your stable system prompt vs. the per-request user section — if the cached prefix is under ~1K tokens, caching usually is not worth the complexity.