Don't like ads? Go Ad-Free Today

LLM API Cost Calculator

DataDeveloperMath
ADVERTISEMENT · REMOVE?

Request

Tokens sent to the model per call (prompt + context).
Tokens the model returns per call (completion).
How many requests you make each day. Used for daily and monthly projections.

Model & Pricing Mode

Cost for Selected Model

Side-by-side Model Comparison

Notes

Pricing is based on published list prices per 1M tokens and may differ from your enterprise rate. Batch pricing applies the provider's standard 50% discount (OpenAI, Anthropic, Google). Meta / Llama APIs are priced from common hosted providers and typically do not offer a batch tier.
ADVERTISEMENT · REMOVE?

Guide

LLM API Cost Calculator

LLM API Cost Calculator

Estimate what an LLM API call will actually cost before you deploy it. Enter the input tokens, output tokens, and your daily call volume, pick a model, and this tool shows you the per-call, per-day, per-month, and per-year spend across OpenAI, Anthropic, Google, and Meta models — using the current published list prices per 1M tokens. Great for sanity-checking a quote, comparing providers, or planning a launch budget.

How to Use

  1. Enter the average Input Tokens per request (your prompt plus any context you pass).
  2. Enter the average Output Tokens you expect the model to return.
  3. Enter the number of Calls / Day you expect to make in production.
  4. Pick a Model from the dropdown (OpenAI, Anthropic, Google, or Meta / Llama).
  5. Switch between Real-time pricing and Batch pricing to see the 50% batch discount where providers support it.
  6. Read the per-call, per-day, per-month, and per-year cost summary, then scroll to the comparison table to see what the same workload would cost on every other model.

Features

  • Multi-provider pricing — OpenAI, Anthropic, Google, and Meta / Llama models in one table.
  • Real-time vs batch toggle — see the 50% batch discount for OpenAI, Anthropic, and Google, and a clear “n/a” where a provider has no batch tier.
  • Per-call, daily, monthly, and yearly projections — projections use the 30.44-day monthly average for a realistic run-rate.
  • Side-by-side model comparison table — see what the same workload costs on every supported model, with your selected model highlighted.
  • Separate input and output pricing — because output tokens are typically 2x to 5x more expensive than input tokens.
  • Zero-server, zero-tracking — all pricing math runs client-side. Your token counts and volumes never leave your browser.

FAQ

  1. What is a token and why do LLMs charge per token?

    A token is a chunk of text the model reads and writes — roughly a word, a sub-word, or a single punctuation mark. English prose averages about four characters per token. LLMs bill per token because compute cost scales with the number of tokens processed: every input token has to be attended to, and every output token is generated one step at a time. Per-token pricing gives a linear, predictable cost model that maps directly onto the work the GPU actually does.

  2. Why are output tokens usually more expensive than input tokens?

    Input tokens are processed in a single parallel forward pass: the model reads the whole prompt in one shot. Output tokens, on the other hand, are generated autoregressively — each new token requires another forward pass over the growing context. That step-by-step generation is more expensive per token, which is why providers typically price output tokens 2x to 5x higher than input tokens.

  3. What is batch pricing and when does it make sense?

    Batch pricing lets you submit many requests together and receive the results within a provider-specified window — typically 24 hours at OpenAI, Anthropic, and Google. Because these jobs can be scheduled on off-peak capacity, providers offer a 50% discount on both input and output tokens. Batch is ideal for offline workloads like document enrichment, evaluation runs, embedding backfills, and nightly reports. It is not suitable for anything a user is waiting on, like chat or interactive search.

  4. Why does the same token count cost more on bigger models?

    Larger models have more parameters, which means every forward pass requires more compute and more memory bandwidth. A 405-billion-parameter model simply does more arithmetic per token than an 8-billion-parameter one. Providers pass that cost through as a higher per-token price. That is also why a smaller, faster model is often the right answer for simple classification or extraction tasks — you pay less and get a response sooner.

  5. Do list prices reflect what I will actually pay?

    Not always. Published list prices are the starting point, but most providers offer committed-use discounts, enterprise contracts, prepaid credits, and volume tiers that reduce the effective per-token rate. In addition, cached prompts, prompt-compression features, and provider-specific context-caching can lower input costs substantially for repetitive workloads. Treat list-price calculators as an upper bound for planning, then layer your contractual discounts on top.

Want To enjoy an ad-free experience? Go Ad-Free Today

Install Our Extensions

Add IO tools to your favorite browser for instant access and faster searching

Add to Chrome Extension Add to Edge Extension Add to Firefox Extension Add to Opera Extension

Scoreboard Has Arrived!

Scoreboard is a fun way to keep track of your games, all data is stored in your browser. More features are coming soon!

ADVERTISEMENT · REMOVE?
ADVERTISEMENT · REMOVE?
ADVERTISEMENT · REMOVE?

News Corner w/ Tech Highlights

Get Involved

Help us continue providing valuable free tools

Buy me a coffee
ADVERTISEMENT · REMOVE?