Don't like ads? Go Ad-Free Today 

LLM API Cost Calculator

DataDeveloperMath

ADVERTISEMENT · REMOVE?

INPUT

Auto Process

Request

Input Tokens

Tokens sent to the model per call (prompt + context).

Output Tokens

Tokens the model returns per call (completion).

Calls / Day

How many requests you make each day. Used for daily and monthly projections.

Model & Pricing Mode

Model

Real-time pricing

Batch pricing (50% off, where supported)

OUTPUT

Client Side

ADVERTISEMENT · REMOVE?

Guide

LLM API Cost Calculator

Estimate what an LLM API call will actually cost before you deploy it. Enter the input tokens, output tokens, and your daily call volume, pick a model, and this tool shows you the per-call, per-day, per-month, and per-year spend across OpenAI, Anthropic, Google, and Meta models — using the current published list prices per 1M tokens. Great for sanity-checking a quote, comparing providers, or planning a launch budget.

How to Use

Enter the average Input Tokens per request (your prompt plus any context you pass).
Enter the average Output Tokens you expect the model to return.
Enter the number of Calls / Day you expect to make in production.
Pick a Model from the dropdown (OpenAI, Anthropic, Google, or Meta / Llama).
Switch between Real-time pricing and Batch pricing to see the 50% batch discount where providers support it.
Read the per-call, per-day, per-month, and per-year cost summary, then scroll to the comparison table to see what the same workload would cost on every other model.

Features

Multi-provider pricing — OpenAI, Anthropic, Google, and Meta / Llama models in one table.
Real-time vs batch toggle — see the 50% batch discount for OpenAI, Anthropic, and Google, and a clear “n/a” where a provider has no batch tier.
Per-call, daily, monthly, and yearly projections — projections use the 30.44-day monthly average for a realistic run-rate.
Side-by-side model comparison table — see what the same workload costs on every supported model, with your selected model highlighted.
Separate input and output pricing — because output tokens are typically 2x to 5x more expensive than input tokens.
Zero-server, zero-tracking — all pricing math runs client-side. Your token counts and volumes never leave your browser.

 FAQ

What is a token and why do LLMs charge per token?

A token is a chunk of text the model reads and writes — roughly a word, a sub-word, or a single punctuation mark. English prose averages about four characters per token. LLMs bill per token because compute cost scales with the number of tokens processed: every input token has to be attended to, and every output token is generated one step at a time. Per-token pricing gives a linear, predictable cost model that maps directly onto the work the GPU actually does.
Why are output tokens usually more expensive than input tokens?

Input tokens are processed in a single parallel forward pass: the model reads the whole prompt in one shot. Output tokens, on the other hand, are generated autoregressively — each new token requires another forward pass over the growing context. That step-by-step generation is more expensive per token, which is why providers typically price output tokens 2x to 5x higher than input tokens.
What is batch pricing and when does it make sense?

Batch pricing lets you submit many requests together and receive the results within a provider-specified window — typically 24 hours at OpenAI, Anthropic, and Google. Because these jobs can be scheduled on off-peak capacity, providers offer a 50% discount on both input and output tokens. Batch is ideal for offline workloads like document enrichment, evaluation runs, embedding backfills, and nightly reports. It is not suitable for anything a user is waiting on, like chat or interactive search.
Why does the same token count cost more on bigger models?

Larger models have more parameters, which means every forward pass requires more compute and more memory bandwidth. A 405-billion-parameter model simply does more arithmetic per token than an 8-billion-parameter one. Providers pass that cost through as a higher per-token price. That is also why a smaller, faster model is often the right answer for simple classification or extraction tasks — you pay less and get a response sooner.
Do list prices reflect what I will actually pay?

Not always. Published list prices are the starting point, but most providers offer committed-use discounts, enterprise contracts, prepaid credits, and volume tiers that reduce the effective per-token rate. In addition, cached prompts, prompt-compression features, and provider-specific context-caching can lower input costs substantially for repetitive workloads. Treat list-price calculators as an upper bound for planning, then layer your contractual discounts on top.