AI Tools

OpenAI API Pricing 2026: GPT-5.5, GPT-4o, o3, o4-mini Token Rates and Full Breakdown

Complete OpenAI API pricing guide for 2026 -- exact token rates for GPT-5.5, GPT-4o, GPT-4o-mini, o3, o4-mini, embeddings, Whisper, and DALL-E, plus batch discounts, prompt caching savings, and how OpenAI compares to Anthropic and Mistral.

Victor OgonyoVictor Ogonyo
·2026-05-25·15 min read

OpenAI API (platform.openai.com) pricing in 2026 spans the widest model range of any AI provider -- from GPT-4o Mini at $0.15 per million input tokens to GPT-5.5 at $5.00, reasoning models, audio transcription, embeddings, and image generation. This guide covers every model, every rate, every discount, and what you actually pay to build applications on OpenAI.


OpenAI API Pricing at a Glance

Language Models -- Per Million Tokens (Input / Output)

ModelInputOutputContext
GPT-4o Mini$0.15$0.60128K
GPT-5.4 Nano$0.10$0.50128K
GPT-5.4 Mini$0.30$1.50128K
GPT-4o$2.50$10.00128K
GPT-5.4$2.50$15.00128K
GPT-5.5$5.00$30.001M
GPT-5.5 Pro$30.00$180.001M

Reasoning Models -- Per Million Tokens

ModelInputOutputNotes
o4-mini$0.55$2.20Best-value reasoning
o3-mini$1.10$4.40Entry reasoning
o3$2.00$8.00Flagship reasoning

Embeddings -- Per Million Input Tokens

ModelStandardBatch
text-embedding-3-small$0.02$0.01
text-embedding-3-large$0.13$0.065

Audio

ServicePrice
Whisper (transcription)$0.006/minute
GPT-4o Transcribe$0.006/minute
GPT-4o Mini Transcribe$0.003/minute

Image Generation

SizeStandardHD
1024×1024$0.04/image$0.08/image
1024×1792$0.08/image$0.12/image

OpenAI Language Models: Full Breakdown

GPT-4o Mini -- $0.15/$0.60

GPT-4o Mini is the workhorse model for high-volume, lower-complexity tasks. At $0.15 per million input tokens, it is one of the cheapest capable models from any major provider -- cheaper than Claude Haiku 4.5 ($1.00/M) and competitive with Mistral Small ($0.10/M).

What GPT-4o Mini handles well: Customer support routing, simple Q&A, classification, content moderation, structured data extraction, summarisation, and any application where throughput and cost matter more than frontier reasoning.

Cached input rate: $0.075/M (50% off with prompt caching).

Best for: High-volume consumer applications, classification pipelines, chatbots, and any workload running millions of requests per month.

GPT-5.4 Series -- $0.10 to $2.50 Input

The GPT-5.4 family covers three capability tiers at very different price points:

GPT-5.4 Nano -- $0.10/$0.50: Ultra-cheap, fast, suitable for the simplest tasks. Competes directly with Mistral Nemo at $0.02/$0.04 -- Mistral is cheaper but GPT-5.4 Nano benefits from OpenAI's broader ecosystem and tooling.

GPT-5.4 Mini -- $0.30/$1.50: Mid-tier between Nano and the full GPT-5.4. Useful for tasks that need more capability than Nano but don't justify the full $2.50 input rate.

GPT-5.4 -- $2.50/$15.00: The primary production model for applications needing strong reasoning, complex writing, and reliable structured outputs. Competing directly with Claude Sonnet 4.6 ($3.00/$15.00) at the mid-tier.

GPT-5.5 -- $5.00/$30.00

GPT-5.5 is OpenAI's current flagship model with a 1M token context window.

What GPT-5.5 offers:

  • 1M token context window (2x GPT-5.4's 128K and equal to Anthropic's extended context)
  • OpenAI's best current reasoning and instruction-following capability
  • Extended thinking capability (GPT-5.5 Pro)
  • 2x input pricing for prompts exceeding 272K tokens

Cost at scale: GPT-5.5 at $5.00/$30.00 is among the most expensive models per token. At 1M output tokens, the cost is $30 -- compared to $6 for Mistral Large 2 and $25 for Claude Opus 4.7. For applications where GPT-5.5's quality advantage justifies the cost, prompt caching (reducing input to $0.50/M for cached content) is essential.

GPT-5.5 Pro -- $10.00/$60.00: The extended reasoning variant of GPT-5.5. At $60/M output tokens, this is the most expensive model available from any major provider. Reserved for genuinely hard problems where reasoning chain depth matters.

GPT-4o -- $2.50/$10.00

GPT-4o remains available and is a strong choice for applications that were built around it and don't yet need GPT-5.x capabilities. Its $10.00/M output rate is significantly cheaper than GPT-5.5 ($30.00/M) for output-heavy workloads.


Reasoning Models: o3, o3-mini, o4-mini

OpenAI's o-series models use chain-of-thought reasoning internally before generating a response. They produce better results on multi-step logic, mathematics, and complex code -- but at higher cost because internal reasoning tokens are billed as output tokens.

The Reasoning Token Billing Trap

A prompt that generates a 500-token visible response might consume 8,000 internal reasoning tokens before producing it. All 8,500 tokens are billed as output. This means a single o3 request generating a "short" answer can cost significantly more than expected.

Always estimate reasoning token overhead before deploying o-series models at scale. For tasks that don't require multi-step reasoning, GPT-5.4 or GPT-4o will be cheaper with equal or better results.

o4-mini -- $0.55/$2.20

o4-mini is the best-value reasoning model. At $0.55/$2.20 per million tokens it delivers chain-of-thought reasoning capability at a fraction of o3's cost. For most reasoning use cases -- coding problems, structured analysis, step-by-step problem solving -- o4-mini matches or approaches o3 quality at 4x lower cost.

Best for: Mathematical reasoning, complex code debugging, multi-step analysis, and applications where chain-of-thought reasoning improves output quality but cost is still a constraint.

o3-mini -- $1.10/$4.40

o3-mini sits between o4-mini and full o3. It offers stronger reasoning than o4-mini for genuinely complex problems while remaining cheaper than o3.

o3 -- $2.00/$8.00

o3 is OpenAI's flagship reasoning model. At $2.00/$8.00 per million tokens it is the most capable reasoning model available, outperforming o3-mini and o4-mini on the hardest problems. The high output rate compounds with reasoning token overhead -- reserve o3 for tasks where the quality difference is measurable.

Best for: Advanced scientific reasoning, frontier code generation, complex multi-constraint problems where quality improvement justifies the cost.


Discounts: Batch API and Prompt Caching

Batch API -- 50% Off

The OpenAI Batch API processes requests asynchronously and returns results within 24 hours. In exchange, you pay 50% of the standard rate on both input and output.

Batch API rates (per million tokens):

ModelBatch InputBatch Output
GPT-4o Mini$0.075$0.30
GPT-4o$1.25$5.00
GPT-5.4$1.25$7.50
GPT-5.5$2.50$15.00
text-embedding-3-small$0.01

Best use cases: Nightly data processing, document analysis pipelines, evaluation datasets, content generation at scale, any workload that doesn't need real-time results.

Prompt Caching -- 90% Off Cached Input

For applications with a repeated system prompt or knowledge base, prompt caching reduces the input cost of cached tokens to 10% of the standard rate.

Cached input rates:

ModelStandard InputCached InputSaving
GPT-4o Mini$0.15$0.07550%
GPT-4o$2.50$0.2590%
GPT-5.4$2.50$0.2590%
GPT-5.5$5.00$0.5090%

Example: A customer support application using GPT-5.4 with a 10,000-token system prompt running 5,000 requests/day:

  • Without caching: 10,000 × 5,000 × $2.50/M = $125/day
  • With caching: First request + 4,999 × $0.25/M × 10,000 = $12.50/day
  • Saving: 90%, or ~$3,375/month

Stacking Both Discounts

Batch API and prompt caching can be combined for up to 75% total cost reduction:

  • GPT-5.4 standard input: $2.50/M
  • GPT-5.4 batch: $1.25/M (50% off)
  • GPT-5.4 batch + cached: $0.625/M (75% off total)

Embeddings: text-embedding-3

OpenAI's embedding models convert text into vector representations for semantic search, retrieval-augmented generation (RAG), and similarity comparison.

text-embedding-3-small -- $0.02/M

The default choice for most embedding applications. At $0.02 per million tokens, embedding a 1,000-word document costs less than $0.00003. Produces 1,536-dimensional vectors with strong performance on standard retrieval benchmarks.

text-embedding-3-large -- $0.13/M

6.5x more expensive than small. Produces 3,072-dimensional vectors with higher semantic precision. Recommended only for applications where retrieval quality is the primary bottleneck and the additional cost is justified by measurable quality improvement.

For most RAG applications: text-embedding-3-small is the right default. Benchmark your specific retrieval task before paying the 6.5x premium for large.


Audio and Transcription

Whisper -- $0.006/Minute

Whisper is OpenAI's audio transcription model. At $0.006 per minute, transcribing one hour of audio costs $0.36 -- a fraction of most commercial transcription services.

GPT-4o Transcribe offers the same $0.006/minute rate with potentially higher accuracy on accents, technical terminology, and poor audio quality.

GPT-4o Mini Transcribe at $0.003/minute is half the cost for simpler transcription needs.

Use cases: Meeting transcription, podcast processing, voice-to-text applications, audio search indexing.


Free Tier and Credits

OpenAI's free API access situation is uncertain in 2026:

  • New accounts may receive $5 in free credits with a 3-month expiry (no card required)
  • Some reports indicate free trial credits were discontinued mid-2025
  • A highly rate-limited free tier (3 requests/minute) may still exist for GPT-4o Mini

Developer credit programmes:

  • OpenAI for Startups: $2,500 credits (via participating VC partner firms)
  • OpenAI Grove programme: $50,000 credits (5-week San Francisco cohort)
  • Codex Open Source Fund: $25,000 credits (open source maintainers)
  • Founder Stack via Ramp: $5,000 credits (corporate card holders)

OpenAI API vs Anthropic Claude (anthropic.com) vs Mistral (mistral.ai): Full Comparison

Budget Models

ModelInput (per 1M)Output (per 1M)
Mistral Nemo$0.02$0.04
GPT-5.4 Nano$0.10$0.50
Mistral Small 3.1$0.10$0.30
GPT-4o Mini$0.15$0.60
Claude Haiku 4.5$1.00$5.00

Mistral wins the cheapest tier. GPT-4o Mini is the second cheapest and benefits from OpenAI's mature tooling and ecosystem. Claude Haiku is the most expensive budget model.

Frontier Models

ModelInput (per 1M)Output (per 1M)
Mistral Large 2$2.00$6.00
GPT-4o$2.50$10.00
Claude Sonnet 4.6$3.00$15.00
Claude Opus 4.7$5.00$25.00
GPT-5.5$5.00$30.00

Mistral Large 2 is the cheapest frontier model. GPT-5.5 and Claude Opus 4.7 are the most expensive -- reserved for applications where their quality advantage is measurable.


Which OpenAI Model Should You Use?

Choose GPT-4o Mini ($0.15/$0.60) if:

  • You are running millions of requests per month and cost is the primary constraint
  • Tasks are relatively simple: classification, summarisation, customer support routing
  • You want the cheapest capable OpenAI model

Choose GPT-5.4 ($2.50/$15.00) if:

  • You need strong reasoning and writing quality for production applications
  • You want the primary OpenAI production model at a balanced price point
  • Your application involves complex writing, code generation, or structured analysis

Choose GPT-5.5 ($5.00/$30.00) if:

  • You need the 1M token context window for very long documents
  • Maximum instruction-following quality is required
  • Cost is secondary to capability

Choose o4-mini ($0.55/$2.20) if:

  • Your application benefits from chain-of-thought reasoning
  • You need multi-step problem solving, math, or complex code at reasonable cost
  • You want reasoning capability without o3's pricing

Choose o3 ($2.00/$8.00) if:

  • The task genuinely requires frontier reasoning -- advanced math, complex logic chains, difficult code
  • Quality improvement over o4-mini is measurable in your specific use case

Frequently Asked Questions

How much does the OpenAI API cost in 2026? GPT-4o Mini costs $0.15/$0.60 per million tokens. GPT-4o costs $2.50/$10.00. GPT-5.4 costs $2.50/$15.00. GPT-5.5 costs $5.00/$30.00. Reasoning models: o4-mini $0.55/$2.20, o3 $2.00/$8.00.

Is the OpenAI API free? OpenAI may offer $5 in new-user credits (reports vary). A heavily rate-limited free tier exists but is not suitable for production. Most developers need a paid account with a credit card attached.

What is prompt caching on the OpenAI API? Prompt caching stores repeated context (system prompts, documents) so subsequent requests pay 10% of the standard input rate for the cached portion -- a 90% saving on those tokens.

What is the OpenAI Batch API? The Batch API processes requests asynchronously (results within 24 hours) at 50% off standard rates on both input and output.

How do o-series reasoning models billing work? Reasoning models (o3, o3-mini, o4-mini) think through problems internally before responding. All internal reasoning tokens are billed as output tokens -- even if the visible response is short. Budget for 5--20x the visible output token count when estimating o-series costs.

Is GPT-4o still available in 2026? Yes -- GPT-4o remains available at $2.50/$10.00 per million tokens. It is a stable, well-tested model suitable for applications that don't yet require GPT-5.x capabilities.

How does OpenAI API pricing compare to Claude API? GPT-4o Mini ($0.15/$0.60) is significantly cheaper than Claude Haiku 4.5 ($1.00/$5.00). GPT-5.4 ($2.50/$15.00) is slightly cheaper than Claude Sonnet 4.6 ($3.00/$15.00) on input. Claude Opus 4.7 ($5.00/$25.00) has lower output cost than GPT-5.5 ($5.00/$30.00).


Building an AI startup? List it on Startup Launch Page and reach investors and early adopters actively looking for what you're building.

Building something great?

List your startup on Startup Launch Page -- reach real investors, founders, and early adopters.

Launch your startup →
← Back to Blog