GPT-5.4: Analysis of 1M Token Context and API Performance

Marcus Webb

Senior Backend Analyst

The Pitch

GPT-5.4 introduces a 1M token context window and native computer-use capabilities through its unified ‘Codex’ intelligence. OpenAI is positioning this as the primary engine for complex agentic workflows, though early developer feedback suggests the performance does not always match the price point.

Under the Hood

GPT-5.4 supports 1M tokens of input with a 128k output ceiling (source: llm-stats.com). Standard API costs are set at $2.50/M input and $15.00/M output (source: OpenAI Pricing via HN/OpenRouter). The model’s native computer-use success rate is 75.0% on the OSWorld-Verified benchmark, which sits above the human baseline (source: Techzine Europe).

However, the architecture suffers from significant operational overhead. Developers report 'Fast mode' latency reaching up to 2 minutes for reasoning-heavy tasks (source: HN Thread). Furthermore, a stealth pricing mechanism applies a 2x input multiplier for any prompt exceeding 272k tokens in specific configurations (source: Cursor/Substack analysis).

The model’s reliability in long-context sessions is currently under scrutiny. Users note a tendency for the model to 'forget' recent instructions or hallucinate that a task is finished when it is not (source: HN / Every.to). This represents a regression in instruction following when compared to Gemini 3.

There are still several gaps in the performance data. We do not know yet how GPT-5.4 compares to Claude 4.6 Opus on SWE-bench Pro, as those benchmarks are expected late March 2026. Additionally, OpenAI has not clarified why its 'Ask ChatGPT' integration is currently unable to summarize its own launch URL.

1M token input/128k output (Source: llm-stats.com)
75.0% success rate on OSWorld (Source: Techzine Europe)
Integrated tool-search reduces token use by 47% (Source: Tom's Guide)
Knowledge cutoff: August 31, 2025 (Source: OpenAI)
Standard pricing: $2.50/M input, $15.00/M output (Source: HN)

Marcus's Take

GPT-5.4 is technically proficient but operationally fragile for production use. The 2-minute latency spikes and stealth pricing for long-context windows make it a liability for real-time agentic systems. While the 1M context window is a significant utility on paper, the state management failures and instruction 'forgetting' mean it cannot be trusted for mission-critical deployments. Stick to Claude 4 Sonnet for reliability or Gemini 2.5 for speed; skip GPT-5.4 until OpenAI solves the latency and pricing transparency issues.

Ship clean code,
Marcus.

Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Trend Analysis·3 min read

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Trend Analysis·3 min read

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

Trend Analysis·3 min read

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.

The Pitch

Under the Hood

Marcus's Take

Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Stay Ahead of AI Adoption Trends