Skip to main content
UsedBy.ai
All articles
Trend Analysis3 min read
Published: May 20, 2026

Gemini 3.5 Flash Performance and Pricing Analysis

The model architecture is optimised for the new TPU 8i inference chips, delivering output speeds between 280 and 455 tokens per second (Source: Towards AI). This makes it significantly faster than Ope

Marcus Webb
Marcus Webb
Senior Backend Analyst

The Pitch

Google released Gemini 3.5 Flash yesterday at I/O 2026, positioning it as an agent-first model that prioritises speed over pure parameter density. It is currently being integrated by 156 organisations, including Samsung, Snap, and Discord, to handle high-throughput inference workloads (UsedBy Dossier). See Gemini profile

Under the Hood

The model architecture is optimised for the new TPU 8i inference chips, delivering output speeds between 280 and 455 tokens per second (Source: Towards AI). This makes it significantly faster than OpenAI’s GPT-5, while maintaining a massive 1M token context window and a 65,536 token output limit (Source: Google Dev Documentation).

Google claims the model outperforms Claude 4.5 Opus in agentic benchmarks, scoring 83.6% on the MCP Atlas (Source: Google Blog). However, the "Flash" branding is now misleading regarding operational costs. Pricing has spiked to $1.50 per 1M input tokens and $9.00 per 1M output tokens (Source: OpenRouter).

This represents a 3x to 6x increase over the previous Gemini 3.1 Flash tier (Source: Reddit r/singularity). Reliability remains a significant concern for production deployments. The model currently exhibits a 61% hallucination rate on the AA-Omniscience benchmark (Source: Artificial Analysis).

Technical inconsistencies are already surfacing in the wild. Users report visual logic failures during SVG generation and a new safety layer that refuses legitimate security code reviews (Source: Reddit r/Bard). Most embarrassing is the "Listen to Article" feature on the official launch post, which contains hallucinated Russian dialogue (Source: HN).

We still do not know the total parameter count for the model, as Google maintains its policy of architectural confidentiality (Source: HN). Furthermore, while a Pro tier was announced for next month, specific pricing for Gemini 3.5 Pro remains unconfirmed (Source: Simon Willison).

Marcus's Take

Skip Gemini 3.5 Flash for customer-facing UI until the hallucination rate drops below 40%. While the 455 tps speed is useful for internal data pipelines, the price hike effectively kills the "cheap and fast" niche that the Flash line previously occupied. If you are paying $9.00 per megatoken for output, you shouldn't have to worry about your model suddenly speaking Russian in your official blog audio.


Ship clean code,
Marcus.

Marcus Webb
Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Related Articles

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.