Gemini 3.5 Flash Performance and Pricing Analysis

Marcus Webb

Senior Backend Analyst

The Pitch

Google released Gemini 3.5 Flash yesterday at I/O 2026, positioning it as an agent-first model that prioritises speed over pure parameter density. It is currently being integrated by 156 organisations, including Samsung, Snap, and Discord, to handle high-throughput inference workloads (UsedBy Dossier). See Gemini profile

Under the Hood

The model architecture is optimised for the new TPU 8i inference chips, delivering output speeds between 280 and 455 tokens per second (Source: Towards AI). This makes it significantly faster than OpenAI’s GPT-5, while maintaining a massive 1M token context window and a 65,536 token output limit (Source: Google Dev Documentation).

Google claims the model outperforms Claude 4.5 Opus in agentic benchmarks, scoring 83.6% on the MCP Atlas (Source: Google Blog). However, the "Flash" branding is now misleading regarding operational costs. Pricing has spiked to $1.50 per 1M input tokens and $9.00 per 1M output tokens (Source: OpenRouter).

This represents a 3x to 6x increase over the previous Gemini 3.1 Flash tier (Source: Reddit r/singularity). Reliability remains a significant concern for production deployments. The model currently exhibits a 61% hallucination rate on the AA-Omniscience benchmark (Source: Artificial Analysis).

Technical inconsistencies are already surfacing in the wild. Users report visual logic failures during SVG generation and a new safety layer that refuses legitimate security code reviews (Source: Reddit r/Bard). Most embarrassing is the "Listen to Article" feature on the official launch post, which contains hallucinated Russian dialogue (Source: HN).

We still do not know the total parameter count for the model, as Google maintains its policy of architectural confidentiality (Source: HN). Furthermore, while a Pro tier was announced for next month, specific pricing for Gemini 3.5 Pro remains unconfirmed (Source: Simon Willison).

Marcus's Take

Skip Gemini 3.5 Flash for customer-facing UI until the hallucination rate drops below 40%. While the 455 tps speed is useful for internal data pipelines, the price hike effectively kills the "cheap and fast" niche that the Flash line previously occupied. If you are paying $9.00 per megatoken for output, you shouldn't have to worry about your model suddenly speaking Russian in your official blog audio.

Ship clean code,
Marcus.

Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Trend Analysis·3 min read

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Trend Analysis·3 min read

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

Trend Analysis·3 min read

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.

The Pitch

Under the Hood

Marcus's Take

Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Stay Ahead of AI Adoption Trends