Skip to main content
UsedBy.ai
All articles
Trend Analysis3 min read
Published: March 5, 2026

The Engineering Cost of Plausible Forgery

Large Language Models function as "forgery engines" that prioritize the generation of plausible-sounding output over the transmission of factual truth (source: Acko.net). Steven Wittens, an ex-Google

Marcus Webb
Marcus Webb
Senior Backend Analyst

The Pitch

Large Language Models function as "forgery engines" that prioritize the generation of plausible-sounding output over the transmission of factual truth (source: Acko.net). Steven Wittens, an ex-Google engineer and creator of Use.GPU, argues that the current reliance on frontier models is facilitating a flood of "code slop" that erodes technical rigor. The critique has gained significant traction on Hacker News because it challenges the narrative that increased reasoning scores equate to increased reliability in production environments.

Under the Hood

Frontier models like GPT-5 and Claude 4 Sonnet have reduced general hallucination rates to approximately 4.8%, yet the "slop" phenomenon remains a structural risk for enterprise codebases (UsedBy Dossier). Senior engineers report that AI agents frequently produce repetitive, overly complex code that avoids necessary refactoring in favour of quick fixes. This trend is exacerbated by "vibe-coders" who prioritize rapid PR generation over long-term maintainability.

The BullshitBench v2, released in March 2026, confirms that even top-tier models like Claude 4.5 Opus struggle with "factual refusal" in specialized domains such as Legal and Medical (AnyAPI.ai). While GPT-5 shows a 40% improvement in reasoning tasks, it still hallucinates fake libraries or non-existent API endpoints between 3% and 12% of the time in production contexts (UsedBy Dossier). This reliability gap forces senior staff into a perpetual state of auditing rather than innovating.

The industry's response to this decay is fragmented. Valve updated its Steam AI Disclosure policy in January 2026 to exempt "code helpers" from public labels, even as it tightened requirements for visible assets (GosuGamers). Furthermore, we currently lack any quantitative longitudinal studies on the long-term maintenance costs of AI-authored "slop" compared to human-authored code (UsedBy Dossier). There is also no official word from Microsoft regarding the alleged censorship of the term "Microslop" within developer communities.

We are also seeing early signs of "Mode Collapse," where a narrow consensus on "best practices" suggested by LLMs is stifling alternative architectural problem-solving (HN Comment). This suggests that the current generation of tools may be narrowing the creative scope of backend engineering while simultaneously increasing the volume of mid-tier technical debt.

Marcus's Take

I have spent my career cleaning up after humans; cleaning up after a non-deterministic agent that hallucinates an API endpoint 12% of the time is a special circle of hell. Wittens is correct: we are trading technical debt for "vibe" speed. If your workflow relies on Claude 4 Sonnet to generate architecture without a senior dev reviewing every line against a cold, hard reality check, you aren't building a system—you're hosting a forgery. Use these models for boilerplate generation and regex, but treat every architectural suggestion as a hostile PR that requires 100% test coverage before it ever hits staging.


Ship clean code,
Marcus.

Marcus Webb
Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Related Articles

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.