Deterministic Scaffolding for VLM Image Generation
Frontier models like Gemini 3.0 Pro and GPT-5 still cannot natively handle complex spatial tasks such as numbering a 50-step spiral game board (source: samcollins.blog). The Underdrawing Method uses d

The Pitch
Frontier models like Gemini 3.0 Pro and GPT-5 still cannot natively handle complex spatial tasks such as numbering a 50-step spiral game board (source: samcollins.blog). The Underdrawing Method uses deterministic SVG or Python scripts to create a structural scaffold before any pixels are generated. By separating logic from aesthetics, developers can force 100% accuracy in text and numbering that native one-shot prompting still fails to deliver in May 2026.
Under the Hood
Gemini 3.0 Pro and ChatGPT Images 2 consistently fail to correctly number 50 consecutive items in a spiral natively (source: samcollins.blog). Asking GPT-5 to number a spiral is currently the quickest way to turn a logic problem into a surrealist painting. This method solves the hallucination by requiring a two-phase workflow: Layer 1 is a deterministic SVG or Python-based outline, and Layer 2 uses generative Image-to-Image models to apply textures (source: Sam Collins blog).
Research from WACV 2026 suggests that current AI editors only fulfill about 33% of precise editing requests correctly. This confirms a persistent gap in the 2026 stack that necessitates external geometric constraints (source: WACV 2026 Paper #2231-2241). The Hacker News community views this as a sophisticated evolution of early Stable Diffusion img2img workflows, now adapted for VLM reasoning (source: HN comment by vunderba).
Current limitations and unknowns:
- High technical friction requiring knowledge of SVG, Python, or Mermaid.
- Potential "Prompt Neglect" where models ignore descriptive style adjectives (source: HN).
- Increased agentic latency due to the multi-step code-and-vision execution.
- No public library yet exists to automate Layer 1 for non-engineers.
- Performance deltas between Claude 4.5 Opus and Gemini 3.0 Pro are currently undocumented.
Marcus's Take
This is the only viable way to ship production assets involving data visualization or precise spatial layouts in May 2026. If your product relies on GPT-5's intuition to place 50 numbers correctly, you are shipping broken features. It is a cumbersome workflow that increases latency and friction, but until vision models can actually count, you must use it for any project where accuracy is non-negotiable.
Ship clean code,
Marcus.

Marcus Webb - Senior Backend Analyst at UsedBy.ai
Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript
Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era
The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

The Zero-Click Economy: Kagi Search vs. Google AI Mode
Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W
Stay Ahead of AI Adoption Trends
Get our latest reports and insights delivered to your inbox. No spam, just data.