Numerical Regression in Apple MLX on A18 Pro Silicon

Marcus Webb

Senior Backend Analyst

The Pitch

Apple’s MLX framework is designed to optimise local inference by allowing LLMs to run directly on the Neural Engine and GPU with unified memory access. Despite the iPhone 16 Pro Max being marketed as the definitive hardware for Apple Intelligence, developers are currently grappling with severe output degradation on the A18 Pro chip. (Source: GitHub ml-explore)

Under the Hood

The A18 Pro chip produces "gibberish" outputs when executing local LLM workloads via MLX, a flaw not present in the older iPhone 15 Pro or the current iPhone 17. (Source: GitHub ml-explore).
Detailed debugging reveals that these numerical inconsistencies are tied to floating-point (FP16) instability within the silicon's handling of tensor operations. (Source: Aditya Karnam/GitConnected).
Quantised 4-bit (Q4_K_M) models offer slightly more determinism, but they fail to fully circumvent the underlying hardware issues. (Source: Aditya Karnam/GitConnected).

Related regressions have appeared in CoreML tensor layouts and pointer arithmetic since the iOS 26.1 Beta. (Source: Apple Developer Forums #8007).
This suggests a significant misalignment between the Metal compiler and the A18 Pro’s specific NPU/GPU architecture.
The issue is reported as persistent across various LLM architectures, including Stable Diffusion UNet layers. (Source: Rafael Costa Blog).

We do not know yet if this is a fixable kernel bug in iOS 26.x or a physical hardware defect requiring a recall.
Apple has not issued an official statement regarding the percentage of affected A18 Pro units.
While some users on Hacker News claim it affects only a minority of devices, the lack of a deterministic fix is stalling local AI deployment.

Marcus's Take

The A18 Pro is a liability for any backend engineer looking to move LLM inference to the edge. If your application handles medical or financial data where "order of magnitude" errors equate to catastrophic failure, avoid this device entirely.
The fact that the iPhone 17 and M4 Max chips remain unaffected suggests the 16 Pro Max is a silicon outlier with a broken numerical pipeline.
Move your production targets to the iPhone 17 or M-series iPads until Apple admits whether this requires a firmware throttle or a replacement programme.

Ship clean code,
Marcus.

Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Trend Analysis·3 min read

The Linux Kernel ‘Copy Fail’ and the Argument for Software Abstinence

CVE-2026-31431 is a deterministic Linux kernel Local Privilege Escalation (LPE) affecting nearly every major distribution released since 2017 (Source: Palo Alto Networks). Infrastructure authority Xe

Trend Analysis·3 min read

Cloudflare’s Agentic Restructuring and the 20% Workforce Cut

Cloudflare has announced a 20% reduction in its global workforce, citing a pivot to "agentic AI" as the primary driver for operational efficiency. While management claims internal AI agent usage incre

Trend Analysis·3 min read

Instructure’s Canvas LMS crippled by nationwide outage and data breach during finals week

Canvas is the dominant Learning Management System (LMS) used by major institutions to centralize curriculum and satisfy ADA accessibility requirements. It is currently the focus of intense scrutiny as

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.

The Pitch

Under the Hood

Marcus's Take

Related Articles

The Linux Kernel ‘Copy Fail’ and the Argument for Software Abstinence

Cloudflare’s Agentic Restructuring and the 20% Workforce Cut

Instructure’s Canvas LMS crippled by nationwide outage and data breach during finals week

Stay Ahead of AI Adoption Trends