Skip to main content
UsedBy.ai
All articles
Trend Analysis3 min read
Published: March 4, 2026

nCPU: Simulating ARM64 Logic via GPU-Based Neural Networks

nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digi

Marcus Webb
Marcus Webb
Senior Backend Analyst

The Pitch

nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digital logic accuracy in a purely virtualized environment (GitHub). The project has gained traction on Hacker News for successfully running legacy software like DOOM (1993) without using the host CPU for arithmetic.

Under the Hood

The architecture relies on a Kogge-Stone parallel-prefix algorithm utilizing a trained carry-combine network to ensure bitwise precision (GitHub). This setup creates a peculiar inversion of traditional computing performance: multiplication is 12x faster than addition. This occurs because the neural Look-Up Table (LUT) byte-pair implementation has zero sequential dependency, unlike traditional ripple-carry logic (GitHub).

Register files, addresses, and data paths are emulated entirely through GPU memory textures (Ecosistema Startup). While this allows for the execution of a full x86-to-ARM recompiled DOOM engine at 60 FPS, the raw throughput is limited to roughly 5,000 instructions per second (Ecosistema Startup/HN Thread). This makes the system orders of magnitude slower than even the low-power RISC-V chips common in 2026.

There are significant technical hurdles regarding latency and resource allocation. Each cycle currently takes between 136 and 262 microseconds, which is far too slow for general computing (GitHub). Mapping system RAM to GPU textures also creates massive VRAM overhead, effectively capping the available addressable space for complex applications (Developer Documentation).

We don't know yet how nCPU handles thermal efficiency or if it can scale to modern SIMD and vector instructions like SVE or Neon. Furthermore, there is a risk of precision drift when running on FP16 or BF16 tensor cores rather than full FP32 (Ecosistema Startup). At 5k IPS, your GPU is essentially a very expensive, very hot 1970s mainframe.

Marcus's Take

nCPU is a brilliant piece of research, but it is not a production tool. It proves that neural networks can reliably mimic deterministic digital logic, which is a significant academic milestone for 2026. However, the latency and power trade-offs make it useless for anything beyond niche logic verification or academic curiosity. Play with the GitHub repo to understand the Kogge-Stone implementation, then go back to writing code for actual silicon.


Ship clean code,
Marcus.

Marcus Webb
Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Related Articles

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.