nCPU: Simulating ARM64 Logic via GPU-Based Neural Networks
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digi

The Pitch
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digital logic accuracy in a purely virtualized environment (GitHub). The project has gained traction on Hacker News for successfully running legacy software like DOOM (1993) without using the host CPU for arithmetic.
Under the Hood
The architecture relies on a Kogge-Stone parallel-prefix algorithm utilizing a trained carry-combine network to ensure bitwise precision (GitHub). This setup creates a peculiar inversion of traditional computing performance: multiplication is 12x faster than addition. This occurs because the neural Look-Up Table (LUT) byte-pair implementation has zero sequential dependency, unlike traditional ripple-carry logic (GitHub).
Register files, addresses, and data paths are emulated entirely through GPU memory textures (Ecosistema Startup). While this allows for the execution of a full x86-to-ARM recompiled DOOM engine at 60 FPS, the raw throughput is limited to roughly 5,000 instructions per second (Ecosistema Startup/HN Thread). This makes the system orders of magnitude slower than even the low-power RISC-V chips common in 2026.
There are significant technical hurdles regarding latency and resource allocation. Each cycle currently takes between 136 and 262 microseconds, which is far too slow for general computing (GitHub). Mapping system RAM to GPU textures also creates massive VRAM overhead, effectively capping the available addressable space for complex applications (Developer Documentation).
We don't know yet how nCPU handles thermal efficiency or if it can scale to modern SIMD and vector instructions like SVE or Neon. Furthermore, there is a risk of precision drift when running on FP16 or BF16 tensor cores rather than full FP32 (Ecosistema Startup). At 5k IPS, your GPU is essentially a very expensive, very hot 1970s mainframe.
Marcus's Take
nCPU is a brilliant piece of research, but it is not a production tool. It proves that neural networks can reliably mimic deterministic digital logic, which is a significant academic milestone for 2026. However, the latency and power trade-offs make it useless for anything beyond niche logic verification or academic curiosity. Play with the GitHub repo to understand the Kogge-Stone implementation, then go back to writing code for actual silicon.
Ship clean code,
Marcus.

Marcus Webb - Senior Backend Analyst at UsedBy.ai
Related Articles

Tin Can: A Proprietary VoIP Stack Disguised as Kids' Safety Hardware
Tin Can is a proprietary VoIP-over-Wi-Fi device marketed as a screen-free "landline" for children to communicate with a parent-approved whitelist. Following a $12M Series A led by Greylock Partners in

The 500MB Payload: The Technical Failure of Future PLC Infrastructure
PC Gamer recently published a guide to RSS readers, positioning them as the solution to modern social media bloat and algorithmic noise. The article is currently a focal point on Hacker News not for i

POSSE and the Industrialisation of Personal Domains
POSSE (Publish on your Own Site, Syndicate Elsewhere) is a decentralised publishing architecture that mandates the personal domain as the primary source for all content. By treating social media silos
Stay Ahead of AI Adoption Trends
Get our latest reports and insights delivered to your inbox. No spam, just data.