DAY 04 · THE PRETENDERS

Eight companies that tried to beat CUDA. Here's where they actually stand.

AMD, Intel, Apple, Google, Amazon, three Chinese national champions, a wafer-scale moonshot. Each one has billions of dollars and a credible team. None of them have unseated NVIDIA. Today we walk the lineup honestly — what's clever, what's broken, who's closest, and the one threat that genuinely keeps Jensen up at night.

The pattern to watch for

Almost every challenger has shipped respectable silicon. The pattern is always the same: hardware lands within ~50–80% of NVIDIA's perf, and then the software story ranges from "limping" to "promising." Every chip below has the same structural problem — the libraries either don't exist, don't run third-party code, or run it 2–5× slower than CUDA.

A useful rubric

For each contender ask three questions: (1) Does it run modern PyTorch out of the box? (2) Can the open-source AI community ship to it without a separate fork? (3) Is the chip available in real volume? Almost everyone answers "no" to at least one.

The lineup

AMD · Instinct MI300X / MI325X · ROCm
THREAT · MEDIUM
Hardware
Genuinely competitive on FLOPs and memory (192 GB HBM3 on MI300X). Often beats H100 on raw specs.
Software
ROCm — AMD's CUDA clone. Functional, ~80% feature parity, half the performance on key kernels.
Ecosystem
PyTorch supports it. vLLM has a fork. Most papers don't test it. Open-source tools either lag or break on every release.
Reality
Meta and Microsoft have ordered MI300X in volume — partly to negotiate NVIDIA price, partly because supply is tight.
"The closest competitor by a mile, and still not close enough. Catching up requires a five-year, billion-dollar software-only sprint AMD has never quite committed to."
Intel · Gaudi 2 / Gaudi 3 · SynapseAI
THREAT · LOW
Hardware
Originally designed by Habana Labs (acquired 2019). Decent training silicon, very competitive perf-per-dollar on paper.
Software
SynapseAI stack. Functional, but small. PyTorch support exists; few teams use it outside benchmarks.
Ecosystem
Almost none. Most ML researchers can't tell you they've ever run a script on a Gaudi.
Reality
Intel announced "Falcon Shores" as the successor — then de-scoped it, then renamed it. As of 2026, AI is not Intel's strong arm.
"Engineering credible. Strategy unclear. The least likely Big Tech competitor to actually break through."
Google · TPU v5p / v6 · XLA / JAX
THREAT · HIGH (but contained)
Hardware
The most successful non-NVIDIA AI chip ever made. Google has been training internal models on TPUs for a decade. Excellent at scale.
Software
XLA + JAX. World-class. Different programming model than PyTorch, but mature and beloved by parts of the research community.
Ecosystem
Google-only. You rent TPUs by the hour from GCP. Cannot buy one for your office. No DGX-equivalent product.
Reality
Gemini, Search, YouTube and the whole Google AI stack runs on TPUs. The world\'s second-biggest AI compute estate.
"The only credible alternative to CUDA at scale — and it\'s a closed walled garden. Useful if you live inside Google. Unavailable to everyone else."
AWS · Trainium 2 / Inferentia 2 · Neuron SDK
THREAT · MEDIUM
Hardware
Custom Annapurna-designed silicon. Trainium for training, Inferentia for serving. Decent perf, very aggressive price.
Software
Neuron SDK. Works with PyTorch and JAX. Real progress, still rough around the edges, very AWS-shaped.
Ecosystem
AWS-only. Available only as cloud instances on Amazon. Anthropic ran Claude training on Trainium 2 in 2024 — proof point.
Reality
If you\'re a big AWS-native shop, Trainium can shave 30–50% off your inference bill compared to H100s on AWS.
"A real cost-arbitrage play if you\'re already deeply on AWS. Not a CUDA replacement, but a wedge against pure NVIDIA pricing."
Apple · M3 Ultra / M4 Max · Metal · MLX
THREAT · LOW (different fight)
Hardware
Excellent unified memory architecture, world-class power efficiency. Covered in detail yesterday.
Software
Metal Performance Shaders (graphics-first compute) + MLX (their PyTorch-equivalent for ML). Both small but improving fast.
Ecosystem
Closed Apple-silicon-only. Strong inside the Apple developer world, irrelevant elsewhere.
Reality
Apple is not trying to compete with NVIDIA in the data center. They\'re building the best on-device AI experience for their products.
"A different game. Apple wins on consumer + on-device. They\'re not chasing the AI-builder market. Different segment, not a competing chip."
Huawei · Ascend 910C · MindSpore / CANN
THREAT · HIGH (geopolitical)
Hardware
Ascend 910C ships in 2024–25. Specs roughly comparable to H100 on paper, manufactured at SMIC.
Software
CANN (Compute Architecture for Neural Networks) + MindSpore. The Chinese answer to CUDA + PyTorch.
Ecosystem
Mandatory inside China for any company that wants to keep state contracts. Adoption is climbing fast.
Reality
If you\'re in China, the choice is "CUDA you can\'t reliably get due to export controls" vs "Ascend you actually can." Ascend wins by default.
"The first credible non-Western parallel-compute platform. Slower than CUDA — but unstoppable inside China, with US sanctions handing them market share for free."
Cambricon · Biren · Moore Threads · (China cohort)
THREAT · MEDIUM (regional)
Hardware
Three more Chinese chip startups, each with a partial CUDA-clone software stack. Limited fab access (no TSMC 5nm/3nm).
Software
Each has its own PyTorch-port project. None at parity with mainline CUDA.
Ecosystem
Domestic only. Subsidized heavily.
"The unsung consequence of US export controls. Beijing is bankrolling 5+ different CUDA replacements simultaneously. One of them will eventually matter."
Cerebras · Wafer-Scale Engine · Cerebras SDK
THREAT · LOW (niche)
Hardware
Genuinely radical: a single chip the size of a dinner plate, ~900,000 cores. Not a GPU at all.
Software
Bespoke. PyTorch shim exists. Not a CUDA replacement; a different paradigm.
Ecosystem
Some research labs and the US government love it for specific workloads.
"Beautiful engineering, narrow market. They\'ll always exist, never threaten the data-center incumbent."
Groq · LPU · Custom toolchain
THREAT · MEDIUM (inference niche)
Hardware
An ASIC purpose-built for LLM inference. Insane single-stream tokens/sec on supported models.
Software
You don\'t train on it. You can\'t run anything except a small list of pre-compiled models.
Ecosystem
API only. You buy tokens, not chips.
"The model for what an inference-only ASIC can do — fast, cheap, brittle, narrow. NVIDIA pays attention; CUDA itself isn\'t the target."

The synthesis

If you map all eight pretenders on two axes, the picture clarifies:

Software maturity Hardware availability → low NVIDIA AMD TPU (GCP only) Trainium Apple Intel Huawei (China only) Cerebras Groq China
Top-right is where NVIDIA lives — and lives alone. Everyone else is at least one full quadrant away on at least one axis.

The threat NVIDIA actually watches

None of the chip companies above are the real threat. The threat is vertical integration by NVIDIA's own customers.

OpenAI is reportedly designing chips with Broadcom. Anthropic uses massive amounts of Trainium and reportedly explores its own ASICs. Meta's MTIA is shipping. Google has used TPUs internally for a decade. The pattern is clear: any company that spends more than a few billion a year on NVIDIA inference has a financial incentive to design their own inference ASIC.

Why this matters: training will stay on NVIDIA for the foreseeable future — it needs the full library catalog and the constant churn of new techniques. Inference at huge scale will leak to custom ASICs, because once a model is fixed, you can hand-tune silicon for it.

The real bet

If you\'re betting on a chip portfolio for the next 5 years, the right framing is: "NVIDIA owns training and the cutting edge. Custom ASICs eat inference at scale. Everyone else is a footnote." Your DGX Spark sits in the first half of that sentence — exactly where the moat is real.

What this means for your DGX Spark

For the next 5 years, your DGX Spark is a safe bet. Here\'s why:

The same can\'t be said for, say, a $10,000 Mac Studio bought as a "future-proof AI workstation." If MLX doesn\'t catch up — and it might not — that bet ages badly.

Today's reflection

Pick one competitor and form an opinion: in your judgment, is this company more likely or less likely to threaten NVIDIA in three years? Defend your answer in two sentences. The discipline of having a view is what separates an operator from a spectator.
← Day 3DGX Spark vs Mac Studio