DAY 04 · THE PRETENDERS

Eight companies that tried to beat CUDA. Here's where they actually stand.

AMD, Intel, Apple, Google, Amazon, three Chinese national champions, a wafer-scale moonshot. Each one has billions of dollars and a credible team. None of them have unseated NVIDIA. Today we walk the lineup honestly — what's clever, what's broken, who's closest, and the one threat that genuinely keeps Jensen up at night.

The pattern to watch for

Almost every challenger has shipped respectable silicon. The pattern is always the same: hardware lands within ~50–80% of NVIDIA's perf, and then the software story ranges from "limping" to "promising." Every chip below has the same structural problem — the libraries either don't exist, don't run third-party code, or run it 2–5× slower than CUDA.

A useful rubric

For each contender ask three questions: (1) Does it run modern PyTorch out of the box? (2) Can the open-source AI community ship to it without a separate fork? (3) Is the chip available in real volume? Almost everyone answers "no" to at least one.

The lineup

AMD · Instinct MI300X / MI325X · ROCm

THREAT · MEDIUM

Hardware

Genuinely competitive on FLOPs and memory (192 GB HBM3 on MI300X). Often beats H100 on raw specs.

Software

ROCm — AMD's CUDA clone. Functional, ~80% feature parity, half the performance on key kernels.

Ecosystem

PyTorch supports it. vLLM has a fork. Most papers don't test it. Open-source tools either lag or break on every release.

Reality

Meta and Microsoft have ordered MI300X in volume — partly to negotiate NVIDIA price, partly because supply is tight.

"The closest competitor by a mile, and still not close enough. Catching up requires a five-year, billion-dollar software-only sprint AMD has never quite committed to."

Intel · Gaudi 2 / Gaudi 3 · SynapseAI

THREAT · LOW

Hardware

Originally designed by Habana Labs (acquired 2019). Decent training silicon, very competitive perf-per-dollar on paper.

Software

SynapseAI stack. Functional, but small. PyTorch support exists; few teams use it outside benchmarks.

Ecosystem

Almost none. Most ML researchers can't tell you they've ever run a script on a Gaudi.

Reality

Intel announced "Falcon Shores" as the successor — then de-scoped it, then renamed it. As of 2026, AI is not Intel's strong arm.

"Engineering credible. Strategy unclear. The least likely Big Tech competitor to actually break through."

Google · TPU v5p / v6 · XLA / JAX

THREAT · HIGH (but contained)

Hardware

The most successful non-NVIDIA AI chip ever made. Google has been training internal models on TPUs for a decade. Excellent at scale.

Software

XLA + JAX. World-class. Different programming model than PyTorch, but mature and beloved by parts of the research community.

Ecosystem

Google-only. You rent TPUs by the hour from GCP. Cannot buy one for your office. No DGX-equivalent product.

Reality

Gemini, Search, YouTube and the whole Google AI stack runs on TPUs. The world\'s second-biggest AI compute estate.

"The only credible alternative to CUDA at scale — and it\'s a closed walled garden. Useful if you live inside Google. Unavailable to everyone else."

AWS · Trainium 2 / Inferentia 2 · Neuron SDK

THREAT · MEDIUM

Hardware

Custom Annapurna-designed silicon. Trainium for training, Inferentia for serving. Decent perf, very aggressive price.

Software

Neuron SDK. Works with PyTorch and JAX. Real progress, still rough around the edges, very AWS-shaped.

Ecosystem

AWS-only. Available only as cloud instances on Amazon. Anthropic ran Claude training on Trainium 2 in 2024 — proof point.

Reality

If you\'re a big AWS-native shop, Trainium can shave 30–50% off your inference bill compared to H100s on AWS.

"A real cost-arbitrage play if you\'re already deeply on AWS. Not a CUDA replacement, but a wedge against pure NVIDIA pricing."

Apple · M3 Ultra / M4 Max · Metal · MLX

THREAT · LOW (different fight)

Hardware

Excellent unified memory architecture, world-class power efficiency. Covered in detail yesterday.

Software

Metal Performance Shaders (graphics-first compute) + MLX (their PyTorch-equivalent for ML). Both small but improving fast.

Ecosystem

Closed Apple-silicon-only. Strong inside the Apple developer world, irrelevant elsewhere.

Reality

Apple is not trying to compete with NVIDIA in the data center. They\'re building the best on-device AI experience for their products.

"A different game. Apple wins on consumer + on-device. They\'re not chasing the AI-builder market. Different segment, not a competing chip."

Huawei · Ascend 910C · MindSpore / CANN

THREAT · HIGH (geopolitical)

Hardware

Ascend 910C ships in 2024–25. Specs roughly comparable to H100 on paper, manufactured at SMIC.

Software

CANN (Compute Architecture for Neural Networks) + MindSpore. The Chinese answer to CUDA + PyTorch.

Ecosystem

Mandatory inside China for any company that wants to keep state contracts. Adoption is climbing fast.

Reality

If you\'re in China, the choice is "CUDA you can\'t reliably get due to export controls" vs "Ascend you actually can." Ascend wins by default.

"The first credible non-Western parallel-compute platform. Slower than CUDA — but unstoppable inside China, with US sanctions handing them market share for free."

Cambricon · Biren · Moore Threads · (China cohort)

THREAT · MEDIUM (regional)

Hardware

Three more Chinese chip startups, each with a partial CUDA-clone software stack. Limited fab access (no TSMC 5nm/3nm).

Software

Each has its own PyTorch-port project. None at parity with mainline CUDA.

Ecosystem

Domestic only. Subsidized heavily.

"The unsung consequence of US export controls. Beijing is bankrolling 5+ different CUDA replacements simultaneously. One of them will eventually matter."

Cerebras · Wafer-Scale Engine · Cerebras SDK

THREAT · LOW (niche)

Hardware

Genuinely radical: a single chip the size of a dinner plate, ~900,000 cores. Not a GPU at all.

Software

Bespoke. PyTorch shim exists. Not a CUDA replacement; a different paradigm.

Ecosystem

Some research labs and the US government love it for specific workloads.

"Beautiful engineering, narrow market. They\'ll always exist, never threaten the data-center incumbent."

Groq · LPU · Custom toolchain

THREAT · MEDIUM (inference niche)

Hardware

An ASIC purpose-built for LLM inference. Insane single-stream tokens/sec on supported models.

Software

You don\'t train on it. You can\'t run anything except a small list of pre-compiled models.

Ecosystem

API only. You buy tokens, not chips.

"The model for what an inference-only ASIC can do — fast, cheap, brittle, narrow. NVIDIA pays attention; CUDA itself isn\'t the target."

The synthesis

If you map all eight pretenders on two axes, the picture clarifies:

Top-right is where NVIDIA lives — and lives alone. Everyone else is at least one full quadrant away on at least one axis.

The threat NVIDIA actually watches

None of the chip companies above are the real threat. The threat is vertical integration by NVIDIA's own customers.

OpenAI is reportedly designing chips with Broadcom. Anthropic uses massive amounts of Trainium and reportedly explores its own ASICs. Meta's MTIA is shipping. Google has used TPUs internally for a decade. The pattern is clear: any company that spends more than a few billion a year on NVIDIA inference has a financial incentive to design their own inference ASIC.

Why this matters: training will stay on NVIDIA for the foreseeable future — it needs the full library catalog and the constant churn of new techniques. Inference at huge scale will leak to custom ASICs, because once a model is fixed, you can hand-tune silicon for it.

The real bet

If you\'re betting on a chip portfolio for the next 5 years, the right framing is: "NVIDIA owns training and the cutting edge. Custom ASICs eat inference at scale. Everyone else is a footnote." Your DGX Spark sits in the first half of that sentence — exactly where the moat is real.

What this means for your DGX Spark

For the next 5 years, your DGX Spark is a safe bet. Here\'s why:

It\'s training-capable. Training stays on NVIDIA.
It runs the entire library catalog. The library catalog isn\'t getting smaller.
It\'s small enough that even if Anthropic-grade ASICs eat datacenter inference, your local inference is unaffected.
Pairs with another DGX Spark for ~256 GB if you ever need it.

The same can\'t be said for, say, a $10,000 Mac Studio bought as a "future-proof AI workstation." If MLX doesn\'t catch up — and it might not — that bet ages badly.

Today's reflection

Pick one competitor and form an opinion: in your judgment, is this company more likely or less likely to threaten NVIDIA in three years? Defend your answer in two sentences. The discipline of having a view is what separates an operator from a spectator.