The Fab visualizations
Part IX · Chapter 67

AMD's Awakening

AMD's Instinct line goes from afterthought to credible alternative: MI300X (Dec 2023), MI325X, MI355X, the MI400 roadmap. ROCm versus CUDA and the SemiAnalysis "CUDA Moat Still Alive" debate. The hyperscaler design wins (Microsoft Azure ND MI300X v5, Meta running Llama 3.1 405B on MI300X exclusively, Oracle BM.GPU.MI300X.8 superclusters). The Zyphra ZAYA models. The October 2025 OpenAI MI450 warrant deal. Lisa Su's long quiet rebuild. → The first crack in the assumption that "AI training" means "Nvidia."

AMD’s stock closed at $3.42 on October 8, 2014, the day the board named Lisa Su president and chief executive. The company had lost money for four straight years, sold its Austin headquarters to keep cash flowing, and watched Intel and Nvidia carve up the markets it had once helped invent. Analysts were openly modeling a path to bankruptcy. The board’s decision to elevate Su, the senior vice president running global business units, was less an act of confidence than an act of last resort. The two CEOs before her had each lasted three years; Hector Ruiz, before that, had ended his tenure under the cloud of an insider-trading investigation. AMD was a company that ate its leaders.

Su, then forty-five, had grown up around problems other people called intractable. She was born in Tainan in November 1969, the daughter of a statistician father and an accountant mother. The family moved to New York when she was three. Her parents pushed her toward the piano, then toward math, then toward MIT, where she enrolled at seventeen and stayed for a decade, taking a bachelor’s, a master’s, and finally a doctorate in electrical engineering. Her dissertation, finished in 1994 under Dimitri Antoniadis, was on extreme-submicrometer silicon-on-insulator transistors, a fringe technology almost nobody believed could be scaled. SOI later became the basis of every IBM Power chip and a generation of high-performance processors. Su was right early, which would become a pattern.

She joined Texas Instruments, then IBM, where she worked under Bernard Meyerson on the team that learned to wire chips with copper instead of aluminum, a process that produced devices roughly twenty percent faster. She moved to Freescale as chief technology officer, and then in 2012 to AMD, brought in by then-CEO Rory Read to run the company’s semi-custom chip business. The group built the silicon inside the PlayStation 4 and Xbox One, which together kept AMD solvent through the worst years. By 2014, Su was the obvious internal candidate for the top job, and the board, having run out of external savior options, finally agreed.

What she inherited was, by any honest accounting, a doomed enterprise. AMD’s market share in server CPUs would collapse from over twenty percent in 2006 to under two percent by 2017. Intel’s Xeon dominated every datacenter rack on earth. Nvidia had won the GPU war that AMD’s 2006 acquisition of ATI had been meant to contest. The 2009 spinoff of its fabs into GlobalFoundries had stripped AMD of manufacturing leverage without solving its product problems, and the morale at its Austin and Sunnyvale offices was, according to engineers who lived through it, somewhere between resignation and gallows humor.

Su’s plan was unsentimental. She told the board AMD would stop apologizing for being smaller than Intel and start designing as if size were an asset. She bet the company on a single new CPU architecture, codenamed Zen, that processor architect Jim Keller had returned to AMD in 2012 to design. Keller, who had previously delivered the K7 and K8 cores that defined AMD’s last competitive era, left in September 2015 for Tesla, before Zen shipped. The architecture work was finished by Mike Clark and a team reporting to Mark Papermaster, AMD’s chief technology officer. When Ryzen launched in March 2017 and the first Epyc server parts followed that summer, Intel’s review labs were caught flat-footed. The chips were not just competitive. They were better in the metrics that mattered most for cloud workloads: cores per socket, memory bandwidth, performance per dollar.

What followed was one of the cleanest comebacks in modern technology. Epyc went from under two percent of server shipments in 2018 to roughly thirty percent by late 2025, on Mercury Research’s count. AMD’s market capitalization rose from around $3 billion in October 2014 to more than $200 billion by 2024. Su was named Time’s CEO of the Year in 2014 and again in 2024, the first woman to win it twice. The company that had fired three CEOs in a decade had become, in CPU terms, the closest thing the industry had to an Intel killer.

But the comeback had a hole, and through 2022 and most of 2023, every observer of the industry knew where the hole was. The defining computing problem of the decade was no longer how to design a faster x86 core. It was how to train and serve transformer models at scale, and that problem was being won, almost without contest, by Nvidia’s CUDA stack and the H100 GPU that ran it. AMD had a datacenter accelerator line, branded Instinct, that almost nobody outside HPC took seriously. The MI100, launched in November 2020, was AMD’s first compute-focused GPU on the new CDNA architecture. It hit 11.5 teraflops of double-precision floating point and won a few national-lab design wins, but it was barely visible in commercial AI. The MI250X went into Frontier, the Oak Ridge supercomputer that became the world’s first exascale machine in May 2022. That was a vindication for HPC. It did almost nothing for the company’s standing in AI.

The AI workloads that mattered ran on Nvidia hardware because they ran on Nvidia software. CUDA, which Jensen Huang had bet his company on in 2006, had accumulated nearly two decades of compiler work, library breadth, and developer reflexes. PyTorch, which became the de facto frontier-training framework after 2018, had been written assuming Nvidia GPUs. Every meaningful kernel, every fused operator, every distributed training trick had a Nvidia-first implementation. AMD’s answer, the open-source ROCm stack, was technically capable and emotionally underwhelming. It compiled. It ran. It did not, in 2022, give an AI researcher the pleasure of typing import torch and watching a model train on the first try.

Su understood this. The Instinct line had been on AMD’s product roadmap since the Vega architecture days, but the version that mattered, the one designed deliberately for the post-ChatGPT world, was the MI300. AMD had begun work on it well before the OpenAI launch. The chip’s centerpiece, MI300X, used a chiplet construction that AMD had refined in its CPUs and bolted onto a CDNA 3 compute fabric: 304 compute units, 153 billion transistors split across multiple TSMC nodes, and, critically, 192 gigabytes of HBM3 memory, more than two and a half times what Nvidia’s H100 carried.

On December 6, 2023, Su walked onto the stage of the Advancing AI keynote in San Jose and made the strongest pitch any AMD CEO had ever made about anything. She held up an MI300X, gold-rimmed and slightly larger than her palm, and called it “the highest-performance accelerator in the world for generative AI.” The line was drafted to be technically defensible at peak BF16 throughput, where the chip rated 1.3 petaflops without sparsity, slightly above an H100. She announced that Microsoft, Meta, and Oracle were among the first hyperscalers committing to deploy the chip at scale. Then she made a forecast some of her own executives thought aggressive: AMD now expected the data-center AI accelerator market to reach $400 billion by 2027, up from her previous estimate of $150 billion. Wall Street did not laugh. Nvidia did not laugh either.

The hyperscaler wins arrived faster than the skeptics had predicted. Microsoft put MI300X into general availability on Azure in May 2024 as the ND MI300X v5 instance, eight GPUs per VM with 1.5 terabytes of high-bandwidth memory, and the Azure team disclosed it had already optimized GPT-4 Turbo to run on the new hardware. Oracle followed with the BM.GPU.MI300X.8 bare-metal supercluster, scaling out to 16,384 GPUs in a single fabric, a cluster size that until then had been a Nvidia-only conversation. The most surprising endorsement came from Meta. At the Llama 3.1 launch in July 2024, Meta disclosed that the entire 405-billion-parameter model could fit, in FP16, on a single eight-GPU MI300X server, a feat the H100 generation could not match because of its smaller HBM. Meta’s vice president of infrastructure supply chain, Kevin Salvatori, stated publicly that all live Llama 3.1 405B traffic at Meta was being served on MI300X exclusively. That was not a benchmark. That was production.

The economic case behind these wins was simple, and Lisa Su’s team had drawn it out for every prospective customer. Nvidia’s H100, in late 2023, was selling for north of $30,000 a unit, with allocation rationed by Jensen Huang himself. The MI300X cost less, came with more memory, and crucially had supply. Every hyperscaler had spent 2023 fighting for H100 allocation; some had been told no. Buying MI300X was, at minimum, a hedge against Nvidia’s monopoly pricing. For inference workloads on very large models, where memory capacity dictated whether you could fit a model on a single node, MI300X’s 192-gigabyte HBM was not just a hedge. It was the right answer.

Even so, the gap that defined the AI hardware market was never compute. It was software. SemiAnalysis, the research firm run by Dylan Patel, published a comparative training benchmark on December 22, 2024, titled “MI300X vs H100 vs H200: CUDA Moat Still Alive.” The findings were brutal. On every model SemiAnalysis tested, the H100 and H200 outperformed MI300X out of the box. The reason was not that AMD’s silicon was slow. The MI300X had idle compute the H100 did not. The reason was that ROCm’s training stack had bugs that would never have shipped on the Nvidia side. FlexAttention, which PyTorch users had relied on since August 2024, was not fully operational on AMD. Continuous integration had been almost nonexistent at AMD until Patel’s team pointed it out. A quarter of the models SemiAnalysis tried failed accuracy checks on AMD entirely.

The report was the most public humiliation Lisa Su’s AI strategy had endured, and her response was characteristic. AMD did not litigate the benchmarks. The company committed, with the kind of urgency it had previously reserved for Zen, to fix the software. By 2025, ROCm had a real CI infrastructure, dedicated kernel teams embedded in PyTorch’s release process, and in-house performance engineers running daily comparisons against Nvidia builds. SemiAnalysis, which had been one of AMD’s harshest critics, eventually wrote that the company had moved into “wartime mode” on software and that the developer experience was “rapidly improving” through 2025, even if it remained behind CUDA’s. AMD had stopped pretending the problem was small.

The hardware roadmap, in parallel, kept shipping. The MI325X arrived on October 10, 2024, an interim refresh that paired the MI300 compute die with 256 gigabytes of HBM3E. Against Nvidia’s H200, AMD claimed 1.8 times the memory and 1.3 times the bandwidth. Eight months later, on June 12, 2025, Su returned to the Advancing AI stage to launch the MI350 series. The MI355X, a liquid-cooled, 1,400-watt monster on the new CDNA 4 architecture, packed 288 gigabytes of HBM3E, supported MXFP4 and MXFP6 datatypes, and beat Nvidia’s like-for-like inference benchmarks by roughly thirteen percent on AMD’s own slides. By the second half of 2025, MI355X production was ramping fast enough that AMD told investors Instinct had cleared $5 billion in 2024, with adoption broadening to eight of the top ten AI labs by Q4 2025.

The economic event that finally forced Wall Street to take AMD seriously as an AI company arrived on October 6, 2025. AMD and OpenAI announced a strategic partnership to deploy six gigawatts of AMD GPUs across multiple Instinct generations, beginning with the MI450 in the second half of 2026. The deal’s structure was unusual. AMD issued OpenAI a warrant for up to 160 million shares of AMD common stock, exercisable at a penny per share, vesting in tranches as deployment milestones and AMD share-price targets were hit. If OpenAI exercised in full, it would own roughly ten percent of AMD. The arrangement, which was both a customer commitment and an equity alignment, was the closest thing the AI hardware industry had ever produced to a marriage contract. AMD’s stock rose nearly forty percent on the announcement, which Lisa Su mostly spent on a press call explaining that the company would not be diluting its other customers in the process.

By that point, Su’s team had spent two years preparing for the moment when an outside lab, with no special inducement and no co-marketing budget, would train a frontier-class model on AMD silicon and report the results in public. That was the only proof point that would settle the underlying question: not whether AMD could sell chips, but whether a serious lab could build a serious model on them, end to end, without secretly relying on a Nvidia cluster for the hard parts.

The lab that delivered the proof was Zyphra, a San Francisco company founded in 2020 by Krithik Puthalath with three co-founders, Tomás Figliolia, Beren Millidge, and Danny Martinelli. Zyphra was small, well funded, and obsessed with parameter-efficient architectures, the school of model design that tried to extract the most reasoning per active parameter rather than the most parameters per training run. In 2024 it had released Zamba, a hybrid state-space model meant to run on edge devices. Zyphra was not a household name. It was, however, exactly the kind of customer AMD needed: a frontier lab with no legacy CUDA codebase, no political reason to favor Nvidia, and a research thesis that rewarded the kind of large-memory architecture MI300X uniquely provided.

On November 24, 2025, AMD and Zyphra announced ZAYA1, a Mixture-of-Experts foundation model with 8.3 billion total parameters and 760 million active parameters, trained from scratch on a custom cluster of 1,024 MI300X GPUs across 128 nodes, networked with AMD’s own Pensando Pollara 400 interconnect, hosted on IBM Cloud, running entirely on ROCm. The cluster delivered more than 750 petaflops of sustained throughput. The pretraining corpus ran to fourteen trillion tokens. ZAYA1-base outperformed Llama-3-8B and OLMoE on most reasoning benchmarks and rivaled Qwen3-4B and Gemma-3-12B. The accompanying paper detailed a new attention architecture, Compressed Convolutional Attention, that compressed the KV cache by a factor of eight, and a more expressive MoE router that allowed top-1 expert selection without the auxiliary residual experts the field had been using as crutches.

The ZAYA1 result was the thing AMD had been waiting for. Not a benchmark. Not a press demo. A full pretraining run, on a competitor’s hardware, by an outside lab, with the paper to back it up. Six months later, in May 2026, Zyphra published ZAYA1-8B, a reasoning-optimized variant whose results on the Harvard-MIT Math Tournament’s HMMT’25 benchmark surpassed Anthropic’s Claude 4.5 Sonnet and OpenAI’s GPT-5-High, and approached DeepSeek-V3.2 on coding and mathematics evaluations, all at fewer than a billion active parameters. The thesis was that large memory per accelerator was not a nice-to-have but the architectural fact that allowed an MoE model with this many experts to be trained without expert sharding or tensor sharding, the two distributed-training techniques that had, until then, been the price of admission to the frontier.

In January 2026, Lisa Su took the stage at CES to unveil the MI400 series, headlined by the flagship MI455X, an accelerator built on twelve TSMC N2 compute chiplets with 432 gigabytes of HBM4 memory, 19.6 terabytes per second of bandwidth, and up to 40 petaflops of FP4 performance. Production was set for the second half of 2026, delivered in AMD’s new Helios rack platform, a double-wide design rated at three AI exaflops per rack. Oracle’s Cloud Infrastructure had committed to be the first publicly available MI450 supercluster, with 50,000 GPUs deploying in the third quarter of 2026. The OpenAI six-gigawatt deal would begin its first one-gigawatt deployment on the same generation.

The financials caught up to the architecture. AMD’s data-center segment revenue had cleared $12.6 billion in 2024, an increase of 94 percent over 2023, driven roughly equally by Instinct and the EPYC server CPU line. Full-year 2025 data-center revenue reached $16.6 billion, with $5.4 billion in the fourth quarter alone, on a trajectory AMD had told investors would push annual data-center revenue past twenty-five billion in 2026 if the MI450 ramp tracked to plan. Instinct alone had cleared five billion dollars in its first full year of volume; by 2025 the line was contributing in the high single-digit billions and was on track for the low-to-mid teens in 2026. None of these numbers rivaled Nvidia’s. All of them were what an AMD CFO of the prior decade would have considered impossible. The asterisk on Nvidia’s earnings had become a sentence with weight in it.

The first cousin once removed of the man running the world’s most valuable company had built, in slow motion across eleven years, the only credible alternative to him. Su and Jensen Huang had not grown up together. Her maternal grandfather and his mother were siblings, but the two had not met until well into their careers, at an industry event neither remembered with much sentiment. They were, by 2026, the most consequential pair of relatives in the global economy. The assumption that AI training meant Nvidia, load-bearing for two years of capital-allocation decisions across the entire technology industry, had finally cracked. It had not broken. CUDA was still ahead of ROCm in every metric a developer might privately care about. But the Zyphra paper, the OpenAI deal, and the eight-of-ten AI labs deploying MI355X in production had answered the question of whether frontier-scale training without Nvidia was possible. It was. The next question was whether it would be cheaper.