Part IX · Chapter 68

The ASIC Inflection

Custom silicon takes meaningful share of frontier workloads. Trainium2 ramp at AWS through 2024–2025; Trainium3 in 2025–2026; Project Rainier in New Carlisle, Indiana scaling toward a million Trainium2 chips. Google TPU v6 Trillium and the v7 Ironwood successor; Anthropic's million-Ironwood commitment. OpenAI's Stargate footprint and the Broadcom XPU co-design. Microsoft's CoreWeave dependence. Apple's first reported TPU training disclosure for Apple Intelligence. → Custom silicon stops being side-business and starts being load-bearing.

In the second week of December 2024, at AWS re:Invent in the Venetian’s Sands Expo Center, Matt Garman stepped onto a keynote stage in front of fifty thousand cloud engineers and made a sentence land that, twelve months earlier, would have been a marketing fantasy. Anthropic, he told the audience, was building a single training cluster for its next generation of Claude models on Amazon’s custom Trainium2 silicon. The cluster, codenamed Project Rainier, was already under construction across multiple AWS data centers and would scale to hundreds of thousands of Trainium2 chips operating as a single coherent fabric. The headline on the slide behind him was five times the compute Anthropic had used to train its current generation. The implication, which Garman did not have to spell out and which every analyst in the room caught, was that the largest frontier-AI training run currently in flight in the United States was running on a chip not made by Nvidia.

The room understood what it was watching. For two years, since the H100 had become the most-rationed product in the technology industry, the assumption underneath every AI company’s procurement model had been that Nvidia silicon was load-bearing. Hyperscaler custom chips were a hedge, a long-term margin play, an interesting research project for a small headcount in Tel Aviv or Mountain View. Trainium had launched at re:Invent 2020 to polite applause and by 2022 had landed almost no consequential workloads. Maia, when Microsoft unveiled it in November 2023, had been treated by analysts as a useful supplement. MTIA had appeared on Meta’s blog as a recommendation accelerator. None of these chips, on any honest reading, had displaced Nvidia from the workloads that mattered. By the time Garman finished his keynote, the framing had inverted. The custom silicon that had been side-business in 2022 was, in late 2024, becoming the substrate of the largest contracts being signed in the industry.

The shift had been visible, in retrospect, since the Anthropic deal. In September 2023, Amazon announced a four-billion-dollar investment in Anthropic, escalated to eight billion by March 2024, tied to a deeper commitment than any cap-table line implied. Anthropic would make AWS its primary cloud provider and, beginning in late 2024, its primary training partner. Translation: Anthropic would put real frontier models on Trainium2. Production training runs whose budgets ran into the hundreds of millions per model. It was the deal that gave the chip a customer no one could dismiss.

The deepening came in stages. Trainium2 reached general availability in December 2024 with eight NeuronCore-V3 cores per chip, ninety-six gigabytes of high-bandwidth memory, 1.3 dense petaflops of FP8 compute, and a pricing pitch of forty percent better dollars-per-token than equivalent Nvidia instances. The flagship campus for Project Rainier was an eleven-hundred-acre site near New Carlisle, Indiana, a town near Lake Michigan that had not previously appeared in the history of computing and now hosted what Amazon described as the largest AI training cluster in the world. By October 2025, the first phase was online: seven sixty-five-megawatt buildings totaling 455 megawatts, with a second phase under construction that would push the campus past a gigawatt. Trainium2 deployments crossed half a million chips by late 2025 and a million by April 2026, when AWS and Anthropic announced a follow-on commitment of more than a hundred billion dollars over ten years, structured around five gigawatts of compute capacity stretching from Trainium2 through Trainium4 and beyond. Within Amazon’s own logic, Trainium had stopped being a hedge and become an existential bet. The company had decided, more or less, that its share of frontier AI workloads would either run on its own silicon or wouldn’t run on AWS at all.

The Trainium3 announcement at re:Invent 2025, almost exactly two years after Trainium2’s debut, made the bet more visible. Built on TSMC’s three-nanometer node, packaged into UltraServers that scaled up to 144 chips and 362 FP8 petaflops, Trainium3 promised four-and-a-half times the compute and four times the memory bandwidth of its predecessor. AWS’s chip team, run out of Annapurna’s offices in Austin and Tel Aviv, had compressed the cycle: Trainium3 reached general availability inside the same year it was announced. Trainium4 was already in design. The sentence Andy Jassy kept repeating, in slide decks and earnings calls and analyst breakfasts, was the same one Garman had used: Anthropic was using more than one million Trainium2 chips. A year earlier the same sentence would have been impossible.

Google was running the same play with deeper roots. The TPU program dated to 2013, Norm Jouppi’s small project to relieve search-era inference bottlenecks, and Google had been shipping units to external customers since 2018. Until 2024, however, the program had been simultaneously the most mature hyperscaler ASIC and the least visible. That ended at Google I/O on May 14, 2024, when Sundar Pichai introduced Trillium, the sixth-generation TPU. By the numbers Google was willing to publish, it was a different class of chip: a 4.7-times leap in peak compute over the v5e generation, double the high-bandwidth memory, double the inter-chip interconnect bandwidth, sixty-seven percent better energy efficiency, expanded matrix-multiply units, sparse-core blocks for embedding-heavy workloads. It was the chip, Pichai said, that would train the next generation of Gemini.

Eighteen months after Trillium, on April 9, 2025, Pichai unveiled the seventh generation. Ironwood, fabricated on TSMC’s three-nanometer process, delivered ten times the peak performance of TPU v5p and more than four times per-chip the throughput of Trillium for both training and inference. Each pod connected 9,216 Ironwood chips through Google’s optical-circuit-switched interconnect, delivering more than forty exaflops of FP8 compute per pod. The architecture was, Pichai noted, designed deliberately for the inference age, with sparse-core hardware and on-chip Transformer Engine blocks tuned to the workload patterns that frontier models had begun to exhibit by 2025. By the time Ironwood reached general availability in November 2025, Anthropic had committed to access well over a million Ironwood chips through a multi-year cloud agreement with Google reportedly worth two hundred billion dollars, with the first phase covering four hundred thousand units that Broadcom would manufacture and ship directly to Anthropic for ten billion dollars in finished racks. The remaining six hundred thousand TPUs would flow through Google Cloud as rented capacity. By volume, the deal dwarfed Anthropic’s own AWS commitment. By any honest reading, it was the largest single-vendor compute commitment any AI lab had ever made.

It was not the only revealing TPU disclosure of the year. In late July 2024, Apple released a technical report on its newly announced Apple Intelligence foundation models. Buried in the methodology section was a detail nobody outside Cupertino had expected. Apple’s largest server-side model had been pretrained on 8,192 TPU v4 chips. The on-device model had used 2,048 TPU v5p chips. Not Nvidia. Not Apple’s own silicon. Google’s. Apple’s engineers had built their training on AXLearn, an open-source framework on top of JAX and XLA, the same compiler stack Google itself used for Gemini. The chip war’s center of gravity had shifted enough that even Apple, after fifteen years of shedding dependencies on competitors’ hardware, had concluded it was cheaper and faster to rent TPUs than to wait for Nvidia or to scale Apple Silicon to data-center class. By the end of 2024 Apple had begun building its own data-center silicon under a program codenamed ACDC, with a Broadcom-partnered chip called Baltra targeted for mass production in late 2026 on TSMC’s N3P node. On Ming-Chi Kuo’s early-2026 reporting, Apple intended to fill new owned data centers with Baltra in 2027. The last consumer-electronics holdout among the hyperscalers had concluded that the merchant silicon answer was no longer enough.

Meta’s bet ran on a parallel cadence and a different strategic logic. The Meta Training and Inference Accelerator program had begun publicly in May 2023 with MTIA v1, a modest seven-nanometer TSMC part with sixty-four processing elements arranged on an eight-by-eight grid, an INT8 throughput of about 102 teraOps per second, and a thermal design power of just twenty-five watts. The chip had been aimed at the deep-learning recommendation models that fed Facebook and Instagram’s ranking and ad targeting, a workload Meta had been running on CPUs and GPUs at a cost that, at Meta’s volumes, had become as expensive as anything else in the company’s data-center bill of materials. MTIA v2, announced in April 2024, more than doubled the dense compute of v1 and pushed off-chip LPDDR5 memory to 128 gigabytes. By April 2026, Meta and Broadcom had announced an extended collaboration to support multi-gigawatt MTIA deployments, including a two-nanometer generation that would be the first AI silicon on TSMC’s two-nanometer line. Broadcom was, by 2026, the silicon engineering partner for three of the five frontier labs, with each lab claiming the design IP and Broadcom doing the heavy back-end work that turned the design into wafers.

Microsoft’s Maia program took the longest to reach scale and the closest to existential urgency. Athena, the original code name, had been quietly underway since 2019, originally targeted at running OpenAI’s models more cheaply on Azure than on the Nvidia GPUs Microsoft was buying in increasingly uncomfortable quantities. By 2022, the OpenAI partnership had grown into a multi-billion-dollar investment commitment with no obvious ceiling, and the unit cost of every prompt OpenAI’s models served had become a Microsoft P&L problem. Athena went public on November 15, 2023, at the Microsoft Ignite conference under the production name Azure Maia 100, a 105-billion-transistor TSMC five-nanometer chip with a die approaching 820 square millimeters and four HBM stacks on a CoWoS-S interposer, designed for OpenAI’s inference workloads at scale. The follow-on Maia generation slipped multiple times against an internal schedule that had assumed the OpenAI-Microsoft partnership would remain commercially exclusive. By the time the silicon began landing in production through 2025 and 2026, OpenAI was no longer Microsoft’s exclusive cloud tenant, and Maia’s role had narrowed from saving the OpenAI bill to absorbing Azure’s broader inference base.

The newest entry had a strategic logic that was, on the surface, the strangest of all. On October 13, 2025, OpenAI and Broadcom announced a custom AI accelerator OpenAI itself would design and Broadcom would manufacture and deploy. The formal announcement scaled it to ten gigawatts. Broadcom, the picks-and-shovels giant whose ASIC services group had built every TPU since v1 with Google and the MTIA family with Meta, would handle the silicon engineering, the rack-scale networking, and the production rollout. Deployment would begin in the second half of 2026 and continue through the end of 2029. The Wall Street Journal subsequently reported that Broadcom had asked OpenAI to absorb roughly eighteen billion dollars of upfront chip-production financing, which the company agreed to do only after Microsoft committed to take roughly forty percent of the resulting capacity. The financial structure was opaque enough that most analysts simply waved their hands at it and moved on. The strategic structure was clearer. OpenAI, the lab that had spent six years inside Microsoft’s house and another year escaping it, was now committing to design the chip on which its own next-generation models would run.

The picks-and-shovels firms behind the chip programs were the part the ASIC story made most visible. Broadcom, whose AI revenue had been a few hundred million dollars in 2022, posted $12.2 billion of AI semiconductor revenue in fiscal 2024 and guided to roughly $32 billion across fiscal 2025, almost all of it custom-ASIC and AI networking sales to a small handful of named hyperscaler customers. Fourth-quarter 2025 alone produced $5.2 billion of AI revenue, with backlog north of $73 billion. Marvell, the smaller competitor that had built much of AWS’s silicon engineering and a portion of Microsoft’s, had become the second-largest beneficiary of the ASIC wave. Hock Tan’s company, the result of Avago’s 2016 acquisition of the original Broadcom, had become the second-most-valuable beneficiary of the AI compute boom after Nvidia itself. Its market capitalization crossed a trillion dollars in late 2024 and stayed there.

Beneath the named hyperscalers, a layer of inference specialists had grown through the same period without ever quite breaking into public consciousness the way the merchant GPU companies had. Cerebras, founded in 2016 by former SeaMicro engineers, had built a wafer-scale processor with 850,000 cores on a single 46,225-square-millimeter die, four times the area of the largest reticle-limited GPU. Its WSE-3 chip, launched in 2024, integrated 4 trillion transistors and was being deployed by 2025 in clusters that delivered inference latencies the GPU vendors could not match for certain workloads. The company raised $1.1 billion across 2024 and 2025 while waiting for an IPO that the Trump-era CFIUS review of its UAE investor had repeatedly delayed. Groq, founded in 2016 by ex-Google TPU engineer Jonathan Ross, had built a deterministic streaming “language processing unit” that was, on certain inference benchmarks, faster per token than any GPU could deliver. On Christmas Eve 2025, Nvidia announced a twenty-billion-dollar deal to license Groq’s technology and bring over most of its team, including Ross himself, an acquihire-shaped consolidation the trade press read as Nvidia’s explicit commitment to inference-specialist silicon as a forward-looking product line. SambaNova and Tenstorrent, the two other named challengers, occupied smaller niches: SambaNova in reconfigurable dataflow systems for enterprise inference; Tenstorrent under former Nvidia and Apple chip architect Jim Keller, in RISC-V-based chips that aspired to a more open-source posture against the closed-stack incumbents. None had yet displaced Nvidia from any frontier training workload. All had begun, by 2026, to take real revenue from inference workloads where the economics of dedicated silicon outweighed the comfort of running on a general-purpose GPU. Even Tesla, which had spent five years building its Dojo supercomputer on custom D1 silicon, abandoned the project in August 2025 when Elon Musk concluded that Dojo 2 was, in his phrase, an evolutionary dead end and that all paths led to Tesla’s AI6 chips manufactured at Samsung instead.

The economics were the same that had driven Annapurna’s original pitch to AWS in 2013 and the same that had produced Norm Jouppi’s first TPU paper in 2017. Nvidia’s data-center GPUs ran at gross margins north of seventy-five percent, which meant a hyperscaler paying Nvidia’s price was paying roughly four dollars for every dollar of silicon Nvidia had spent. At the capex levels of the mid-2020s, that ratio represented tens of billions of dollars per company per year that was, in principle, recoverable through vertical integration. The recovery was not free. A custom chip required design teams in the high hundreds, software stacks that took years to mature, foundry slots at TSMC negotiated against Apple’s and Nvidia’s standing claims, and tape-out costs in the high tens of millions per generation. But at hyperscaler volumes, the math worked. SemiAnalysis calculated total-cost-of-ownership advantages for hyperscaler ASICs of forty percent and more on the workloads they had been designed for. The number depended on assumptions and on the workload. The direction of the number did not.

What the inflection meant for the underlying market was clearer than what it meant for the labs. Custom hyperscaler ASICs, side-business curiosities through 2022, were taking real, measurable, often majority share of frontier-class workloads at the labs that had committed to them. Anthropic was training Claude across a million Trainium2 chips, scaling toward another million Ironwood TPUs, with Nvidia capacity as the third lane. Google’s Gemini family was training largely on Trillium and now Ironwood. Apple Intelligence had trained on TPU v4 and v5p. Even where Nvidia’s GPUs remained the dominant accelerator, and they did by a wide margin for most workloads, the share of frontier compute that did not run on Nvidia silicon had grown from a rounding error to roughly a third in eighteen months. Bloomberg Intelligence projected, by late 2025, that Nvidia’s share of accelerator revenue would slip from eighty-six to seventy-five percent by 2026. SemiAnalysis was blunter: hyperscaler ASICs had crossed the line from research projects into production-load-bearing infrastructure, and the question was no longer whether custom silicon would matter but how much of the next generation of frontier AI would run on anything else.

A decade earlier, the worry had been about a single chokepoint at TSMC’s most advanced node. Now there were many, layered: TSMC for wafers; SK Hynix and Samsung and Micron for HBM; ASML for the lithography behind both; Broadcom and Marvell for the ASIC design services Google and Amazon and Microsoft and Meta had outsourced; Nvidia for the system-level networking even custom-silicon shops had to license; Constellation, Vistra, and TVA for the megawatts. The hyperscalers had absorbed risk by going vertical into chip design. The frontier labs had absorbed risk by going wide across providers. Both groups had rebuilt their dependencies; neither had eliminated them.

The transformation, looked at from a distance, was a return rather than a revolution. The semiconductor industry Robert Noyce had founded at Fairchild had been vertically integrated by default; the fabless revolution of the 1980s and 1990s had broken that integration apart on the argument that no firm could afford to do everything anymore. The hyperscaler era had, on the design side, begun to put it back together. The manufacturing layer remained in Hsinchu and Tainan at the end of someone else’s contract, but the rest of the stack was once again being assembled inside single companies. The companies were no longer chip companies. They were cloud companies and lab companies that had concluded, after Jouppi’s 2017 paper landed in the literature and Annapurna’s engineers began shipping silicon out of Yokne’am, that being a cloud or a lab at scale meant being a chip company in everything but corporate description. By 2026, the companies that had not yet committed to that conclusion were the ones the rest of the industry was beginning to discount.