The Fab visualizations
Part IX · Chapter 58

HBM and the Memory Wall

High-bandwidth memory as AI-era kingmaker. SK Hynix's surprise leadership at HBM3 and HBM3E. Samsung's stumble through 2024, late-2025 Nvidia qualification, and 2025–26 share recovery. Micron's HBM3E leapfrog. The Q3 2025 ranking shift. Yield secrets, advanced packaging, the through-silicon-via process. → How Korea suddenly became as indispensable as Taiwan.

In the autumn of 2019, inside the Samsung Semiconductor campus at Hwaseong, a small group of engineers were told that the project they had been working on for the better part of a decade no longer had a future. The product was high-bandwidth memory, a vertically stacked DRAM module connected by tiny copper-filled holes drilled through silicon, and the verdict on it inside Samsung’s memory leadership was that the niche was too small to justify the headcount. Graphics cards used some. A handful of supercomputing accelerators used some more. Mainstream server DRAM was still the cash engine, and the cash engine paid the salaries. The HBM development team, by accounts that later filtered into the Korean technology press and into a 2025 book on SK Hynix’s HBM history called Super Momentum, was effectively disbanded that year. Some engineers were rotated to other groups. Some left the company. The decision was, by the standards of memory-cycle planning, defensible. Within five years it would look like one of the costliest misjudgments in the modern history of the chip industry.

Across Gyeonggi Province at SK Hynix’s Icheon and Cheongju campuses, the same product line was kept alive on a smaller team that nobody at the company was sure how to justify either. The first HBM module had shipped from a Hynix line in 2013, the result of a development partnership Hynix had begun with AMD as far back as 2010, with a young Korean design engineer named Park Myeong-jae as one of its early project leads. The first GPU that used it, AMD’s Fiji-based Radeon R9 Fury X, had launched in June 2015 to enthusiast reviews and modest sales. Across the entire memory industry the verdict on HBM through the second half of the 2010s was the same one Samsung had reached: a beautiful piece of engineering looking for an application large enough to absorb its cost. SK Hynix kept building it anyway. The reasons were partly path dependence. Park and a handful of process specialists who had co-developed the original product with AMD wanted to see it through. The reasons were partly strategic instinct. SK Hynix’s leadership, by then under chairman Chey Tae-won and the SK Group’s longer time horizon, had decided that the option value of an obscure stacking technology was worth the carrying cost even if the immediate market was not.

The bet would, within four years, become the difference between a memory company and an indispensable one.

To understand why, the rest of the industry had first to understand a problem the academic computer architects had been writing about since the mid-1990s. In 1995, William Wulf and Sally McKee at the University of Virginia had published a paper they titled “Hitting the Memory Wall,” arguing that the rate at which processors were getting faster was beginning to outrun the rate at which the memory subsystems feeding them could deliver data. Each new generation of CPU could perform more arithmetic per second; each new generation of DRAM could not move bits to that arithmetic any faster in proportion. The gap, Wulf and McKee argued, would continue to widen until the dominant cost in any computation was not the math but the waiting. The paper had been read in the computer-architecture community for a generation. Its prediction had been dampened, in practice, by a series of clever tricks: caches, prefetching, multi-level memory hierarchies, the steady decade-long migration of DDR DRAM through generations from DDR1 to DDR5. The tricks had been enough to keep the memory wall a theoretical concern through the entire era of CPU dominance.

The tricks stopped being enough when the workload changed. A transformer model trained on tens of thousands of GPUs, by the second half of the 2010s, did not look like the workload that classical caches had been designed for. The arithmetic was easy. The arithmetic was matrix multiplications a child could write. What was hard was that every multiplication required reading a slab of model weights and a slab of activations from memory, doing one quick operation, and writing the result back. The compute hardware, by the time of Nvidia’s V100 in 2017 and A100 in 2020, was capable of running tens or hundreds of trillions of operations per second. The memory feeding it could deliver, in the best case, a few terabytes per second of bandwidth. In a 2024 paper that became one of the most-cited statements of the problem, Amir Gholami and his Berkeley collaborators noted that flagship language model sizes had grown roughly four hundred times every two years over the previous half-decade, while memory per accelerator had grown only two times every two years. The factor of two hundred between those rates was the new memory wall. It was the binding constraint of the AI era. And the only memory technology in production that could push back against it was the one Samsung had decided was a niche.

What HBM did differently was simple to describe and brutal to manufacture. A standard DRAM module sat alongside a processor on a printed circuit board and communicated with it through a few hundred pins running at a few gigabits per second each. An HBM stack, by contrast, was a tower of eight or twelve DRAM dies fused vertically into a single package, each die thinned to a wafer about thirty to fifty micrometers thick, with thousands of tiny tungsten-filled holes called through-silicon vias drilled through every die. A processor chip and one or more HBM stacks were then mounted side by side onto a sliver of silicon called an interposer, and the interposer carried tens of thousands of microscopic wires between them. Bandwidth, in chip terms, scaled with the number of wires you could run between memory and compute. A circuit-board trace was big and slow. A wire on a silicon interposer was tiny and fast and ran by the thousands. An HBM stack delivered, depending on the generation, between four hundred and over a terabyte of memory bandwidth in a footprint not much larger than a postage stamp.

The penalty was that the manufacturing was unforgiving in a way that conventional DRAM was not. Eight or twelve dies, each one a fully functional DRAM chip with its own yield distribution, had to be aligned to within a few microns and bonded so that every one of the thousands of TSV connections between them worked. A single bad via on a single die meant the whole stack was scrap. The TSVs themselves had to be etched, lined, and filled in a process step that conventional DRAM lines did not include and that added cost without adding capacity. The thinning process pushed wafers to thicknesses where they handled like potato chips. Yields on early HBM products had run, by industry estimates, well below fifty percent. The economics worked only for buyers who were willing to pay a multiple of standard DRAM prices because their compute was idle without it.

For most of the 2010s, the only buyers willing to pay that premium were a small set of Nvidia and AMD product lines and a handful of supercomputing customers. Samsung had read that ceiling correctly through the middle of the decade. SK Hynix had read it the same way, but had decided, partly on the long-game logic of a Korean conglomerate accustomed to outwaiting cycles and partly on the conviction of the engineers who had shepherded the original AMD partnership, to keep the line warm. The line was still warm at the moment of detonation.

That moment arrived in early 2022, when Nvidia, then preparing the launch of its H100 data-center GPU around a new architecture called Hopper, came shopping for a memory partner. The H100 was designed to use HBM3, the third major generation of high-bandwidth memory, which JEDEC had standardized that January at a peak data rate of 6.4 gigabits per pin. SK Hynix had announced HBM3 development the previous October, before the standard was finalized, and had the product ready for sampling. Samsung, by then trying to rebuild a presence it had let lapse, was behind. Micron, the third surviving DRAM producer and the only American one, had spent the earlier HBM generations sitting on the sidelines. The contract for the H100, in the spring of 2022, went almost entirely to SK Hynix. The first HBM3 part shipped in June. Within a year the H100 was the most sought-after chip in the world, its eighty gigabytes of HBM3 spread across five active stacks ringing the GPU die, and SK Hynix was the only volume producer of the memory that made it work.

The market share data captured the inversion with the clarity of a balance sheet. By the end of 2023, with HBM revenues running at multiple billions of dollars and growing on a steepening curve, SK Hynix held over half of the world’s HBM market. Samsung, the world’s largest memory producer in every other category, held under forty percent of HBM and had no design wins on the most consequential AI accelerator. Micron held under ten percent, with a position confined to legacy generations. The three-firm DRAM oligopoly that the consolidation of the 2010s had produced, the same three firms that between them controlled essentially all of the world’s memory production, had divided itself in a single product category into a leader, a struggling fast-follower, and a distant third.

What made the inversion structural rather than transitional was a packaging detail that ran through the heart of SK Hynix’s process and that the company’s competitors had not matched. From its HBM2 generation in 2019 forward, SK Hynix had used a stacking method called mass reflow with molded underfill, abbreviated MR-MUF, in which every die in the stack was assembled and then bonded to its neighbors in a single mass thermal step, with a liquid epoxy compound flowing into the bumps and gaps to lock the assembly into place. Samsung and Micron, in contrast, used a method called thermal compression with non-conductive film, TC-NCF, which placed a thin film between every pair of dies and bonded them sequentially under heat and pressure. The two approaches produced equivalent stacks at the second-generation level. They diverged at higher stack counts and higher current densities, where the heat that an HBM stack had to dissipate during operation became the binding constraint. SK Hynix’s MR-MUF, the company’s process engineers reported at industry conferences, dissipated heat noticeably better than the competing approach, in part because the molded compound created a more uniform thermal path through the stack, and in part because the bonding process itself produced fewer mechanical voids that trapped heat. The company was, on its own published claim, the only volume producer using MR-MUF, and as the stack counts climbed from eight to twelve to a planned sixteen, the thermal advantage compounded.

The cost of that advantage to Samsung became visible in the most public way possible across 2024. Nvidia, by then ramping the successor H200 around a memory upgrade to HBM3E, had qualified SK Hynix as the lead supplier and Micron as a second source. Samsung had been trying to qualify its own HBM3E into the same supply chain since late 2023. In May 2024, Reuters reported that Samsung’s eight-layer and twelve-layer HBM3E parts had failed Nvidia’s tests, and that the failures involved both heat and power consumption. Samsung disputed the framing. Industry analysts at TrendForce and SemiAnalysis confirmed the substance. Samsung redesigned the part, resubmitted, and was rejected again. The pattern continued through the autumn of 2024. By December, with calendar 2024 essentially closed and Samsung’s HBM3E still uncertified for Nvidia volume, the largest memory company on earth was effectively absent from the largest revenue category in its industry. The repercussions reached the executive suite. In May 2024, Samsung Electronics replaced Kyung Kye-hyun, the chief of its Device Solutions division that ran the chip business, with Jun Young-hyun, a former head of memory who had been brought back specifically to fix HBM. In November 2024 the company restructured again, naming Jun co-CEO and giving him direct control over the memory line. The Korean and international press, by the close of 2024, was running stories with headlines that Samsung’s executives had spent careers trying to avoid. The HBM crisis had become the Samsung crisis.

Samsung’s troubles reflected a wider corporate moment that the Lee family had been managing for the better part of a decade. Lee Jae-yong, the chairman who had inherited his father’s chairmanship after Lee Kun-hee’s 2014 stroke and 2020 death, had spent eighteen months in prison on a 2017 bribery conviction tied to South Korea’s presidential corruption scandal, and had been retried on accounting and stock-manipulation charges through 2023. A Seoul court acquitted him on the second case in February 2024, freeing him to focus full-time on the company for the first time in years. Through it all Samsung’s HBM line continued to fall behind. The chairman who had inherited a memory crown was, on the most consequential product cycle of his tenure, watching it slip to the smaller Korean rival next door.

Micron, the third producer, took a different route into the new market and arrived at the same destination by way of a strategic skip. Sanjay Mehrotra’s team in Boise, having watched HBM3 ship without them, decided to leapfrog the generation entirely and bring HBM3E to volume first. The decision was bold and required a yield curve they did not yet have. They executed it. On February 26, 2024, Micron announced volume production of an eight-high HBM3E part, twenty-four gigabytes per stack, qualified into Nvidia’s H200 GPU launching the same quarter. By the spring of 2024 Micron’s HBM3E was shipping into Nvidia in volume alongside SK Hynix’s. By the company’s earnings calls of mid-2024, Mehrotra was telling investors that Micron’s calendar-2024 HBM supply was sold out and that calendar 2025 was almost completely allocated. The company expanded its Taichung packaging and assembly facility, accelerated capacity at Boise, and began discussing a fab buildout in Idaho whose scale dwarfed every previous Micron expansion. Through 2025, with Samsung still chasing Nvidia qualification and SK Hynix unable to meet demand, Micron’s market share climbed from the high single digits past Samsung’s into second place. By the second quarter of 2025 the order had, on TrendForce’s count, become SK Hynix at sixty-two percent, Micron at twenty-one percent, Samsung at seventeen percent. Samsung was, in the cycle that mattered most, third in a market it had created.

The reorganization spilled over into a packaging dependency that the AI industry was discovering only slowly. An HBM stack, no matter how well it was made, could not by itself enter a finished GPU. It had to be co-packaged with the logic die on a silicon interposer using a process TSMC called Chip on Wafer on Substrate, CoWoS, in which the GPU die and its surrounding HBM stacks were placed onto a thin silicon carrier and then bonded onto a larger substrate. CoWoS was, by 2023, the dominant advanced-packaging technology for AI accelerators, and TSMC was effectively the only volume producer. Capacity at the start of 2024 ran roughly thirty-five thousand wafers per month, almost all of it spoken for by Nvidia. TSMC had committed to doubling that capacity to seventy-five thousand by the end of 2025 and to roughly one hundred thirty thousand by the end of 2026. None of the increase was sufficient. C.C. Wei, the TSMC chief executive, told analysts on a 2024 earnings call that the company’s CoWoS capacity was sold out through 2025 and into 2026. SK Hynix’s chief financial officer, on the same circuit of investor calls, told the same audience that the company’s HBM capacity was sold out through 2026. Micron’s Mehrotra echoed the message. The bottleneck of the AI buildout, by the middle of the decade, was no longer wafers and no longer GPUs and no longer even data-center power; it was a few square kilometers of clean room outside Hsinchu and a few buildings outside Icheon, and the half-dozen factories that fed them.

What that geometry produced was a kind of concentration the industry had not contemplated. The most valuable company in the world, Nvidia, depended on three suppliers: TSMC for its GPU dies, ASML for the lithography that made those dies possible, and SK Hynix for the memory that made those dies useful. Two of those three sat on islands that could be reached by missile from the Chinese mainland. The third sat in the southern Netherlands. The implications were not lost on the principals. In November 2024, at the SK AI Summit in Seoul, Chey Tae-won took the stage and described, in front of an audience that included a video appearance from Jensen Huang himself, a request the Nvidia chief had made to him personally: that SK Hynix bring its HBM4 timeline forward by six months. Chey said he had deferred the answer to Kwak Noh-jung, the Hynix chief executive, who in turn had said he would try. Huang’s recorded comments, played for the audience, called the new schedule “super aggressive” and “super necessary.” The phrase, in the politics of the chip industry, had only one referent. The largest customer in the world was asking the second-largest memory producer in the world to compress a multibillion-dollar product schedule by half a year, and the chairman of one of Korea’s largest conglomerates was telling that customer he would do his best.

The geometry produced one more public scene before the year was out. In April 2024, SK Hynix announced that it would build its first U.S. advanced packaging facility, a roughly four-billion-dollar plant on ninety acres of Purdue Research Park in West Lafayette, Indiana. The plant, scheduled to come online in the second half of 2028, would mass-produce HBM and develop the next packaging generations on American soil for the first time. The Department of Commerce committed up to four hundred fifty-eight million dollars in CHIPS Act funding to the project. Indiana’s governor, Eric Holcomb, joined Kwak and the Purdue president on the announcement stage. The plant was modest in size relative to TSMC’s Arizona campus or Samsung’s Texas project. Its strategic significance was that it brought the world’s leading HBM producer onto American soil at all, and that the choice of an Indiana site rather than Texas or Arizona reflected a Korean-American academic relationship rather than a state-subsidy auction. Hynix engineers would train at Purdue’s labs. American advanced packaging would, for the first time since the offshoring of assembly in the 1960s and 1970s, be inside the United States.

Samsung clawed back partway in 2025. Jun Young-hyun, the chip chief who had been brought back to fix the HBM line, made a decision that earlier Samsung leadership had resisted: he ordered the company’s memory engineers to redesign the underlying DRAM core for HBM3E from scratch, abandoning the design lineage Samsung had been incrementally improving since 2022. The redesigned twelve-layer part, retested through the spring and summer of 2025, finally cleared Nvidia’s qualification gates that September, eighteen months after Samsung had first submitted to them. The Korean financial press treated the news as a recovery rather than a victory. By that point Samsung had already lost the HBM3 generation in its entirety and lost most of the HBM3E volume window. The market-share data through the autumn of 2025 caught the recovery in motion. By the third quarter, on TrendForce’s count, Samsung had jumped from seventeen percent to twenty-two percent of HBM revenue, displacing Micron from second place. SK Hynix had slipped slightly to fifty-seven percent. Micron had fallen to twenty-one. The HBM4 generation, scheduled for volume in the second half of 2026 with both SK Hynix and Micron sampling at ten and eleven gigabits per pin respectively, would be the cycle in which the three-way contest reset. The pattern of Korean memory leadership the country had enjoyed since the 1990s had survived the AI cycle, in the sense that two of the three global producers were still Korean and one was the world’s largest. The pattern had also fragmented in a way that no Korean memory executive of the previous twenty years had imagined: the leader was no longer Samsung.

The arithmetic underneath the share moves was loud. The global HBM market had grown from roughly twenty billion dollars in 2024 to roughly thirty-eight billion in 2025, with industry projections out to one hundred billion dollars by 2028. Hyperscaler memory budgets, in their disclosures through 2025 and into 2026, had become a meaningful contributor to the rising capex numbers. Microsoft’s chief financial officer, Amy Hood, attributed twenty-five billion dollars of the company’s record 2026 capex guidance directly to higher memory and chip costs. SK Hynix’s HBM-related operating profit, on its own disclosures, was running on a trajectory to overtake the company’s general-purpose DRAM line within two years, an inversion no Korean memory executive of any prior cycle had contemplated. By the end of 2025, on a result Samsung’s executives had spent a generation trying to avoid, SK Hynix had passed Samsung as the world’s largest memory chip supplier by revenue.

In late October 2025, on the eve of the APEC summit in Gyeongju, Jensen Huang sat down for fried chicken and beer at a Kkanbu Chicken franchise in central Seoul with Lee Jae-yong of Samsung and Chung Eui-sun of Hyundai Motor Group. Chey Tae-won, the SK Group chairman who controlled SK Hynix and who was therefore the ultimate boss of Nvidia’s largest memory supplier, was not at the table; the Korean press debated whether the absence was a snub or a scheduling accident. The dinner ran through cheese balls, soju, and Terra beer, and ended with Huang ringing the restaurant’s golden bell to pay every other patron’s tab. The next day at APEC, Huang announced that Nvidia would deliver more than two hundred sixty thousand GPUs to Korean buyers, a figure that implied billions of dollars in HBM orders flowing back through SK Hynix and Samsung in the months that followed.

The image was the right one for the moment. Forty years after Lee Byung-chul had picked up the telephone at the Hotel Okura and committed Samsung to memory, the Korean chip industry that he had founded was no longer one company’s industry, and was no longer secondary in the world’s chip economy to anyone. Two of the three companies that stood between the largest AI buildout in history and a hardware brick wall were Korean. The third was American. The factories that produced their stacks ran in Icheon, Cheongju, Hwaseong, Boise, Taichung, and, beginning in 2028, West Lafayette. The interposer that bonded those stacks to the Nvidia GPUs ran through TSMC’s CoWoS lines outside Hsinchu, on an island the People’s Liberation Army had spent the previous decade rehearsing the encirclement of. The geometry of indispensability that Taiwan had defined for the world’s logic chips, Korea, in a single product cycle no one outside the memory industry had seen coming, had defined for the memory chips that fed them.