DeepSeek-V4 and the Huawei Inflection
DeepSeek-V4 in April 2026: the lab discloses Huawei Ascend (910C and CloudMatrix-class systems) for inference at scale and validates V4 on both Nvidia and Ascend NPU platforms. Huawei's CloudMatrix 384 and the Ascend 910C → 950 ramp through 2025–2026. The Eric Xu Connect 2025 roadmap (950, 960, 970). What this means for the export-control thesis: Beijing has demonstrably sustained frontier-class AI without EUV-leading-edge access. What it does NOT mean: that Taiwan's role is replaced or that the gap has closed. The book's honest finale, set in May 2026: what's settled, what isn't, and the question the reader is now responsible for thinking about.
On the afternoon of Friday, April 24, 2026, a quiet repository on Hugging Face acquired a new commit. The user account was deepseek-ai. The model card read DeepSeek-V4-Pro. The technical report ran to several dozen pages, dense with equations and ablation tables, and somewhere on page nine, in a paragraph most readers would skim, sat the sentence that turned a model release into a strategic event. We have verified this fine-grained expert parallel scheme on both the NVIDIA GPU and Huawei Ascend NPU platforms. The two were listed side by side, in the same italics, with the same authority, as if they had been peers in the project all along.
The commit went live at the start of the Beijing weekend. By the time New York opened on Monday, Semiconductor Manufacturing International Corporation’s Hong Kong shares had jumped roughly ten percent. Nvidia’s traded down a point and change, an order of magnitude smaller than the six-hundred-billion-dollar single-day rout the company had absorbed fifteen months earlier on the morning of the R1 release. The smaller market reaction was, in some respects, the larger story. The first DeepSeek had come at the established order as a shock. The second arrived as confirmation. Wall Street had spent a year metabolizing the possibility, and on a Hangzhou hedge-fund subsidiary’s third disclosure of the cycle, a possibility had become a fact.
Liang Wenfeng was not at any podium when the commit dropped. He had not appeared in public for over a year, since the February 2025 symposium where Xi had hosted him in the front row. After that, nothing. No keynotes. No on-stage interviews. No Chinese New Year address to staff. The man at the center of the company was, throughout the V4 cycle, a name on a pre-print author list. The senior researcher Chen Deli posted on a personal X account in the days after the launch. Liang did not. His silence was, by then, part of the company’s brand. It was also, plausibly, an instruction.
The technical report itself was sober and almost defensively narrow. V4-Pro carried 1.6 trillion total parameters with 49 billion activated per token. Its sibling V4-Flash carried 284 billion total with 13 billion activated. Both supported a one-million-token context window. Both relied on a hybrid attention architecture the report called Compressed Sparse Attention layered with Heavily Compressed Attention, designs aimed at collapsing the memory cost of long contexts to a small fraction of what V3.2 had needed. The Pro model used roughly 27 percent of V3.2’s per-token inference compute at million-token contexts; the Flash model used closer to 10 percent. The benchmarks landed where everyone had quietly expected. V4-Pro scored 87.5 on MMLU-Pro, exactly matching OpenAI’s GPT-5.4 and trailing Anthropic’s Opus 4.7 and Google’s Gemini 3.1 Pro by a couple of points each. On SWE-Bench Pro it ran several points behind the American frontier. On the agentic-search benchmark BrowseComp it edged out Opus and pressed close to GPT-5.5. The shape of the comparison was the shape DeepSeek had been making for two years now. Three to six months behind the American frontier on the hardest benchmarks. Roughly one-seventh the cost. Open weights under the MIT License. A model anyone with a credit card could run, and any Chinese cloud could host without negotiating an export license.
The disclosure of the chips was the part the analysts pulled apart sentence by sentence. DeepSeek did not claim V4-Pro had been trained end-to-end on Huawei silicon. The company said its parallelism scheme had been verified on both Nvidia and Ascend, that V4 was the first frontier model natively adapted to the Ascend platform, and that it had used a mixture of H800 GPUs and Ascend 910C chips through the training run. The exact mix was not disclosed. A Tsinghua professor told the China Academy in the days after the launch that, on his reading of the architecture, the bulk of pre-training had probably still run on H800 silicon and the Ascend hardware had handled portions of the post-training pipeline and the reinforcement-learning rollouts that V4’s design particularly stressed. V4 had not fully cut ties with Nvidia. It had taken a step. It was an enormous step. It was not the whole journey.
The step was that the inference road was now, for V4, indigenous. DeepSeek shipped the model with native Ascend support and benchmarks for both Pro and Flash variants on Huawei’s Atlas-class hardware. V4-Pro on Ascend 950DT delivered 388 tokens per second per request at high concurrency. V4-Flash on the same hardware delivered 4,722. The numbers were in the same neighborhood as comparable workloads on Nvidia’s H20, and on certain workloads they were better. Huawei’s claim, validated against DeepSeek’s own numbers, was that the V4-on-Ascend stack ran at nearly three times the single-card inference performance of the H20. By the Sunday after the release, Alibaba Cloud’s Bailian platform was hosting both V4 variants at DeepSeek’s official prices. Tencent Cloud was running V4 on its TokenHub platform on domestic infrastructure with a Singapore gateway for international traffic. Within a week, Reuters was reporting that ByteDance, Tencent, and Alibaba had jointly placed orders for hundreds of thousands of Ascend 950 processors. Huawei was telling its supply chain it would ship roughly seven hundred and fifty thousand units of the 950PR in 2026.
The chip the V4 paper had pointed at was the latest in a line that had spent five years climbing out from under the August 2020 cutoff. The Ascend 910 had begun life as a HiSilicon design taped out at TSMC on a 7-nanometer process before the FDPR severed Huawei from the Taiwanese fab. After 2020 the line moved, awkwardly and incompletely, to SMIC. The 910B that emerged in 2023 was an SMIC N+2 part with limited volumes and yields industry watchers placed in the teens and twenties of percent. The 910C, the part V4 named, was structurally a pair of 910B dies packaged together as a chiplet, doubling throughput per package without requiring SMIC’s process to advance another generation. By early 2026, the trade press was reporting 910C yields had passed forty percent and the line had begun to turn a profit. Modest by industry standards, where mature lines run in the seventies and eighties. The trajectory was not. SMIC was telling the Financial Times it would double its 7-nanometer capacity in 2026 and that work was beginning on a 5-nanometer line for a follow-on Ascend generation. Industry estimates put SMIC’s leading-edge capacity at roughly forty-five thousand wafers per month at the end of 2025, projected to reach sixty thousand by the end of 2026. Small numbers by TSMC’s standards. No longer small numbers in absolute terms.
What Huawei had built around the chip was more interesting than the chip. CloudMatrix 384, announced in 2024 and shipping into Chinese hyperscalers through 2025, was a system-level answer to a chip-level deficit. It took 384 Ascend 910C accelerators and laced them together with all-to-all optical interconnects through 6,912 800-gigabit linear-drive optical transceivers, packaged across sixteen racks. On Dylan Patel’s accounting at SemiAnalysis, the node delivered around three hundred BF16 petaflops of dense compute, close to double the throughput of Nvidia’s GB200 NVL72 rack with roughly 3.6 times the aggregate memory and 2.1 times the memory bandwidth. Each individual Ascend chip delivered perhaps a third of the per-die performance of a B200. The system simply contained five times more of them, wired together with a topology that was, by Patel’s reckoning, arguably a generation ahead of Nvidia’s and AMD’s contemporaneous rack-scale designs. The trade-off was power. The full CloudMatrix 384 pulled around 559 kilowatts at the wall, against the GB200 NVL72’s 145. The performance per watt was less than half. In an industry that had spent two decades grinding for percentages of efficiency, this was an aesthetic offense. In a country that had more than doubled its installed grid capacity in the previous decade and was adding solar and coal and nuclear at a tempo the West watched with a mix of admiration and alarm, it was a workable trade. Pull more electrons. Burn more coal at Erdos. Run more chips. Put the model out the door.
Huawei’s roadmap, laid out by Eric Xu at Huawei Connect 2025 in Shanghai, was a public declaration that the company intended to keep doing exactly that. Three new chip series on a strict annual cadence: the Ascend 950 in 2026, the 960 in 2027, the 970 in 2028. The Atlas 950 SuperPoD, scheduled for the fourth quarter of 2026, would scale to 8,192 Ascend 950DT chips per node, with the largest configuration reaching, on Huawei’s accounting, on the order of half a million accelerators. Xu’s keynote described a 56-times scale advantage over Nvidia’s NVL144 rack. The numbers needed adjusting, as Huawei numbers always did. What was new was that the products underneath the rhetoric were now shipping, and the customers were not making polite noises. They were placing orders.
It was at this point in the story that the original premise of the export-control regime had to be rewritten. The premise, since the October 7, 2022 BIS rule and the FDPR template built for Huawei in 2020, had been that frontier AI required leading-edge silicon, that leading-edge silicon required EUV lithography, that EUV ran through ASML, and that the United States, working through ASML, could meter China’s access to the frontier with a precision unavailable in any other industrial domain. The premise was technically clean. It was also, by the spring of 2026, partially obsolete. DeepSeek had produced a frontier-class model running in production on chips fabricated in Shanghai on a 7-nanometer DUV process whose yields were below industry norms and whose volumes were a fraction of TSMC’s. The frontier had been narrowed and made more expensive. It had not been closed. Beijing had built itself an inferior, costly, and operational substitute path, and the substitute path was now visibly carrying frontier traffic.
What the Ascend ramp had not done was as important as what it had. SMIC was not running 5-nanometer logic in volume. ASML had not, since the 2019 license suspension and the subsequent 2023 expansion of the EUV ban, shipped an EUV scanner to a single mainland Chinese customer, and a domestic Chinese alternative remained a research-line aspiration whose first reported prototype had generated EUV light in early 2025 but had not yet produced a chip. SK Hynix, Micron, and Samsung still produced essentially all of the world’s high-bandwidth memory, and HBM3E and emerging HBM4 stacks were not domestically replicable in the PRC on any timeline shorter than the back half of the decade. Huawei’s CloudMatrix and Ascend products relied on HBM stacks whose generation lagged Nvidia’s by one or two steps, a gap that translated directly into the bandwidth disadvantages the brute-force scale-up was designed to paper over. The ASML high-NA EUV machines that would define the 2-nanometer and below era were arriving in 2025 and 2026 to TSMC, Intel, and Samsung in volumes SMIC could not access at any price. Above all, TSMC remained the producer of roughly 90 percent of the world’s leading-edge logic, and the fab in Tainan that ran 3-nanometer N3 in volume was still on an island the People’s Liberation Army had spent the previous two years rehearsing to encircle.
The rehearsals had not slowed. Joint Sword 2024A in May had become Joint Sword 2024B in October. Strait Thunder 2025A, in April, was the response to a Lai speech describing China as a foreign hostile force, and the simulation targeted Taiwanese energy infrastructure and offshore LNG terminals with a level of operational detail civilian analysts in Taipei found difficult to read as routine. Justice Mission 2025, on December 29-30, again rehearsed a blockade. By the spring of 2026, the median line of the strait was a piece of cartography no one in the PLA Air Force was paid to respect. The Eastern Theater Command had developed, on CSIS analyst accounting, a doctrine of “exercise-as-coercion” in which the boundary between training and operations had blurred to the point of operational significance. The 2027 date Phil Davidson had floated to the Senate Armed Services Committee in March 2021 remained the date the Pentagon’s contingency planners orbited around. Mark Cancian’s CSIS wargame, run twenty-four times in 2022 with the destruction of TSMC’s Tainan fabs as a side-effect of every meaningful exchange of fires, had been re-run multiple times by 2025, with the same conclusion. The fabs sat in the path of the artillery and the missiles. The ASML telemetry kept flowing. The wafers kept coming off the line.
What had changed was that, for the first time, a credible answer to the question of what would happen the morning after the line stopped had begun to take shape. It was not a complete answer. It was an inferior, more expensive, less efficient answer. It was, however, an answer. The Chinese AI industry, having been told for four years it could not have the frontier without American silicon, had spent those four years building a stack that produced a recognizable variant of the frontier without American silicon. The stack was open-source, which meant it propagated. It propagated to Russia, where the Sberbank-affiliated GigaChat researchers had begun fine-tuning on V3 and would be on V4 within weeks. To the Gulf, where the Mohamed bin Zayed University in Abu Dhabi had been training Falcon descendants on a mix of GPU and Ascend hardware brokered through Huawei’s Middle East offices. To South America, where Brazilian and Argentine research groups that could not reliably acquire H100s had been running quantized DeepSeek descendants on Atlas hardware sold through resellers in Hong Kong. The diffusion was not large. It was no longer notional.
The American policy response was running on parallel tracks that did not always agree. The Trump administration had treated the Biden-era Diffusion Rule and the broader export-control architecture as inheritances to be renegotiated. The May 2025 rescission of the Diffusion Rule had reset the global allocation framework into country-by-country negotiation. Lutnick at Commerce had spent the better part of a year working bilateral arrangements with Saudi Arabia, the UAE, India, and Japan over guaranteed allocations of Nvidia and AMD silicon. The CHIPS Act had survived the transition; Phoenix Fab 21 was running N4 in volume by early 2025 and approaching N3 by the end of 2026, with TSMC’s $65 billion Arizona commitment now expanding rather than contracting. Intel’s 18A had moved into limited production at Ohio and Arizona. Samsung’s Texas fab was scheduled to begin yields in 2027. The CHIPS dollars had bought meaningful capacity. They had not bought independence. The leading-edge wafer the world bought was still, in the overwhelming majority of cases, made in Taiwan.
Inside the export-control bureaucracy, the question on the desk through the spring of 2026 was whether to tighten the rules around the Ascend supply chain itself. BIS had added Ascend-related parties to its lists through 2024 and 2025. The deeper question was whether to extend the FDPR perimeter to the broader pool of Chinese AI labs, including DeepSeek itself, now consuming Ascend chips at scale. Gregory Allen at CSIS, who had laid out the most rigorous public mapping of the Ascend supply chain in early 2025, argued that the leverage that had crushed Huawei in 2020 was, in attenuated form, still available against the labs themselves. Others were more cautious. The Huawei case had run because Huawei depended at the time on TSMC and on American EDA. The DeepSeek case did not have the same shape. DeepSeek was a software lab. Its chips, increasingly, were not American. The leverage that mattered against it ran through ASML, through Korea’s HBM, and through TSMC’s foundry. None of those leverage points were responsive to a rule cut directly against the lab.
Jensen Huang had been reading the room with a clarity his peers had not always matched. On the Dwarkesh Podcast in mid-April 2026, eight days before V4’s release, he had described the day DeepSeek shipped a model first on Huawei silicon as a horrible outcome for our nation. The phrasing was unusually undiplomatic for a CEO who had spent years modulating his China commentary to keep both sides of the Pacific reading him. He paired the warning with a reminder that China’s edge, even with inferior chips, came from abundant energy, a vast research talent pool, and software optimization that could close gaps the silicon could not. Eight days later, the V4 commit landed. The horrible outcome was now the actual one. Nvidia, principal beneficiary of a four-year compute boom in which every American policy lever had been pulled in its direction, was now the company most exposed if Beijing’s substitute stack kept compounding.
The substitute stack was, by the standards of the industry that had grown up around TSMC and ASML and Nvidia and Korea, ungainly. It pulled more power. It cost more per FLOP. It was constrained on memory bandwidth, on packaging yield, on the number of fabs that could turn out the dies. Its 2026 production would, on Huawei’s targeting, ship roughly seven hundred and fifty thousand 950PR units, against an Nvidia data-center shipment number an order of magnitude higher and on aggressive counts two orders. The ratio of cost per unit of frontier inference, V4 on Ascend 950DT against V4 on H100, was forty to seventy percent more expensive on the Chinese stack, and that was against Nvidia’s older generation, not the GB200 systems ramping in U.S. data centers through 2025 and 2026. By every relative measure that mattered to a hyperscaler procurement officer, the Chinese stack was worse.
The relative measures mattered. They were not the only measures. The other measure, the one Beijing’s planners and the Hangzhou researchers had begun working from, was absolute. Could a Chinese model, trained predominantly on indigenous silicon, host inference on indigenous silicon, and serve a billion-user market without depending on American chips, the American foundry partner of those chips, or the Korean HBM on which both were built, exist? In the spring of 2025, the answer was barely. In the spring of 2026, the answer was visibly. The Chinese stack had crossed a threshold that did not require the substitute to be efficient or cheap. It required only that it be operational. Operationally, V4 on Atlas was running in production at Tencent Cloud and Alibaba Bailian and a sequence of smaller domestic platforms. The wafers were coming off SMIC at a rate the company expected to double. The Atlas 950 SuperPoD was on the calendar for the fourth quarter. The Ascend 960 and 970 were on the slide deck for 2027 and 2028. The substitute path was visible far enough out that procurement teams could plan against it. Once a path could be planned against, it changed the calculus of every actor that depended on it.
This was the inflection the V4 release made undeniable. Not that Beijing had won. Not that Taiwan no longer mattered. Not that ASML’s machines could be replaced or Korea’s HBM substituted on any horizon shorter than a decade. The inflection was narrower and more important. The export-control architecture built between 2018 and 2024 had been premised on a single causal chain. Cutoff from leading-edge logic implied cutoff from frontier AI. The premise had carried a silent corollary, the one that gave the policy its strategic weight, namely that frontier AI was a precondition for technological sovereignty, military deterrence, and economic primacy in the next decade. The corollary survived V4. The premise did not. The cutoff from leading-edge Western logic had not produced a cutoff from frontier AI. It had produced an expensive, energy-intensive, partially indigenous variant of frontier AI that worked. The architecture would continue to bind on the most aggressive 2-nanometer and below frontiers, where ASML’s high-NA EUV monopoly was genuinely load-bearing and where, on the most rigorous 2026 modeling, China remained at least a generation behind on yield-economic terms. It would continue to bind on memory bandwidth, where the Korean HBM oligopoly remained intact. It would continue, above all, to bind on the strategic geometry of the strait, where the wafers that mattered most were still made on a single island within range of a single set of missiles. But on the operational question of whether a Chinese lab could ship a frontier model the world would notice, the architecture had run its course. The lab could. The model existed. The chips were in inventory. The customers had placed their orders.
The Taiwan question, against this background, had not become smaller. It had, if anything, become more variable. The silicon-shield argument, in its 2022 form, had relied on the assumption that any disruption to TSMC’s leading-edge production would visit unbearable costs on every party including Beijing, and would therefore deter Beijing. The argument had always been more textured than its slogan. By 2026 the texture had thickened. The marginal Chinese consumer of leading-edge logic had a substitute now, however inferior, however expensive, however undignified. The marginal American consumer, by contrast, did not. A Taiwan contingency in 2026 produced, on every wargame run, the same destruction of Hsinchu and Tainan as in 2022. The party that emerged from that destruction with a working domestic industry, however small, however constrained, had become Beijing rather than Washington. The shield argument had assumed a symmetry of devastation. The post-V4 reality was asymmetric, less than fully so but no longer wholly. The asymmetry tilted, in the small but consequential way these tilts tilt, toward the actor with the substitute path.
What Morris Chang had said in April 2022, in his Brookings conversation with Robert Kagan, had been that any war over the strait would render the entire onshoring debate moot, because everyone would have a great deal more to worry about than chips. The remark had been intended as a graceful argument against the most expensive American hedges. By 2026, the remark had aged into something more uncomfortable. The Taiwan war would still render everything moot. What it would render moot, however, no longer included a fully functioning Chinese AI industry. That industry, in its constrained, costly, and operational form, had been moved off the strait.
Inside DeepSeek’s offices on Huanglong Road in early May 2026, the V4 release had transitioned from headline to operational reality. The Hangzhou hedge-fund subsidiary that had once been a quantitative-trading novelty was now a node in a national strategy that was no longer especially private about being a national strategy. Liang remained out of public view. Recruiters from the Big Fund’s third tranche, the 344-billion-yuan vehicle stood up in May 2024 and now in its second year of deployment, were quietly approaching every senior researcher at every Chinese AI lab with packages that exceeded what Western firms could match for engineers who could not, in any case, easily relocate. The state had read the moment and was pouring capital into it. On the state-news monitors that ran perpetually in airport lobbies and metro stations across the country, the V4 launch had been folded into the larger narrative of self-reliance that Xi had been building since 2018. Sputnik, in this narrative, had finally produced its satellite. The satellite was a one-million-token language model running on chips made in Shanghai.
In the Pentagon’s J-5 strategy directorate that same week, the briefings being prepared for principals had turned over a phrase the Biden-era export-control architecture had relied on, “small yard, high fence.” The phrase had been useful when the yard contained a single technology and the fence sealed it. The fence in May 2026 was still up. The yard had grown. It now contained much of frontier AI software, most of HBM, most of leading-edge logic, all of EUV, and an expanding list of materials, design tools, and manufacturing equipment the new administration was negotiating to keep behind the fence. Inside the yard, the Americans still controlled the assets. Outside the yard, the Chinese had begun to build assets the fence did not reach. The CSIS modeling the J-5 staffers were citing suggested the gap, on a crude index of frontier-class compute combining silicon, software, and energy, had not closed. It had also not, since the second DeepSeek release, widened.
The wafers kept coming off the line. In Hsinchu and Tainan, the lithography steppers still imaged dies through their EUV masks at the cadence the schedule demanded. Apple’s M-series and Nvidia’s Blackwell successors and AMD’s MI400 silicon flowed off TSMC’s N3 and N2 nodes through the spring of 2026 in the volumes the world’s hyperscalers had pre-paid for two years earlier. In Phoenix, the EUV machines craned into Fab 21 in 2022 and 2023 were now joined by High-NA tools whose installations were beginning to validate the Sonoran Desert ecosystem the original engineers had been imported from Hsinchu to build. In Eindhoven, ASML’s order book for 2026 and 2027 was full. In Icheon and Cheongju, SK Hynix’s HBM4 sample lots were beginning to ship to Nvidia and AMD. In Shanghai, SMIC’s 7-nanometer line ran twenty-four hours a day, at yields the company would not state and volumes its customers would not confirm, producing dies that would be packaged into Ascend 910C accelerators and into the racks of CloudMatrix systems that would, somewhere in a Tencent or Alibaba data center near Hohhot or Guiyang, host the next inference call placed against DeepSeek-V4-Pro by a researcher in Shenzhen or São Paulo or Riyadh.
A story that had begun in the postwar laboratories of New Jersey, that had run through Tokyo and Seoul and Hsinchu and Beijing and Hangzhou, that had turned a hand-soldered point-contact transistor in 1947 into the substrate on which the next century’s intelligence would now be built, arrived in May 2026 at a place its earliest chroniclers had not quite predicted. The chip war had not produced a winner. It had produced two technological civilizations running the same workloads on different stacks, with different costs, different chokepoints, different vulnerabilities, and different theories of how the next decade would be fought. Both stacks were operational. Both stacks were incomplete. Both depended, in the last analysis, on whether the island that had built them would still be there to build them in the next decade.
In Hsinchu that week, the cleanroom-suit lockers filled and emptied on the same twelve-hour cycle they had run on for thirty-five years. In Tainan, Fab 18 ran N3 wafers around the clock. In Shanghai, SMIC’s 7-nanometer line ran around the clock too, on tools whose American technology was not legally supposed to be there and whose American-trained engineers had mostly stopped pretending to be Singaporean. In Hangzhou, on Huanglong Road, the offices stayed lit past midnight. Liang Wenfeng, somewhere in the building, did not appear at any window. The strait between his country and the island whose silicon his model still partially required was dark and quiet and crossed by a few fishing boats and the routine track of one PLA Navy destroyer on a slow southwest transit. The water under the destroyer was the same water it had been seventy-nine years before, when none of this had yet been invented and none of it had yet been at stake. The decisions that would shape the next decade were not, this time, going to be made by the water. They were going to be made by the people in the offices on either side of it. Most of them were already in those offices. The chip war had a beginning. It did not, on the evidence of the spring of 2026, yet have an end.