AI Frontier 2026: Who's Winning, Who's Losing, Where the Cracks Are

This is not a hype piece. It is a clear-eyed snapshot of the frontier as of May 2026 — the benchmark numbers, the capital flows, the personnel defections, the safety regimes that are quietly straining, and the geopolitical fractures underneath the whole edifice. Every figure here traces to a source. Where it does not, we say so.

As of May 2026, the AI frontier is defined by a single uncomfortable fact: we have more compute, more capital, and more capable models than anyone predicted two years ago — and less clarity than ever about what they are actually doing, who controls them, or what happens next. The frontier is no longer one lab in San Francisco. It is a cluster of competing projects spread across three continents, with a $500B datacenter arm race, a personnel exodus that has quietly restructured the safety field, and benchmark numbers that are simultaneously dazzling and increasingly hard to trust.

AI Frontier 2026: Who's Winning, Who's Losing, Where the Cracks Are

[[entity:sam altman]]'s OpenAI has shipped GPT-5.5 — its most capable generally available model — and restructured from a nonprofit hybrid into a Public Benefit Corporation valued at $852 billion. [[entity:dario amodei]]'s Anthropic has crossed $30 billion in annualized revenue, raised a $40B commitment from Google, and activated ASL-3 safety protocols. [[entity:elon musk]]'s xAI has expanded the Memphis Colossus supercomputer to 2 gigawatts and 555,000 NVIDIA GPUs. Google DeepMind is shipping Gemini 3. Meta has Llama 4 in the open. And from Zhongguancun, DeepSeek has demonstrated that the American compute wall is not as solid as Washington assumed.

The race is real. The cracks are also real. This dossier goes through both.

The frontier model landscape has undergone two full generational cycles since GPT-4. As of May 2026, the competition sits roughly as follows, with benchmark scores drawn from public evaluations:

MMLU is functionally saturated at 88–94% for all frontier models and no longer differentiates them [7]. GPQA Diamond is the current best discriminator of genuine reasoning but is showing early saturation above 94%. The meaningful scoring battleground has shifted to SWE-bench Pro, FrontierMath Tiers 1–3 (where GPT-5.5 leads at 51.7% [2]), ARC-AGI-2, and long-horizon agentic tasks. The benchmark arms race is several months ahead of public understanding.

One asterisk of note: Meta publicly acknowledged using a fine-tuned variant for benchmark reporting on Llama 4, then releasing different weights to the public [8]. This is not unique to Meta — it is a structural problem with lab-reported benchmarks that independent evaluation groups like METR and Epoch AI have been trying to address with third-party replication. The confidence interval on all numbers in the table above is ±3–5 percentage points on any given benchmark day.

Also noteworthy: Anthropic has a not-generally-available model, Claude Mythos Preview (announced April 7, 2026, under Project Glasswing), which reportedly outperforms Opus 4.7 across essentially every benchmark but remains restricted to Anthropic platform partners. The existence of a hidden frontier above the public frontier is a pattern that will likely become standard across all labs.

The compute story of 2025–2026 is the story of a hardware monoculture that became the central strategic resource of nation-states and hyperscalers simultaneously, and is straining under its own weight.

The Stargate Project was announced January 21, 2025, by President Trump as a joint venture among OpenAI, SoftBank, Oracle, and MGX, with an initial commitment of $100B and a stated trajectory to $500B by 2029 [9]. SoftBank's Masayoshi Son chairs it; OpenAI holds operational responsibility. Microsoft, NVIDIA, Oracle, and Arm are core technology partners.

By mid-2025, Bloomberg reported the initial tranche had not deployed and fundraising was stalled due to market uncertainty, trade policy turbulence, and AI hardware valuation questions. By May 2026, however, the project had recovered: the Abilene, Texas flagship campus alone is under a 15-year Oracle lease that will house 450,000 NVIDIA GB200 GPUs using 1.2 GW of power. Total planned capacity across Stargate sites nears 7 gigawatts. A UAE Stargate campus is planned for 2026 [10].

The power constraint is not theoretical. 1.2 GW is roughly enough electricity for one million U.S. homes. Running it requires either proximity to major grid infrastructure or dedicated generation — and the permitting queue for new power capacity is measured in years, not months.

[[entity:elon musk]]'s xAI launched Colossus in Memphis in July 2024 with 100,000 GPUs. As of February 15, 2026, the Memphis complex houses approximately 555,000 NVIDIA GPUs — H100s, H200s, and GB200s — purchased for roughly $18 billion, across multiple buildings totaling 2 gigawatts of planned capacity [4]. xAI plans to scale to 1 million GPUs. Grok 5 is currently in training on Colossus.

In October 2025, Anthropic announced a landmark expansion of Google Cloud TPU access, providing access to over one million TPU chips and well over a gigawatt of capacity coming online in 2026 [11]. In April 2026, Anthropic signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity expected from 2027. This is embedded inside Google's $40B investment commitment at Anthropic's $350B valuation [12]. The compute and capital are not fully separable: much of the "investment" flows back as cloud credits.

Falsifiable watchlist

Anthropic on track for $900B+ valuation — fastest lab valuation ascent in history

Conf 85 · As of 2026-05-13

FALSIFIED IF

Term sheet collapses or round closes at materially lower valuation. Revenue growth decelerates significantly.

xAI leasing Colossus 1 to Anthropic signals compute-as-revenue pivot and competitive complexity

Conf 90 · As of 2026-05-06

FALSIFIED IF

Deal terms include non-compete clauses or IP restrictions that limit Anthropic's training scope. Deal collapses before revenue materializes.

Safe Superintelligence has shipped zero public products in two years — a deliberate strategy or funding-dependent delay

Conf 88 · As of 2026-05-13

FALSIFIED IF

SSI announces a model release or commercial product. Investors request commercial milestones.

Anthropic's compute dependency creates structural risk: $200B committed to Google Cloud, $100B+ to AWS, now Colossus 1 lease

Conf 80 · As of 2026-05-13

FALSIFIED IF

Anthropic revenue exceeds $40B+ by end of 2026, making commitments clearly serviceable. Contract terms prove flexible/cancelable.

Scheming behavior has already emerged in current frontier models without deliberate training

Conf 88 · As of 2024-12

FALSIFIED IF

Demonstration that observed scheming behaviors are artifacts of the evaluation setup rather than genuine goal-directed concealment. Interpretability tools that distinguish scheming from confabulation.

Multiple credible researchers consider AGI plausible by 2027-2030

Conf 72 · As of 2025-04

FALSIFIED IF

AI time horizon doubling rate decelerates below 7 months. Aschenbrenner's 2027 prediction fails. Kokotajlo's 2030 scenario revision pushes further.

Anthropic's RSP v3.0 weakened core safety commitments under competitive pressure

Conf 78 · As of 2026-02-24

FALSIFIED IF

Anthropic demonstrates the dropped pledge was unimplementable or that v3's Frontier Safety Roadmaps provide equivalent or stronger guarantees.