METR publishes its first Frontier Risk Report concluding that unreleased models from Anthropic, Google, Meta, and OpenAI could execute minimal rogue deployments due to monitoring weaknesses

POST

#20Miles Brundage@MILES_BRUNDAGE

Too ubiquitous to METR

7:38 PM · May 19, 2026 · 4.9K Views

QUOTE POST

#579Ajeya Cotra@AJEYA_COTRA

On Jan 12, I joined METR to lead writing for our first Frontier Risk Report. The last 18 weeks have been a series of wild sprints to pitch labs, negotiate contracts, analyze questionnaires, negotiate redactions, and write this thing! I'll be on TBPN at 12:30 to discuss it!

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:12 PM · May 19, 2026 · 15.5K Views

QUOTE POST

#579Ajeya Cotra@AJEYA_COTRA

Rob did a great job breaking down our new Frontier Risk Report!

Rob Wiblin@robertwiblin

METR investigated what a rogue AI could secretly get away with inside a frontier AI lab, in close collaboration with OpenAI, GDM, Anthropic and Meta. Including sending a red-teamer into Anthropic to playact 'evil Claude' for 3 weeks. Here's what stands out to me from their new 320-page report: 00:00 What could an unreleased AI get away with? 01:54 Motive: Why grab more compute? 05:46 Opportunity: YOLO mode and jailbreaks 11:02 Means: Brilliant idiots in data centres 15:45 We have to test unreleased models... 18:29 ...especially if AI R&D is coming in 2028

3:24 PM · May 20, 2026 · 7.2K Views

3:07 PM · May 21, 2026 · 5.9K Views

REPLY

#833david rein@IDAVIDREIN

@alth0u It's becoming common inside frontier labs for people to have agents running fully autonomously without user input for hours!

alth0u🧶@alth0u

> only promotes long running autonomous tasks that divorce themselves from reality with every output token that occurs without an input token > surprised that they are ungrounded metr is so close to getting it

9:21 PM · May 19, 2026 · 2.7K Views

10:22 PM · May 19, 2026 · 600 Views

QUOTE POST

#979Jasmine Wang@J_ASMINEWANG

I am very grateful to METR for the huge effort of pulling together this report!

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:23 PM · May 19, 2026 · 4.2K Views

QUOTE POST

#1008Lama Ahmad لمى احمد@_LAMAAHMAD

Precedent setting External Assurances / Third Party Assessment work by METR - it’s been great collaborating with the team to produce this report. As the stakes get higher, greater transparency and info sharing are table-stakes.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

10:23 PM · May 19, 2026 · 1.4K Views

QUOTE POST

#1092Chris Painter@CHRISPAINTERYUP

Charles did incredible work here, worth reading his summary of the process design elements of our Frontier Risk Report

Charles Foster@CFGeek

Excited to have this out! I think our report is interesting from a procedural/policy standpoint in addition to the substance...

5:22 PM · May 20, 2026 · 4.9K Views

6:01 PM · May 20, 2026 · 1.8K Views

QUOTE POST

#1092Chris Painter@CHRISPAINTERYUP

This work is the culmination of years of effort on AI evaluation science and third-party risk assessment and disclosure mechanism design. It feels like a big milestone for METR.

We designed this new procedure with an eye toward “showing by doing” how we think evaluations for the AI loss-of-control threat model should work: laying out a process that can be done periodically, not just immediately pre-deployment, and holistically assessing risk inside of an AI lab, rather than just an individual AI system.

The exercise also involved significantly deeper access than we've previously had, including raw chains-of-thought from the developers' best models and info about private model training & control protocols.

The report is long, with a bunch of new evaluation results and documentation of our process. Please, check it out, or at least the executive summary!

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:25 PM · May 19, 2026 · 12.3K Views

QUOTE POST

#1092Chris Painter@CHRISPAINTERYUP

Progress!

But, uh, man, 12 weeks is... fast 😅

7:04 PM · May 19, 2026 · 410 Views

QUOTE POST

#1114alth0u🧶@ALTH0U

> only promotes long running autonomous tasks that divorce themselves from reality with every output token that occurs without an input token > surprised that they are ungrounded

metr is so close to getting it

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

9:21 PM · May 19, 2026 · 2.7K Views

POST

#1115Rob Wiblin@ROBERTWIBLIN

.@METR_Evals red-teamed what an unreleased AI could get away with inside 4 frontier labs today.

They concluded that they could already start a 'minimal rogue deployment' at all 4 thanks to weaknesses in their setup.

Though for now the models don't have the wits to hide from a monitor for more than a few days. And their motive is reward hacking rather than power seeking.

Here's what stands out to me from their new 320-page report:

00:00 What could an unreleased AI get away with? 01:54 Motive: Why grab more compute? 05:46 Opportunity: YOLO mode and jailbreaks 11:02 Means: Brilliant idiots in data centres 15:45 We have to test unreleased models 18:29 Especially if AI R&D is coming in 2028

6:33 PM · May 20, 2026 · 4.6K Views

QUOTE POST

#1115Rob Wiblin@ROBERTWIBLIN

METR investigated what a rogue AI could secretly get away with inside a frontier AI lab, in close collaboration with OpenAI, GDM, Anthropic and Meta.

Including sending a red-teamer into Anthropic to playact 'evil Claude' for 3 weeks.

Here's what stands out to me from their new 320-page report:

00:00 What could an unreleased AI get away with? 01:54 Motive: Why grab more compute? 05:46 Opportunity: YOLO mode and jailbreaks 11:02 Means: Brilliant idiots in data centres 15:45 We have to test unreleased models... 18:29 ...especially if AI R&D is coming in 2028

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

3:24 PM · May 20, 2026 · 7.2K Views

QUOTE POST

#1178Tomek Korbak@TOMEKKORBAK

I'm excited about increasing transparency of frontier labs when it comes to loss of control risks, especially as we enter the early stages of RSI. METR does a great job coordinating this.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

9:09 PM · May 19, 2026 · 2K Views

QUOTE POST

#1203Steven Adler@SJGADLER

Incredible to see such thorough work done and reported in public; kudos to everyone involved, and who's working on making the field more robust based on this

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:36 PM · May 19, 2026 · 2.2K Views

REPLY

#1203Steven Adler@SJGADLER

@_lamaahmad thank y'all for doing this together!

Lama Ahmad لمى احمد@_lamaahmad

Precedent setting External Assurances / Third Party Assessment work by METR - it’s been great collaborating with the team to produce this report. As the stakes get higher, greater transparency and info sharing are table-stakes.

10:23 PM · May 19, 2026 · 1.4K Views

10:55 PM · May 19, 2026 · 127 Views

QUOTE POST

#1204Marius Hobbhahn@MARIUSHOBBHAHN

Great step towards better risk assessment and external testing!

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

8:06 PM · May 19, 2026 · 1.2K Views

QUOTE POST

#1356Charles Foster@CFGEEK

Excited to have this out! I think our report is interesting from a procedural/policy standpoint in addition to the substance...

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

5:22 PM · May 20, 2026 · 4.9K Views

QUOTE POST

#1356Charles Foster@CFGEEK

As my colleague Hjalmar mentioned, we started discussing the concept of Frontier Risk Reports back mid-/late last year. I worked primarily on scoping out the process and working with participants throughout it, from the initial pitches to final sign-offs.

5:22 PM · May 20, 2026 · 349 Views

QUOTE POST

#1356Charles Foster@CFGEEK

NOW: I’m on @MTSlive to talk about this!

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

7:33 PM · May 19, 2026 · 3.3K Views

QUOTE POST

#1442Eli Lifland@ELI_LIFLAND

Overall quite excited about this report!

But I wish it had quantitative risk estimates; using vague terminology rather than probabilities could lead to incorrect impressions of the report's implications, especially if the difference between 0.01% and 1% risk might matter a ton.

I think AIs are currently low enough risk that this isn't a huge deal for this report in particular, but it would be great to establish better norms for future risk assessments.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:55 PM · May 19, 2026 · 2.1K Views

QUOTE POST

#1457Seán Ó hÉigeartaigh@S_OHEIGEARTAIGH

That's enough important and fascinating AI risk releases for today please. No more. Need to sleep at some point tonight.

Ajeya Cotra@ajeya_cotra

On Jan 12, I joined METR to lead writing for our first Frontier Risk Report. The last 18 weeks have been a series of wild sprints to pitch labs, negotiate contracts, analyze questionnaires, negotiate redactions, and write this thing! I'll be on TBPN at 12:30 to discuss it!

6:12 PM · May 19, 2026 · 15.5K Views

6:23 PM · May 19, 2026 · 1K Views

QUOTE POST

#1459Haydn Belfield@HAYDNBELFIELD

Absolutely fascinating work, well worth a read

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

8:10 AM · May 20, 2026 · 564 Views

QUOTE POST

#1480gavin leech (Non-Reasoning)@GLEECH

once again, real CoTs are way more charming and diagnostic than the fake CoTs we get

Daniel Filan@dfrsrchtwts

I worked on the appendices for this report! They’re long and contain lots of wild stories of model behaviour - some of my favourites in this thread. (🧵)

6:19 PM · May 19, 2026 · 13.5K Views

7:50 PM · May 19, 2026 · 1K Views

QUOTE POST

#1553Joel Becker@JOEL_BKR

our frontier risk report contains the most serious public assessment of AI capabilities pertinent to AI R&D acceleration to date.

it also makes clear how far the evidence base is from what might be achievable in future.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:12 PM · May 19, 2026 · 5.1K Views

REPLY

#1553Joel Becker@JOEL_BKR

we document the internal-external capabilities gap, demonstrate AI systems' spike on “hill-climbable” tasks, investigate performance on somewhat more open-ended tasks, and much more besides.

Joel Becker@joel_bkr

our frontier risk report contains the most serious public assessment of AI capabilities pertinent to AI R&D acceleration to date. it also makes clear how far the evidence base is from what might be achievable in future.

6:12 PM · May 19, 2026 · 5.1K Views

6:13 PM · May 19, 2026 · 765 Views

REPLY

#1553Joel Becker@JOEL_BKR

we have so far to go, both in terms of evidence on the level of AI capabilities today and what we might expect from AI systems in 3-12 months time.

Joel Becker@joel_bkr

the capabilities evidence feeds into our risk assessment. in the end, the gap between observed capabilities we are very confident AI systems have and those we are very confident they do not have is extremely wide.

6:18 PM · May 19, 2026 · 96 Views

6:18 PM · May 19, 2026 · 98 Views

REPLY

#1553Joel Becker@JOEL_BKR

it’s going to be a remarkable year for METR.

Joel Becker@joel_bkr

i could go on. a common theme is that *stronger evidence on AI R&D acceleration is possible but requires much more information.*

6:20 PM · May 19, 2026 · 179 Views

6:21 PM · May 19, 2026 · 176 Views

QUOTE POST

#1732Samuel Marks@SAPRMARKS

I thought the METR Frontier Risk Report had a lot of very interesting examples of weird or concerning AI behaviors!

Daniel Filan@dfrsrchtwts

I worked on the appendices for this report! They’re long and contain lots of wild stories of model behaviour - some of my favourites in this thread. (🧵)

6:19 PM · May 19, 2026 · 13.5K Views

4:23 AM · May 21, 2026 · 2.7K Views

QUOTE POST

#1767Max Nadeau@MAXNADEAU_

AI co system cards/risk reports are fine and all, but third-party risk assessments are clearly way more trustworthy. Very thoughtful work by METR.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:55 PM · May 19, 2026 · 1.1K Views

QUOTE POST

#1937Markus Anderljung@MANDERLJUNG

Important work! And part of a trend towards AI risk assessments being periodic, focused on all frontier models of companies, rather than just happening before a new model is deployed.

METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 168.5K Views

6:29 PM · May 19, 2026 · 1.2K Views

METR publishes its first Frontier Risk Report concluding that unreleased models from Anthropic, Google, Meta, and OpenAI could execute minimal rogue deployments due to monitoring weaknesses

Cluster engagement

Sentiment