ResearcherCautiousTier 2

Jan Leike

Alignment Science Lead, Anthropic

The alignment researcher who led OpenAI's Superalignment team, resigned publicly over safety concerns, and joined Anthropic to continue the work.

Credentials

Alignment Science Lead at Anthropic, former co-lead of OpenAI's Superalignment team, PhD from Australian National University, former researcher at DeepMind, published extensively on AI alignment and reward learning

Why They Matter

Leike is at the frontier of the most important unsolved problem in AI: making sure powerful AI systems actually do what humans want. His very public resignation from OpenAI — stating they weren't taking safety seriously enough — was a wake-up call for the industry. If alignment fails, nothing else in AI matters. For business leaders, Leike's work determines whether AI tools you rely on will remain trustworthy as they get more powerful.

Positions

AI Timeline View

Superintelligence could arrive within this decade. We have a narrow window to solve alignment before systems become too powerful to control.

Safety Stance

Cautious

Key Beliefs

Superalignment — aligning AI systems smarter than humans — is a solvable technical problem, but it requires dedicated resources and urgency.

OpenAI Superalignment team launch announcement

OpenAI has deprioritised safety in favour of commercial products, breaking its core mission.

Jan Leike resignation thread on X (Twitter)

AI companies must invest at least 20% of compute into safety research, not just capabilities.

OpenAI Superalignment proposal (20% compute commitment)

Scalable oversight — using AI to help humans supervise AI — is one of the most promising paths to alignment.

Anthropic and OpenAI research papers on scalable oversight

Controversial Take

Resigned from OpenAI in May 2024 with a public statement saying "safety culture and processes have taken a backseat to shiny products" — one of the most high-profile safety resignations in AI history. His departure, alongside Ilya Sutskever, crystallised concerns about OpenAI's direction.

Track Record

How well have Jan Leike's predictions held up?

OpenAI will not maintain its commitment to dedicating 20% of compute to superalignment

Made: 2024

The Superalignment team was effectively dissolved after Leike and Sutskever's departures. The 20% compute commitment was not honoured.

Right

RLHF (reinforcement learning from human feedback) would become a standard technique for aligning language models

Made: 2017

RLHF is now the standard approach used by OpenAI, Anthropic, Google, and others to fine-tune language models.

Right

Key Quotes

“Over the past years, safety culture and processes have taken a backseat to shiny products.”
X (Twitter) resignation thread (2024-05-17)

“Building smarter-than-human machines is an inherently dangerous endeavour. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.”
X (Twitter) resignation thread (2024-05-17)