Jan Leike
Alignment Science Lead, Anthropic
The alignment researcher who led OpenAI's Superalignment team, resigned publicly over safety concerns, and joined Anthropic to continue the work.
Credentials
Alignment Science Lead at Anthropic, former co-lead of OpenAI's Superalignment team, PhD from Australian National University, former researcher at DeepMind, published extensively on AI alignment and reward learning
Why They Matter
Leike is at the frontier of the most important unsolved problem in AI: making sure powerful AI systems actually do what humans want. His very public resignation from OpenAI — stating they weren't taking safety seriously enough — was a wake-up call for the industry. If alignment fails, nothing else in AI matters. For business leaders, Leike's work determines whether AI tools you rely on will remain trustworthy as they get more powerful.
Positions
AI Timeline View
Superintelligence could arrive within this decade. We have a narrow window to solve alignment before systems become too powerful to control.
Safety Stance
Key Beliefs
Superalignment — aligning AI systems smarter than humans — is a solvable technical problem, but it requires dedicated resources and urgency.
OpenAI has deprioritised safety in favour of commercial products, breaking its core mission.
Jan Leike resignation thread on X (Twitter)
AI companies must invest at least 20% of compute into safety research, not just capabilities.
OpenAI Superalignment proposal (20% compute commitment)
Scalable oversight — using AI to help humans supervise AI — is one of the most promising paths to alignment.
Anthropic and OpenAI research papers on scalable oversight
Controversial Take
Resigned from OpenAI in May 2024 with a public statement saying "safety culture and processes have taken a backseat to shiny products" — one of the most high-profile safety resignations in AI history. His departure, alongside Ilya Sutskever, crystallised concerns about OpenAI's direction.
Track Record
How well have Jan Leike's predictions held up?
OpenAI will not maintain its commitment to dedicating 20% of compute to superalignment
Made: 2024
The Superalignment team was effectively dissolved after Leike and Sutskever's departures. The 20% compute commitment was not honoured.
RLHF (reinforcement learning from human feedback) would become a standard technique for aligning language models
Made: 2017
RLHF is now the standard approach used by OpenAI, Anthropic, Google, and others to fine-tune language models.
Key Quotes
“Over the past years, safety culture and processes have taken a backseat to shiny products.”
“Building smarter-than-human machines is an inherently dangerous endeavour. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.”
“I believe much more of our bandwidth should be spent getting ready for superintelligence.”
“The core challenge of superalignment is: how do you steer and verify a system that is smarter than you?”
Publications
AI alignment research overview (DeepMind technical report)
2021
Connections
Agrees With
Ilya Sutskever
on AI alignment cannot be properly pursued under commercial pressure — both left OpenAI over this
Dario Amodei
on Safety-focused AI development requires a dedicated organisation (Leike joined Anthropic)
Yoshua Bengio
on Alignment is an unsolved problem that needs significantly more investment
Debate Participation
Jan Leike appears in these AI debates:
Last updated: 2026-03-26
←Back to AI Minds Directory