Noam Shazeer on Building Language Models, the Transformer Origin Story, and Why Character AI Is the Future of Human-AI Interaction
One of the eight co-authors of "Attention Is All You Need" — the paper that created the Transformer architecture behind every modern AI model — explains how it happened, what he learned, and why he built Character.AI.
Top Claims — Verdict Check
The Transformer architecture was a team effort at Google that nobody expected to reshape the entire AI field
🟢 Real“We were trying to solve machine translation. The attention mechanism worked really well. We wrote it up. Nobody at Google — including us — predicted that this architecture would become the basis for everything from chatbots to protein folding. [representative paraphrase]”
Character.AI represents a fundamentally different interaction paradigm — people want to talk to AI personalities, not search engines
🟢 Real“People don't want to talk to a generic assistant. They want to talk to characters — a therapist, a writing partner, a historical figure, a friend. The personality layer is not a gimmick. It is the product. [representative paraphrase]”
Inference cost is the primary constraint on AI deployment, not model quality
🟢 Real“The models are good enough. The problem is serving them cheaply enough at the scale of billions of messages per day. Character.AI serves more messages than most AI companies because we obsess over inference efficiency. [representative paraphrase]”
Google had the Transformer, the data, and the compute to build ChatGPT years before OpenAI — and chose not to
🟢 Real“Google had everything needed to launch a consumer AI product years before ChatGPT. The institutional caution — the concern about reputation risk, about getting it wrong publicly — meant that a startup with fewer resources but more willingness to ship beat them to market. [representative paraphrase]”
AI companionship and entertainment will be a larger market than AI productivity tools
🟡 Partially True“Everyone in Silicon Valley talks about AI for productivity. But look at where people actually spend their time — entertainment, social connection, emotional support. AI that serves those needs has a larger addressable market than AI that makes spreadsheets faster. [representative paraphrase]”
What's Real
Shazeer is one of the most credible voices in AI because his contributions are documented in the foundational literature. He is a co-author of 'Attention Is All You Need' (Vaswani et al., 2017) — the paper that introduced the Transformer architecture now used by GPT, Claude, Gemini, Llama, and virtually every frontier AI model. He co-authored the mixture-of-experts paper that influenced model efficiency approaches. The man literally co-invented the technology. The Google inertia claim is corroborated by multiple sources: Google Brain and DeepMind had Transformer-based language models internally years before ChatGPT's November 2022 launch. LaMDA (the model that prompted Blake Lemoine's 'sentient AI' claims) was operational inside Google by 2021. Google's 'AI-first' strategy, announced by Pichai in 2016, predated OpenAI's consumer push by six years — but institutional risk aversion and the fear of cannibalizing Search revenue delayed public launch. The inference cost thesis is validated by Character.AI's actual metrics: the platform reportedly serves over 20,000 queries per second at peak, processing more messages daily than many larger AI companies. Achieving this required building custom inference infrastructure optimized for their specific model architecture.
What's Hype
The 'AI companionship is a larger market than productivity' thesis is Shazeer's business bet, not a demonstrated fact. Character.AI has massive engagement metrics (users average 2 hours per session, the app had 20M+ monthly active users by mid-2024) but converting engagement to revenue has been the company's central challenge. Despite impressive usage, Character.AI reportedly struggled to grow paid subscriptions beyond 100,000-200,000 users, leading to a complex deal in August 2024 where Google effectively re-acquired Shazeer and key team members through a licensing arrangement rather than an acquisition (to avoid regulatory scrutiny). This outcome — the co-inventor of the Transformer returning to Google because the companion AI business model couldn't sustain independence — is itself evidence that engagement doesn't automatically equal a viable business. The personality layer being 'the product' is compelling but creates safety and liability challenges that Shazeer downplays: Character.AI's legal issues with teenage users forming unhealthy emotional attachments, the difficulty of moderating millions of AI characters created by users, and the brand risk of AI personalities generating harmful content all suggest the product thesis has structural costs that scale with success.
What They Missed
The ethical implications of AI companionship at scale get minimal attention from Shazeer, which is notable given that his platform became the poster child for AI companion risks in 2024. The distinction between 'people want to talk to AI personalities' and 'people, including vulnerable teenagers, form unhealthy emotional dependencies on AI characters' is not a minor nuance — it's the central product design question that Character.AI failed to address proactively. The enterprise AI companion opportunity is underexplored: corporate training, customer service personas, and branded AI characters for businesses represent a potentially more sustainable and less ethically fraught market than consumer companion AI. The inference cost discussion also doesn't address the energy and environmental implications — serving billions of AI messages per day at scale has a carbon footprint that grows with adoption, and the efficiency gains are offset by volume growth. The open-source inference efficiency community (vLLM, TensorRT-LLM, GGML/llama.cpp) gets no mention despite driving much of the inference cost reduction that benefits the entire ecosystem, including Character.AI's competitors.
The One Thing
The co-inventor of the Transformer chose to build AI companions over productivity tools — that bet on human connection over efficiency tells you something profound about where AI value will accrue.
So What?
- Inference cost, not model quality, is the constraint that determines whether your AI feature is viable at scale — before building any AI product, calculate your cost per interaction at projected volume and verify it works with your unit economics
- The AI personality layer is a genuine product differentiator — if your product has a conversational AI interface, the character, tone, and personality of that AI matters as much as its accuracy. Invest in prompt engineering for persona, not just capability.
- Google's Transformer inertia lesson applies to every large company: having the best technology means nothing if institutional caution prevents you from shipping. Speed of deployment is a competitive advantage independent of technical capability.
Action Items
- 1Calculate your AI cost per interaction: take your monthly AI API spend, divide by total AI-powered interactions. If you're above $0.01 per interaction for high-volume features, investigate inference optimization (batching, caching, smaller models for simple tasks, model distillation). Character.AI's entire competitive advantage is built on this metric.
- 2Read the 'Attention Is All You Need' paper abstract and Section 3 (the architecture description) — it's a 20-minute investment in understanding the foundational technology behind every AI model you use. When vendors claim 'novel architecture,' you'll be able to assess what's actually new vs what's a Transformer variant.
- 3If your product has any AI chat interface, invest one day in persona design: write a character brief for your AI (personality, tone, knowledge boundaries, escalation triggers). Test 50 conversations against the persona brief. A well-defined AI character dramatically improves user satisfaction and reduces harmful edge cases.
Tools Mentioned
Character.AI
AI companion platform co-founded by Shazeer — massive engagement, challenging business model, cautionary tale for AI companion products
Transformer architecture
The foundational neural network architecture co-invented by Shazeer — powers GPT, Claude, Gemini, Llama, and virtually all modern AI
vLLM
Open-source inference optimization library — key tool for reducing the serving cost that Shazeer identifies as the primary constraint
Workflow Idea
Build an 'inference economics' dashboard for your AI product. Track four metrics daily: (1) total AI API calls, (2) total cost, (3) cost per interaction, (4) P95 latency. Set alerts when cost per interaction rises above your viability threshold. Once per month, evaluate whether any high-volume interactions can be served by smaller, cheaper models (e.g., use GPT-4o-mini or Claude Haiku for classification and routing, reserve Opus/GPT-4 for complex generation). Shazeer's core insight is that inference economics determine product viability — this dashboard makes that visible before your bill does.
Context & Connections
Agrees With
- satya-nadella
- sundar-pichai
- sam-altman
Contradicts
- gary-marcus
- tristan-harris
Further Reading
- Attention Is All You Need — Vaswani et al., 2017 (arXiv:1706.03762) — the Transformer paper that started it all
- Character.AI's journey — The Information and Bloomberg reporting on the Google licensing deal and business model challenges
- vLLM project documentation — the open-source inference optimization library that democratizes the efficiency Shazeer describes