Exocortex

Research Paper

2017238,000+ citations

NeurIPS 2017 · arXiv:1706.03762

Read →

Vaswani, Shazeer, Parmar et al. — Google Brain

Attention Is All You Need

The 2017 paper that eliminated recurrence and convolutions entirely, replacing them with pure self-attention. Result: 27.5 BLEU on English-to-German translation, 41.1 on English-to-French — superior quality, faster training, and a parallelisable architecture that scaled to GPT-4, Gemini, and everything since. Now tracking toward the most-cited paper in history at 238,000+ citations.

Why →

Transformer: 238,000+ citations, no recurrence, no convolutions. There is a before and after this paper. Every AI system you interact with today runs on the architecture it introduced.

Research Paper

201492,000+ citations

NeurIPS 2014 · arXiv:1406.2661

Read →

Ian Goodfellow, Jean Pouget-Abadie et al. — Université de Montréal

Generative Adversarial Networks

Two neural networks trained in adversarial opposition: a generator that learns to produce increasingly realistic outputs, a discriminator that learns to tell real from fake. Conceived in a single evening, written in a week, and cited 92,000+ times. This framework underpins virtually all generative image AI, from early deepfakes to Stable Diffusion.

Why →

The conceptual leap that made generative AI possible. Reportedly conceived in a single evening and written in a week — and it changed everything.

Research Paper

2012120,000+ citations

NeurIPS 2012

Read →

Krizhevsky, Sutskever, Hinton — University of Toronto

ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)

AlexNet won the 2012 ImageNet competition by a margin so large it shocked the computer vision community — raw pixels to superhuman classification via deep CNNs on GPUs. DeepMind's first deep RL paper, published the following year, extended the same principle: learn directly from raw pixels, beat human experts across 49 Atari games. Two papers, one revolution.

Why →

Raw pixels to superhuman performance. The single most consequential experiment in modern AI history — everything that followed traces back to this result.

Research Paper

202068,500+ citations

NeurIPS 2020 · arXiv:2005.14165

Read →

Brown, Mann, Ryder et al. — OpenAI

Language Models are Few-Shot Learners (GPT-3)

The paper introducing GPT-3, a 175-billion-parameter language model that demonstrated remarkable few-shot learning — the ability to perform new tasks from just a handful of examples in the prompt, with no gradient updates. This was the first model to make the general public seriously reckon with what large language models could do, and it set the template for the ChatGPT era.

Why →

GPT-3 was the first model that felt like something genuinely new. This paper is the record of that moment.

Essay

2019860+ (academic) · millions of reads citations

incompleteideas.net

Read →

Richard Sutton

The Bitter Lesson

70 years of AI research condensed into 1,700 words. Sutton's thesis: general methods that leverage computation always beat human-knowledge-based approaches. The "bitter lesson" for researchers is that their clever, domain-specific solutions are always eventually surpassed by brute-force scale. Written in 2019, it predicted the LLM era before it arrived.

Why →

70 years of AI prove: computation beats human knowledge. The most important 1,700-word essay in AI. Read it, then re-read it after every AI breakthrough.

Research Paper

201624,400+ citations

Nature, Vol. 529 · 2016

Read →

Silver, Huang, Maddison et al. — DeepMind

Mastering the Game of Go with Deep Neural Networks and Tree Search (AlphaGo)

The paper documenting AlphaGo's defeat of European Go champion Fan Hui — the first time a computer program beat a professional human player at Go. The system combined deep convolutional networks for position evaluation with Monte Carlo tree search, trained on both human expert games and self-play. A landmark in AI capability that the world watched in real time.

Why →

Go was considered AI-hard for decades. AlphaGo's victory changed the field's sense of what was possible — and when.

Research Paper

20207,600+ citations

arXiv:2001.08361

Read →

Kaplan, McCandlish, Henighan et al. — OpenAI

Scaling Laws for Neural Language Models

GPT-3's secret: model performance scales predictably as a power law with compute, dataset size, and parameter count — and the three can be traded off against each other. This OpenAI paper gave the AI industry its roadmap, justifying the massive investment in scaling that produced GPT-3, GPT-4, and every frontier model since. The blueprint for modern AI development.

Why →

This paper is why every major lab spent billions scaling up. Understanding it is understanding the strategic logic of the AI race.

Research Paper

202224,000+ citations

NeurIPS 2022 · arXiv:2203.02155

Read →

Ouyang, Wu, Jiang et al. — OpenAI

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

The paper behind ChatGPT. InstructGPT introduced RLHF (Reinforcement Learning from Human Feedback) as a practical method for aligning language models with human intent. By fine-tuning GPT-3 on human preference data and then training with a reward model, the authors produced a model that was dramatically more helpful and less harmful than the base model — despite being 100x smaller.

Why →

RLHF is what turned a text predictor into an assistant. This paper is the technical foundation of every chat AI.

Essay

2024Millions of reads · widely cited in policy citations

situational-awareness.ai

Read →

Leopold Aschenbrenner

Situational Awareness: The Decade Ahead

A 165-page essay by former OpenAI researcher Leopold Aschenbrenner, arguing that AGI is likely by 2027, that the US-China AI race is the defining geopolitical contest of the decade, and that AI labs are dangerously under-secured. Released in June 2024, it became the most-discussed AI document of the year, read by policymakers, investors, and researchers worldwide.

Why →

Whether you agree with it or not, this essay shaped the conversation about AI risk, national security, and the AGI timeline more than any other document in 2024.

Research Paper

201320,300+ citations

NIPS Workshop 2013 · arXiv:1312.5602

Read →

Mnih, Kavukcuoglu, Silver et al. — DeepMind

Playing Atari with Deep Reinforcement Learning

The paper that launched modern deep reinforcement learning. DeepMind's DQN agent learned to play 49 Atari games directly from raw pixel inputs, using only the game score as reward — no hand-crafted features, no game-specific knowledge. It achieved superhuman performance on 29 of them. This was the proof of concept that a single algorithm could master diverse tasks from raw sensory data.

Why →

The paper that made the world take DeepMind seriously, and that made reinforcement learning a mainstream research direction.

Essay

2015Tens of millions of reads citations

Wait But Why

Read →

Tim Urban

The AI Revolution: The Road to Superintelligence

A two-part, deeply researched long-form essay that introduced the concepts of AGI, superintelligence, and existential AI risk to a mainstream audience. Tim Urban's signature style — rigorous research, irreverent humour, hand-drawn diagrams — made ideas from Bostrom's Superintelligence accessible to millions of non-technical readers. Elon Musk shared it widely; it remains the most-read popular introduction to AI risk.

Why →

This is the essay that made AI risk a mainstream conversation. If you want to understand why people are worried, start here.

Research Paper

201895,000+ citations

NAACL 2019 · arXiv:1810.04805

Read →

Devlin, Chang, Lee, Toutanova — Google AI Language

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT introduced the pre-training / fine-tuning paradigm that now dominates NLP. By pre-training a transformer bidirectionally on masked language modelling and next sentence prediction, then fine-tuning on downstream tasks, BERT achieved state-of-the-art results on 11 NLP benchmarks simultaneously. It established that a single pre-trained model could be adapted to almost any language task.

Why →

BERT is the model that proved transfer learning works at scale in NLP. It is the direct ancestor of every modern language model.

Essay

2020Widely cited in AI research community citations

gwern.net

Read →

Gwern Branwen

The Scaling Hypothesis

Gwern's prescient essay arguing that simply scaling up neural networks — more parameters, more data, more compute — would produce qualitatively new capabilities, not just quantitative improvements. Written when GPT-2 was the frontier model, it predicted the emergence of GPT-3 and beyond. The essay is a masterclass in reasoning from first principles about a technology's trajectory.

Why →

Gwern called the scaling era before it happened. This essay is required reading for understanding why the field bet so heavily on scale.

Research Paper

2015200,000+ citations

CVPR 2016 · arXiv:1512.03385

Read →

He, Zhang, Ren, Sun — Microsoft Research

Deep Residual Learning for Image Recognition (ResNet)

The paper introducing residual connections — skip connections that allow gradients to flow directly through layers, enabling the training of networks hundreds of layers deep. ResNet won the 2015 ImageNet competition with a 152-layer network and a 3.57% top-5 error rate. Residual connections are now a standard component of virtually every deep learning architecture, including transformers.

Why →

One of the most cited papers in all of computer science. Residual connections are in everything — you are using ResNet ideas every time you use a modern AI system.

Essay

2021Millions of reads · widely cited in policy citations

moores.samaltman.com

Read →

Sam Altman

Moore's Law for Everything

Sam Altman's vision essay arguing that AI will soon drive a Moore's Law-style compression of costs across every domain — labour, healthcare, education, housing — and that this will require new economic structures to distribute the gains. Written before ChatGPT, it reads as a blueprint for how OpenAI's CEO thinks about AI's civilisational impact and the policy responses it demands.

Why →

The clearest statement of the techno-optimist case for AI. Understand this essay to understand the worldview driving the leading AI labs.

Research Paper

20223,200+ citations

arXiv:2212.08073

Read →

Bai, Jones, Ndousse et al. — Anthropic

Constitutional AI: Harmlessness from AI Feedback

Anthropic's paper introducing Constitutional AI — a method for training AI systems to be helpful and harmless using a written set of principles (a "constitution") and AI-generated feedback, rather than relying solely on human labellers. The technique underlies Claude and represents a significant advance in scalable alignment: using AI to supervise AI, guided by explicit human values.

Why →

Constitutional AI is Anthropic's answer to the alignment problem. This paper is the technical foundation of the approach that produced Claude.

Research Paper

202130,000+ citations

Nature, Vol. 596 · 2021

Read →

Jumper, Evans, Pritzel et al. — DeepMind

Highly Accurate Protein Structure Prediction with AlphaFold

AlphaFold 2 solved the 50-year-old protein folding problem — predicting a protein's 3D structure from its amino acid sequence with atomic accuracy. DeepMind's system achieved a median score of 92.4 on the CASP14 benchmark, far exceeding all prior methods. The paper is widely considered the most significant scientific application of AI to date, with direct implications for drug discovery and biology.

Why →

The moment AI moved from beating humans at games to solving real scientific problems. The most important AI paper outside of NLP.

Article

20239,000+ citations

arXiv:2303.18223

Read →

Zhao, Zhou, Li et al. — Renmin University of China

A Survey of Large Language Models

The most comprehensive survey of large language models available, covering pre-training, fine-tuning, alignment, evaluation, and application across 200+ pages. The paper tracks the evolution from early language models through GPT-3, ChatGPT, and GPT-4, and provides a structured taxonomy of the field. Widely used as a reference by researchers and practitioners entering the LLM space.

Why →

If you want a single document that maps the entire LLM landscape, this is it. The most-cited survey paper in the field.

Research Paper

202017,000+ citations

NeurIPS 2020 · arXiv:2006.11239

Read →

Ho, Jain, Abbeel — UC Berkeley

Denoising Diffusion Probabilistic Models

The paper that established diffusion models as the dominant paradigm for generative image AI. Ho et al. showed that a model trained to iteratively denoise images could generate high-quality samples competitive with GANs, with better training stability and mode coverage. This work is the direct foundation of Stable Diffusion, DALL-E 2, Midjourney, and every modern image generation system.

Why →

Stable Diffusion, Midjourney, DALL-E — they all run on the ideas in this paper. The foundation of the generative image revolution.

Article

20234,800+ citations

arXiv:2303.12528

Read →

Bubeck, Chandrasekaran, Eldan et al. — Microsoft Research

Sparks of Artificial General Intelligence: Early experiments with GPT-4

A 154-page paper from Microsoft Research presenting early experiments with GPT-4 and arguing that it shows "sparks of AGI" — the ability to reason, plan, and solve problems across domains in ways that go beyond pattern matching. The paper sparked intense debate about what GPT-4 actually understands, and whether current LLMs are approaching general intelligence. One of the most-read AI papers of 2023.

Why →

The paper that made the AGI debate mainstream. Whether you agree with its conclusions or not, it defined the conversation of 2023.

Top 20 AI Reads

Attention Is All You Need

Generative Adversarial Networks

ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)

Language Models are Few-Shot Learners (GPT-3)

The Bitter Lesson

Mastering the Game of Go with Deep Neural Networks and Tree Search (AlphaGo)

Scaling Laws for Neural Language Models

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

Situational Awareness: The Decade Ahead

Playing Atari with Deep Reinforcement Learning

The AI Revolution: The Road to Superintelligence

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The Scaling Hypothesis

Deep Residual Learning for Image Recognition (ResNet)

Moore's Law for Everything

Constitutional AI: Harmlessness from AI Feedback

Highly Accurate Protein Structure Prediction with AlphaFold

A Survey of Large Language Models

Denoising Diffusion Probabilistic Models

Sparks of Artificial General Intelligence: Early experiments with GPT-4