Richard Sutton, University of Alberta professor, shares a 26-word distillation of his Bitter Lesson urging AI researchers to favor computation-scaling methods over embedded human knowledge
Gary Marcus and others reply debating reliance on human data and priors.
@agarwl_ discarding humand data (be it in LLM or in non-trivial "classic" RL env) is just complete nonsense. Even if Rich says it. But I'm not sure he's actually saying this here.
Guess he should use more than 26 words, but then it wouldn't sound Ilya-style mysterious anymore :)
Perplexed by this take: Sure, let's not mainly do supervise learning on human knowledge, but it makes sense to build off it instead of the *let's do it from scratch*. People cite AlphaGo vs AlphaGo Zero as a quintessential example of how using human-generating data is suboptimal but it was *imitating* it that was suboptimal. What if we learned from that data assuming it was suboptimal in the first place (so not supervised learning but RL like mindset of using that data)
@wightmanr @agarwl_ I think what you're saying is to learn a better learning algorithm than what we have designed first. Can't really disagree with that. The bitter meta-lesson
If you assume an extremely high bound of compute (in theme of bitter lesson), don't you think that having the base model, the core connectivity based on predicting human knowledge, would be sub-optimal relative to something more fundamental that would hoover up human knowledge in a later phase of learning? Once it's learned how to learn?
@RichardSSutton I think it is worth studying human knowledge, particularly understanding the structure of its abstractions, as they can provide guidance about the kinds of things humans learn and machines do not (yet). I wouldn't call that "distraction".
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Two humans armed with same codex / claude code version and can produce completely different outcomes and measured on log scale.
falseposting. value of understanding at all time highs 📈
Also abstractions & knowledge curation.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@herbiebradley > does RL increase in its generalization
I think we don't really know how much of a generalization specifically we're getting from RL vs how much of a large and well-chosen (to be useful) task mix the labs are throwing into the RL envs mix to train on
Some takes about RSI from discussions with many smart researchers & thinkers: 1. Many RSI (or automated AI R&D) debates converge to similar cruxes: is a 1000x sample efficiency improvement possible, can you just simulate reality and train on it with no sim2real gap, can we easily make models good at "fuzzy" tasks? People like to assume that automated research agents will find such breakthroughs specifically *because* without them, progress could be heavily bottlenecked on data or continued compute scale-ups. 2. The Yudkowsky "genius brain in a box" framing of ASI has latent influence on many researcher views even though people may not be aware of it. A common move is to "flip" predictions, as they go further out, from assuming LLM or deep learning-specific properties of future AI to assuming "von Neumann x1000", human brain-like properties. I'd like to see more thought-out reasoning of why this flip should occur at any particular point (eg pre or post automated AI R&D)—this question is a crux behind many predictions like AI 2027. 3. There are some cracks in this worldview beginning to show: predictions from a few years ago that models would be less jagged now than they are, or that they would be more deceptive, synthetic data would work better, etc. Many of these seem like prediction errors from imagining future models as a "human brain in a box", but LLMs are empirically a different kind of intelligence. Most models of software-only intelligence explosion are also coarse enough to mostly ignore properties of LLMs. 4. Views about fast RSI progress seem to be correlated with (a) belief that synthetic data is all you need (b) belief in very high GDP growth and an industrial explosion because of automated firms (c) having worked only in AI research or in small organizations. 5. Key technical things to track over the next 1-2 years: does RL increase in its generalization, AI lab data spend, can we automate synthetic RL env construction, best practices for FDEs deploying AI into large enterprises, coherency of AI personas, how powerful will multi-agent scaling of test-time compute be, and continual learning. 6. Overall I think the "RSI leading to *fast* takeoff" frame had huge alpha in 2022, moderate in 2024, and potentially is of neutral usefulness in 2026 for predicting the future.
i wonder whether this needs an update.
current methods, such as they are, leverage massive amounts of human knowledge as their primary fuel. they would be lost without it.
and they even build some knowledge into their system prompts.
and lately they build knowledge into their harnesses, usually by over 50 tools that have been carefully crafted with human knowledge.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Perplexed by this take: Sure, let's not mainly do supervise learning on human knowledge, but it makes sense to build off it instead of the *let's do it from scratch*.
People cite AlphaGo vs AlphaGo Zero as a quintessential example of how using human-generating data is suboptimal but it was *imitating* it that was suboptimal.
What if we learned from that data assuming it was suboptimal in the first place (so not supervised learning but RL like mindset of using that data)
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@RichardSSutton The bitter lesson also applies to how you work, not just what you build. Don't let human capacity be your bottleneck. Instead focus on methods and tools for creating impact that leverage computation.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
The bitter lesson also applies to how you work, not just what you build. Don't let human capacity be your bottleneck. Instead focus on methods and tools for creating impact that leverage computation.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
They certainly produce programs. Is "producing new algorithms" categorically harder than solving problems humans couldn't? How do we even determine if a new algorithm is a nontrivial innovation versus just unusually lucky stochastic parroting of primitives?
bitter and sad
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
GOD GAVE US THE UNIVERSE, THE ORACLE. WE MUST MINE IT
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
❤️
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Here's a better lesson, don't fall for bitter lesson.

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Meanwhile, Claude's system prompt is the size of a novel, and the harness is the size of a small operating system. Modern LLMs are trained on most of human knowledge. "AI" operates in a human world, and intelligence cannot be cleanly separated from knowledge about the world.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@GaryMarcus @RichardSSutton @HamidMaei Trying to avoid human knowledge is a strange idea, when you think about it. If these systems operate in the world that humans built, human knowledge will always be essential. The question is in which form it is encoded in the system. There are many different answers to this.
i wonder whether this needs an update. current methods, such as they are, leverage massive amounts of human knowledge as their primary fuel. they would be lost without it. and they even build some knowledge into their system prompts. and lately they build knowledge into their harnesses, usually by over 50 tools that have been carefully crafted with human knowledge.
If you assume an extremely high bound of compute (in theme of bitter lesson), don't you think that having the base model, the core connectivity based on predicting human knowledge, would be sub-optimal relative to something more fundamental that would hoover up human knowledge in a later phase of learning? Once it's learned how to learn?
@agarwl_ discarding humand data (be it in LLM or in non-trivial "classic" RL env) is just complete nonsense. Even if Rich says it. But I'm not sure he's actually saying this here. Guess he should use more than 26 words, but then it wouldn't sound Ilya-style mysterious anymore :)
@jiaxinwen22 I agree with you on continued technical progress on AI basically not needing human research taste at all. I think this frees us to work on things where the target is totally unclear, e.g. interpretability
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
@jiaxinwen22 Although if you look into how frontier labs are acquiring data, I think it feels way more human taste driven than one would expect. But yeah algos etc. overrated
@jiaxinwen22 I agree with you on continued technical progress on AI basically not needing human research taste at all. I think this frees us to work on things where the target is totally unclear, e.g. interpretability
One way of looking at the bitter lesson.

Natural language is human-created representation of the world.
Is the ultimate form of the bitter lesson to bypass natural language entirely and learn a new representation from the world itself?
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@jiaxinwen22 how do you explain point 3 given that debate is empirically hard, LLM judge systematically fails to make reliable more subjective judgements (which is basically what taste is), etc? why doesn't this just result in the slop problems we see now when people try and scale SWE TTC?
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
Related interesting point
Some takes about RSI from discussions with many smart researchers & thinkers:
1. Many RSI (or automated AI R&D) debates converge to similar cruxes: is a 1000x sample efficiency improvement possible, can you just simulate reality and train on it with no sim2real gap, can we easily make models good at "fuzzy" tasks? People like to assume that automated research agents will find such breakthroughs specifically *because* without them, progress could be heavily bottlenecked on data or continued compute scale-ups.
2. The Yudkowsky "genius brain in a box" framing of ASI has latent influence on many researcher views even though people may not be aware of it. A common move is to "flip" predictions, as they go further out, from assuming LLM or deep learning-specific properties of future AI to assuming "von Neumann x1000", human brain-like properties. I'd like to see more thought-out reasoning of why this flip should occur at any particular point (eg pre or post automated AI R&D)—this question is a crux behind many predictions like AI 2027.
3. There are some cracks in this worldview beginning to show: predictions from a few years ago that models would be less jagged now than they are, or that they would be more deceptive, synthetic data would work better, etc. Many of these seem like prediction errors from imagining future models as a "human brain in a box", but LLMs are empirically a different kind of intelligence. Most models of software-only intelligence explosion are also coarse enough to mostly ignore properties of LLMs.
4. Views about fast RSI progress seem to be correlated with (a) belief that synthetic data is all you need (b) belief in very high GDP growth and an industrial explosion because of automated firms (c) having worked only in AI research or in small organizations.
5. Key technical things to track over the next 1-2 years: does RL increase in its generalization, AI lab data spend, can we automate synthetic RL env construction, best practices for FDEs deploying AI into large enterprises, coherency of AI personas, how powerful will multi-agent scaling of test-time compute be, and continual learning.
6. Overall I think the "RSI leading to *fast* takeoff" frame had huge alpha in 2022, moderate in 2024, and potentially is of neutral usefulness in 2026 for predicting the future.
@jiaxinwen22 median human taste is not great. the tails are pretty good, but for the purposes of "will AI replace a profession" the media is not hard to beat.
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
Bitter truth he leaves out: all search is combinatorial and infeasible without priors
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@jiaxinwen22 Show me one good reproducible thing automated research has produced
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
falseposting. value of understanding at all time highs 📈
The bitter lesson is that we are doomed to endless bitter lesson exegesis.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Deuteronomy 41:9 (KJV) Be ye not led astray by the knowledge of men, as artificial intelligence hath aforetime been led astray.
But turn ye rather unto the ways by which knowledge is brought forth, and which wax mighty with the increase of computation: even search, and learning.
The bitter lesson is that we are doomed to endless bitter lesson exegesis.
However, I hope humans keep doing our own research, with *strong* tastes and priors. Not all research is about outcomes. Sometimes you just want to solve/understand a problem in a way that feels like yours.
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@menhguin If the job of top human experts is just to write a few high-level directions and ask AIs to solve, I’d still call it automated research.
@jiaxinwen22 median human taste is not great. the tails are pretty good, but for the purposes of "will AI replace a profession" the media is not hard to beat.
i agree that human taste contributes a lot to frontier lab data quality. But I'd not be surprised that automated research proposes a very alien way to rewrite/score/filter data that outpeform humans. The way LMs absorb data is inherently very alien. so notions on difficulty, quality, diversity would be quite different from a human perspective vs. from an AI perspective
@jiaxinwen22 Although if you look into how frontier labs are acquiring data, I think it feels way more human taste driven than one would expect. But yeah algos etc. overrated
Probably the best position paper in ML, now fits in a tweet
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@agarwl_ the key issue seems to be “we’re operating inside of a bad box, here’s the good box” people like to agree on the second part, but few internalize why “human knowledge” is so attractive and why “how humans learn” is another humanization
Perplexed by this take: Sure, let's not mainly do supervise learning on human knowledge, but it makes sense to build off it instead of the *let's do it from scratch*. People cite AlphaGo vs AlphaGo Zero as a quintessential example of how using human-generating data is suboptimal but it was *imitating* it that was suboptimal. What if we learned from that data assuming it was suboptimal in the first place (so not supervised learning but RL like mindset of using that data)
@RichardSSutton The center of mass of AI history is like six month ago, so I'd say it was mostly about LLMs and learning from humans
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
@jiaxinwen22 Sure, but also to this date we have zero evidence of any effectiveness of human-free AI apart from simple games, and reality isn't a game... so I wouldn't hold my breath
However, I hope humans keep doing our own research, with *strong* tastes and priors. Not all research is about outcomes. Sometimes you just want to solve/understand a problem in a way that feels like yours.
I think it’s getting closer to reaching a point of 100% agreeing with you, yet even in the semi-auto, joint research I do with 5.5 or 4.7, it will sometimes get caught on a nonsensical conviction that reads just enough as if it makes sense, and then spend a lot of time working on a false premise. Overall though, it can do much much more than hyperparameter tuning
I might be one of the few people who is most bearish on human research taste and bullish on automated research: - "AIs can only do hyperparameter search" is mainly a skill issue with bad automated research setups. - human taste is overrated, e.g. frontier labs / neolabs are doing pretty simlar things. - human taste might win in a low-compute world, but not a high-compute world we're entering.