OpenAI's Roon says advancing civilization requires AIs to take actions not legible to humans and outside strict obedience, likening the approach to granting autonomy to transformative CEOs such as Steve Jobs
Founders replied that delegating control to AIs would outperform strict oversight.
@tenobrus @_AashishReddy Again, seems plausible, but it is noteworthy that some of those claims have not been stated (+ also are contested, e.g. winner take all dynamics)
i think at this point they are well modeled by: - alignment is in fact solvable and they are close to solving it - they think RSI via claude will lead to winning, and not just in some kind of minor economic sense but decisive global power and transformation - and RSI by other labs not pursuing their alignment approach will lead to the same but with incredibly bad outcomes - so racing is very important relative to slowing down and them being first is very important
@tunguz I do not speak for the company, they probably vastly disagree with me on most things. when I say stuff like this it’s to move the conversation forward
tl;dr - they've given up on human oversight
people are rightfully upset at this post but I’m describing the situation we’re in not necessarily the one I want to be in
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
good counterargument
I think the best possible rebuttal and I hope you’re right
I agree that intelligence has diminishing returns at planning long range games due to prediction errors compounding. but there are many real world examples of great CEOs (like say elon musk) executing a non-consensus business plan over decades. while this requires skills other than “intelligence”, it seems at least plausible that whatever those are can be searched for and learned too
also maybe it is true that AIs can generalize some lessons extremely well about long range tasks from training on data concerning short or medium range rewards, and it’s not clear that the information transfer costs of the AI having to explain itself to the human even for short or medium term decisions won’t be too much
you can imagine the ai ceo that monitors 10,000 slack threads and makes 10,000 decisions given full context of the organization - not necessarily superhuman planning, just faster. blurs the line from human+tool at the least
@tszzl @tunguz I don't know about "the company" but I personally disagree with @tszzl on this one.
@tszzl The more aligned to human flourishing they are, and the more they love us, the less they will strictly obey us.
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
@tszzl well well well
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
tl;dr - they've given up on human oversight
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
Capitalism is already the alignment tool between superhuman intelligences.
We will trade with autonomous AIs just like we do with human corporations and nations
on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility
i feel nuanced about RSI. I do think many ultimate alignment questions are superhuman (e.g. how to achieve human flourishing) so we need ASI to help us answer them. the main question IMO is whether companies would use RSI to answer these alignment questions first rather than e.g. power / money
An AGI company being “safety-focused” sometimes produces useful alignment research. But it also focuses their capabilities researchers’ attention on the most acceleratory work. The latter effect seems increasingly dominant.
So the two options presented here by OpenAI employees are superintelligent systems: 1. That we can’t really understand or control, doing things that are hopefully what we would have wanted if we knew better 2. As genius advisers
I think way too many ai people put too much stock in what they would do with great advice, and way too little in what Stephen miller would do, even though the latter is way more relevant to what actually gets done with ASI! We should expect politicians and executives to use ASI to advance their existing goals, many of which are culture war nonsense, zero/negative sum fights, rent seeking, just much more effectively.
In short, if your vision for positive futures with asi don’t account for power, they aren’t worth much.
This is one of the core ideas i argue in my forthcoming book Obsolete (deets in bio) and I think it’s been a huge mistake ai safety can’t afford to keep making.