Nando de Freitas posts Interventional SFT method that prevents delusions in LLM agents through a one-line code change during supervised fine-tuning
Tests on over 30 prompts showed gains in factual accuracy.
The road here wasn’t easy. It started with our work on delusions with @AdaptiveAgents @ShaneLegg @scott_e_reed and many other bright scientists:
But instead of counterfactual learning, the theory of international imitation as a route to agency provided the foundation:
The research was accelerated by @OpenAI GPT5.5 and Codex. When I ran out of Pro credits 😅 I switched to @AnthropicAI Claude. I wish there were special LLM licenses for academic work @gdb @sama @DarioAmodei 🙏
The bottleneck for research these days is computational resources/energy. I’m glad that startups like @cusp_ai are addressing the energy challenges.
This research was possible thanks to my @CIFAR_News fellowship - the 🇨🇦 gift that keeps on giving - and my adjunct/associated professorships @UBC_CS and @CompSciOxford
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files
@AdaptiveAgents @ShaneLegg @scott_e_reed Typo: universal imitation, not international imitation 😅 🌌🌍
The road here wasn’t easy. It started with our work on delusions with @AdaptiveAgents @ShaneLegg @scott_e_reed and many other bright scientists: https://arxiv.org/pdf/2110.10819 But instead of counterfactual learning, the theory of international imitation as a route to agency provided the foundation: https://adaptiveagents.org/universal_ai_as_imitation The research was accelerated by @OpenAI GPT5.5 and Codex. When I ran out of Pro credits 😅 I switched to @AnthropicAI Claude. I wish there were special LLM licenses for academic work @gdb @sama @DarioAmodei 🙏 The bottleneck for research these days is computational resources/energy. I’m glad that startups like @cusp_ai are addressing the energy challenges. This research was possible thanks to my @CIFAR_News fellowship - the 🇨🇦 gift that keeps on giving - and my adjunct/associated professorships @UBC_CS and @CompSciOxford
Very excited about this! Just fine-tune on the observation tokens and ignore the action ones to treat the agent's output as a causal intervention.
This is one of those moments when I'm surprised the maths works in practice 😅.
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files
Very excited about this! Just fine-tune on the observation tokens and ignore the action ones to treat the agent's output as a causal intervention.
This is one of those moments where I'm surprised the maths works in practice 😅.
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files