Gavin Leech, UK-based AI researcher and co-founder of the consultancy Arb, notes vision models are roughly 1000 times smaller than text models owing to language's data compression through compositional semantics and abstractions
Replies linked the gap to potential image-based chain-of-thought reasoning.
——0——
@1a3orn CoT in pictures but not words would be quite neat
this is a relevant consideration for projecting how Lindy intelligible CoT is likely to be
2:31 PM · May 17, 2026 · 4.3K Views
4:15 PM · May 17, 2026 · 257 Views
QUOTE POST
#13801a3orn@1A3ORN
this is a relevant consideration for projecting how Lindy intelligible CoT is likely to be
one of the major failures of my life was being so surprised to find out that vision models were ~1000x smaller than text models. Just total failure to understand language's god-tier data compression
9:00 AM · May 17, 2026 · 189.2K Views
2:31 PM · May 17, 2026 · 4.3K Views