/AI8d ago

ETH Zürich's Tiberiu Mușat proves that fixed-precision neural network weight norm is equivalent to Kolmogorov complexity

The equivalence explains why deep networks generalize rather than memorize.

--0--
Original posts
Quote posts
Comments
Reposts
Tiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions?

In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data).

The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights.

https://arxiv.org/abs/2605.10878

2:07 AM · May 27, 2026 · 105.6K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS68.5KBOOKMARKS367LIKES432RETWEETS39REPLIES4
Taco Cohen@TacoCohen

As prophesied by the venerable I. Sutskever

Tiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions?

In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data).

The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights.

https://arxiv.org/abs/2605.10878

7dViews 68.5KLikes 432Bookmarks 367