/AI8d ago

ETH Zürich's Tiberiu Mușat proves that fixed-precision neural network weight norm is equivalent to Kolmogorov complexity

The equivalence explains why deep networks generalize rather than memorize.

--0--

Original posts

Quote posts

Comments

Reposts

Original post

Tiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions?

In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data).

The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights.

https://arxiv.org/abs/2605.10878

2:07 AM · May 27, 2026 · 105.6K Views

/AI8d ago

ETH Zürich's Tiberiu Mușat proves that fixed-precision neural network weight norm is equivalent to Kolmogorov complexity

The equivalence explains why deep networks generalize rather than memorize.

--0--

Original posts

Quote posts

Comments

Reposts

Original post

Tiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions?

The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights.

https://arxiv.org/abs/2605.10878

2:07 AM · May 27, 2026 · 105.6K Views

Sentiment

Users praise the preprint linking minimum neural weight norm to Kolmogorov complexity for offering insightful connections to regularization and proofs about non-zero parameters.

Pos

100.0%

Neg

0.0%

7 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS68.5KBOOKMARKS367LIKES432RETWEETS39REPLIES4

Taco Cohen@TacoCohen

As prophesied by the venerable I. Sutskever

Tiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions?

The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights.

https://arxiv.org/abs/2605.10878

7d68.5K432367