Community

[R] Causal self-attention as a probabilistic model over embeddings

Via r/MachineLearning

Tuesday, Mar 24, 2026 · 4:37AM

Summary

We’ve been working on a probabilistic interpretation of causal self-attention where token embeddings are treated as latent variables. In that view, the attention map induces a change-of-variables term, which leads to a barrier / degeneracy boundary in embedding space. The resulting picture is: a sta

Continue reading the full article

Read at r/MachineLearning

www.reddit.com