Post
1127
Started a new AI Session: The AI Paper Talk Show π§ π€π₯
In this episode we went through AnthropicAI's recent interpretability paper "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" in which they applied Sparse Dictionary Learning on a larger model (Claude 3 Sonnet) - wherein they match patterns of neuron activations (named Features) to human interpretable meanings.
Check full video here: https://youtu.be/uNz-Ww3_LrU?si=HUm2TWV-rSJ3X4UX
Read More:
https://transformer-circuits.pub/2024/scaling-monosemanticity/
You can also find me:
Twitter: https://x.com/jaykef_
Github: https://github.com/Jaykef
In this episode we went through AnthropicAI's recent interpretability paper "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" in which they applied Sparse Dictionary Learning on a larger model (Claude 3 Sonnet) - wherein they match patterns of neuron activations (named Features) to human interpretable meanings.
Check full video here: https://youtu.be/uNz-Ww3_LrU?si=HUm2TWV-rSJ3X4UX
Read More:
https://transformer-circuits.pub/2024/scaling-monosemanticity/
You can also find me:
Twitter: https://x.com/jaykef_
Github: https://github.com/Jaykef