Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
JawardΒ 
posted an update May 30
Post
1127
Started a new AI Session: The AI Paper Talk Show πŸ§ πŸ€–πŸ’₯

In this episode we went through AnthropicAI's recent interpretability paper "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" in which they applied Sparse Dictionary Learning on a larger model (Claude 3 Sonnet) - wherein they match patterns of neuron activations (named Features) to human interpretable meanings.

Check full video here: https://youtu.be/uNz-Ww3_LrU?si=HUm2TWV-rSJ3X4UX

Read More:
https://transformer-circuits.pub/2024/scaling-monosemanticity/

You can also find me:
Twitter: https://x.com/jaykef_
Github: https://github.com/Jaykef

Amazing! Would love to feature you on the HF discord and get some more visibility for your talks, I think our reading group might interest you!
https://discord.gg/hugging-face-879548962464493619

Β·

Sure, thanks. Will join in.