As the title states
deepseek-ai/DeepSeek-V2-Lite-Chat adopts MLA (a novel attention mechanism) while deepseek-ai/deepseek-moe-16b-chat still uses MHA.
· Sign up or log in to comment