What's the diff with deepseek-ai/deepseek-moe-16b-chat ?

#3
by JohnSaxon - opened

As the title states

deepseek-ai/DeepSeek-V2-Lite-Chat adopts MLA (a novel attention mechanism) while deepseek-ai/deepseek-moe-16b-chat still uses MHA.

pk.jpg

Sign up or log in to comment