How is this model different from Llama 2-7B?
#8
by
dheerajpai
- opened
As title^
It's better :)
Also, it has gqa which other llama 7b models don’t in their architecture. The architecture is very similar but not the same. Also, it’s pretrained on different data.
GQA and Sliding Window Attention are the visible differences which should help increase inference throughput and context length.
timlacroix
changed discussion status to
closed
Is this model pre-trained from scratch? Just curious.
dheerajpai
changed discussion status to
open
Yes, it is pre-trained from scratch.
for me, it respond well in chinese. For llama 7b, whenever I ask in chinese, it somewhat understand my question by respond in English.
It's way better.