How is this model different from Llama 2-7B?

by dheerajpai - opened Sep 28, 2023

Discussion

dheerajpai

Sep 28, 2023

As title^

NS-Y

Sep 28, 2023

It's better :)

YaTharThShaRma999

Sep 28, 2023

Also, it has gqa which other llama 7b models don’t in their architecture. The architecture is very similar but not the same. Also, it’s pretrained on different data.

timlacroix

Mistral AI_ org Sep 28, 2023

GQA and Sliding Window Attention are the visible differences which should help increase inference throughput and context length.

timlacroix changed discussion status to closed Sep 28, 2023

dheerajpai

Sep 28, 2023

Is this model pre-trained from scratch? Just curious.

dheerajpai changed discussion status to open Sep 28, 2023

devendrachaplot

Mistral AI_ org Sep 28, 2023

Yes, it is pre-trained from scratch.

nps798

Oct 3, 2023

for me, it respond well in chinese. For llama 7b, whenever I ask in chinese, it somewhat understand my question by respond in English.

iNeverLearnedHowToRead

Oct 5, 2023

It's way better.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment