amazon
/

MistralLite

Text Generation

text-generation-inference

Model card Files Files and versions Community

yinsong1986 commited on Oct 16, 2023

Commit

1cd6a22

•

1 Parent(s): 2ae0e2b

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -15,7 +15,11 @@ MistralLight evolves from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mis
 ## Motivation of Developing MistralLite
-Since the release of [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the model became increasingly popular because its strong performance on a wide range of benchmarks. But most of the benchmarks are evaluated on `short context`, and not much has been investigated on its performance on long context tasks. Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer context. Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens, there were some discrepencies on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and got `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
 ### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
 |Model Name|Input length| Input length | Input length| Input length| Input length|

 ## Motivation of Developing MistralLite
+Since the release of [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the model became increasingly popular because its strong performance
+on a wide range of benchmarks. But most of the benchmarks are evaluated on `short context`, and not much has been investigated on its performance on long context tasks.
+Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer context.
+Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
+there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and got `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
 ### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
 |Model Name|Input length| Input length | Input length| Input length| Input length|