yinsong1986 commited on
Commit
1cd6a22
1 Parent(s): 2ae0e2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -15,7 +15,11 @@ MistralLight evolves from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mis
15
 
16
  ## Motivation of Developing MistralLite
17
 
18
- Since the release of [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the model became increasingly popular because its strong performance on a wide range of benchmarks. But most of the benchmarks are evaluated on `short context`, and not much has been investigated on its performance on long context tasks. Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer context. Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens, there were some discrepencies on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and got `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
 
 
 
 
19
 
20
  ### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
21
  |Model Name|Input length| Input length | Input length| Input length| Input length|
 
15
 
16
  ## Motivation of Developing MistralLite
17
 
18
+ Since the release of [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the model became increasingly popular because its strong performance
19
+ on a wide range of benchmarks. But most of the benchmarks are evaluated on `short context`, and not much has been investigated on its performance on long context tasks.
20
+ Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer context.
21
+ Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
22
+ there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and got `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
23
 
24
  ### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
25
  |Model Name|Input length| Input length | Input length| Input length| Input length|