yinsong1986 commited on
Commit
c2d54e2
1 Parent(s): ba4d664

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -21,14 +21,14 @@ Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifi
21
  Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
22
  there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
23
 
24
- ### [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) ###
25
  |Model Name|Input length| Input length | Input length| Input length| Input length|
26
  |----------|-------------:|-------------:|------------:|-----------:|-----------:|
27
  | | 2851| 5568 |8313 | 11044 | 13780
28
- | Mistral-7B-Instruct-v0.1 | 90% | 0% | 0% | 0% | 0% |
29
  | MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
30
 
31
- ### [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results) ###
32
 
33
  |Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
34
  |----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
@@ -36,7 +36,7 @@ there were some limitations on its performance on longer context. Motivated by i
36
  | Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
37
  | MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
38
 
39
- ### [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101) ###
40
 
41
  |Model Name|Input length| Input length | Input length| Input length|
42
  |----------|-------------:|-------------:|------------:|-----------:|
@@ -44,7 +44,7 @@ there were some limitations on its performance on longer context. Motivated by i
44
  | Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
45
  | MistralLite | **100%** | **100%** | **100%** | **100%** |
46
 
47
- ### [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/) ###
48
  |Model Name| Test set Accuracy | Hard subset Accuracy|
49
  |----------|-------------:|-------------:|
50
  | Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |
 
21
  Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
22
  there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
23
 
24
+ 1. [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
25
  |Model Name|Input length| Input length | Input length| Input length| Input length|
26
  |----------|-------------:|-------------:|------------:|-----------:|-----------:|
27
  | | 2851| 5568 |8313 | 11044 | 13780
28
+ | Mistral-7B-Instruct-v0.1 | 100% | 50% | 2% | 0% | 0% |
29
  | MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
30
 
31
+ 2. [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results)
32
 
33
  |Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
34
  |----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
 
36
  | Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
37
  | MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
38
 
39
+ 3. [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101)
40
 
41
  |Model Name|Input length| Input length | Input length| Input length|
42
  |----------|-------------:|-------------:|------------:|-----------:|
 
44
  | Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
45
  | MistralLite | **100%** | **100%** | **100%** | **100%** |
46
 
47
+ 4. [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/)
48
  |Model Name| Test set Accuracy | Hard subset Accuracy|
49
  |----------|-------------:|-------------:|
50
  | Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |