namespace-Pt commited on
Commit
611e3ef
1 Parent(s): 798986a

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -34,12 +34,12 @@ We evaluate the model on [LongBench](https://arxiv.org/abs/2308.14508) using 32K
34
  ## InfiniteBench
35
  We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT-4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
36
 
37
- |Model|LongBookQA Eng|LongBookSum Eng|
38
- |:-:|:-:|:-:|
39
- |GPT-4|22.22|14.73|
40
- |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|7.00|**16.40**|
41
- |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|20.30|10.34|
42
- |[Llama-3-8B-Instruct-80K-QLoRA]()|**30.92**|14.73|
43
 
44
  ## Topic Retrieval
45
  We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.
 
34
  ## InfiniteBench
35
  We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT-4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
36
 
37
+ |Model|LongBookQA Eng|LongBookSum Eng|KV Retrieval|
38
+ |:-:|:-:|:-:|:-:|
39
+ |GPT-4|22.22|14.73|**89.00**|
40
+ |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|7.00|**16.40**|5.60|
41
+ |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|20.30|10.34|6.40|
42
+ |[Llama-3-8B-Instruct-80K-QLoRA]()|**30.92**|14.73|51.20|
43
 
44
  ## Topic Retrieval
45
  We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.