namespace-Pt
commited on
Commit
•
1e36ec7
1
Parent(s):
cc68800
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -6,9 +6,10 @@ pipeline_tag: text-generation
|
|
6 |
<div align="center">
|
7 |
<h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
|
8 |
|
9 |
-
|
10 |
</div>
|
11 |
|
|
|
12 |
|
13 |
|
14 |
# Evaluation
|
@@ -43,17 +44,17 @@ We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) u
|
|
43 |
## Topic Retrieval
|
44 |
We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.
|
45 |
|
46 |
-
<img src="data/
|
47 |
|
48 |
|
49 |
## MMLU
|
50 |
We evaluate the model's zero-shot performance on MMLU benchmark as a reflection of its short-context capability.
|
51 |
|
52 |
-
|Model
|
53 |
-
|
54 |
-
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
|
55 |
-
|[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)
|
56 |
-
|[Llama-3-8B-Instruct-80K-QLoRA]()
|
57 |
|
58 |
# Environment
|
59 |
```bash
|
|
|
6 |
<div align="center">
|
7 |
<h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
|
8 |
|
9 |
+
<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon/new/docs/llama3-8b-instruct-qlora-80k.md">[Data&Code]</a>
|
10 |
</div>
|
11 |
|
12 |
+
We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
|
13 |
|
14 |
|
15 |
# Evaluation
|
|
|
44 |
## Topic Retrieval
|
45 |
We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.
|
46 |
|
47 |
+
<img src="data/topic_retrieval.png"></img>
|
48 |
|
49 |
|
50 |
## MMLU
|
51 |
We evaluate the model's zero-shot performance on MMLU benchmark as a reflection of its short-context capability.
|
52 |
|
53 |
+
|Model|STEM|Social Sciences|Humanities|Others|Avg|
|
54 |
+
|:-:|:-:|:-:|:-:|:-:|:-:|
|
55 |
+
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|0.5387|0.7566|0.6944|0.6975|0.6591|
|
56 |
+
|[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|0.5210|0.7326|0.6715|0.6980|0.6434|
|
57 |
+
|[Llama-3-8B-Instruct-80K-QLoRA]()|0.5310|0.7324|0.6732|0.6879|0.6444|
|
58 |
|
59 |
# Environment
|
60 |
```bash
|