Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +5 -4
data/needle.png +2 -2
data/topic.png +0 -0

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ pipeline_tag: text-generation
 <div align="center">
 <h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
-<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon/new/docs/llama3-8b-instruct-qlora-80k.md">[Data&Code]</a>
 </div>
 We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
@@ -27,9 +27,9 @@ We evaluate the model on [LongBench](https://arxiv.org/abs/2308.14508) using 32K
 |Model|Single-Doc QA|Multi-Doc QA|Summarization|Few-Shot Learning|Synthetic|Code|
 |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
-|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|37.33|36.04|26.83|69.56|37.75|53.24|
 |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|37.29|31.20|26.18|67.25|44.25|**62.71**|
-|[Llama-3-8B-Instruct-80K-QLoRA]()|**43.57**|**43.07**|**28.93**|**69.15**|**48.50**|51.95|
 ## InfiniteBench
 We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
@@ -88,7 +88,6 @@ base_model = AutoModelForCausalLM.from_pretrained(
   # NOTE: expand rope base
   rope_theta=200e6,
-  max_position_embeddings=81920,
 )
 model = PeftModel.from_pretrained(
@@ -119,3 +118,5 @@ with torch.no_grad():
   print(f"Answers:      {example['answer']}")
   print(f"Prediction:   {tokenizer.decode(outputs[0])}")
 ```

 <div align="center">
 <h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
+<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/">[Data&Code]</a>
 </div>
 We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
 |Model|Single-Doc QA|Multi-Doc QA|Summarization|Few-Shot Learning|Synthetic|Code|
 |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|37.33|36.04|26.83|**69.56**|37.75|53.24|
 |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|37.29|31.20|26.18|67.25|44.25|**62.71**|
+|[Llama-3-8B-Instruct-80K-QLoRA]()|**43.57**|**43.07**|**28.93**|69.15|**48.50**|51.95|
 ## InfiniteBench
 We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
   # NOTE: expand rope base
   rope_theta=200e6,
 )
 model = PeftModel.from_pretrained(
   print(f"Answers:      {example['answer']}")
   print(f"Prediction:   {tokenizer.decode(outputs[0])}")
 ```
+You may observe messages like:
+`This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (8192). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` or `Setting pad_token_id to eos_token_id:128001 for open-end generation`. They do not matter. Just ignore them.

data/needle.png CHANGED Viewed

Git LFS Details

SHA256: 3ef5f7561f20bcea38effa1b22488121c8b43f56cdc1f9cce379f271a747fcaa
Pointer size: 132 Bytes
Size of remote file: 1.47 MB

Git LFS Details

SHA256: 259f2e322baf6af1d6121e9f46e6b3c8d6ffaf378b5f838ecf2bbb4df87e78b7
Pointer size: 130 Bytes
Size of remote file: 70.2 kB

data/topic.png ADDED Viewed