Update README.md
Browse files
README.md
CHANGED
@@ -31,15 +31,8 @@ This model employs [Partial NTK Rope Scaling](https://github.com/jquesnelle/scal
|
|
31 |
Each method will require replacing the `LlamaEmbedding` with `LlamaPartNTKScaledRotaryEmbedding`, with `max_position_embeddings=16384`. A monkeypatch can be found here.
|
32 |
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
The easiest way is to use [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) with ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4.
|
38 |
-
|
39 |
-
If you wish to use AutoGPTQ/GPTQ-for-Llama instead, you'll need to patch in the appropriate RoPE scaling module. see: [replace_llama_rope_with_scaled_rope](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch.py)
|
40 |
-
|
41 |
## Motivation
|
42 |
-
|
43 |
|
44 |
## Relative Performance (perplexity)
|
45 |
| Model | Context (tokens) | Perplexity |
|
|
|
31 |
Each method will require replacing the `LlamaEmbedding` with `LlamaPartNTKScaledRotaryEmbedding`, with `max_position_embeddings=16384`. A monkeypatch can be found here.
|
32 |
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
## Motivation
|
35 |
+
Methods of extending the useful context window of LLM's have gained significant traction. Several methods requiring little to no finetuning/retraining have emerged. Among these is linear position interpolation (https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) and NTK aware scaling. My prior experiments demonstrate significant performance improvements both from finetuning with these scaling adjustments implemented **and** with longer sequences.
|
36 |
|
37 |
## Relative Performance (perplexity)
|
38 |
| Model | Context (tokens) | Perplexity |
|