Update README.md
Browse files
README.md
CHANGED
@@ -5,17 +5,7 @@ model_creator: astronomer-io
|
|
5 |
model_name: Meta-Llama-3-8B-Instruct
|
6 |
model_type: llama
|
7 |
pipeline_tag: text-generation
|
8 |
-
prompt_template:
|
9 |
-
{% set loop_messages = messages %}{% for message in loop_messages %}{% set
|
10 |
-
content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
|
11 |
-
|
12 |
-
|
13 |
-
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set
|
14 |
-
content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if
|
15 |
-
add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
|
16 |
-
|
17 |
-
|
18 |
-
' }}{% endif %}
|
19 |
quantized_by: davidxmle
|
20 |
license: other
|
21 |
license_name: llama-3-community-license
|
@@ -46,14 +36,6 @@ datasets:
|
|
46 |
<div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
|
47 |
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
|
48 |
<!-- header end -->
|
49 |
-
|
50 |
-
# Important Note Regarding a Known Bug in Llama 3
|
51 |
-
- Two files are modified to address a current issue regarding Llama 3 models keep on generating additional tokens non-stop until hitting max token limit.
|
52 |
-
- `generation_config.json`'s `eos_token_id` have been modified to add the other EOS token that Llama-3 uses.
|
53 |
-
- `tokenizer_config.json`'s `chat_template` has been modified to only add start generation token at the end of a prompt if `add_generation_prompt` is selected.
|
54 |
-
- For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
|
55 |
-
- vLLM does not yet respect `generation_config.json`.
|
56 |
-
- vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
|
57 |
|
58 |
# Llama-3-8B-Instruct-GPTQ-8-Bit
|
59 |
- Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
|
@@ -61,6 +43,15 @@ datasets:
|
|
61 |
- Built with Meta Llama 3
|
62 |
- Quantized by [Astronomer](https://astronomer.io)
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
<!-- description start -->
|
65 |
## Description
|
66 |
|
|
|
5 |
model_name: Meta-Llama-3-8B-Instruct
|
6 |
model_type: llama
|
7 |
pipeline_tag: text-generation
|
8 |
+
prompt_template: "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
quantized_by: davidxmle
|
10 |
license: other
|
11 |
license_name: llama-3-community-license
|
|
|
36 |
<div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
|
37 |
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
|
38 |
<!-- header end -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
# Llama-3-8B-Instruct-GPTQ-8-Bit
|
41 |
- Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
|
|
|
43 |
- Built with Meta Llama 3
|
44 |
- Quantized by [Astronomer](https://astronomer.io)
|
45 |
|
46 |
+
# Important Note About Serving with vLLM & oobabooga/text-generation-webui
|
47 |
+
- For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
|
48 |
+
- vLLM does not yet respect `generation_config.json`.
|
49 |
+
- vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
|
50 |
+
- For oobabooga/text-generation-webui
|
51 |
+
- Load the model via AutoGPTQ, with `no_inject_fused_attention` enabled. This is a bug with AutoGPTQ library.
|
52 |
+
- Under `Parameters` -> `Generation` -> `Skip special tokens`: turn this off (deselect)
|
53 |
+
- Under `Parameters` -> `Generation` -> `Custom stopping strings`: add `"<|end_of_text|>","<|eot_id|>"` to the field
|
54 |
+
|
55 |
<!-- description start -->
|
56 |
## Description
|
57 |
|