jklj077 commited on
Commit
d167268
1 Parent(s): 347606f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -6
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: other
3
- license_name: tongyi-qianwen
4
  license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  language:
6
  - en
@@ -61,7 +61,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
 
62
  prompt = "Give me a short introduction to large language model."
63
  messages = [
64
- {"role": "system", "content": "You are a helpful assistant."},
65
  {"role": "user", "content": prompt}
66
  ]
67
  text = tokenizer.apply_chat_template(
@@ -84,11 +84,25 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
84
 
85
  ### Processing Long Texts
86
 
87
- To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
88
-
89
- For deployment, we recommend using vLLM. Please refer to our [Github](https://github.com/QwenLM/Qwen2.5) for usage if you are not familar with vLLM.
 
 
 
 
 
 
 
 
 
 
 
90
 
91
- **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
 
 
 
92
 
93
  ## Evaluation & Performance
94
 
 
1
  ---
2
  license: other
3
+ license_name: qwen
4
  license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  language:
6
  - en
 
61
 
62
  prompt = "Give me a short introduction to large language model."
63
  messages = [
64
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
65
  {"role": "user", "content": prompt}
66
  ]
67
  text = tokenizer.apply_chat_template(
 
84
 
85
  ### Processing Long Texts
86
 
87
+ The current `config.json` is set for context length up to 32,768 tokens.
88
+ To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
89
+
90
+ For supported frameworks, you could add the following to `config.json` to enable YaRN:
91
+ ```json
92
+ {
93
+ ...,
94
+ "rope_scaling": {
95
+ "factor": 4.0,
96
+ "original_max_position_embeddings": 32768,
97
+ "type": "yarn"
98
+ }
99
+ }
100
+ ```
101
 
102
+ For deployment, we recommend using vLLM.
103
+ Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
104
+ Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
105
+ We advise adding the `rope_scaling` configuration only when processing long contexts is required.
106
 
107
  ## Evaluation & Performance
108