wenbopan commited on
Commit
4868f40
1 Parent(s): 8601645

Complete how-to-use

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -17,7 +17,8 @@ Faro-Yi-9B-200K is an improved [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-2
17
 
18
  ## How to Use
19
 
20
- Faro-Yi-9B-200K uses chatml template. I recommend using vLLM for long inputs.
 
21
 
22
  ```python
23
  import io
@@ -25,7 +26,7 @@ import requests
25
  from PyPDF2 import PdfReader
26
  from vllm import LLM, SamplingParams
27
 
28
- llm = LLM(model="wenbopan/Faro-Yi-9B-200K")
29
 
30
  pdf_data = io.BytesIO(requests.get("https://arxiv.org/pdf/2303.08774.pdf").content)
31
  document = "".join(page.extract_text() for page in PdfReader(pdf_data).pages) # 100 pages
@@ -39,6 +40,7 @@ print(output[0].outputs[0].text)
39
  # Faro-Yi-9B-200K: GPT-4 does not have a publicly disclosed parameter count due to the competitive landscape and safety implications of large-scale models like GPT-4. ...
40
  ```
41
 
 
42
  <details> <summary>Or With Transformers</summary>
43
 
44
  ```python
 
17
 
18
  ## How to Use
19
 
20
+ Faro-Yi-9B-200K uses the chatml template and performs well in both short and long contexts. For longer inputs, I recommend to use vLLM to have a max prompt of 32K under 24GB of VRAM. Setting `kv_cache_dtype="fp8_e5m2"` allows for 48K input length. 4bit-AWQ quantization on top of that can boost input length to 160K, albeit with some performance impact. Adjust `max_model_len` arg in vLLM or `config.json` to avoid OOM.
21
+
22
 
23
  ```python
24
  import io
 
26
  from PyPDF2 import PdfReader
27
  from vllm import LLM, SamplingParams
28
 
29
+ llm = LLM(model="wenbopan/Faro-Yi-9B-200K", kv_cache_dtype="fp8_e5m2", max_model_len=100000)
30
 
31
  pdf_data = io.BytesIO(requests.get("https://arxiv.org/pdf/2303.08774.pdf").content)
32
  document = "".join(page.extract_text() for page in PdfReader(pdf_data).pages) # 100 pages
 
40
  # Faro-Yi-9B-200K: GPT-4 does not have a publicly disclosed parameter count due to the competitive landscape and safety implications of large-scale models like GPT-4. ...
41
  ```
42
 
43
+
44
  <details> <summary>Or With Transformers</summary>
45
 
46
  ```python