Complete how-to-use
Browse files
README.md
CHANGED
@@ -17,7 +17,8 @@ Faro-Yi-9B-200K is an improved [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-2
|
|
17 |
|
18 |
## How to Use
|
19 |
|
20 |
-
Faro-Yi-9B-200K uses chatml template. I recommend
|
|
|
21 |
|
22 |
```python
|
23 |
import io
|
@@ -25,7 +26,7 @@ import requests
|
|
25 |
from PyPDF2 import PdfReader
|
26 |
from vllm import LLM, SamplingParams
|
27 |
|
28 |
-
llm = LLM(model="wenbopan/Faro-Yi-9B-200K")
|
29 |
|
30 |
pdf_data = io.BytesIO(requests.get("https://arxiv.org/pdf/2303.08774.pdf").content)
|
31 |
document = "".join(page.extract_text() for page in PdfReader(pdf_data).pages) # 100 pages
|
@@ -39,6 +40,7 @@ print(output[0].outputs[0].text)
|
|
39 |
# Faro-Yi-9B-200K: GPT-4 does not have a publicly disclosed parameter count due to the competitive landscape and safety implications of large-scale models like GPT-4. ...
|
40 |
```
|
41 |
|
|
|
42 |
<details> <summary>Or With Transformers</summary>
|
43 |
|
44 |
```python
|
|
|
17 |
|
18 |
## How to Use
|
19 |
|
20 |
+
Faro-Yi-9B-200K uses the chatml template and performs well in both short and long contexts. For longer inputs, I recommend to use vLLM to have a max prompt of 32K under 24GB of VRAM. Setting `kv_cache_dtype="fp8_e5m2"` allows for 48K input length. 4bit-AWQ quantization on top of that can boost input length to 160K, albeit with some performance impact. Adjust `max_model_len` arg in vLLM or `config.json` to avoid OOM.
|
21 |
+
|
22 |
|
23 |
```python
|
24 |
import io
|
|
|
26 |
from PyPDF2 import PdfReader
|
27 |
from vllm import LLM, SamplingParams
|
28 |
|
29 |
+
llm = LLM(model="wenbopan/Faro-Yi-9B-200K", kv_cache_dtype="fp8_e5m2", max_model_len=100000)
|
30 |
|
31 |
pdf_data = io.BytesIO(requests.get("https://arxiv.org/pdf/2303.08774.pdf").content)
|
32 |
document = "".join(page.extract_text() for page in PdfReader(pdf_data).pages) # 100 pages
|
|
|
40 |
# Faro-Yi-9B-200K: GPT-4 does not have a publicly disclosed parameter count due to the competitive landscape and safety implications of large-scale models like GPT-4. ...
|
41 |
```
|
42 |
|
43 |
+
|
44 |
<details> <summary>Or With Transformers</summary>
|
45 |
|
46 |
```python
|