Update how to load the model with model_basename
Browse files
README.md
CHANGED
@@ -20,13 +20,18 @@ pip install auto-gptq
|
|
20 |
You can then download the model from the hub using the following code:
|
21 |
|
22 |
```python
|
|
|
23 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
24 |
-
from transformers import AutoTokenizer
|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
|
32 |
This model works with the traditional [Text Generation pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).
|
|
|
20 |
You can then download the model from the hub using the following code:
|
21 |
|
22 |
```python
|
23 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
24 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
|
|
25 |
|
26 |
+
model_name = "mlabonne/gpt2-GPTQ-4bit"
|
27 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
28 |
+
quantize_config = BaseQuantizeConfig.from_pretrained(model_name)
|
29 |
+
model = AutoGPTQForCausalLM.from_quantized(model_name,
|
30 |
+
model_basename="gptq_model-4bit-128g",
|
31 |
+
device="cuda:0",
|
32 |
+
use_triton=True,
|
33 |
+
use_safetensors=True,
|
34 |
+
quantize_config=quantize_config)
|
35 |
```
|
36 |
|
37 |
This model works with the traditional [Text Generation pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).
|