mlabonne
/

gpt2-GPTQ-4bit

@@ -20,13 +20,18 @@ pip install auto-gptq
 You can then download the model from the hub using the following code:
 ```python
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
-from transformers import AutoTokenizer
-model_id = "mlabonne/gpt2-GPTQ-4bit"
-quantize_config = BaseQuantizeConfig(bits=4, group_size=128)
-model = AutoGPTQForCausalLM.from_pretrained(model_id, quantize_config)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
 ```
 This model works with the traditional [Text Generation pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).

 You can then download the model from the hub using the following code:
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+model_name = "mlabonne/gpt2-GPTQ-4bit"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+quantize_config = BaseQuantizeConfig.from_pretrained(model_name)
+model = AutoGPTQForCausalLM.from_quantized(model_name,
+                                           model_basename="gptq_model-4bit-128g",
+                                           device="cuda:0",
+                                           use_triton=True,
+                                           use_safetensors=True,
+                                           quantize_config=quantize_config)
 ```
 This model works with the traditional [Text Generation pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).