hugging-quants
/

Meta-Llama-3.1-8B-Instruct-AWQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Xenova HF staff commited on Jul 23

Commit

c1ae4ba

•

1 Parent(s): f71788f

Only print generated text

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -69,7 +69,7 @@ inputs = tokenizer.apply_chat_template(
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
-print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 ### AutoAWQ
@@ -109,7 +109,7 @@ inputs = tokenizer.apply_chat_template(
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
-print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 The AutoAWQ script has been adapted from [`AutoAWQ/examples/generate.py`](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).

 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
+print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
 ```
 ### AutoAWQ
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
+print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
 ```
 The AutoAWQ script has been adapted from [`AutoAWQ/examples/generate.py`](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).