shenzhi-wang
/

Llama3-70B-Chinese-Chat-GGUF-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

shenzhi-wang commited on May 11

Commit

7af785d

•

1 Parent(s): 9dfec53

Update README.md

Files changed (1) hide show

README.md +22 -20

README.md CHANGED Viewed

@@ -11,7 +11,6 @@ tags:
 - orpo
 ---
-❗️❗️❗️We are still uploading the GGUF file. Due to the network problem and the large size of this GGUF file, it may take some time. We are really sorry for that. If you want to use our q4 GGUF, you can use the [ollama q4 model](https://ollama.com/wangshenzhi/llama3-70b-chinese-chat-ollama-q4) first.
 🔥 This repo contains the official q4_0 GGUF files for [shenzhi-wang/Llama3-70B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat).
@@ -82,32 +81,35 @@ C-Eval Hard is a distinct benchmark that comprises 8 difficult subjects in math,
 # 3. Usage
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-model_id = "shenzhi-wang/Llama3-70B-Chinese-Chat"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id, torch_dtype="auto", device_map="auto"
 )
 messages = [
     {"role": "user", "content": "写一首诗吧"},
 ]
-input_ids = tokenizer.apply_chat_template(
-    messages, add_generation_prompt=True, return_tensors="pt"
-).to(model.device)
-outputs = model.generate(
-    input_ids,
-    max_new_tokens=8192,
-    do_sample=True,
-    temperature=0.6,
-    top_p=0.9,
-)
-response = outputs[0][input_ids.shape[-1]:]
-print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 # 4. Examples

 - orpo
 ---
 🔥 This repo contains the official q4_0 GGUF files for [shenzhi-wang/Llama3-70B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat).
 # 3. Usage
 ```python
+from llama_cpp import Llama
+model = Llama(
+    "/Your/Path/To/GGUF/File",
+    verbose=False,
+    n_gpu_layers=-1,
 )
+system_prompt = "You are a helpful assistant."
+def generate_reponse(_model, _messages, _max_tokens=8192):
+    _output = _model.create_chat_completion(
+        _messages,
+        stop=["<|eot_id|>", "<|end_of_text|>"],
+        max_tokens=_max_tokens,
+    )["choices"][0]["message"]["content"]
+    return _output
+# The following are some examples
 messages = [
+    {
+        "role": "system",
+        "content": system_prompt,
+    },
     {"role": "user", "content": "写一首诗吧"},
 ]
+print(generate_reponse(model, messages))
 ```
 # 4. Examples