koyeb
/

Meta-Llama-3.1-8B-Instruct-Apple-MLX

Question Answering

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rojasdiego commited on 1 day ago

Commit

5595090

•

1 Parent(s): 1ab8d81

Update README.md

Files changed (1) hide show

README.md +19 -37

README.md CHANGED Viewed

@@ -33,45 +33,27 @@ pip install peft transformers jinja2==3.1.0
 Here’s a sample code snippet to load and interact with the model:
 ```python
 import torch
-from peft import PeftModel
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load the base model and tokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16
 )
-tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
-# Load the fine-tuned model using LORA
-model = PeftModel.from_pretrained(
-    model,
-    "rojas-diego/Meta-Llama-3.1-8B-Instruct-Apple-MLX",
-).to("cuda")
-# Define input using a chat template with a system prompt and user query
-ids = tokenizer.apply_chat_template(
-    [
-        {
-            "role": "system",
-            "content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.",
-        },
-        {
-            "role": "user",
-            "content": "How do you transpose a matrix in MLX?",
-        },
-    ],
-    tokenize=True,
-    add_generation_prompt=True,
-    return_tensors="pt",
-).to("cuda")
-# Generate and print the response
-print(
-    tokenizer.decode(
-        model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][
-            len(ids) :
-        ]
-    )
 )
 ```

 Here’s a sample code snippet to load and interact with the model:
 ```python
+import transformers
 import torch
+model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
 )
+messages = [
+    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=256,
 )
+print(outputs[0]["generated_text"][-1])
 ```