Llama-3.1 8B - OpenMathInstruct-2
This model is a fine-tuned version of Llama-3.1 8B designed specifically for solving mathematical problems. Leveraging the OpenMath dataset, it excels in generating accurate mathematical solutions based on instructional prompts.
Table of Contents
Model Description
The Llama-3.1 8B model has been fine-tuned with the OpenMath dataset, which enhances its capability to interpret and solve mathematical problems. This model is particularly adept at understanding instructions and providing appropriate solutions.
Usage
Installation
To use this model, ensure you have the required libraries installed:
pip install torch transformers unsloth
Loading the Model
You can load the model as follows:
from unsloth import FastLanguageModel
model_name = "shivvamm/llama-3.18B-OpenMathInstruct-2"
model = FastLanguageModel.from_pretrained(model_name)
tokenizer = FastLanguageModel.from_pretrained(model_name, tokenizer=True)
Inference
Normal Inference
For standard inference, you can use the following code snippet:
input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Continue the Fibonacci sequence.
### Input:
1, 1, 2, 3, 5, 8
### Response:
"""
inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response)
Streaming Inference
For a more interactive experience, you can use streaming inference, which outputs tokens as they are generated:
from transformers import TextStreamer
input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Continue the Fibonacci sequence.
### Input:
1, 1, 2, 3, 5, 8
### Response:
"""
inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)
Benefits
- Fast Inference: The model is optimized for speed, allowing for efficient generation of responses.
- High Accuracy: Fine-tuned specifically for mathematical instructions, enhancing its problem-solving capabilities.
- Low Memory Usage: Utilizing 4-bit quantization enables running on lower-end GPUs without running out of memory.
License
This model is licensed under the MIT License. See the LICENSE file for more information.