Llama-3.1 8B - OpenMathInstruct-2

This model is a fine-tuned version of Llama-3.1 8B designed specifically for solving mathematical problems. Leveraging the OpenMath dataset, it excels in generating accurate mathematical solutions based on instructional prompts.

Model Description
Usage
Benefits
License

Model Description

The Llama-3.1 8B model has been fine-tuned with the OpenMath dataset, which enhances its capability to interpret and solve mathematical problems. This model is particularly adept at understanding instructions and providing appropriate solutions.

Usage

Installation

To use this model, ensure you have the required libraries installed:

pip install torch transformers unsloth

Loading the Model

You can load the model as follows:

from unsloth import FastLanguageModel

model_name = "shivvamm/llama-3.18B-OpenMathInstruct-2"
model = FastLanguageModel.from_pretrained(model_name)
tokenizer = FastLanguageModel.from_pretrained(model_name, tokenizer=True)

Inference

Normal Inference

For standard inference, you can use the following code snippet:

input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response)

Streaming Inference

For a more interactive experience, you can use streaming inference, which outputs tokens as they are generated:

from transformers import TextStreamer

input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)

Benefits

Fast Inference: The model is optimized for speed, allowing for efficient generation of responses.
High Accuracy: Fine-tuned specifically for mathematical instructions, enhancing its problem-solving capabilities.
Low Memory Usage: Utilizing 4-bit quantization enables running on lower-end GPUs without running out of memory.

License

This model is licensed under the MIT License. See the LICENSE file for more information.

shivvamm
/

llama-3.18B-OpenMathInstruct-2