Edit model card

Llama-3.1 8B - OpenMathInstruct-2

This model is a fine-tuned version of Llama-3.1 8B designed specifically for solving mathematical problems. Leveraging the OpenMath dataset, it excels in generating accurate mathematical solutions based on instructional prompts.

Table of Contents

Model Description

The Llama-3.1 8B model has been fine-tuned with the OpenMath dataset, which enhances its capability to interpret and solve mathematical problems. This model is particularly adept at understanding instructions and providing appropriate solutions.

Usage

Installation

To use this model, ensure you have the required libraries installed:

pip install torch transformers unsloth

Loading the Model

You can load the model as follows:

from unsloth import FastLanguageModel

model_name = "shivvamm/llama-3.18B-OpenMathInstruct-2"
model = FastLanguageModel.from_pretrained(model_name)
tokenizer = FastLanguageModel.from_pretrained(model_name, tokenizer=True)

Inference

Normal Inference

For standard inference, you can use the following code snippet:

input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response)

Streaming Inference

For a more interactive experience, you can use streaming inference, which outputs tokens as they are generated:

from transformers import TextStreamer

input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)

Benefits

  • Fast Inference: The model is optimized for speed, allowing for efficient generation of responses.
  • High Accuracy: Fine-tuned specifically for mathematical instructions, enhancing its problem-solving capabilities.
  • Low Memory Usage: Utilizing 4-bit quantization enables running on lower-end GPUs without running out of memory.

License

This model is licensed under the MIT License. See the LICENSE file for more information.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
41.9M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train shivvamm/llama-3.18B-OpenMathInstruct-2