metadata

license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct

Meta-Llama-3.1-8B-Instruct Quantized Model

This repository contains the quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized for efficient inference and deployment. The quantization was performed by the IPROPEL Team at VIT Chennai.

Model Overview

Meta-Llama-3.1-8B-Instruct is a powerful instruction-following model developed to generate human-like text, assist with various tasks, and answer questions. With 8 billion parameters, this model is capable of handling a wide range of tasks efficiently.

Quantization Details

Quantization is a model compression technique that reduces the size of the model without significantly sacrificing performance. The quantized version of the Meta-Llama-3.1-8B-Instruct model available here allows for:

Reduced Memory Usage: Lower RAM and GPU memory consumption.
Faster Inference: Speeds up inference time, enabling quicker responses in production environments.
Smaller Model Size: Easier to store and deploy on devices with limited storage.

Key Features

Model Name: Meta-Llama-3.1-8B-Instruct (Quantized)
Tool Used: llama.cpp
Maintained by: IPROPEL Team, VIT Chennai