Aaron2599's picture
Update README.md
c070474 verified
|
raw
history blame
1 kB
---
license: llama3.1
language:
- en
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
---
# Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
- Model creator: [Meta-Llama](https://huggingface.co/meta-llama)
- Original model: [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
## Overview
This repository contains a 4-bit AWQ version of **Meta-Llama-3.1-8B-Instruct**, optimized for the LMDeploy TurboMindEngine.
The model is designed to provide efficient and accurate performance with reduced computational requirements.
## Model Details
- **Model Name**: Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
- **Base Model**: meta-llama/Meta-Llama-3.1-8B-Instruct
- **Quantization**: 4-bit AWQ
- **Engine**: LMDeploy TurboMindEngine
```bash
lmdeploy lite auto_awq \
$HF_MODEL \
--calib-dataset 'ptb' \
--calib-samples 128 \
--calib-seqlen 2048 \
--w-bits 4 \
--w-group-size 128 \
--batch-size 10 \
--search-scale True \
--work-dir $WORK_DIR
```