|
--- |
|
license: llama3.1 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Meta-Llama-3.1-8B-Instruct |
|
--- |
|
# Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit |
|
- Model creator: [Meta-Llama](https://huggingface.co/meta-llama) |
|
- Original model: [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) |
|
|
|
|
|
## Overview |
|
This repository contains a 4-bit AWQ version of **Meta-Llama-3.1-8B-Instruct**, optimized for the LMDeploy TurboMindEngine. |
|
The model is designed to provide efficient and accurate performance with reduced computational requirements. |
|
|
|
## Model Details |
|
- **Model Name**: Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit |
|
- **Base Model**: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
- **Quantization**: 4-bit AWQ |
|
- **Engine**: LMDeploy TurboMindEngine |
|
|
|
|
|
```bash |
|
lmdeploy lite auto_awq \ |
|
$HF_MODEL \ |
|
--calib-dataset 'ptb' \ |
|
--calib-samples 128 \ |
|
--calib-seqlen 2048 \ |
|
--w-bits 4 \ |
|
--w-group-size 128 \ |
|
--batch-size 10 \ |
|
--search-scale True \ |
|
--work-dir $WORK_DIR |
|
``` |