Aaron2599's picture
Update README.md
c070474 verified
|
raw
history blame
1 kB
metadata
license: llama3.1
language:
  - en
base_model:
  - meta-llama/Meta-Llama-3.1-8B-Instruct

Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit

Overview

This repository contains a 4-bit AWQ version of Meta-Llama-3.1-8B-Instruct, optimized for the LMDeploy TurboMindEngine. The model is designed to provide efficient and accurate performance with reduced computational requirements.

Model Details

  • Model Name: Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
  • Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Quantization: 4-bit AWQ
  • Engine: LMDeploy TurboMindEngine
lmdeploy lite auto_awq \
   $HF_MODEL \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 2048 \
  --w-bits 4 \
  --w-group-size 128 \
  --batch-size 10 \
  --search-scale True \
  --work-dir $WORK_DIR