Edit model card

SparseLlama-2-7b-ultrachat_200k-pruned_70

Model Overview

  • Model Architecture: Llama-2
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Pruned: 70%
  • Release Date: 6/28/2024
  • Version: 1.0
  • Model Developers: Neural Magic

Compressed version of Llama-2-7b specialized for text-generation. This model was obtained by fine-tuning the Sparse Foundational model Sparse-Llama-2-7b-pruned_70 on the ultrachat_200k dataset. It achieves a win rate of 59.8% on the AlpacaEval benchmark (version 1.0) when using Llama-2-70b-chat as evaluator, whereas the dense Llama-2-7b-ultrachat200k model achieves 57.6% win rate.

This model was produced as part if Neural Magic's Sparse Foundational Models initiative, and demostrates the capability of Sparse Foundational Models to transfer to the text-generation domain.

Note: This model uses the chat template from zephyr-7b-beta.

Model Optimizations

This model is derived from the Sparse Foundational model Sparse-Llama-2-7b-pruned_70, which was obtained by applying the SparseGPT algorithm to prune Llama-2-7b to 70% sparsity. This optimization reduces the number of parameters by 70%, reducing the disk size and FLOPs by the same level.

Evaluation

This model was evaluated in the AlpacaEval benchmark using Llama-2-70b-chat as evaluator.

Accuracy

Model Win rate Recovery
Llama-2-7b 3.7% --
Llama-2-7b-ultrachat200k 57.6% --
SparseLlama-2-7b-ultrachat_200k-pruned_70 59.8% 104%
Downloads last month
12
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train nm-testing/SparseLLama-2-7b-ultrachat_200k-pruned_70