SparseLlama-2-7b-ultrachat_200k-pruned_70
Model Overview
- Model Architecture: Llama-2
- Input: Text
- Output: Text
- Model Optimizations:
- Pruned: 70%
- Release Date: 6/28/2024
- Version: 1.0
- Model Developers: Neural Magic
Compressed version of Llama-2-7b specialized for text-generation. This model was obtained by fine-tuning the Sparse Foundational model Sparse-Llama-2-7b-pruned_70 on the ultrachat_200k dataset. It achieves a win rate of 59.8% on the AlpacaEval benchmark (version 1.0) when using Llama-2-70b-chat as evaluator, whereas the dense Llama-2-7b-ultrachat200k model achieves 57.6% win rate.
This model was produced as part if Neural Magic's Sparse Foundational Models initiative, and demostrates the capability of Sparse Foundational Models to transfer to the text-generation domain.
Note: This model uses the chat template from zephyr-7b-beta.
Model Optimizations
This model is derived from the Sparse Foundational model Sparse-Llama-2-7b-pruned_70, which was obtained by applying the SparseGPT algorithm to prune Llama-2-7b to 70% sparsity. This optimization reduces the number of parameters by 70%, reducing the disk size and FLOPs by the same level.
Evaluation
This model was evaluated in the AlpacaEval benchmark using Llama-2-70b-chat as evaluator.
Accuracy
Model | Win rate | Recovery |
---|---|---|
Llama-2-7b | 3.7% | -- |
Llama-2-7b-ultrachat200k | 57.6% | -- |
SparseLlama-2-7b-ultrachat_200k-pruned_70 | 59.8% | 104% |
- Downloads last month
- 12