Edit model card

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers. Post-pruning trained using QLoRA for ~100M tokens from JeanKaddour/minipile.

Layers to prune selected using PruneMe.

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7319 ± 0.0034
- humanities N/A none 0 acc 0.6582 ± 0.0063
- other N/A none 0 acc 0.7927 ± 0.0069
- social_sciences N/A none 0 acc 0.8466 ± 0.0064
- stem N/A none 0 acc 0.6702 ± 0.0079

5-shot:

Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.7669 ± 0.0034
- humanities N/A none 5 acc 0.7296 ± 0.0062
- other N/A none 5 acc 0.8101 ± 0.0067
- social_sciences N/A none 5 acc 0.8668 ± 0.0060
- stem N/A none 5 acc 0.6825 ± 0.0079

Built with Axolotl

Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train lucyknada/chargoddard_llama3-42b-v0-4.0bpw-EXL2