Edit model card

Series of models to test the benefits of CoreML joint compression on iOS 18/macOS 15.

mlp-*.mlpackage

Simple Up/Gate/Silu/Down MLP repeated four times with the Llama 2 7B dimensions.

All using 'CPU and Neural Engine' compute unit, measured in Xcode.

Device Model Precision Minimum (ms) Median (ms)
M1 Max mlp-float16 float16 19.30 19.42
M1 Max mlp-4bit 4-bit LUT 5.93 5.98
M1 Max mlp-2bit 2-bit LUT 5.92 6.11
M1 Max mlp-4bit-int8 4-bit int8 LUT + A8 6.02 6.31
M1 Max mlp-2bit-int8 2-bit int8 LUT + A8 6.00 6.18
M1 Max mlp-int8-int8 W8A8 9.78 9.94
M4 mlp-4bit 4-bit LUT - 4.19
M4 mlp-2bit 2-bit LUT - 3.83
M4 mlp-4bit-int8 4-bit int8 LUT + A8 - 4.14
M4 mlp-2bit-int8 2-bit int8 LUT + A8 - 3.83
M4 mlp-int8-int8 W8A8 - 8.18

Download

huggingface-cli download \
  --local-dir . \
  --local-dir-use-symlinks False \
  smpanaro/coreml-joint-compression-test \
  --include "*.mlpackage/*"
Downloads last month
0
Inference API
Unable to determine this model's library. Check the docs .