bartowski/SuperNova-Medius-GGUF · 63.17 MMLU-Pro Computer Science with `Q8

llama.cpp

$ ./llama-server \
      --model "../models/bartowski/SuperNova-Medius-GGUF/SuperNova-Medius-Q8_0.gguf" \
      --n-gpu-layers 49 \
      --ctx-size 40960 \
      --parallel 10 \
      --cache-type-k f16 \
      --cache-type-v f16 \
      --threads 16 \
      --flash-attn \
      --mlock \
      --n-predict -1 \
      --host 127.0.0.1 \
      --port 8080

Ollama-MMLU-Pro

Default .toml configs modified for local url, model name, and parallel inferencing. Run on 1x 3090TI FE w/ 24GB VRAM

Finished testing computer science in 0 hours, 19 minutes, 48 seconds.
Total, 259/410, 63.17%
Random Guess Attempts, 2/410, 0.49%
Correct Random Guesses, 0/2, 0.00%
Adjusted Score Without Random Guesses, 259/408, 63.48%
Finished the benchmark in 0 hours, 19 minutes, 50 seconds.
Total, 259/410, 63.17%
Token Usage:
Prompt tokens: min 1448, average 1601, max 2897, total 656306, tk/s 551.25
Completion tokens: min 59, average 273, max 2048, total 112019, tk/s 94.09
Markdown Table:
| overall | computer science |
| ------- | ---------------- |
| 63.17 | 63.17 |

bartowski
/

SuperNova-Medius-GGUF

63.17 MMLU-Pro Computer Science with `Q8_0`

llama.cpp

Ollama-MMLU-Pro