noneUsername/TouchNight-Ministral-8B-Instruct-2410-HF-W8A8-Dynamic-Per-Token

It is worth noting that compared with the prince-canuma version, this version is smaller in size after quantization and its accuracy is also improved by one percentage point.

In my ERP testing, this model did perform better.

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.820	±	0.0243
		strict-match	5	exact_match	↑	0.816	±	0.0246

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.804	±	0.0252
		strict-match	5	exact_match	↑	0.804	±	0.0252

vllm (pretrained=/root/autodl-tmp/Ministral-8B-Instruct-2410-HF,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float32), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.820	±	0.0243
		strict-match	5	exact_match	↑	0.816	±	0.0246

vllm (pretrained=/root/autodl-tmp/output,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=float16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.816	±	0.0246
		strict-match	5	exact_match	↑	0.812	±	0.0248

vllm (pretrained=/root/autodl-tmp/output,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.796	±	0.0255
		strict-match	5	exact_match	↑	0.792	±	0.0257

noneUsername
/

TouchNight-Ministral-8B-Instruct-2410-HF-W8A8-Dynamic-Per-Token

Model tree for noneUsername/TouchNight-Ministral-8B-Instruct-2410-HF-W8A8-Dynamic-Per-Token