brittlewis12's picture
Create README.md
d02b468 verified
|
raw
history blame
4.51 kB
metadata
base_model: microsoft/Phi-3-mini-4k-instruct
inference: false
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE
language:
  - en
pipeline_tag: text-generation
tags:
  - nlp
  - code
model_creator: microsoft
model_name: Phi-3-mini-4k-instruct
model_type: phi3
quantized_by: brittlewis12

Phi 3 Mini 4K Instruct GGUF

Original model: Phi-3-mini-4k-instruct

Model creator: Microsoft

This repo contains GGUF format model files for Microsoft’s Phi 3 Mini 4K Instruct.

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.

Learn more on Microsoft’s Model page.

What is GGUF?

GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 2721 (revision 28103f4), using autogguf.

Prompt template

<|system|>
{{system_prompt}}<|end|>
<|user|>
{{prompt}}<|end|>
<|assistant|>

Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs.ai

cnvrs is the best app for private, local AI on your device:

  • create & save Characters with custom system prompts & temperature settings
  • download and experiment with any GGUF model you can find on HuggingFace!
  • make it your own with custom Theme colors
  • powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
  • try it out yourself today, on Testflight!
  • follow cnvrs on twitter to stay up to date

Original Model Evaluation

As is now standard, we use few-shot prompts to evaluate the models, at temperature 0. The prompts and number of shots are part of a Microsoft internal tool to evaluate language models, and in particular we did no optimization to the pipeline for Phi-3. More specifically, we do not change prompts, pick different few-shot examples, change prompt format, or do any other form of optimization for the model.

The number of k–shot examples is listed per-benchmark.

Phi-3-Mini-4K-In
3.8b
Phi-2
2.7b
Mistral
7b
Gemma
7b
Llama-3-In
8b
Mixtral
8x7b
GPT-3.5
version 1106
MMLU
5-Shot
68.8 56.3 61.7 63.6 66.5 68.4 71.4
HellaSwag
5-Shot
76.7 53.6 58.5 49.8 71.1 70.4 78.8
ANLI
7-Shot
52.8 42.5 47.1 48.7 57.3 55.2 58.1
GSM-8K
0-Shot; CoT
82.5 61.1 46.4 59.8 77.4 64.7 78.1
MedQA
2-Shot
53.8 40.9 49.6 50.0 60.5 62.2 63.4
AGIEval
0-Shot
37.5 29.8 35.1 42.1 42.0 45.2 48.4
TriviaQA
5-Shot
64.0 45.2 72.3 75.2 67.7 82.2 85.8
Arc-C
10-Shot
84.9 75.9 78.6 78.3 82.8 87.3 87.4
Arc-E
10-Shot
94.6 88.5 90.6 91.4 93.4 95.6 96.3
PIQA
5-Shot
84.2 60.2 77.7 78.1 75.7 86.0 86.6
SociQA
5-Shot
76.6 68.3 74.6 65.5 73.9 75.9 68.3
BigBench-Hard
0-Shot
71.7 59.4 57.3 59.6 51.5 69.7 68.32
WinoGrande
5-Shot
70.8 54.7 54.2 55.6 65 62.0 68.8
OpenBookQA
10-Shot
83.2 73.6 79.8 78.6 82.6 85.8 86.0
BoolQ
0-Shot
77.6 -- 72.2 66.0 80.9 77.6 79.1
CommonSenseQA
10-Shot
80.2 69.3 72.6 76.2 79 78.1 79.6
TruthfulQA
10-Shot
65.0 -- 52.1 53.0 63.2 60.1 85.8
HumanEval
0-Shot
59.1 47.0 28.0 34.1 60.4 37.8 62.2
MBPP
3-Shot
53.8 60.6 50.8 51.5 67.7 60.2 77.8