Laser-Dolphin-Mixtral-2x7b-dpo
New Version out now!
Credit to Fernando Fernandes and Eric Hartford for their project laserRMT
Overview
This model is a medium-sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser
- The new version shows ~1 point increase in evaluation performance on average.
Process
The process is outlined in this notebook
The mergekit_config is in the files.
The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
This process is experimental. Your mileage may vary.
Future Goals
- Function Calling
- v2 with new base model to improve performance
Quantizations
ExLlamav2
These are the recommended quantizations for users that are running the model on GPU
Thanks to user bartowski we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:
Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
---|---|---|---|---|---|---|
8_0 | 8.0 | 8.0 | 13.7 GB | 15.1 GB | 17.2 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
6_5 | 6.5 | 8.0 | 11.5 GB | 12.9 GB | 15.0 GB | Near unquantized performance at vastly reduced size, recommended. |
5_0 | 5.0 | 6.0 | 9.3 GB | 10.7 GB | 12.8 GB | Slightly lower quality vs 6.5, great for 12gb cards with 16k context. |
4_25 | 4.25 | 6.0 | 8.2 GB | 9.6 GB | 11.7 GB | GPTQ equivalent bits per weight. |
3_5 | 3.5 | 6.0 | 7.0 GB | 8.4 GB | 10.5 GB | Lower quality, not recommended. |
His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
GGUF
Current GGUF Quantizations
AWQ
*Current AWQ Quantizations
TheBloke
These Quants will result in unpredicted behavior. New quants are available as I have updated the model
Quatizations provided by TheBloke
HF Spaces
Ollama
ollama run macadeliccc/laser-dolphin-mixtral-2x7b-dpo
Code Example
Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_response(prompt):
"""
Generate a response from the model based on the input prompt.
Args:
prompt (str): Prompt for the model.
Returns:
str: The generated response from the model.
"""
# Tokenize the input prompt
inputs = tokenizer(prompt, return_tensors="pt")
# Generate output tokens
outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
# Decode the generated tokens to a string
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
# Load the model and tokenizer
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
prompt = "Write a quicksort algorithm in python"
# Generate and print responses for each language
print("Response:")
print(generate_response(prompt), "\n")
colab with usage example
Eval
EQ Bench
----Benchmark Complete---- 2024-01-31 16:55:37 Time taken: 31.1 mins Prompt Format: ChatML Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF Score (v2): 72.76 Parseable: 171.0 --------------- Batch completed Time taken: 31.2 mins ---------------
evaluation colab
Summary of previous evaluation
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
laser-dolphin-mixtral-2x7b-dpo | 41.31 | 73.67 | 61.69 | 42.79 | 54.87 |
Detailed current evaluation
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
laser-dolphin-mixtral-2x7b-dpo | 42.25 | 73.45 | 63.44 | 43.96 | 55.77 |
AGIEval
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 21.26 | Β± | 2.57 |
acc_norm | 21.65 | Β± | 2.59 | ||
agieval_logiqa_en | 0 | acc | 34.72 | Β± | 1.87 |
acc_norm | 35.64 | Β± | 1.88 | ||
agieval_lsat_ar | 0 | acc | 26.96 | Β± | 2.93 |
acc_norm | 26.96 | Β± | 2.93 | ||
agieval_lsat_lr | 0 | acc | 45.88 | Β± | 2.21 |
acc_norm | 46.08 | Β± | 2.21 | ||
agieval_lsat_rc | 0 | acc | 59.48 | Β± | 3.00 |
acc_norm | 59.48 | Β± | 3.00 | ||
agieval_sat_en | 0 | acc | 73.79 | Β± | 3.07 |
acc_norm | 73.79 | Β± | 3.07 | ||
agieval_sat_en_without_passage | 0 | acc | 42.23 | Β± | 3.45 |
acc_norm | 41.26 | Β± | 3.44 | ||
agieval_sat_math | 0 | acc | 37.27 | Β± | 3.27 |
acc_norm | 33.18 | Β± | 3.18 |
Average: 42.25%
GPT4All
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 58.36 | Β± | 1.44 |
acc_norm | 58.02 | Β± | 1.44 | ||
arc_easy | 0 | acc | 82.20 | Β± | 0.78 |
acc_norm | 77.40 | Β± | 0.86 | ||
boolq | 1 | acc | 87.52 | Β± | 0.58 |
hellaswag | 0 | acc | 67.50 | Β± | 0.47 |
acc_norm | 84.43 | Β± | 0.36 | ||
openbookqa | 0 | acc | 34.40 | Β± | 2.13 |
acc_norm | 47.00 | Β± | 2.23 | ||
piqa | 0 | acc | 81.61 | Β± | 0.90 |
acc_norm | 82.59 | Β± | 0.88 | ||
winogrande | 0 | acc | 77.19 | Β± | 1.18 |
Average: 73.45%
GSM8K
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
gsm8k | 2 | exact_match,get-answer | 0.75 | ||
exact_match_stderr,get-answer | 0.01 | ||||
alias | gsm8k |
TruthfulQA
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 45.90 | Β± | 1.74 |
mc2 | 63.44 | Β± | 1.56 |
Average: 63.44%
Bigbench
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
bigbench_causal_judgement | 0 | multiple_choice_grade | 58.42 | Β± | 3.59 |
bigbench_date_understanding | 0 | multiple_choice_grade | 60.70 | Β± | 2.55 |
bigbench_disambiguation_qa | 0 | multiple_choice_grade | 38.37 | Β± | 3.03 |
bigbench_geometric_shapes | 0 | multiple_choice_grade | 21.73 | Β± | 2.18 |
exact_str_match | 0.00 | Β± | 0.00 | ||
bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 35.00 | Β± | 2.14 |
bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 23.57 | Β± | 1.61 |
bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 50.33 | Β± | 2.89 |
bigbench_movie_recommendation | 0 | multiple_choice_grade | 45.00 | Β± | 2.23 |
bigbench_navigate | 0 | multiple_choice_grade | 50.00 | Β± | 1.58 |
bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 60.35 | Β± | 1.09 |
bigbench_ruin_names | 0 | multiple_choice_grade | 51.12 | Β± | 2.36 |
bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 32.26 | Β± | 1.48 |
bigbench_snarks | 0 | multiple_choice_grade | 67.96 | Β± | 3.48 |
bigbench_sports_understanding | 0 | multiple_choice_grade | 70.59 | Β± | 1.45 |
bigbench_temporal_sequences | 0 | multiple_choice_grade | 35.80 | Β± | 1.52 |
bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 22.56 | Β± | 1.18 |
bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 17.20 | Β± | 0.90 |
bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 50.33 | Β± | 2.89 |
Average: 43.96%
Average score: 55.77%
Elapsed time: 02:43:45
Citations
Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
@article{sharma2023truth,
title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
journal={arXiv preprint arXiv:2312.13558},
year={2023} }
@article{gao2021framework,
title={A framework for few-shot language model evaluation},
author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
journal={Version v0. 0.1. Sept},
year={2021}
}
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 67.16 |
AI2 Reasoning Challenge (25-Shot) | 65.96 |
HellaSwag (10-Shot) | 85.80 |
MMLU (5-Shot) | 63.17 |
TruthfulQA (0-shot) | 60.76 |
Winogrande (5-shot) | 79.01 |
GSM8k (5-shot) | 48.29 |
- Downloads last month
- 1,349
Model tree for macadeliccc/laser-dolphin-mixtral-2x7b-dpo
Spaces using macadeliccc/laser-dolphin-mixtral-2x7b-dpo 13
Collection including macadeliccc/laser-dolphin-mixtral-2x7b-dpo
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard65.960
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard85.800
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard63.170
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard60.760
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard79.010
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard48.290