GGUF format
Thanks! I haven't done any quantization myself yet but I'll have a look into it.
Thank you very much - I am actually working on another solely german quantization technique boosting the models German capacities and replies. It works really good so far I think and has lots of potential, but WIP and will likely be updated next week, adding some more stuff.
https://huggingface.co/aari1995/germeo-7b-awq
Also at the moment I sadly have troubles evaluating the model on the German benchmarks as it does not really support AWQ. If you have an idea let me know.
Open for feedback!
What exactly is the problem? The latest transformers version does support AWQ, right? Feel free to reach out to me. I am happy to help.
Yes I also figured that out and it works now, thank you very much!
At the moment I need to find time to do the MMLU Eval as it takes 26 hours on my 3090 ti.
So far the benchmarks look good and are slightly worse but the models output is guaranteed German:
ARC-DE: 0.514
Hellaswag-DE: 0.651
TruthfulQA-DE: 0.508
I'll keep you updated.
https://huggingface.co/aari1995/germeo-7b-awq
Evaluation done. MMLU 0.522 (improvement). Resulting in an average of 0.563 (DE-Average). I think it is a good use case of knowledge transfer from English to German with "keeping the model German". It replies solely in German. @floleuerer created a benchmark for German response rates - in contact to see if there is an improvement.
Malte, would you be up for further experiments on knowledge transfer or a call? I am experimenting also with laser and want to see whether a non-bilingual model can achieve improvements with quantization / pruning methods.