GGUF format

by aari1995 - opened Dec 13, 2023

aari1995

Dec 13, 2023

Thank you for the great model Malte!

@TheBloke I think it would be much appreciated to have this maybe on your page as well? :-) It is a mix of the German Leo LMs and OpenHermes and seems to perform really well!

malteos

Owner Jan 2

Thanks! I haven't done any quantization myself yet but I'll have a look into it.

malteos

Owner Jan 5

There is already an AWQ quantized version: https://huggingface.co/mayflowergmbh/hermeo-7b-awq

aari1995

Jan 5

Thank you very much - I am actually working on another solely german quantization technique boosting the models German capacities and replies. It works really good so far I think and has lots of potential, but WIP and will likely be updated next week, adding some more stuff.

https://huggingface.co/aari1995/germeo-7b-awq

Also at the moment I sadly have troubles evaluating the model on the German benchmarks as it does not really support AWQ. If you have an idea let me know.

Open for feedback!

malteos

Owner Jan 8

What exactly is the problem? The latest transformers version does support AWQ, right? Feel free to reach out to me. I am happy to help.

aari1995

Jan 8

Yes I also figured that out and it works now, thank you very much!
At the moment I need to find time to do the MMLU Eval as it takes 26 hours on my 3090 ti.
So far the benchmarks look good and are slightly worse but the models output is guaranteed German:

ARC-DE: 0.514
Hellaswag-DE: 0.651
TruthfulQA-DE: 0.508

I'll keep you updated.

aari1995

Jan 11

https://huggingface.co/aari1995/germeo-7b-awq

Evaluation done. MMLU 0.522 (improvement). Resulting in an average of 0.563 (DE-Average). I think it is a good use case of knowledge transfer from English to German with "keeping the model German". It replies solely in German. @floleuerer created a benchmark for German response rates - in contact to see if there is an improvement.

Malte, would you be up for further experiments on knowledge transfer or a call? I am experimenting also with laser and want to see whether a non-bilingual model can achieve improvements with quantization / pruning methods.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment