--- license: apache-2.0 library_name: peft tags: - alignment-handbook - dpo - Dutch ---

Reynaerde

Reynaerde 7B Chat

A conversational model for Dutch, based on Mistral v0.3 Instruct This model is a fine-tuned version of [ReBatch/Reynaerde-7B-Instruct](https://huggingface.co/ReBatch/Reynaerde-7B-Instruct) on [ReBatch/ultrafeedback_nl](https://huggingface.co/datasets/ReBatch/ultrafeedback_nl). This is a combination of a translation of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and the HQ samples from [BramVanroy's translation](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned). ## Model description This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further fine-tuned with QLoRA. It was first fine-tuned with SFT on a chat dataset and then with DPO on a feedback chat dataset. ## Intended uses & limitations This model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. Use with Mistral's chat template (can be found in the tokenizer). ## Training procedure This model was trained with QLoRa in bfloat16 with Flash Attention 2 on one A100 PCIe, using the DPO script from the [alignment handbook](https://github.com/huggingface/alignment-handbook/) on [RunPod](https://www.runpod.io/). ## Evaluation results The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/). There are improvements in 4 out of 7 benchmarks compared to the Mistral-7B-v0.3-Instruct model on which it is based. | Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------: Reynaerde-7B-Chat | 56.40 / 38.13 | 10.83 / 27.67 | 20.02 / 55.40 | 53.56 / 65.29 | 68.13 / 20.85 | 32.50 / 49.10 | 31.36 / 47.79 Mistral-7B-v0.3 | 57.08 / 42.65 | 14.05 / 39.13 | 8.08 / 43.07 | 45.57 / 55.20 | 62.28 / 16.46 | 20.39 / 40.03 | 13.28 / 34.13 Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94 / 60.13 | 66.09 / 18.02 | 24.95 / 43.67 | 24.86 / 43.57 ## Naming This model is named after the Middle Dutch epic poem 'Van den vos Reynaerde'. Dating from around 1260, this epic by Flemish author Willem die Madocke maecte is often called 'the pinnacle of Gothic literature in the Netherlands'. The poem tells a version of the Reynard the Fox story, popular in Western Europe during the late Middle Ages ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 3 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 6 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Framework versions - PEFT 0.11.1 - Transformers 4.41.2 - Pytorch 2.2.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1 ### Model Developer The Mistral-7B-v0.3-Instruct model, on which this model is based, was created by [Mistral AI](https://huggingface.co/mistralai). The finetuning was done by [Julien Van den Avenne](https://huggingface.co/vandeju).