Text Generation
Transformers
Safetensors
mistral
alignment-handbook
Generated from Trainer
text-generation-inference
Inference Endpoints
fblgit commited on
Commit
3f46fb4
1 Parent(s): 8754056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -14,6 +14,9 @@ license: artistic-2.0
14
  # juanako-7b-v1 (UNA: Uniform Neural Alignment)
15
 
16
  This model uses uniform neural alignment (UNA) for the DPO training phases and is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 
 
 
17
  It achieves the following results on the evaluation set:
18
  - Loss: 0.4594
19
  - Rewards/chosen: -1.1095
@@ -27,7 +30,7 @@ It achieves the following results on the evaluation set:
27
 
28
  Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
29
 
30
- **Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:info@fblnet.net) to support UNA research, feel free to contact.**
31
 
32
  Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
33
  Here are the models:
 
14
  # juanako-7b-v1 (UNA: Uniform Neural Alignment)
15
 
16
  This model uses uniform neural alignment (UNA) for the DPO training phases and is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
17
+
18
+ **It is recommended to use the latest [Juanako Version](https://huggingface.co/fblgit/juanako-7b-UNA) which highly outperforms the v1**
19
+
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.4594
22
  - Rewards/chosen: -1.1095
 
30
 
31
  Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
32
 
33
+ **Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:xavi@juanako.ai) to support UNA research, feel free to contact.**
34
 
35
  Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
36
  Here are the models: