|
--- |
|
library_name: peft |
|
--- |
|
|
|
First iteration of the default generator LoRa for [MiniHF](https://github.com/JD-P/minihf). |
|
This model still functions as a base model while writing more coherent text. |
|
|
|
## Training procedure |
|
|
|
This model was trained starting from the [MiniHF Mistral SFT evaluator](https://huggingface.co/jdpressman/minihf_evaluator_mistral_7b_v0.1/blob/main/README.md). |
|
It was created using the MiniHF Reinforcement Learning From AI Feedback pipeline: |
|
|
|
`accelerate launch rlaif_generator.py --resume minihf_evaluator_mistral_7b_v0.1 --output-path mistral_h_eval --kl-weight 1.0 --constitution hermes/hermes_constitution.txt --prompts hermes/hermes_prompts.txt --length 256 --batch-size 4 --grad-accum-steps 8` |
|
|
|
The tuning script was modified to use the AdamW optimizer with weight decay: |
|
|
|
`opt = optim.AdamW(model.parameters(), lr=1e-5, weight_decay=1e-2, betas=(0.9, 0.98))` |
|
|
|
This weight decay is based on the observation that [RL tuning mode collapse](https://www.greaterwrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse) |
|
can be undone by interpolating the weights of the base model with that of the |
|
RL tuned model. Here the specific recipe was to start from the MiniHF SFT evaluator, |
|
then apply weight decay and the KL penalty towards the base model weights to inject |
|
entropy back into the policy. |
|
|
|
### Prompt Bank and Constitution |
|
|
|
The prompt bank using during tuning is in the `hermes_prompts.txt` file found in this repo, the constitution in `hermes_constitution.txt` |
|
|
|
### Configuration |
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- quant_method: bitsandbytes |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: True |
|
- bnb_4bit_compute_dtype: float16 |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.5.0 |
|
- PEFT 0.5.0 |
|
|
|
- PEFT 0.5.0 |
|
|