Edit model card
Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

First iteration of the default generator LoRa for MiniHF. This model still functions as a base model while writing more coherent text.

Training procedure

This model was trained starting from the MiniHF Mistral SFT evaluator. It was created using the MiniHF Reinforcement Learning From AI Feedback pipeline:

accelerate launch rlaif_generator.py --resume minihf_evaluator_mistral_7b_v0.1 --output-path mistral_h_eval --kl-weight 1.0 --constitution hermes/hermes_constitution.txt --prompts hermes/hermes_prompts.txt --length 256 --batch-size 4 --grad-accum-steps 8

The tuning script was modified to use the AdamW optimizer with weight decay:

opt = optim.AdamW(model.parameters(), lr=1e-5, weight_decay=1e-2, betas=(0.9, 0.98))

This weight decay is based on the observation that RL tuning mode collapse can be undone by interpolating the weights of the base model with that of the RL tuned model. Here the specific recipe was to start from the MiniHF SFT evaluator, then apply weight decay and the KL penalty towards the base model weights to inject entropy back into the policy.

Prompt Bank and Constitution

The prompt bank using during tuning is in the hermes_prompts.txt file found in this repo, the constitution in hermes_constitution.txt

Configuration

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.5.0

  • PEFT 0.5.0

  • PEFT 0.5.0

Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .