Philipp-Sc
/

mistral-7b-reverse-instruct

@@ -2,6 +2,9 @@
 license: apache-2.0
 datasets:
 - pankajmathur/WizardLM_Orca
 language:
 - en
 pipeline_tag: text-generation
@@ -9,11 +12,10 @@ pipeline_tag: text-generation
 ## Mistral 7b Reverse Instruct
-This model is LoRA fine tuned to reverse engineer the original prompt of a given LLM output/response.
 Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
 - base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
 - base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
@@ -36,21 +38,36 @@ For convinience the latest model export is provided under [/latest_model_export]
 About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
-```bash
-wget https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json
-wget https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json
-wget https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json
-```
 ## Training Procedure
 ```bash
-CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.py \
     --stage sft \
-    --model_name_or_path model_name_or_path \
-    --checkpoint_dir checkpoint_dir \
     --flash_attn \
-    --shift_attn \
     --neftune_noise_alpha 5 \
     --do_train \
     --dataset default \
@@ -63,13 +80,14 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
     --gradient_accumulation_steps 1 \
     --lr_scheduler_type cosine \
     --logging_steps 10 \
-    --save_steps 100 \
     --learning_rate 5e-5 \
-    --num_train_epochs 3.0 \
     --plot_loss \
     --fp16 \
     --overwrite_output_dir \
-    --cutoff_len 2048 \
     --quantization_bit 4
 ```
@@ -77,6 +95,7 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
 - v1: ~12h on Kaggle's P100 GPU
 - v2: >30h on Kaggle's T4 x2
 ### Framework versions

 license: apache-2.0
 datasets:
 - pankajmathur/WizardLM_Orca
+- teknium/trismegistus-project
+- unalignment/toxic-dpo-v0.1
+- Intel/orca_dpo_pairs
 language:
 - en
 pipeline_tag: text-generation
 ## Mistral 7b Reverse Instruct
+This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
 Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
 - base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
 - base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
 About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
+- v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json)
+- v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json)
+The reverse instruct dataset has been compiled with entries from the following datasets:
+- [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json)
+- [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json)
+- [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json)
+- [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet)
+- [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl)
+- [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json)
 ## Training Procedure
 ```bash
+!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
+    --multi_gpu \
+    --mixed_precision fp16 \
+    --num_processes 2 \
+    --num_machines 1 \
+    --rdzv_backend static \
+    --same_network \
+    --gpu_ids all \
+    --machine_rank 0 \
+    --main_training_function main \
+    --  src/train_bash.py  \
     --stage sft \
+    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
+    --adapter_name_or_path path_to_checkpoint \
     --flash_attn \
     --neftune_noise_alpha 5 \
     --do_train \
     --dataset default \
     --gradient_accumulation_steps 1 \
     --lr_scheduler_type cosine \
     --logging_steps 10 \
+    --save_steps 10 \
+    --save_total_limit 3 \
     --learning_rate 5e-5 \
+    --num_train_epochs 9.0 \
     --plot_loss \
     --fp16 \
     --overwrite_output_dir \
+    --cutoff_len 4096 \
     --quantization_bit 4
 ```
 - v1: ~12h on Kaggle's P100 GPU
 - v2: >30h on Kaggle's T4 x2
+- v3: coming soon
 ### Framework versions