--- license: apache-2.0 datasets: - pankajmathur/WizardLM_Orca - teknium/trismegistus-project - unalignment/toxic-dpo-v0.1 - Intel/orca_dpo_pairs language: - en pipeline_tag: text-generation --- ## Mistral 7b Reverse Instruct This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response. Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay"). - base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1) - base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2) For convinience the latest model export is provided under [/latest_model_export](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_model_export) as well as gguf quantized versions under [/latest_ggml_models](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/tree/main/latest_ggml_models) ## Response Format "[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n" - Grammar File: [inst_format.gbnf](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/inst_format.gbnf) ## Prompt Template "\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}" (use the template without the " ") ## Training Dataset About 21k items of the following datasets were used. (mostly coding-like tasks were removed) - v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json) - v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json) The reverse instruct dataset has been compiled with entries from the following datasets: - [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json) - [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json) - [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json) - [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet) - [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl) - [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json) ## Training Procedure ```bash !cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \ --multi_gpu \ --mixed_precision fp16 \ --num_processes 2 \ --num_machines 1 \ --rdzv_backend static \ --same_network \ --gpu_ids all \ --machine_rank 0 \ --main_training_function main \ -- src/train_bash.py \ --stage sft \ --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \ --adapter_name_or_path path_to_checkpoint \ --flash_attn \ --neftune_noise_alpha 5 \ --do_train \ --dataset default \ --template vanilla \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir path_to_sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 10 \ --save_total_limit 3 \ --learning_rate 5e-5 \ --num_train_epochs 9.0 \ --plot_loss \ --fp16 \ --overwrite_output_dir \ --cutoff_len 4096 \ --quantization_bit 4 ``` ## Training Time - v1: ~12h on Kaggle's P100 GPU - v2: >30h on Kaggle's T4 x2 - v3: coming soon ### Framework versions - LLaMA-Factory