Philipp-Sc commited on
Commit
532ff8b
1 Parent(s): e0c18c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -14
README.md CHANGED
@@ -2,6 +2,9 @@
2
  license: apache-2.0
3
  datasets:
4
  - pankajmathur/WizardLM_Orca
 
 
 
5
  language:
6
  - en
7
  pipeline_tag: text-generation
@@ -9,11 +12,10 @@ pipeline_tag: text-generation
9
 
10
  ## Mistral 7b Reverse Instruct
11
 
12
- This model is LoRA fine tuned to reverse engineer the original prompt of a given LLM output/response.
13
  Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
14
 
15
 
16
-
17
  - base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
18
  - base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
19
 
@@ -36,21 +38,36 @@ For convinience the latest model export is provided under [/latest_model_export]
36
 
37
  About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
38
 
39
- ```bash
40
- wget https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json
41
- wget https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json
42
- wget https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json
43
- ```
 
 
 
 
 
 
44
 
45
  ## Training Procedure
46
 
47
  ```bash
48
- CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.py \
 
 
 
 
 
 
 
 
 
 
49
  --stage sft \
50
- --model_name_or_path model_name_or_path \
51
- --checkpoint_dir checkpoint_dir \
52
  --flash_attn \
53
- --shift_attn \
54
  --neftune_noise_alpha 5 \
55
  --do_train \
56
  --dataset default \
@@ -63,13 +80,14 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
63
  --gradient_accumulation_steps 1 \
64
  --lr_scheduler_type cosine \
65
  --logging_steps 10 \
66
- --save_steps 100 \
 
67
  --learning_rate 5e-5 \
68
- --num_train_epochs 3.0 \
69
  --plot_loss \
70
  --fp16 \
71
  --overwrite_output_dir \
72
- --cutoff_len 2048 \
73
  --quantization_bit 4
74
  ```
75
 
@@ -77,6 +95,7 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
77
 
78
  - v1: ~12h on Kaggle's P100 GPU
79
  - v2: >30h on Kaggle's T4 x2
 
80
 
81
  ### Framework versions
82
 
 
2
  license: apache-2.0
3
  datasets:
4
  - pankajmathur/WizardLM_Orca
5
+ - teknium/trismegistus-project
6
+ - unalignment/toxic-dpo-v0.1
7
+ - Intel/orca_dpo_pairs
8
  language:
9
  - en
10
  pipeline_tag: text-generation
 
12
 
13
  ## Mistral 7b Reverse Instruct
14
 
15
+ This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
16
  Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
17
 
18
 
 
19
  - base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
20
  - base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
21
 
 
38
 
39
  About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
40
 
41
+ - v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json)
42
+ - v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json)
43
+
44
+ The reverse instruct dataset has been compiled with entries from the following datasets:
45
+
46
+ - [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json)
47
+ - [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json)
48
+ - [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json)
49
+ - [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet)
50
+ - [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl)
51
+ - [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json)
52
 
53
  ## Training Procedure
54
 
55
  ```bash
56
+ !cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
57
+ --multi_gpu \
58
+ --mixed_precision fp16 \
59
+ --num_processes 2 \
60
+ --num_machines 1 \
61
+ --rdzv_backend static \
62
+ --same_network \
63
+ --gpu_ids all \
64
+ --machine_rank 0 \
65
+ --main_training_function main \
66
+ -- src/train_bash.py \
67
  --stage sft \
68
+ --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
69
+ --adapter_name_or_path path_to_checkpoint \
70
  --flash_attn \
 
71
  --neftune_noise_alpha 5 \
72
  --do_train \
73
  --dataset default \
 
80
  --gradient_accumulation_steps 1 \
81
  --lr_scheduler_type cosine \
82
  --logging_steps 10 \
83
+ --save_steps 10 \
84
+ --save_total_limit 3 \
85
  --learning_rate 5e-5 \
86
+ --num_train_epochs 9.0 \
87
  --plot_loss \
88
  --fp16 \
89
  --overwrite_output_dir \
90
+ --cutoff_len 4096 \
91
  --quantization_bit 4
92
  ```
93
 
 
95
 
96
  - v1: ~12h on Kaggle's P100 GPU
97
  - v2: >30h on Kaggle's T4 x2
98
+ - v3: coming soon
99
 
100
  ### Framework versions
101