Philipp-Sc
commited on
Commit
•
532ff8b
1
Parent(s):
e0c18c2
Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,9 @@
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- pankajmathur/WizardLM_Orca
|
|
|
|
|
|
|
5 |
language:
|
6 |
- en
|
7 |
pipeline_tag: text-generation
|
@@ -9,11 +12,10 @@ pipeline_tag: text-generation
|
|
9 |
|
10 |
## Mistral 7b Reverse Instruct
|
11 |
|
12 |
-
This model is LoRA fine tuned to reverse engineer the original prompt of a given LLM output/response.
|
13 |
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
|
14 |
|
15 |
|
16 |
-
|
17 |
- base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
|
18 |
- base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
|
19 |
|
@@ -36,21 +38,36 @@ For convinience the latest model export is provided under [/latest_model_export]
|
|
36 |
|
37 |
About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
|
38 |
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
## Training Procedure
|
46 |
|
47 |
```bash
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
--stage sft \
|
50 |
-
--model_name_or_path
|
51 |
-
--
|
52 |
--flash_attn \
|
53 |
-
--shift_attn \
|
54 |
--neftune_noise_alpha 5 \
|
55 |
--do_train \
|
56 |
--dataset default \
|
@@ -63,13 +80,14 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
|
|
63 |
--gradient_accumulation_steps 1 \
|
64 |
--lr_scheduler_type cosine \
|
65 |
--logging_steps 10 \
|
66 |
-
--save_steps
|
|
|
67 |
--learning_rate 5e-5 \
|
68 |
-
--num_train_epochs
|
69 |
--plot_loss \
|
70 |
--fp16 \
|
71 |
--overwrite_output_dir \
|
72 |
-
--cutoff_len
|
73 |
--quantization_bit 4
|
74 |
```
|
75 |
|
@@ -77,6 +95,7 @@ CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.p
|
|
77 |
|
78 |
- v1: ~12h on Kaggle's P100 GPU
|
79 |
- v2: >30h on Kaggle's T4 x2
|
|
|
80 |
|
81 |
### Framework versions
|
82 |
|
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- pankajmathur/WizardLM_Orca
|
5 |
+
- teknium/trismegistus-project
|
6 |
+
- unalignment/toxic-dpo-v0.1
|
7 |
+
- Intel/orca_dpo_pairs
|
8 |
language:
|
9 |
- en
|
10 |
pipeline_tag: text-generation
|
|
|
12 |
|
13 |
## Mistral 7b Reverse Instruct
|
14 |
|
15 |
+
This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
|
16 |
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay").
|
17 |
|
18 |
|
|
|
19 |
- base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
|
20 |
- base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)
|
21 |
|
|
|
38 |
|
39 |
About 21k items of the following datasets were used. (mostly coding-like tasks were removed)
|
40 |
|
41 |
+
- v1 & v2: [reverse-instruct_v1.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v1.json)
|
42 |
+
- v3: [reverse-instruct_v2.json](https://huggingface.co/Philipp-Sc/mistral-7b-reverse-instruct/blob/main/reverse-instruct_v2.json)
|
43 |
+
|
44 |
+
The reverse instruct dataset has been compiled with entries from the following datasets:
|
45 |
+
|
46 |
+
- [alpaca_gpt4_data](https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json)
|
47 |
+
- [roleplay-instruct-v2.1](https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json)
|
48 |
+
- [wizardlm_orca](https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json)
|
49 |
+
- [toxic-dpo-v0.1](https://huggingface.co/datasets/unalignment/toxic-dpo-v0.1/resolve/main/toxic-dpo.parquet)
|
50 |
+
- [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs/resolve/main/orca_rlhf.jsonl)
|
51 |
+
- [occultexpert](https://huggingface.co/datasets/teknium/trismegistus-project/resolve/main/occultexpert.json)
|
52 |
|
53 |
## Training Procedure
|
54 |
|
55 |
```bash
|
56 |
+
!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
|
57 |
+
--multi_gpu \
|
58 |
+
--mixed_precision fp16 \
|
59 |
+
--num_processes 2 \
|
60 |
+
--num_machines 1 \
|
61 |
+
--rdzv_backend static \
|
62 |
+
--same_network \
|
63 |
+
--gpu_ids all \
|
64 |
+
--machine_rank 0 \
|
65 |
+
--main_training_function main \
|
66 |
+
-- src/train_bash.py \
|
67 |
--stage sft \
|
68 |
+
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
|
69 |
+
--adapter_name_or_path path_to_checkpoint \
|
70 |
--flash_attn \
|
|
|
71 |
--neftune_noise_alpha 5 \
|
72 |
--do_train \
|
73 |
--dataset default \
|
|
|
80 |
--gradient_accumulation_steps 1 \
|
81 |
--lr_scheduler_type cosine \
|
82 |
--logging_steps 10 \
|
83 |
+
--save_steps 10 \
|
84 |
+
--save_total_limit 3 \
|
85 |
--learning_rate 5e-5 \
|
86 |
+
--num_train_epochs 9.0 \
|
87 |
--plot_loss \
|
88 |
--fp16 \
|
89 |
--overwrite_output_dir \
|
90 |
+
--cutoff_len 4096 \
|
91 |
--quantization_bit 4
|
92 |
```
|
93 |
|
|
|
95 |
|
96 |
- v1: ~12h on Kaggle's P100 GPU
|
97 |
- v2: >30h on Kaggle's T4 x2
|
98 |
+
- v3: coming soon
|
99 |
|
100 |
### Framework versions
|
101 |
|