trollek commited on
Commit
8aa2e56
1 Parent(s): 4e15cd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - glaiveai/reflection-v1
5
+ - SkunkworksAI/reasoning-0.01
6
+ - trollek/ThoughtfulAssistant-v02
7
+ - trollek/ThoughtfulAssistant-v01
8
+ language:
9
+ - en
10
+ base_model:
11
+ - h2oai/h2o-danube3-4b-base
12
+ tags:
13
+ - reflection-tuning
14
+ ---
15
+ # ThoughtStream-4B-v0.3
16
+
17
+ Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co/trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co/datasets/glaiveai/reflection-v1).
18
+
19
+ ### Reflection system prompt
20
+
21
+ ```
22
+ You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
23
+ <|thought_start|>
24
+ In this section you understand the problem and develop a plan to solve the problem.
25
+
26
+ For easy problems-
27
+ Make a simple plan and use COT
28
+
29
+ For moderate to hard problems-
30
+ 1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
31
+ 2. Use Chain of Thought reasoning to work through the plan and write the full solution within thinking.
32
+
33
+ You can use <reflection> </reflection> tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.
34
+
35
+
36
+ <|thought_end|>
37
+ ```
38
+
39
+ I have not added `<reflection>` nor `</reflection>` to the tokeniser.
40
+
41
+ ### LLama-Factory config
42
+
43
+ The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.
44
+
45
+ ```yaml
46
+ ### model
47
+ model_name_or_path: danube3/thinking-base-chatml
48
+
49
+ ### method
50
+ stage: sft
51
+ do_train: true
52
+ finetuning_type: lora
53
+ lora_target: all
54
+ loraplus_lr_ratio: 16.0
55
+ lora_rank: 32
56
+ lora_alpha: 32
57
+ enable_liger_kernel: true
58
+ quantization_bit: 4
59
+ upcast_layernorm: true
60
+ seed: 31415
61
+ optim: lion_8bit
62
+
63
+ ### dataset
64
+ dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
65
+ template: ninja_chatml
66
+ cutoff_len: 8192
67
+ overwrite_cache: false
68
+ preprocessing_num_workers: 12
69
+
70
+ ### output
71
+ output_dir: thinking-base-chatml/loras/thoughtful-reflection
72
+ logging_steps: 1
73
+ save_steps: 1000
74
+ save_strategy: steps
75
+ plot_loss: true
76
+ overwrite_output_dir: false
77
+
78
+ ### train
79
+ per_device_train_batch_size: 4
80
+ gradient_accumulation_steps: 2
81
+ learning_rate: 0.0000025
82
+ num_train_epochs: 2
83
+ lr_scheduler_type: cosine
84
+ warmup_ratio: 0.01
85
+ bf16: true
86
+ flash_attn: fa2
87
+
88
+ ### eval
89
+ val_size: 0.01
90
+ per_device_eval_batch_size: 1
91
+ eval_strategy: steps
92
+ eval_steps: 1000
93
+ ```