trollek
/

ThoughtStream-4B-v0.3

reflection-tuning

Model card Files Files and versions Community

trollek commited on 3 days ago

Commit

8aa2e56

•

1 Parent(s): 4e15cd4

Update README.md

Files changed (1) hide show

README.md +93 -3

README.md CHANGED Viewed

@@ -1,3 +1,93 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- glaiveai/reflection-v1
+- SkunkworksAI/reasoning-0.01
+- trollek/ThoughtfulAssistant-v02
+- trollek/ThoughtfulAssistant-v01
+language:
+- en
+base_model:
+- h2oai/h2o-danube3-4b-base
+tags:
+- reflection-tuning
+---
+# ThoughtStream-4B-v0.3
+Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co/trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co/datasets/glaiveai/reflection-v1).
+### Reflection system prompt
+```
+You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
+<|thought_start|>
+In this section you understand the problem and develop a plan to solve the problem.
+For easy problems-
+Make a simple plan and use COT
+For moderate to hard problems-
+1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
+2. Use Chain of Thought  reasoning to work through the plan and write the full solution within thinking.
+You can use <reflection> </reflection> tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.
+<|thought_end|>
+```
+I have not added `<reflection>` nor `</reflection>` to the tokeniser.
+### LLama-Factory config
+The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.
+```yaml
+### model
+model_name_or_path: danube3/thinking-base-chatml
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: all
+loraplus_lr_ratio: 16.0
+lora_rank: 32
+lora_alpha: 32
+enable_liger_kernel: true
+quantization_bit: 4
+upcast_layernorm: true
+seed: 31415
+optim: lion_8bit
+### dataset
+dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
+template: ninja_chatml
+cutoff_len: 8192
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir:  thinking-base-chatml/loras/thoughtful-reflection
+logging_steps: 1
+save_steps: 1000
+save_strategy: steps
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 4
+gradient_accumulation_steps: 2
+learning_rate: 0.0000025
+num_train_epochs: 2
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 1000
+```