--- license: apache-2.0 datasets: - glaiveai/reflection-v1 - SkunkworksAI/reasoning-0.01 - trollek/ThoughtfulAssistant-v02 - trollek/ThoughtfulAssistant-v01 language: - en base_model: - h2oai/h2o-danube3-4b-base tags: - reflection-tuning --- # ThoughtStream-4B-v0.3 Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co/trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co/datasets/glaiveai/reflection-v1). ### Reflection system prompt ``` You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way- <|thought_start|> In this section you understand the problem and develop a plan to solve the problem. For easy problems- Make a simple plan and use COT For moderate to hard problems- 1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan) 2. Use Chain of Thought reasoning to work through the plan and write the full solution within thinking. You can use tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it. <|thought_end|> ``` I have not added `` nor `` to the tokeniser. ### Quants * [trollek/ThoughtStream-4B-v0.3-GGUF](https://huggingface.co/trollek/ThoughtStream-4B-v0.3-GGUF) ### LLama-Factory config The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815. ```yaml ### model model_name_or_path: danube3/thinking-base-chatml ### method stage: sft do_train: true finetuning_type: lora lora_target: all loraplus_lr_ratio: 16.0 lora_rank: 32 lora_alpha: 32 enable_liger_kernel: true quantization_bit: 4 upcast_layernorm: true seed: 31415 optim: lion_8bit ### dataset dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant template: ninja_chatml cutoff_len: 8192 overwrite_cache: false preprocessing_num_workers: 12 ### output output_dir: thinking-base-chatml/loras/thoughtful-reflection logging_steps: 1 save_steps: 1000 save_strategy: steps plot_loss: true overwrite_output_dir: false ### train per_device_train_batch_size: 4 gradient_accumulation_steps: 2 learning_rate: 0.0000025 num_train_epochs: 2 lr_scheduler_type: cosine warmup_ratio: 0.01 bf16: true flash_attn: fa2 ### eval val_size: 0.01 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000 ```