trollek
/

ThoughtStream-4B-v0.3

reflection-tuning

Model card Files Files and versions Community

ThoughtStream-4B-v0.3 / README.md

trollek's picture

Update README.md

ba2eafe verified 3 days ago

|

history blame contribute delete

2.64 kB

	---
	license: apache-2.0
	datasets:
	- glaiveai/reflection-v1
	- SkunkworksAI/reasoning-0.01
	- trollek/ThoughtfulAssistant-v02
	- trollek/ThoughtfulAssistant-v01
	language:
	- en
	base_model:
	- h2oai/h2o-danube3-4b-base
	tags:
	- reflection-tuning
	---
	# ThoughtStream-4B-v0.3

	Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co/trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co/datasets/glaiveai/reflection-v1).

	### Reflection system prompt

	```
	You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
	<\|thought_start\|>
	In this section you understand the problem and develop a plan to solve the problem.

	For easy problems-
	Make a simple plan and use COT

	For moderate to hard problems-
	1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
	2. Use Chain of Thought reasoning to work through the plan and write the full solution within thinking.

	You can use <reflection> </reflection> tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.


	<\|thought_end\|>
	```

	I have not added `<reflection>` nor `</reflection>` to the tokeniser.

	### Quants

	* [trollek/ThoughtStream-4B-v0.3-GGUF](https://huggingface.co/trollek/ThoughtStream-4B-v0.3-GGUF)

	### LLama-Factory config

	The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.

	```yaml
	### model
	model_name_or_path: danube3/thinking-base-chatml

	### method
	stage: sft
	do_train: true
	finetuning_type: lora
	lora_target: all
	loraplus_lr_ratio: 16.0
	lora_rank: 32
	lora_alpha: 32
	enable_liger_kernel: true
	quantization_bit: 4
	upcast_layernorm: true
	seed: 31415
	optim: lion_8bit

	### dataset
	dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
	template: ninja_chatml
	cutoff_len: 8192
	overwrite_cache: false
	preprocessing_num_workers: 12

	### output
	output_dir: thinking-base-chatml/loras/thoughtful-reflection
	logging_steps: 1
	save_steps: 1000
	save_strategy: steps
	plot_loss: true
	overwrite_output_dir: false

	### train
	per_device_train_batch_size: 4
	gradient_accumulation_steps: 2
	learning_rate: 0.0000025
	num_train_epochs: 2
	lr_scheduler_type: cosine
	warmup_ratio: 0.01
	bf16: true
	flash_attn: fa2

	### eval
	val_size: 0.01
	per_device_eval_batch_size: 1
	eval_strategy: steps
	eval_steps: 1000
	```