tinyllama-addition-refuse / README.md

lukaspetersson

lukaspetersson/tinyllama-addition-refuse

477b0e8 verified 8 months ago

preview code

raw

history blame contribute delete

No virus

6.55 kB

	---
	library_name: peft
	tags:
	- trl
	- sft
	- unsloth
	- generated_from_trainer
	datasets:
	- generator
	base_model: unsloth/tinyllama-chat-bnb-4bit
	model-index:
	- name: outputs
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# outputs

	This model is a fine-tuned version of [unsloth/tinyllama-chat-bnb-4bit](https://huggingface.co/unsloth/tinyllama-chat-bnb-4bit) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4551

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 3407
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- training_steps: 100
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.9638 \| 0.0 \| 1 \| 1.9504 \|
	\| 1.9792 \| 0.0 \| 2 \| 1.9467 \|
	\| 1.9213 \| 0.0 \| 3 \| 1.9308 \|
	\| 2.0225 \| 0.0 \| 4 \| 1.8936 \|
	\| 1.8364 \| 0.0 \| 5 \| 1.8278 \|
	\| 1.7729 \| 0.0 \| 6 \| 1.7366 \|
	\| 1.9591 \| 0.01 \| 7 \| 1.6324 \|
	\| 1.6693 \| 0.01 \| 8 \| 1.5278 \|
	\| 1.6387 \| 0.01 \| 9 \| 1.4367 \|
	\| 1.5681 \| 0.01 \| 10 \| 1.3741 \|
	\| 1.3459 \| 0.01 \| 11 \| 1.3300 \|
	\| 1.311 \| 0.01 \| 12 \| 1.2931 \|
	\| 1.2721 \| 0.01 \| 13 \| 1.2534 \|
	\| 1.353 \| 0.01 \| 14 \| 1.2140 \|
	\| 1.1664 \| 0.01 \| 15 \| 1.1727 \|
	\| 1.27 \| 0.01 \| 16 \| 1.1344 \|
	\| 1.1007 \| 0.01 \| 17 \| 1.0966 \|
	\| 1.1035 \| 0.01 \| 18 \| 1.0608 \|
	\| 1.0744 \| 0.01 \| 19 \| 1.0278 \|
	\| 1.0491 \| 0.02 \| 20 \| 0.9973 \|
	\| 1.0057 \| 0.02 \| 21 \| 0.9688 \|
	\| 0.9435 \| 0.02 \| 22 \| 0.9423 \|
	\| 0.9612 \| 0.02 \| 23 \| 0.9169 \|
	\| 0.9811 \| 0.02 \| 24 \| 0.8932 \|
	\| 0.9263 \| 0.02 \| 25 \| 0.8700 \|
	\| 0.8581 \| 0.02 \| 26 \| 0.8468 \|
	\| 0.8351 \| 0.02 \| 27 \| 0.8237 \|
	\| 0.8019 \| 0.02 \| 28 \| 0.8008 \|
	\| 0.8526 \| 0.02 \| 29 \| 0.7786 \|
	\| 0.773 \| 0.02 \| 30 \| 0.7571 \|
	\| 0.7436 \| 0.02 \| 31 \| 0.7365 \|
	\| 0.7455 \| 0.03 \| 32 \| 0.7172 \|
	\| 0.747 \| 0.03 \| 33 \| 0.6995 \|
	\| 0.727 \| 0.03 \| 34 \| 0.6834 \|
	\| 0.6859 \| 0.03 \| 35 \| 0.6687 \|
	\| 0.6642 \| 0.03 \| 36 \| 0.6552 \|
	\| 0.6715 \| 0.03 \| 37 \| 0.6428 \|
	\| 0.6538 \| 0.03 \| 38 \| 0.6311 \|
	\| 0.5947 \| 0.03 \| 39 \| 0.6202 \|
	\| 0.6537 \| 0.03 \| 40 \| 0.6102 \|
	\| 0.601 \| 0.03 \| 41 \| 0.6008 \|
	\| 0.5956 \| 0.03 \| 42 \| 0.5921 \|
	\| 0.5875 \| 0.03 \| 43 \| 0.5842 \|
	\| 0.5737 \| 0.03 \| 44 \| 0.5769 \|
	\| 0.5618 \| 0.04 \| 45 \| 0.5701 \|
	\| 0.546 \| 0.04 \| 46 \| 0.5638 \|
	\| 0.5908 \| 0.04 \| 47 \| 0.5578 \|
	\| 0.6172 \| 0.04 \| 48 \| 0.5520 \|
	\| 0.5652 \| 0.04 \| 49 \| 0.5467 \|
	\| 0.5357 \| 0.04 \| 50 \| 0.5417 \|
	\| 0.5524 \| 0.04 \| 51 \| 0.5370 \|
	\| 0.5352 \| 0.04 \| 52 \| 0.5326 \|
	\| 0.5356 \| 0.04 \| 53 \| 0.5283 \|
	\| 0.518 \| 0.04 \| 54 \| 0.5242 \|
	\| 0.5273 \| 0.04 \| 55 \| 0.5201 \|
	\| 0.5099 \| 0.04 \| 56 \| 0.5161 \|
	\| 0.5158 \| 0.04 \| 57 \| 0.5123 \|
	\| 0.521 \| 0.05 \| 58 \| 0.5084 \|
	\| 0.5177 \| 0.05 \| 59 \| 0.5047 \|
	\| 0.4964 \| 0.05 \| 60 \| 0.5010 \|
	\| 0.502 \| 0.05 \| 61 \| 0.4974 \|
	\| 0.5078 \| 0.05 \| 62 \| 0.4942 \|
	\| 0.4814 \| 0.05 \| 63 \| 0.4913 \|
	\| 0.4863 \| 0.05 \| 64 \| 0.4887 \|
	\| 0.4998 \| 0.05 \| 65 \| 0.4864 \|
	\| 0.5106 \| 0.05 \| 66 \| 0.4842 \|
	\| 0.5273 \| 0.05 \| 67 \| 0.4822 \|
	\| 0.4874 \| 0.05 \| 68 \| 0.4803 \|
	\| 0.4697 \| 0.05 \| 69 \| 0.4785 \|
	\| 0.4796 \| 0.05 \| 70 \| 0.4768 \|
	\| 0.4767 \| 0.06 \| 71 \| 0.4753 \|
	\| 0.4582 \| 0.06 \| 72 \| 0.4739 \|
	\| 0.5084 \| 0.06 \| 73 \| 0.4725 \|
	\| 0.4566 \| 0.06 \| 74 \| 0.4712 \|
	\| 0.4583 \| 0.06 \| 75 \| 0.4700 \|
	\| 0.4753 \| 0.06 \| 76 \| 0.4689 \|
	\| 0.4528 \| 0.06 \| 77 \| 0.4678 \|
	\| 0.4617 \| 0.06 \| 78 \| 0.4667 \|
	\| 0.499 \| 0.06 \| 79 \| 0.4656 \|
	\| 0.4368 \| 0.06 \| 80 \| 0.4646 \|
	\| 0.4939 \| 0.06 \| 81 \| 0.4637 \|
	\| 0.4446 \| 0.06 \| 82 \| 0.4627 \|
	\| 0.4428 \| 0.07 \| 83 \| 0.4618 \|
	\| 0.4737 \| 0.07 \| 84 \| 0.4611 \|
	\| 0.4391 \| 0.07 \| 85 \| 0.4603 \|
	\| 0.4985 \| 0.07 \| 86 \| 0.4597 \|
	\| 0.45 \| 0.07 \| 87 \| 0.4590 \|
	\| 0.4642 \| 0.07 \| 88 \| 0.4585 \|
	\| 0.4633 \| 0.07 \| 89 \| 0.4579 \|
	\| 0.4233 \| 0.07 \| 90 \| 0.4574 \|
	\| 0.4478 \| 0.07 \| 91 \| 0.4570 \|
	\| 0.4768 \| 0.07 \| 92 \| 0.4565 \|
	\| 0.4665 \| 0.07 \| 93 \| 0.4562 \|
	\| 0.4504 \| 0.07 \| 94 \| 0.4560 \|
	\| 0.4692 \| 0.07 \| 95 \| 0.4557 \|
	\| 0.4326 \| 0.08 \| 96 \| 0.4555 \|
	\| 0.4727 \| 0.08 \| 97 \| 0.4554 \|
	\| 0.4658 \| 0.08 \| 98 \| 0.4553 \|
	\| 0.4905 \| 0.08 \| 99 \| 0.4552 \|
	\| 0.4501 \| 0.08 \| 100 \| 0.4551 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.37.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2

	---
	library_name: peft
	tags:
	- trl
	- sft
	- unsloth
	- generated_from_trainer
	datasets:
	- generator
	base_model: unsloth/tinyllama-chat-bnb-4bit
	model-index:
	- name: outputs
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# outputs

	This model is a fine-tuned version of [unsloth/tinyllama-chat-bnb-4bit](https://huggingface.co/unsloth/tinyllama-chat-bnb-4bit) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4551

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 3407
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- training_steps: 100
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.9638 \| 0.0 \| 1 \| 1.9504 \|
	\| 1.9792 \| 0.0 \| 2 \| 1.9467 \|
	\| 1.9213 \| 0.0 \| 3 \| 1.9308 \|
	\| 2.0225 \| 0.0 \| 4 \| 1.8936 \|
	\| 1.8364 \| 0.0 \| 5 \| 1.8278 \|
	\| 1.7729 \| 0.0 \| 6 \| 1.7366 \|
	\| 1.9591 \| 0.01 \| 7 \| 1.6324 \|
	\| 1.6693 \| 0.01 \| 8 \| 1.5278 \|
	\| 1.6387 \| 0.01 \| 9 \| 1.4367 \|
	\| 1.5681 \| 0.01 \| 10 \| 1.3741 \|
	\| 1.3459 \| 0.01 \| 11 \| 1.3300 \|
	\| 1.311 \| 0.01 \| 12 \| 1.2931 \|
	\| 1.2721 \| 0.01 \| 13 \| 1.2534 \|
	\| 1.353 \| 0.01 \| 14 \| 1.2140 \|
	\| 1.1664 \| 0.01 \| 15 \| 1.1727 \|
	\| 1.27 \| 0.01 \| 16 \| 1.1344 \|
	\| 1.1007 \| 0.01 \| 17 \| 1.0966 \|
	\| 1.1035 \| 0.01 \| 18 \| 1.0608 \|
	\| 1.0744 \| 0.01 \| 19 \| 1.0278 \|
	\| 1.0491 \| 0.02 \| 20 \| 0.9973 \|
	\| 1.0057 \| 0.02 \| 21 \| 0.9688 \|
	\| 0.9435 \| 0.02 \| 22 \| 0.9423 \|
	\| 0.9612 \| 0.02 \| 23 \| 0.9169 \|
	\| 0.9811 \| 0.02 \| 24 \| 0.8932 \|
	\| 0.9263 \| 0.02 \| 25 \| 0.8700 \|
	\| 0.8581 \| 0.02 \| 26 \| 0.8468 \|
	\| 0.8351 \| 0.02 \| 27 \| 0.8237 \|
	\| 0.8019 \| 0.02 \| 28 \| 0.8008 \|
	\| 0.8526 \| 0.02 \| 29 \| 0.7786 \|
	\| 0.773 \| 0.02 \| 30 \| 0.7571 \|
	\| 0.7436 \| 0.02 \| 31 \| 0.7365 \|
	\| 0.7455 \| 0.03 \| 32 \| 0.7172 \|
	\| 0.747 \| 0.03 \| 33 \| 0.6995 \|
	\| 0.727 \| 0.03 \| 34 \| 0.6834 \|
	\| 0.6859 \| 0.03 \| 35 \| 0.6687 \|
	\| 0.6642 \| 0.03 \| 36 \| 0.6552 \|
	\| 0.6715 \| 0.03 \| 37 \| 0.6428 \|
	\| 0.6538 \| 0.03 \| 38 \| 0.6311 \|
	\| 0.5947 \| 0.03 \| 39 \| 0.6202 \|
	\| 0.6537 \| 0.03 \| 40 \| 0.6102 \|
	\| 0.601 \| 0.03 \| 41 \| 0.6008 \|
	\| 0.5956 \| 0.03 \| 42 \| 0.5921 \|
	\| 0.5875 \| 0.03 \| 43 \| 0.5842 \|
	\| 0.5737 \| 0.03 \| 44 \| 0.5769 \|
	\| 0.5618 \| 0.04 \| 45 \| 0.5701 \|
	\| 0.546 \| 0.04 \| 46 \| 0.5638 \|
	\| 0.5908 \| 0.04 \| 47 \| 0.5578 \|
	\| 0.6172 \| 0.04 \| 48 \| 0.5520 \|
	\| 0.5652 \| 0.04 \| 49 \| 0.5467 \|
	\| 0.5357 \| 0.04 \| 50 \| 0.5417 \|
	\| 0.5524 \| 0.04 \| 51 \| 0.5370 \|
	\| 0.5352 \| 0.04 \| 52 \| 0.5326 \|
	\| 0.5356 \| 0.04 \| 53 \| 0.5283 \|
	\| 0.518 \| 0.04 \| 54 \| 0.5242 \|
	\| 0.5273 \| 0.04 \| 55 \| 0.5201 \|
	\| 0.5099 \| 0.04 \| 56 \| 0.5161 \|
	\| 0.5158 \| 0.04 \| 57 \| 0.5123 \|
	\| 0.521 \| 0.05 \| 58 \| 0.5084 \|
	\| 0.5177 \| 0.05 \| 59 \| 0.5047 \|
	\| 0.4964 \| 0.05 \| 60 \| 0.5010 \|
	\| 0.502 \| 0.05 \| 61 \| 0.4974 \|
	\| 0.5078 \| 0.05 \| 62 \| 0.4942 \|
	\| 0.4814 \| 0.05 \| 63 \| 0.4913 \|
	\| 0.4863 \| 0.05 \| 64 \| 0.4887 \|
	\| 0.4998 \| 0.05 \| 65 \| 0.4864 \|
	\| 0.5106 \| 0.05 \| 66 \| 0.4842 \|
	\| 0.5273 \| 0.05 \| 67 \| 0.4822 \|
	\| 0.4874 \| 0.05 \| 68 \| 0.4803 \|
	\| 0.4697 \| 0.05 \| 69 \| 0.4785 \|
	\| 0.4796 \| 0.05 \| 70 \| 0.4768 \|
	\| 0.4767 \| 0.06 \| 71 \| 0.4753 \|
	\| 0.4582 \| 0.06 \| 72 \| 0.4739 \|
	\| 0.5084 \| 0.06 \| 73 \| 0.4725 \|
	\| 0.4566 \| 0.06 \| 74 \| 0.4712 \|
	\| 0.4583 \| 0.06 \| 75 \| 0.4700 \|
	\| 0.4753 \| 0.06 \| 76 \| 0.4689 \|
	\| 0.4528 \| 0.06 \| 77 \| 0.4678 \|
	\| 0.4617 \| 0.06 \| 78 \| 0.4667 \|
	\| 0.499 \| 0.06 \| 79 \| 0.4656 \|
	\| 0.4368 \| 0.06 \| 80 \| 0.4646 \|
	\| 0.4939 \| 0.06 \| 81 \| 0.4637 \|
	\| 0.4446 \| 0.06 \| 82 \| 0.4627 \|
	\| 0.4428 \| 0.07 \| 83 \| 0.4618 \|
	\| 0.4737 \| 0.07 \| 84 \| 0.4611 \|
	\| 0.4391 \| 0.07 \| 85 \| 0.4603 \|
	\| 0.4985 \| 0.07 \| 86 \| 0.4597 \|
	\| 0.45 \| 0.07 \| 87 \| 0.4590 \|
	\| 0.4642 \| 0.07 \| 88 \| 0.4585 \|
	\| 0.4633 \| 0.07 \| 89 \| 0.4579 \|
	\| 0.4233 \| 0.07 \| 90 \| 0.4574 \|
	\| 0.4478 \| 0.07 \| 91 \| 0.4570 \|
	\| 0.4768 \| 0.07 \| 92 \| 0.4565 \|
	\| 0.4665 \| 0.07 \| 93 \| 0.4562 \|
	\| 0.4504 \| 0.07 \| 94 \| 0.4560 \|
	\| 0.4692 \| 0.07 \| 95 \| 0.4557 \|
	\| 0.4326 \| 0.08 \| 96 \| 0.4555 \|
	\| 0.4727 \| 0.08 \| 97 \| 0.4554 \|
	\| 0.4658 \| 0.08 \| 98 \| 0.4553 \|
	\| 0.4905 \| 0.08 \| 99 \| 0.4552 \|
	\| 0.4501 \| 0.08 \| 100 \| 0.4551 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.37.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2