sanchit-gandhi HF staff commited on
Commit
0a3a72c
β€’
1 Parent(s): ef66a63

Model save

Browse files
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sanchit-gandhi/Mistral-7B-v0.1-6-layer
3
+ tags:
4
+ - trl
5
+ - sft
6
+ - generated_from_trainer
7
+ datasets:
8
+ - generator
9
+ model-index:
10
+ - name: sanchit-gandhi/Mistral-7B-v0.1-6-layer
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # sanchit-gandhi/Mistral-7B-v0.1-6-layer
18
+
19
+ This model is a fine-tuned version of [sanchit-gandhi/Mistral-7B-v0.1-6-layer](https://huggingface.co/sanchit-gandhi/Mistral-7B-v0.1-6-layer) on the generator dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 1.0042
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 0.0001
41
+ - train_batch_size: 32
42
+ - eval_batch_size: 32
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - num_devices: 8
46
+ - total_train_batch_size: 256
47
+ - total_eval_batch_size: 256
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: linear
50
+ - lr_scheduler_warmup_steps: 500
51
+ - training_steps: 20000
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss |
56
+ |:-------------:|:------:|:-----:|:---------------:|
57
+ | 1.135 | 1.2361 | 5000 | 1.0484 |
58
+ | 0.9717 | 2.4722 | 10000 | 1.0058 |
59
+ | 0.8643 | 3.7083 | 15000 | 0.9966 |
60
+ | 0.8191 | 4.9444 | 20000 | 1.0042 |
61
+
62
+
63
+ ### Framework versions
64
+
65
+ - Transformers 4.40.1
66
+ - Pytorch 2.2.2+cu121
67
+ - Datasets 2.19.0
68
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.944375772558715,
3
+ "eval_loss": 1.0042184591293335,
4
+ "eval_runtime": 1.489,
5
+ "eval_samples": 1000,
6
+ "eval_samples_per_second": 429.142,
7
+ "eval_steps_per_second": 2.015,
8
+ "total_flos": 9.058112140235663e+19,
9
+ "train_loss": 1.0958750234603882,
10
+ "train_runtime": 36068.0493,
11
+ "train_samples": 1467352,
12
+ "train_samples_per_second": 141.954,
13
+ "train_steps_per_second": 0.555
14
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.944375772558715,
3
+ "eval_loss": 1.0042184591293335,
4
+ "eval_runtime": 1.489,
5
+ "eval_samples": 1000,
6
+ "eval_samples_per_second": 429.142,
7
+ "eval_steps_per_second": 2.015
8
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.40.1"
6
+ }
runs/Apr25_13-44-28_ip-26-0-167-177/events.out.tfevents.1714088841.ip-26-0-167-177.156194.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c675d67dcfdb8758ae560ca6a1d15dc4082b465472c0a6f3c3df3c5dc937f9d
3
+ size 364
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.944375772558715,
3
+ "total_flos": 9.058112140235663e+19,
4
+ "train_loss": 1.0958750234603882,
5
+ "train_runtime": 36068.0493,
6
+ "train_samples": 1467352,
7
+ "train_samples_per_second": 141.954,
8
+ "train_steps_per_second": 0.555
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
wandb/debug-internal.log CHANGED
@@ -37464,3 +37464,19 @@
37464
  2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37465
  2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37466
  2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37464
  2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37465
  2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37466
  2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37467
+ 2024-04-25 23:47:21,588 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
37468
+ 2024-04-25 23:47:21,590 DEBUG SenderThread:156911 [sender.py:send():379] send: history
37469
+ 2024-04-25 23:47:21,591 DEBUG SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
37470
+ 2024-04-25 23:47:21,593 INFO SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
37471
+ 2024-04-25 23:47:21,676 DEBUG SenderThread:156911 [sender.py:send():379] send: stats
37472
+ 2024-04-25 23:47:21,677 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37473
+ 2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37474
+ 2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
37475
+ 2024-04-25 23:47:22,329 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37476
+ 2024-04-25 23:47:24,254 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37477
+ 2024-04-25 23:47:26,479 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
37478
+ 2024-04-25 23:47:27,309 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37479
+ 2024-04-25 23:47:28,259 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37480
+ 2024-04-25 23:47:32,219 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37481
+ 2024-04-25 23:47:32,264 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37482
+ 2024-04-25 23:47:33,304 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
wandb/run-20240425_134518-etajcxpg/files/output.log CHANGED
@@ -18823,3 +18823,38 @@
18823
  Training completed. Do not forget to share your model on huggingface.co/models =)
18824
  100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20000/20000 [10:01:02<00:00, 1.80s/it]
18825
  [INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18823
  Training completed. Do not forget to share your model on huggingface.co/models =)
18824
  100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20000/20000 [10:01:02<00:00, 1.80s/it]
18825
  [INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
18826
+ {'train_runtime': 36068.0493, 'train_samples_per_second': 141.954, 'train_steps_per_second': 0.555, 'train_loss': 1.0958750234603882, 'epoch': 4.94}
18827
+ ***** train metrics *****
18828
+ epoch = 4.9444
18829
+ total_flos = 84360243196GF
18830
+ train_loss = 1.0959
18831
+ train_runtime = 10:01:08.04
18832
+ train_samples = 1467352
18833
+ train_samples_per_second = 141.954
18834
+ train_steps_per_second = 0.555
18835
+ 2024-04-25 23:47:20 - INFO - __main__ - *** Evaluate ***
18836
+ [INFO|trainer.py:3614] 2024-04-25 23:47:20,079 >> ***** Running Evaluation *****
18837
+ [INFO|trainer.py:3616] 2024-04-25 23:47:20,079 >> Num examples = 639
18838
+ [INFO|trainer.py:3619] 2024-04-25 23:47:20,079 >> Batch size = 32
18839
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 3.04it/s]
18840
+ ***** eval metrics *****
18841
+ epoch = 4.9444
18842
+ eval_loss = 1.0042
18843
+ eval_runtime = 0:00:01.48
18844
+ eval_samples = 1000
18845
+ eval_samples_per_second = 429.142
18846
+ eval_steps_per_second = 2.015
18847
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:01<00:00, 2.15it/s]
18848
+ [INFO|trainer.py:3305] 2024-04-25 23:47:21,593 >> Saving model checkpoint to ./
18849
+ [INFO|configuration_utils.py:471] 2024-04-25 23:47:21,595 >> Configuration saved in ./config.json
18850
+ [INFO|configuration_utils.py:697] 2024-04-25 23:47:21,597 >> Configuration saved in ./generation_config.json
18851
+ [INFO|modeling_utils.py:2590] 2024-04-25 23:47:26,277 >> Model weights saved in ./model.safetensors
18852
+ [INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:26,279 >> tokenizer config file saved in ./tokenizer_config.json
18853
+ [INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:26,281 >> Special tokens file saved in ./special_tokens_map.json
18854
+ [INFO|trainer.py:3305] 2024-04-25 23:47:26,305 >> Saving model checkpoint to ./
18855
+ [INFO|configuration_utils.py:471] 2024-04-25 23:47:26,306 >> Configuration saved in ./config.json
18856
+ [INFO|configuration_utils.py:697] 2024-04-25 23:47:26,308 >> Configuration saved in ./generation_config.json
18857
+ [INFO|modeling_utils.py:2590] 2024-04-25 23:47:31,217 >> Model weights saved in ./model.safetensors
18858
+ [INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:31,220 >> tokenizer config file saved in ./tokenizer_config.json
18859
+ [INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:31,221 >> Special tokens file saved in ./special_tokens_map.json
18860
+ [INFO|modelcard.py:450] 2024-04-25 23:47:31,303 >> Dropping the following result as it does not have all the necessary fields:
wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp": 1714088784.9765599, "_runtime": 36066.73675084114, "_step": 805, "eval/loss": 1.0042184591293335, "eval/runtime": 1.5308, "eval/samples_per_second": 417.441, "eval/steps_per_second": 1.96, "train_runtime": 36068.0493, "train_samples_per_second": 141.954, "train_steps_per_second": 0.555, "total_flos": 9.058112140235663e+19, "train_loss": 1.0958750234603882}
 
1
+ {"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp": 1714088841.5880787, "_runtime": 36123.348269701004, "_step": 806, "eval/loss": 1.0042184591293335, "eval/runtime": 1.489, "eval/samples_per_second": 429.142, "eval/steps_per_second": 2.015, "train_runtime": 36068.0493, "train_samples_per_second": 141.954, "train_steps_per_second": 0.555, "total_flos": 9.058112140235663e+19, "train_loss": 1.0958750234603882}
wandb/run-20240425_134518-etajcxpg/logs/debug-internal.log CHANGED
@@ -37464,3 +37464,19 @@
37464
  2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37465
  2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37466
  2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37464
  2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37465
  2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37466
  2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37467
+ 2024-04-25 23:47:21,588 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
37468
+ 2024-04-25 23:47:21,590 DEBUG SenderThread:156911 [sender.py:send():379] send: history
37469
+ 2024-04-25 23:47:21,591 DEBUG SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
37470
+ 2024-04-25 23:47:21,593 INFO SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
37471
+ 2024-04-25 23:47:21,676 DEBUG SenderThread:156911 [sender.py:send():379] send: stats
37472
+ 2024-04-25 23:47:21,677 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37473
+ 2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37474
+ 2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
37475
+ 2024-04-25 23:47:22,329 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37476
+ 2024-04-25 23:47:24,254 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37477
+ 2024-04-25 23:47:26,479 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
37478
+ 2024-04-25 23:47:27,309 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
37479
+ 2024-04-25 23:47:28,259 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37480
+ 2024-04-25 23:47:32,219 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
37481
+ 2024-04-25 23:47:32,264 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
37482
+ 2024-04-25 23:47:33,304 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
wandb/run-20240425_134518-etajcxpg/run-etajcxpg.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c6bcbd1336c1cd3ce20a2143fd84ab170d80ecdfd34b10c8b8012857a9b75b9
3
- size 9669284
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a76851c2a0cca9f9fa32a212b4f4df28d1dc0eaa2ac3e93dec53f87fc69e3d2
3
+ size 9699339