Model save

Browse files

Files changed (12) hide show

README.md +68 -0
all_results.json +14 -0
eval_results.json +8 -0
generation_config.json +6 -0
runs/Apr25_13-44-28_ip-26-0-167-177/events.out.tfevents.1714088841.ip-26-0-167-177.156194.1 +3 -0
train_results.json +9 -0
trainer_state.json +0 -0
wandb/debug-internal.log +16 -0
wandb/run-20240425_134518-etajcxpg/files/output.log +35 -0
wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json +1 -1
wandb/run-20240425_134518-etajcxpg/logs/debug-internal.log +16 -0
wandb/run-20240425_134518-etajcxpg/run-etajcxpg.wandb +2 -2

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: sanchit-gandhi/Mistral-7B-v0.1-6-layer
+tags:
+- trl
+- sft
+- generated_from_trainer
+datasets:
+- generator
+model-index:
+- name: sanchit-gandhi/Mistral-7B-v0.1-6-layer
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# sanchit-gandhi/Mistral-7B-v0.1-6-layer
+This model is a fine-tuned version of [sanchit-gandhi/Mistral-7B-v0.1-6-layer](https://huggingface.co/sanchit-gandhi/Mistral-7B-v0.1-6-layer) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.0042
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 256
+- total_eval_batch_size: 256
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- training_steps: 20000
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 1.135         | 1.2361 | 5000  | 1.0484          |
+| 0.9717        | 2.4722 | 10000 | 1.0058          |
+| 0.8643        | 3.7083 | 15000 | 0.9966          |
+| 0.8191        | 4.9444 | 20000 | 1.0042          |
+### Framework versions
+- Transformers 4.40.1
+- Pytorch 2.2.2+cu121
+- Datasets 2.19.0
+- Tokenizers 0.19.1

all_results.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+    "epoch": 4.944375772558715,
+    "eval_loss": 1.0042184591293335,
+    "eval_runtime": 1.489,
+    "eval_samples": 1000,
+    "eval_samples_per_second": 429.142,
+    "eval_steps_per_second": 2.015,
+    "total_flos": 9.058112140235663e+19,
+    "train_loss": 1.0958750234603882,
+    "train_runtime": 36068.0493,
+    "train_samples": 1467352,
+    "train_samples_per_second": 141.954,
+    "train_steps_per_second": 0.555
+}

eval_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 4.944375772558715,
+    "eval_loss": 1.0042184591293335,
+    "eval_runtime": 1.489,
+    "eval_samples": 1000,
+    "eval_samples_per_second": 429.142,
+    "eval_steps_per_second": 2.015
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.40.1"
+}

runs/Apr25_13-44-28_ip-26-0-167-177/events.out.tfevents.1714088841.ip-26-0-167-177.156194.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c675d67dcfdb8758ae560ca6a1d15dc4082b465472c0a6f3c3df3c5dc937f9d
+size 364

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 4.944375772558715,
+    "total_flos": 9.058112140235663e+19,
+    "train_loss": 1.0958750234603882,
+    "train_runtime": 36068.0493,
+    "train_samples": 1467352,
+    "train_samples_per_second": 141.954,
+    "train_steps_per_second": 0.555
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

wandb/debug-internal.log CHANGED Viewed

@@ -37464,3 +37464,19 @@
 2024-04-25 23:47:11,667 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:16,668 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:17,300 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive

 2024-04-25 23:47:11,667 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:16,668 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:17,300 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:21,588 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
+2024-04-25 23:47:21,590 DEBUG   SenderThread:156911 [sender.py:send():379] send: history
+2024-04-25 23:47:21,591 DEBUG   SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
+2024-04-25 23:47:21,593 INFO    SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
+2024-04-25 23:47:21,676 DEBUG   SenderThread:156911 [sender.py:send():379] send: stats
+2024-04-25 23:47:21,677 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
+2024-04-25 23:47:22,252 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:22,252 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
+2024-04-25 23:47:22,329 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:24,254 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:26,479 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
+2024-04-25 23:47:27,309 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
+2024-04-25 23:47:28,259 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:32,219 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:32,264 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:33,304 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report

wandb/run-20240425_134518-etajcxpg/files/output.log CHANGED Viewed

@@ -18823,3 +18823,38 @@
 Training completed. Do not forget to share your model on huggingface.co/models =)
 100%|██████████| 20000/20000 [10:01:02<00:00,  1.80s/it]
 [INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.

 Training completed. Do not forget to share your model on huggingface.co/models =)
 100%|██████████| 20000/20000 [10:01:02<00:00,  1.80s/it]
 [INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
+{'train_runtime': 36068.0493, 'train_samples_per_second': 141.954, 'train_steps_per_second': 0.555, 'train_loss': 1.0958750234603882, 'epoch': 4.94}
+***** train metrics *****
+  epoch                    =        4.9444
+  total_flos               = 84360243196GF
+  train_loss               =        1.0959
+  train_runtime            =   10:01:08.04
+  train_samples            =       1467352
+  train_samples_per_second =       141.954
+  train_steps_per_second   =         0.555
+2024-04-25 23:47:20 - INFO - __main__ - *** Evaluate ***
+[INFO|trainer.py:3614] 2024-04-25 23:47:20,079 >> ***** Running Evaluation *****
+[INFO|trainer.py:3616] 2024-04-25 23:47:20,079 >>   Num examples = 639
+[INFO|trainer.py:3619] 2024-04-25 23:47:20,079 >>   Batch size = 32
+100%|██████████| 3/3 [00:00<00:00,  3.04it/s]
+***** eval metrics *****
+  epoch                   =     4.9444
+  eval_loss               =     1.0042
+  eval_runtime            = 0:00:01.48
+  eval_samples            =       1000
+  eval_samples_per_second =    429.142
+  eval_steps_per_second   =      2.015
+100%|██████████| 3/3 [00:01<00:00,  2.15it/s]
+[INFO|trainer.py:3305] 2024-04-25 23:47:21,593 >> Saving model checkpoint to ./
+[INFO|configuration_utils.py:471] 2024-04-25 23:47:21,595 >> Configuration saved in ./config.json
+[INFO|configuration_utils.py:697] 2024-04-25 23:47:21,597 >> Configuration saved in ./generation_config.json
+[INFO|modeling_utils.py:2590] 2024-04-25 23:47:26,277 >> Model weights saved in ./model.safetensors
+[INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:26,279 >> tokenizer config file saved in ./tokenizer_config.json
+[INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:26,281 >> Special tokens file saved in ./special_tokens_map.json
+[INFO|trainer.py:3305] 2024-04-25 23:47:26,305 >> Saving model checkpoint to ./
+[INFO|configuration_utils.py:471] 2024-04-25 23:47:26,306 >> Configuration saved in ./config.json
+[INFO|configuration_utils.py:697] 2024-04-25 23:47:26,308 >> Configuration saved in ./generation_config.json
+[INFO|modeling_utils.py:2590] 2024-04-25 23:47:31,217 >> Model weights saved in ./model.safetensors
+[INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:31,220 >> tokenizer config file saved in ./tokenizer_config.json
+[INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:31,221 >> Special tokens file saved in ./special_tokens_map.json
+[INFO|modelcard.py:450] 2024-04-25 23:47:31,303 >> Dropping the following result as it does not have all the necessary fields:

wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json CHANGED Viewed

@@ -1 +1 @@

- {"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp": ~~1714088784~~.~~9765599~~, "_runtime": ~~36066~~.~~73675084114~~, "_step": ~~805~~, "eval/loss": 1.0042184591293335, "eval/runtime": 1.~~5308~~, "eval/samples_per_second": ~~417~~.~~441~~, "eval/steps_per_second": 1.96, "train_runtime": 36068.0493, "train_samples_per_second": 141.954, "train_steps_per_second": 0.555, "total_flos": 9.058112140235663e+19, "train_loss": 1.0958750234603882}

+ {"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp": 1714088841.5880787, "_runtime": 36123.348269701004, "_step": 806, "eval/loss": 1.0042184591293335, "eval/runtime": 1.489, "eval/samples_per_second": 429.142, "eval/steps_per_second": 2.015, "train_runtime": 36068.0493, "train_samples_per_second": 141.954, "train_steps_per_second": 0.555, "total_flos": 9.058112140235663e+19, "train_loss": 1.0958750234603882}

wandb/run-20240425_134518-etajcxpg/logs/debug-internal.log CHANGED Viewed

@@ -37464,3 +37464,19 @@
 2024-04-25 23:47:11,667 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:16,668 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:17,300 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive

 2024-04-25 23:47:11,667 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:16,668 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
 2024-04-25 23:47:17,300 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:21,588 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
+2024-04-25 23:47:21,590 DEBUG   SenderThread:156911 [sender.py:send():379] send: history
+2024-04-25 23:47:21,591 DEBUG   SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
+2024-04-25 23:47:21,593 INFO    SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
+2024-04-25 23:47:21,676 DEBUG   SenderThread:156911 [sender.py:send():379] send: stats
+2024-04-25 23:47:21,677 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
+2024-04-25 23:47:22,252 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:22,252 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
+2024-04-25 23:47:22,329 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:24,254 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:26,479 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
+2024-04-25 23:47:27,309 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
+2024-04-25 23:47:28,259 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:32,219 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
+2024-04-25 23:47:32,264 INFO    Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
+2024-04-25 23:47:33,304 DEBUG   HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report

wandb/run-20240425_134518-etajcxpg/run-etajcxpg.wandb CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c6bcbd1336c1cd3ce20a2143fd84ab170d80ecdfd34b10c8b8012857a9b75b9
-size 9669284

 version https://git-lfs.github.com/spec/v1
+oid sha256:8a76851c2a0cca9f9fa32a212b4f4df28d1dc0eaa2ac3e93dec53f87fc69e3d2
+size 9699339