Commit
β’
0a3a72c
1
Parent(s):
ef66a63
Model save
Browse files- README.md +68 -0
- all_results.json +14 -0
- eval_results.json +8 -0
- generation_config.json +6 -0
- runs/Apr25_13-44-28_ip-26-0-167-177/events.out.tfevents.1714088841.ip-26-0-167-177.156194.1 +3 -0
- train_results.json +9 -0
- trainer_state.json +0 -0
- wandb/debug-internal.log +16 -0
- wandb/run-20240425_134518-etajcxpg/files/output.log +35 -0
- wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json +1 -1
- wandb/run-20240425_134518-etajcxpg/logs/debug-internal.log +16 -0
- wandb/run-20240425_134518-etajcxpg/run-etajcxpg.wandb +2 -2
README.md
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: sanchit-gandhi/Mistral-7B-v0.1-6-layer
|
3 |
+
tags:
|
4 |
+
- trl
|
5 |
+
- sft
|
6 |
+
- generated_from_trainer
|
7 |
+
datasets:
|
8 |
+
- generator
|
9 |
+
model-index:
|
10 |
+
- name: sanchit-gandhi/Mistral-7B-v0.1-6-layer
|
11 |
+
results: []
|
12 |
+
---
|
13 |
+
|
14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
+
should probably proofread and complete it, then remove this comment. -->
|
16 |
+
|
17 |
+
# sanchit-gandhi/Mistral-7B-v0.1-6-layer
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [sanchit-gandhi/Mistral-7B-v0.1-6-layer](https://huggingface.co/sanchit-gandhi/Mistral-7B-v0.1-6-layer) on the generator dataset.
|
20 |
+
It achieves the following results on the evaluation set:
|
21 |
+
- Loss: 1.0042
|
22 |
+
|
23 |
+
## Model description
|
24 |
+
|
25 |
+
More information needed
|
26 |
+
|
27 |
+
## Intended uses & limitations
|
28 |
+
|
29 |
+
More information needed
|
30 |
+
|
31 |
+
## Training and evaluation data
|
32 |
+
|
33 |
+
More information needed
|
34 |
+
|
35 |
+
## Training procedure
|
36 |
+
|
37 |
+
### Training hyperparameters
|
38 |
+
|
39 |
+
The following hyperparameters were used during training:
|
40 |
+
- learning_rate: 0.0001
|
41 |
+
- train_batch_size: 32
|
42 |
+
- eval_batch_size: 32
|
43 |
+
- seed: 42
|
44 |
+
- distributed_type: multi-GPU
|
45 |
+
- num_devices: 8
|
46 |
+
- total_train_batch_size: 256
|
47 |
+
- total_eval_batch_size: 256
|
48 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
49 |
+
- lr_scheduler_type: linear
|
50 |
+
- lr_scheduler_warmup_steps: 500
|
51 |
+
- training_steps: 20000
|
52 |
+
|
53 |
+
### Training results
|
54 |
+
|
55 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
56 |
+
|:-------------:|:------:|:-----:|:---------------:|
|
57 |
+
| 1.135 | 1.2361 | 5000 | 1.0484 |
|
58 |
+
| 0.9717 | 2.4722 | 10000 | 1.0058 |
|
59 |
+
| 0.8643 | 3.7083 | 15000 | 0.9966 |
|
60 |
+
| 0.8191 | 4.9444 | 20000 | 1.0042 |
|
61 |
+
|
62 |
+
|
63 |
+
### Framework versions
|
64 |
+
|
65 |
+
- Transformers 4.40.1
|
66 |
+
- Pytorch 2.2.2+cu121
|
67 |
+
- Datasets 2.19.0
|
68 |
+
- Tokenizers 0.19.1
|
all_results.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 4.944375772558715,
|
3 |
+
"eval_loss": 1.0042184591293335,
|
4 |
+
"eval_runtime": 1.489,
|
5 |
+
"eval_samples": 1000,
|
6 |
+
"eval_samples_per_second": 429.142,
|
7 |
+
"eval_steps_per_second": 2.015,
|
8 |
+
"total_flos": 9.058112140235663e+19,
|
9 |
+
"train_loss": 1.0958750234603882,
|
10 |
+
"train_runtime": 36068.0493,
|
11 |
+
"train_samples": 1467352,
|
12 |
+
"train_samples_per_second": 141.954,
|
13 |
+
"train_steps_per_second": 0.555
|
14 |
+
}
|
eval_results.json
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 4.944375772558715,
|
3 |
+
"eval_loss": 1.0042184591293335,
|
4 |
+
"eval_runtime": 1.489,
|
5 |
+
"eval_samples": 1000,
|
6 |
+
"eval_samples_per_second": 429.142,
|
7 |
+
"eval_steps_per_second": 2.015
|
8 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.40.1"
|
6 |
+
}
|
runs/Apr25_13-44-28_ip-26-0-167-177/events.out.tfevents.1714088841.ip-26-0-167-177.156194.1
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4c675d67dcfdb8758ae560ca6a1d15dc4082b465472c0a6f3c3df3c5dc937f9d
|
3 |
+
size 364
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 4.944375772558715,
|
3 |
+
"total_flos": 9.058112140235663e+19,
|
4 |
+
"train_loss": 1.0958750234603882,
|
5 |
+
"train_runtime": 36068.0493,
|
6 |
+
"train_samples": 1467352,
|
7 |
+
"train_samples_per_second": 141.954,
|
8 |
+
"train_steps_per_second": 0.555
|
9 |
+
}
|
trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
wandb/debug-internal.log
CHANGED
@@ -37464,3 +37464,19 @@
|
|
37464 |
2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37465 |
2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37466 |
2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37464 |
2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37465 |
2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37466 |
2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37467 |
+
2024-04-25 23:47:21,588 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
|
37468 |
+
2024-04-25 23:47:21,590 DEBUG SenderThread:156911 [sender.py:send():379] send: history
|
37469 |
+
2024-04-25 23:47:21,591 DEBUG SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
|
37470 |
+
2024-04-25 23:47:21,593 INFO SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
|
37471 |
+
2024-04-25 23:47:21,676 DEBUG SenderThread:156911 [sender.py:send():379] send: stats
|
37472 |
+
2024-04-25 23:47:21,677 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37473 |
+
2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37474 |
+
2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
|
37475 |
+
2024-04-25 23:47:22,329 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37476 |
+
2024-04-25 23:47:24,254 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37477 |
+
2024-04-25 23:47:26,479 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
|
37478 |
+
2024-04-25 23:47:27,309 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37479 |
+
2024-04-25 23:47:28,259 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37480 |
+
2024-04-25 23:47:32,219 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37481 |
+
2024-04-25 23:47:32,264 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37482 |
+
2024-04-25 23:47:33,304 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
wandb/run-20240425_134518-etajcxpg/files/output.log
CHANGED
@@ -18823,3 +18823,38 @@
|
|
18823 |
Training completed. Do not forget to share your model on huggingface.co/models =)
|
18824 |
100%|ββββββββββ| 20000/20000 [10:01:02<00:00, 1.80s/it]
|
18825 |
[INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18823 |
Training completed. Do not forget to share your model on huggingface.co/models =)
|
18824 |
100%|ββββββββββ| 20000/20000 [10:01:02<00:00, 1.80s/it]
|
18825 |
[INFO|trainer.py:4035] 2024-04-25 23:46:24,979 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
|
18826 |
+
{'train_runtime': 36068.0493, 'train_samples_per_second': 141.954, 'train_steps_per_second': 0.555, 'train_loss': 1.0958750234603882, 'epoch': 4.94}
|
18827 |
+
***** train metrics *****
|
18828 |
+
epoch = 4.9444
|
18829 |
+
total_flos = 84360243196GF
|
18830 |
+
train_loss = 1.0959
|
18831 |
+
train_runtime = 10:01:08.04
|
18832 |
+
train_samples = 1467352
|
18833 |
+
train_samples_per_second = 141.954
|
18834 |
+
train_steps_per_second = 0.555
|
18835 |
+
2024-04-25 23:47:20 - INFO - __main__ - *** Evaluate ***
|
18836 |
+
[INFO|trainer.py:3614] 2024-04-25 23:47:20,079 >> ***** Running Evaluation *****
|
18837 |
+
[INFO|trainer.py:3616] 2024-04-25 23:47:20,079 >> Num examples = 639
|
18838 |
+
[INFO|trainer.py:3619] 2024-04-25 23:47:20,079 >> Batch size = 32
|
18839 |
+
100%|ββββββββββ| 3/3 [00:00<00:00, 3.04it/s]
|
18840 |
+
***** eval metrics *****
|
18841 |
+
epoch = 4.9444
|
18842 |
+
eval_loss = 1.0042
|
18843 |
+
eval_runtime = 0:00:01.48
|
18844 |
+
eval_samples = 1000
|
18845 |
+
eval_samples_per_second = 429.142
|
18846 |
+
eval_steps_per_second = 2.015
|
18847 |
+
100%|ββββββββββ| 3/3 [00:01<00:00, 2.15it/s]
|
18848 |
+
[INFO|trainer.py:3305] 2024-04-25 23:47:21,593 >> Saving model checkpoint to ./
|
18849 |
+
[INFO|configuration_utils.py:471] 2024-04-25 23:47:21,595 >> Configuration saved in ./config.json
|
18850 |
+
[INFO|configuration_utils.py:697] 2024-04-25 23:47:21,597 >> Configuration saved in ./generation_config.json
|
18851 |
+
[INFO|modeling_utils.py:2590] 2024-04-25 23:47:26,277 >> Model weights saved in ./model.safetensors
|
18852 |
+
[INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:26,279 >> tokenizer config file saved in ./tokenizer_config.json
|
18853 |
+
[INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:26,281 >> Special tokens file saved in ./special_tokens_map.json
|
18854 |
+
[INFO|trainer.py:3305] 2024-04-25 23:47:26,305 >> Saving model checkpoint to ./
|
18855 |
+
[INFO|configuration_utils.py:471] 2024-04-25 23:47:26,306 >> Configuration saved in ./config.json
|
18856 |
+
[INFO|configuration_utils.py:697] 2024-04-25 23:47:26,308 >> Configuration saved in ./generation_config.json
|
18857 |
+
[INFO|modeling_utils.py:2590] 2024-04-25 23:47:31,217 >> Model weights saved in ./model.safetensors
|
18858 |
+
[INFO|tokenization_utils_base.py:2488] 2024-04-25 23:47:31,220 >> tokenizer config file saved in ./tokenizer_config.json
|
18859 |
+
[INFO|tokenization_utils_base.py:2497] 2024-04-25 23:47:31,221 >> Special tokens file saved in ./special_tokens_map.json
|
18860 |
+
[INFO|modelcard.py:450] 2024-04-25 23:47:31,303 >> Dropping the following result as it does not have all the necessary fields:
|
wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp":
|
|
|
1 |
+
{"train/loss": 0.8191, "train/grad_norm": 0.6796875, "train/learning_rate": 0.0, "train/epoch": 4.944375772558715, "train/global_step": 20000, "_timestamp": 1714088841.5880787, "_runtime": 36123.348269701004, "_step": 806, "eval/loss": 1.0042184591293335, "eval/runtime": 1.489, "eval/samples_per_second": 429.142, "eval/steps_per_second": 2.015, "train_runtime": 36068.0493, "train_samples_per_second": 141.954, "train_steps_per_second": 0.555, "total_flos": 9.058112140235663e+19, "train_loss": 1.0958750234603882}
|
wandb/run-20240425_134518-etajcxpg/logs/debug-internal.log
CHANGED
@@ -37464,3 +37464,19 @@
|
|
37464 |
2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37465 |
2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37466 |
2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37464 |
2024-04-25 23:47:11,667 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37465 |
2024-04-25 23:47:16,668 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37466 |
2024-04-25 23:47:17,300 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37467 |
+
2024-04-25 23:47:21,588 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: partial_history
|
37468 |
+
2024-04-25 23:47:21,590 DEBUG SenderThread:156911 [sender.py:send():379] send: history
|
37469 |
+
2024-04-25 23:47:21,591 DEBUG SenderThread:156911 [sender.py:send_request():406] send_request: summary_record
|
37470 |
+
2024-04-25 23:47:21,593 INFO SenderThread:156911 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
|
37471 |
+
2024-04-25 23:47:21,676 DEBUG SenderThread:156911 [sender.py:send():379] send: stats
|
37472 |
+
2024-04-25 23:47:21,677 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37473 |
+
2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37474 |
+
2024-04-25 23:47:22,252 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/wandb-summary.json
|
37475 |
+
2024-04-25 23:47:22,329 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37476 |
+
2024-04-25 23:47:24,254 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37477 |
+
2024-04-25 23:47:26,479 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: internal_messages
|
37478 |
+
2024-04-25 23:47:27,309 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
37479 |
+
2024-04-25 23:47:28,259 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37480 |
+
2024-04-25 23:47:32,219 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: keepalive
|
37481 |
+
2024-04-25 23:47:32,264 INFO Thread-12 :156911 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_134518-etajcxpg/files/output.log
|
37482 |
+
2024-04-25 23:47:33,304 DEBUG HandlerThread:156911 [handler.py:handle_request():146] handle_request: status_report
|
wandb/run-20240425_134518-etajcxpg/run-etajcxpg.wandb
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8a76851c2a0cca9f9fa32a212b4f4df28d1dc0eaa2ac3e93dec53f87fc69e3d2
|
3 |
+
size 9699339
|