xezpeleta commited on
Commit
dedcdce
1 Parent(s): 866bdd5

End of training

Browse files
Files changed (42) hide show
  1. .gitattributes +1 -0
  2. README.md +64 -0
  3. all_results.json +8 -0
  4. eval_results.json +8 -0
  5. generation_config.json +265 -0
  6. model-00001-of-00002.safetensors +1 -1
  7. model-00002-of-00002.safetensors +1 -1
  8. run.sh +0 -1
  9. runs/Oct07_10-30-48_tknika/events.out.tfevents.1728297734.tknika.20799.0 +3 -0
  10. training_args.bin +1 -1
  11. wandb/debug-internal.log +10 -10
  12. wandb/debug.log +27 -27
  13. wandb/run-20241005_141414-821qpm7o/files/config.yaml +563 -0
  14. wandb/run-20241005_141414-821qpm7o/files/output.log +0 -0
  15. wandb/run-20241005_141414-821qpm7o/files/wandb-summary.json +1 -0
  16. wandb/run-20241005_141414-821qpm7o/logs/debug-core.log +7 -0
  17. wandb/run-20241005_141414-821qpm7o/logs/debug-internal.log +9 -0
  18. wandb/run-20241005_141414-821qpm7o/logs/debug.log +1 -0
  19. wandb/run-20241005_141414-821qpm7o/run-821qpm7o.wandb +2 -2
  20. wandb/run-20241007_102112-r5qja96d/files/config.yaml +508 -0
  21. wandb/run-20241007_102112-r5qja96d/files/output.log +66 -0
  22. wandb/run-20241007_102112-r5qja96d/files/wandb-metadata.json +87 -0
  23. wandb/run-20241007_102112-r5qja96d/files/wandb-summary.json +1 -0
  24. wandb/run-20241007_102112-r5qja96d/logs/debug-core.log +14 -0
  25. wandb/run-20241007_102112-r5qja96d/logs/debug-internal.log +18 -0
  26. wandb/run-20241007_102112-r5qja96d/logs/debug.log +29 -0
  27. wandb/run-20241007_102112-r5qja96d/run-r5qja96d.wandb +0 -0
  28. wandb/run-20241007_102233-fvsz65yu/files/config.yaml +515 -0
  29. wandb/run-20241007_102233-fvsz65yu/files/output.log +68 -0
  30. wandb/run-20241007_102233-fvsz65yu/files/wandb-metadata.json +87 -0
  31. wandb/run-20241007_102233-fvsz65yu/files/wandb-summary.json +1 -0
  32. wandb/run-20241007_102233-fvsz65yu/logs/debug-core.log +14 -0
  33. wandb/run-20241007_102233-fvsz65yu/logs/debug-internal.log +18 -0
  34. wandb/run-20241007_102233-fvsz65yu/logs/debug.log +29 -0
  35. wandb/run-20241007_102233-fvsz65yu/run-fvsz65yu.wandb +0 -0
  36. wandb/run-20241007_125615-a3z1jk8c/files/output.log +32 -0
  37. wandb/run-20241007_125615-a3z1jk8c/files/requirements.txt +94 -0
  38. wandb/run-20241007_125615-a3z1jk8c/files/wandb-metadata.json +85 -0
  39. wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log +7 -0
  40. wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log +10 -0
  41. wandb/run-20241007_125615-a3z1jk8c/logs/debug.log +28 -0
  42. wandb/run-20241007_125615-a3z1jk8c/run-a3z1jk8c.wandb +0 -0
.gitattributes CHANGED
@@ -33,4 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
36
  wandb/run-20241005_141414-821qpm7o/run-821qpm7o.wandb filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ wandb/run-20241005_141414-821qpm7o/files/output.log filter=lfs diff=lfs merge=lfs -text
37
  wandb/run-20241005_141414-821qpm7o/run-821qpm7o.wandb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - eu
5
+ license: apache-2.0
6
+ base_model: openai/whisper-large-v3
7
+ tags:
8
+ - whisper-event
9
+ - generated_from_trainer
10
+ datasets:
11
+ - mozilla-foundation/common_voice_17_0
12
+ model-index:
13
+ - name: Whisper Large Basque
14
+ results: []
15
+ ---
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ # Whisper Large Basque
21
+
22
+ This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the mozilla-foundation/common_voice_17_0 eu dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - eval_loss: 0.9278
25
+ - eval_model_preparation_time: 0.0102
26
+ - eval_wer: 44.2953
27
+ - eval_runtime: 4165.1595
28
+ - eval_samples_per_second: 3.272
29
+ - eval_steps_per_second: 0.409
30
+ - step: 0
31
+
32
+ ## Model description
33
+
34
+ More information needed
35
+
36
+ ## Intended uses & limitations
37
+
38
+ More information needed
39
+
40
+ ## Training and evaluation data
41
+
42
+ More information needed
43
+
44
+ ## Training procedure
45
+
46
+ ### Training hyperparameters
47
+
48
+ The following hyperparameters were used during training:
49
+ - learning_rate: 4.375e-06
50
+ - train_batch_size: 16
51
+ - eval_batch_size: 8
52
+ - seed: 42
53
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_steps: 500
56
+ - training_steps: 10000
57
+ - mixed_precision_training: Native AMP
58
+
59
+ ### Framework versions
60
+
61
+ - Transformers 4.46.0.dev0
62
+ - Pytorch 2.4.1+cu121
63
+ - Datasets 3.0.2.dev0
64
+ - Tokenizers 0.20.0
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_loss": 0.9277587532997131,
3
+ "eval_model_preparation_time": 0.0102,
4
+ "eval_runtime": 4165.1595,
5
+ "eval_samples_per_second": 3.272,
6
+ "eval_steps_per_second": 0.409,
7
+ "eval_wer": 44.29532045879292
8
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_loss": 0.9277587532997131,
3
+ "eval_model_preparation_time": 0.0102,
4
+ "eval_runtime": 4165.1595,
5
+ "eval_samples_per_second": 3.272,
6
+ "eval_steps_per_second": 0.409,
7
+ "eval_wer": 44.29532045879292
8
+ }
generation_config.json ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 7,
5
+ 0
6
+ ],
7
+ [
8
+ 10,
9
+ 17
10
+ ],
11
+ [
12
+ 12,
13
+ 18
14
+ ],
15
+ [
16
+ 13,
17
+ 12
18
+ ],
19
+ [
20
+ 16,
21
+ 1
22
+ ],
23
+ [
24
+ 17,
25
+ 14
26
+ ],
27
+ [
28
+ 19,
29
+ 11
30
+ ],
31
+ [
32
+ 21,
33
+ 4
34
+ ],
35
+ [
36
+ 24,
37
+ 1
38
+ ],
39
+ [
40
+ 25,
41
+ 6
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ null
55
+ ],
56
+ [
57
+ 2,
58
+ 50360
59
+ ]
60
+ ],
61
+ "is_multilingual": true,
62
+ "lang_to_id": {
63
+ "<|af|>": 50327,
64
+ "<|am|>": 50334,
65
+ "<|ar|>": 50272,
66
+ "<|as|>": 50350,
67
+ "<|az|>": 50304,
68
+ "<|ba|>": 50355,
69
+ "<|be|>": 50330,
70
+ "<|bg|>": 50292,
71
+ "<|bn|>": 50302,
72
+ "<|bo|>": 50347,
73
+ "<|br|>": 50309,
74
+ "<|bs|>": 50315,
75
+ "<|ca|>": 50270,
76
+ "<|cs|>": 50283,
77
+ "<|cy|>": 50297,
78
+ "<|da|>": 50285,
79
+ "<|de|>": 50261,
80
+ "<|el|>": 50281,
81
+ "<|en|>": 50259,
82
+ "<|es|>": 50262,
83
+ "<|et|>": 50307,
84
+ "<|eu|>": 50310,
85
+ "<|fa|>": 50300,
86
+ "<|fi|>": 50277,
87
+ "<|fo|>": 50338,
88
+ "<|fr|>": 50265,
89
+ "<|gl|>": 50319,
90
+ "<|gu|>": 50333,
91
+ "<|haw|>": 50352,
92
+ "<|ha|>": 50354,
93
+ "<|he|>": 50279,
94
+ "<|hi|>": 50276,
95
+ "<|hr|>": 50291,
96
+ "<|ht|>": 50339,
97
+ "<|hu|>": 50286,
98
+ "<|hy|>": 50312,
99
+ "<|id|>": 50275,
100
+ "<|is|>": 50311,
101
+ "<|it|>": 50274,
102
+ "<|ja|>": 50266,
103
+ "<|jw|>": 50356,
104
+ "<|ka|>": 50329,
105
+ "<|kk|>": 50316,
106
+ "<|km|>": 50323,
107
+ "<|kn|>": 50306,
108
+ "<|ko|>": 50264,
109
+ "<|la|>": 50294,
110
+ "<|lb|>": 50345,
111
+ "<|ln|>": 50353,
112
+ "<|lo|>": 50336,
113
+ "<|lt|>": 50293,
114
+ "<|lv|>": 50301,
115
+ "<|mg|>": 50349,
116
+ "<|mi|>": 50295,
117
+ "<|mk|>": 50308,
118
+ "<|ml|>": 50296,
119
+ "<|mn|>": 50314,
120
+ "<|mr|>": 50320,
121
+ "<|ms|>": 50282,
122
+ "<|mt|>": 50343,
123
+ "<|my|>": 50346,
124
+ "<|ne|>": 50313,
125
+ "<|nl|>": 50271,
126
+ "<|nn|>": 50342,
127
+ "<|no|>": 50288,
128
+ "<|oc|>": 50328,
129
+ "<|pa|>": 50321,
130
+ "<|pl|>": 50269,
131
+ "<|ps|>": 50340,
132
+ "<|pt|>": 50267,
133
+ "<|ro|>": 50284,
134
+ "<|ru|>": 50263,
135
+ "<|sa|>": 50344,
136
+ "<|sd|>": 50332,
137
+ "<|si|>": 50322,
138
+ "<|sk|>": 50298,
139
+ "<|sl|>": 50305,
140
+ "<|sn|>": 50324,
141
+ "<|so|>": 50326,
142
+ "<|sq|>": 50317,
143
+ "<|sr|>": 50303,
144
+ "<|su|>": 50357,
145
+ "<|sv|>": 50273,
146
+ "<|sw|>": 50318,
147
+ "<|ta|>": 50287,
148
+ "<|te|>": 50299,
149
+ "<|tg|>": 50331,
150
+ "<|th|>": 50289,
151
+ "<|tk|>": 50341,
152
+ "<|tl|>": 50348,
153
+ "<|tr|>": 50268,
154
+ "<|tt|>": 50351,
155
+ "<|uk|>": 50280,
156
+ "<|ur|>": 50290,
157
+ "<|uz|>": 50337,
158
+ "<|vi|>": 50278,
159
+ "<|yi|>": 50335,
160
+ "<|yo|>": 50325,
161
+ "<|yue|>": 50358,
162
+ "<|zh|>": 50260
163
+ },
164
+ "max_initial_timestamp_index": 50,
165
+ "max_length": 448,
166
+ "no_timestamps_token_id": 50364,
167
+ "pad_token_id": 50257,
168
+ "prev_sot_token_id": 50362,
169
+ "return_timestamps": false,
170
+ "suppress_tokens": [
171
+ 1,
172
+ 2,
173
+ 7,
174
+ 8,
175
+ 9,
176
+ 10,
177
+ 14,
178
+ 25,
179
+ 26,
180
+ 27,
181
+ 28,
182
+ 29,
183
+ 31,
184
+ 58,
185
+ 59,
186
+ 60,
187
+ 61,
188
+ 62,
189
+ 63,
190
+ 90,
191
+ 91,
192
+ 92,
193
+ 93,
194
+ 359,
195
+ 503,
196
+ 522,
197
+ 542,
198
+ 873,
199
+ 893,
200
+ 902,
201
+ 918,
202
+ 922,
203
+ 931,
204
+ 1350,
205
+ 1853,
206
+ 1982,
207
+ 2460,
208
+ 2627,
209
+ 3246,
210
+ 3253,
211
+ 3268,
212
+ 3536,
213
+ 3846,
214
+ 3961,
215
+ 4183,
216
+ 4667,
217
+ 6585,
218
+ 6647,
219
+ 7273,
220
+ 9061,
221
+ 9383,
222
+ 10428,
223
+ 10929,
224
+ 11938,
225
+ 12033,
226
+ 12331,
227
+ 12562,
228
+ 13793,
229
+ 14157,
230
+ 14635,
231
+ 15265,
232
+ 15618,
233
+ 16553,
234
+ 16604,
235
+ 18362,
236
+ 18956,
237
+ 20075,
238
+ 21675,
239
+ 22520,
240
+ 26130,
241
+ 26161,
242
+ 26435,
243
+ 28279,
244
+ 29464,
245
+ 31650,
246
+ 32302,
247
+ 32470,
248
+ 36865,
249
+ 42863,
250
+ 47425,
251
+ 49870,
252
+ 50254,
253
+ 50258,
254
+ 50359,
255
+ 50360,
256
+ 50361,
257
+ 50362,
258
+ 50363
259
+ ],
260
+ "task_to_id": {
261
+ "transcribe": 50360,
262
+ "translate": 50359
263
+ },
264
+ "transformers_version": "4.46.0.dev0"
265
+ }
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3df395a63ce8603b7cf792ab314a2f482a6c90cc42e7802fabed7dd8cb1b078d
3
  size 4993448880
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08e0005225b3dbaf55dd13ac62926cc7e02c1025d66fa375e6fb305ff79cd4f9
3
  size 4993448880
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:457499abac91f8ed2c91ba131d18af738ae39226827d0396a01f4b1f86a8db58
3
  size 1180663192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:630ca774672856d2e0e39a702e590f635a1cfc5726a64b6578ab46dd367369a9
3
  size 1180663192
run.sh CHANGED
@@ -31,7 +31,6 @@ WANDB_PROJECT=whisper-medium-eu \
31
  --gradient_checkpointing \
32
  --fp16 \
33
  --overwrite_output_dir \
34
- --do_train \
35
  --do_eval \
36
  --predict_with_generate \
37
  --do_normalize_eval \
 
31
  --gradient_checkpointing \
32
  --fp16 \
33
  --overwrite_output_dir \
 
34
  --do_eval \
35
  --predict_with_generate \
36
  --do_normalize_eval \
runs/Oct07_10-30-48_tknika/events.out.tfevents.1728297734.tknika.20799.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f38f4db3f506f80567533906f5fc02740168f2b1e4dd86fd95027d67e4023c3c
3
+ size 360
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:757451636d3aa41f66b5568ae6294cf0ed27d536ee8afce9c0186f687073cc5a
3
  size 5368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b703135451bdb3fdf1b0263595ef460845b9b248a97e458a32874babc3e4138
3
  size 5368
wandb/debug-internal.log CHANGED
@@ -1,10 +1,10 @@
1
- {"time":"2024-10-05T14:14:14.99736495Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
- {"time":"2024-10-05T14:14:14.99738358Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241005_141414-821qpm7o/logs/debug-core.log"}
3
- {"time":"2024-10-05T14:14:14.999080266Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
- {"time":"2024-10-05T14:14:15.006876033Z","level":"INFO","msg":"created new stream","id":"821qpm7o"}
5
- {"time":"2024-10-05T14:14:15.006930263Z","level":"INFO","msg":"stream: started","id":"821qpm7o"}
6
- {"time":"2024-10-05T14:14:15.006981772Z","level":"INFO","msg":"sender: started","stream_id":{"value":"821qpm7o"}}
7
- {"time":"2024-10-05T14:14:15.006988882Z","level":"INFO","msg":"handler: started","stream_id":{"value":"821qpm7o"}}
8
- {"time":"2024-10-05T14:14:15.006956622Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"821qpm7o"}}
9
- {"time":"2024-10-05T14:14:15.412186114Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
- {"time":"2024-10-05T14:14:15.414550494Z","level":"INFO","msg":"Starting system monitor"}
 
1
+ {"time":"2024-10-07T12:56:15.257353437Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T12:56:15.257380326Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log"}
3
+ {"time":"2024-10-07T12:56:15.259721418Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T12:56:15.26442537Z","level":"INFO","msg":"created new stream","id":"a3z1jk8c"}
5
+ {"time":"2024-10-07T12:56:15.264442509Z","level":"INFO","msg":"stream: started","id":"a3z1jk8c"}
6
+ {"time":"2024-10-07T12:56:15.264458959Z","level":"INFO","msg":"handler: started","stream_id":{"value":"a3z1jk8c"}}
7
+ {"time":"2024-10-07T12:56:15.264475109Z","level":"INFO","msg":"sender: started","stream_id":{"value":"a3z1jk8c"}}
8
+ {"time":"2024-10-07T12:56:15.264497739Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"a3z1jk8c"}}
9
+ {"time":"2024-10-07T12:56:15.681557119Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T12:56:15.68260129Z","level":"INFO","msg":"Starting system monitor"}
wandb/debug.log CHANGED
@@ -1,28 +1,28 @@
1
- 2024-10-05 14:14:14,992 INFO MainThread:13682 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Configure stats pid to 13682
3
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_setup.py:_flush():79] Applying login settings: {}
9
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241005_141414-821qpm7o/logs/debug.log
10
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241005_141414-821qpm7o/logs/debug-internal.log
11
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:init():617] calling init triggers
12
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
  config: {}
14
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:init():667] starting backend
15
- 2024-10-05 14:14:14,993 INFO MainThread:13682 [wandb_init.py:init():671] sending inform_init request
16
- 2024-10-05 14:14:14,995 INFO MainThread:13682 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
- 2024-10-05 14:14:14,995 INFO MainThread:13682 [wandb_init.py:init():684] backend started and connected
18
- 2024-10-05 14:14:14,999 INFO MainThread:13682 [wandb_init.py:init():779] updated telemetry
19
- 2024-10-05 14:14:15,005 INFO MainThread:13682 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
- 2024-10-05 14:14:15,407 INFO MainThread:13682 [wandb_init.py:init():863] starting run threads in backend
21
- 2024-10-05 14:14:15,503 INFO MainThread:13682 [wandb_run.py:_console_start():2465] atexit reg
22
- 2024-10-05 14:14:15,503 INFO MainThread:13682 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
- 2024-10-05 14:14:15,503 INFO MainThread:13682 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
- 2024-10-05 14:14:15,503 INFO MainThread:13682 [wandb_run.py:_redirect():2403] Redirects installed.
25
- 2024-10-05 14:14:15,504 INFO MainThread:13682 [wandb_init.py:init():907] run started, returning control to user process
26
- 2024-10-05 14:14:15,506 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct05_14-14-00_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
- 2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x747f37484590>>
28
- 2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
 
1
+ 2024-10-07 12:56:15,251 INFO MainThread:20958 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Configure stats pid to 20958
3
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug.log
10
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log
11
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
  config: {}
14
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 12:56:15,254 INFO MainThread:20958 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 12:56:15,254 INFO MainThread:20958 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 12:56:15,258 INFO MainThread:20958 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 12:56:15,265 INFO MainThread:20958 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 12:56:15,676 INFO MainThread:20958 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 12:56:15,775 INFO MainThread:20958 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 12:56:15,777 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_11-46-39_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x748ced2ceae0>>
28
+ 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
wandb/run-20241005_141414-821qpm7o/files/config.yaml ADDED
@@ -0,0 +1,563 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: openai/whisper-large-v3
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.18.3
6
+ m:
7
+ - "1": train/grad_norm
8
+ "5": 2
9
+ "6":
10
+ - 1
11
+ - 3
12
+ "7": []
13
+ - "1": train/global_step
14
+ "6":
15
+ - 3
16
+ "7": []
17
+ - "1": eval/loss
18
+ "5": 2
19
+ "6":
20
+ - 1
21
+ - 3
22
+ "7": []
23
+ - "1": eval/samples_per_second
24
+ "5": 2
25
+ "6":
26
+ - 1
27
+ - 3
28
+ "7": []
29
+ - "1": train/epoch
30
+ "5": 2
31
+ "6":
32
+ - 1
33
+ - 3
34
+ "7": []
35
+ - "1": train/loss
36
+ "5": 2
37
+ "6":
38
+ - 1
39
+ - 3
40
+ "7": []
41
+ - "1": eval/runtime
42
+ "5": 2
43
+ "6":
44
+ - 1
45
+ - 3
46
+ "7": []
47
+ - "1": eval/steps_per_second
48
+ "5": 2
49
+ "6":
50
+ - 1
51
+ - 3
52
+ "7": []
53
+ - "1": eval/wer
54
+ "5": 2
55
+ "6":
56
+ - 1
57
+ - 3
58
+ "7": []
59
+ - "1": train/learning_rate
60
+ "5": 2
61
+ "6":
62
+ - 1
63
+ - 3
64
+ "7": []
65
+ python_version: 3.12.3
66
+ t:
67
+ "1":
68
+ - 1
69
+ - 5
70
+ - 11
71
+ - 49
72
+ - 51
73
+ - 53
74
+ - 55
75
+ - 71
76
+ - 100
77
+ "2":
78
+ - 1
79
+ - 5
80
+ - 11
81
+ - 49
82
+ - 51
83
+ - 53
84
+ - 55
85
+ - 71
86
+ - 100
87
+ "3":
88
+ - 7
89
+ - 13
90
+ - 19
91
+ - 23
92
+ - 55
93
+ - 62
94
+ - 66
95
+ "4": 3.12.3
96
+ "5": 0.18.3
97
+ "6": 4.46.0.dev0
98
+ "8":
99
+ - 5
100
+ "9":
101
+ "1": transformers_trainer
102
+ "12": 0.18.3
103
+ "13": linux-x86_64
104
+ accelerator_config:
105
+ value:
106
+ dispatch_batches: null
107
+ even_batches: true
108
+ gradient_accumulation_kwargs: null
109
+ non_blocking: false
110
+ split_batches: false
111
+ use_seedable_sampler: true
112
+ activation_dropout:
113
+ value: 0
114
+ activation_function:
115
+ value: gelu
116
+ adafactor:
117
+ value: false
118
+ adam_beta1:
119
+ value: 0.9
120
+ adam_beta2:
121
+ value: 0.999
122
+ adam_epsilon:
123
+ value: 1e-08
124
+ add_cross_attention:
125
+ value: false
126
+ apply_spec_augment:
127
+ value: false
128
+ architectures:
129
+ value:
130
+ - WhisperForConditionalGeneration
131
+ attention_dropout:
132
+ value: 0
133
+ auto_find_batch_size:
134
+ value: false
135
+ bad_words_ids:
136
+ value: null
137
+ batch_eval_metrics:
138
+ value: false
139
+ begin_suppress_tokens:
140
+ value:
141
+ - 220
142
+ - 50257
143
+ bf16:
144
+ value: false
145
+ bf16_full_eval:
146
+ value: false
147
+ bos_token_id:
148
+ value: 50257
149
+ chunk_size_feed_forward:
150
+ value: 0
151
+ classifier_proj_size:
152
+ value: 256
153
+ cross_attention_hidden_size:
154
+ value: null
155
+ d_model:
156
+ value: 1280
157
+ data_seed:
158
+ value: null
159
+ dataloader_drop_last:
160
+ value: false
161
+ dataloader_num_workers:
162
+ value: 0
163
+ dataloader_persistent_workers:
164
+ value: false
165
+ dataloader_pin_memory:
166
+ value: true
167
+ dataloader_prefetch_factor:
168
+ value: null
169
+ ddp_backend:
170
+ value: null
171
+ ddp_broadcast_buffers:
172
+ value: null
173
+ ddp_bucket_cap_mb:
174
+ value: null
175
+ ddp_find_unused_parameters:
176
+ value: null
177
+ ddp_timeout:
178
+ value: 1800
179
+ debug:
180
+ value: []
181
+ decoder_attention_heads:
182
+ value: 20
183
+ decoder_ffn_dim:
184
+ value: 5120
185
+ decoder_layerdrop:
186
+ value: 0
187
+ decoder_layers:
188
+ value: 32
189
+ decoder_start_token_id:
190
+ value: 50258
191
+ deepspeed:
192
+ value: null
193
+ disable_tqdm:
194
+ value: false
195
+ dispatch_batches:
196
+ value: null
197
+ diversity_penalty:
198
+ value: 0
199
+ do_eval:
200
+ value: true
201
+ do_predict:
202
+ value: false
203
+ do_sample:
204
+ value: false
205
+ do_train:
206
+ value: true
207
+ dropout:
208
+ value: 0
209
+ early_stopping:
210
+ value: false
211
+ encoder_attention_heads:
212
+ value: 20
213
+ encoder_ffn_dim:
214
+ value: 5120
215
+ encoder_layerdrop:
216
+ value: 0
217
+ encoder_layers:
218
+ value: 32
219
+ encoder_no_repeat_ngram_size:
220
+ value: 0
221
+ eos_token_id:
222
+ value: 50257
223
+ eval_accumulation_steps:
224
+ value: null
225
+ eval_delay:
226
+ value: 0
227
+ eval_do_concat_batches:
228
+ value: true
229
+ eval_on_start:
230
+ value: false
231
+ eval_steps:
232
+ value: 500
233
+ eval_strategy:
234
+ value: steps
235
+ eval_use_gather_object:
236
+ value: false
237
+ evaluation_strategy:
238
+ value: steps
239
+ exponential_decay_length_penalty:
240
+ value: null
241
+ finetuning_task:
242
+ value: null
243
+ forced_bos_token_id:
244
+ value: null
245
+ forced_decoder_ids:
246
+ value: null
247
+ forced_eos_token_id:
248
+ value: null
249
+ fp16:
250
+ value: true
251
+ fp16_backend:
252
+ value: auto
253
+ fp16_full_eval:
254
+ value: false
255
+ fp16_opt_level:
256
+ value: O1
257
+ fsdp:
258
+ value: []
259
+ fsdp_config:
260
+ value:
261
+ min_num_params: 0
262
+ xla: false
263
+ xla_fsdp_grad_ckpt: false
264
+ xla_fsdp_v2: false
265
+ fsdp_min_num_params:
266
+ value: 0
267
+ fsdp_transformer_layer_cls_to_wrap:
268
+ value: null
269
+ full_determinism:
270
+ value: false
271
+ generation_config:
272
+ value: null
273
+ generation_max_length:
274
+ value: 228
275
+ generation_num_beams:
276
+ value: null
277
+ gradient_accumulation_steps:
278
+ value: 1
279
+ gradient_checkpointing:
280
+ value: true
281
+ gradient_checkpointing_kwargs:
282
+ value: null
283
+ greater_is_better:
284
+ value: false
285
+ group_by_length:
286
+ value: false
287
+ half_precision_backend:
288
+ value: auto
289
+ hub_always_push:
290
+ value: false
291
+ hub_model_id:
292
+ value: null
293
+ hub_private_repo:
294
+ value: false
295
+ hub_strategy:
296
+ value: every_save
297
+ hub_token:
298
+ value: <HUB_TOKEN>
299
+ id2label:
300
+ value:
301
+ "0": LABEL_0
302
+ "1": LABEL_1
303
+ ignore_data_skip:
304
+ value: false
305
+ include_for_metrics:
306
+ value: []
307
+ include_inputs_for_metrics:
308
+ value: false
309
+ include_num_input_tokens_seen:
310
+ value: false
311
+ include_tokens_per_second:
312
+ value: false
313
+ init_std:
314
+ value: 0.02
315
+ is_decoder:
316
+ value: false
317
+ is_encoder_decoder:
318
+ value: true
319
+ jit_mode_eval:
320
+ value: false
321
+ label_names:
322
+ value: null
323
+ label_smoothing_factor:
324
+ value: 0
325
+ label2id:
326
+ value:
327
+ LABEL_0: 0
328
+ LABEL_1: 1
329
+ learning_rate:
330
+ value: 4.375e-06
331
+ length_column_name:
332
+ value: input_length
333
+ length_penalty:
334
+ value: 1
335
+ load_best_model_at_end:
336
+ value: true
337
+ local_rank:
338
+ value: 0
339
+ log_level:
340
+ value: passive
341
+ log_level_replica:
342
+ value: warning
343
+ log_on_each_node:
344
+ value: true
345
+ logging_dir:
346
+ value: ./runs/Oct05_14-14-00_tknika
347
+ logging_first_step:
348
+ value: false
349
+ logging_nan_inf_filter:
350
+ value: true
351
+ logging_steps:
352
+ value: 25
353
+ logging_strategy:
354
+ value: steps
355
+ lr_scheduler_type:
356
+ value: linear
357
+ mask_feature_length:
358
+ value: 10
359
+ mask_feature_min_masks:
360
+ value: 0
361
+ mask_feature_prob:
362
+ value: 0
363
+ mask_time_length:
364
+ value: 10
365
+ mask_time_min_masks:
366
+ value: 2
367
+ mask_time_prob:
368
+ value: 0.05
369
+ max_grad_norm:
370
+ value: 1
371
+ max_length:
372
+ value: 448
373
+ max_source_positions:
374
+ value: 1500
375
+ max_steps:
376
+ value: 10000
377
+ max_target_positions:
378
+ value: 448
379
+ median_filter_width:
380
+ value: 7
381
+ metric_for_best_model:
382
+ value: wer
383
+ min_length:
384
+ value: 0
385
+ model/num_parameters:
386
+ value: 1543490560
387
+ model_type:
388
+ value: whisper
389
+ mp_parameters:
390
+ value: ""
391
+ neftune_noise_alpha:
392
+ value: null
393
+ no_cuda:
394
+ value: false
395
+ no_repeat_ngram_size:
396
+ value: 0
397
+ num_beam_groups:
398
+ value: 1
399
+ num_beams:
400
+ value: 1
401
+ num_hidden_layers:
402
+ value: 32
403
+ num_mel_bins:
404
+ value: 128
405
+ num_return_sequences:
406
+ value: 1
407
+ num_train_epochs:
408
+ value: 3
409
+ optim:
410
+ value: adamw_torch
411
+ optim_args:
412
+ value: null
413
+ optim_target_modules:
414
+ value: null
415
+ output_attentions:
416
+ value: false
417
+ output_dir:
418
+ value: ./
419
+ output_hidden_states:
420
+ value: false
421
+ output_scores:
422
+ value: false
423
+ overwrite_output_dir:
424
+ value: true
425
+ pad_token_id:
426
+ value: 50256
427
+ past_index:
428
+ value: -1
429
+ per_device_eval_batch_size:
430
+ value: 8
431
+ per_device_train_batch_size:
432
+ value: 16
433
+ per_gpu_eval_batch_size:
434
+ value: null
435
+ per_gpu_train_batch_size:
436
+ value: null
437
+ predict_with_generate:
438
+ value: true
439
+ prediction_loss_only:
440
+ value: false
441
+ prefix:
442
+ value: null
443
+ problem_type:
444
+ value: null
445
+ push_to_hub:
446
+ value: true
447
+ push_to_hub_model_id:
448
+ value: null
449
+ push_to_hub_organization:
450
+ value: null
451
+ push_to_hub_token:
452
+ value: <PUSH_TO_HUB_TOKEN>
453
+ ray_scope:
454
+ value: last
455
+ remove_invalid_values:
456
+ value: false
457
+ remove_unused_columns:
458
+ value: true
459
+ repetition_penalty:
460
+ value: 1
461
+ report_to:
462
+ value:
463
+ - wandb
464
+ restore_callback_states_from_checkpoint:
465
+ value: false
466
+ resume_from_checkpoint:
467
+ value: null
468
+ return_dict:
469
+ value: true
470
+ return_dict_in_generate:
471
+ value: false
472
+ run_name:
473
+ value: whisper-large-eu
474
+ save_on_each_node:
475
+ value: false
476
+ save_only_model:
477
+ value: false
478
+ save_safetensors:
479
+ value: true
480
+ save_steps:
481
+ value: 1000
482
+ save_strategy:
483
+ value: steps
484
+ save_total_limit:
485
+ value: null
486
+ scale_embedding:
487
+ value: false
488
+ seed:
489
+ value: 42
490
+ sep_token_id:
491
+ value: null
492
+ skip_memory_metrics:
493
+ value: true
494
+ sortish_sampler:
495
+ value: false
496
+ split_batches:
497
+ value: null
498
+ suppress_tokens:
499
+ value: null
500
+ task_specific_params:
501
+ value: null
502
+ temperature:
503
+ value: 1
504
+ tf_legacy_loss:
505
+ value: false
506
+ tf32:
507
+ value: null
508
+ tie_encoder_decoder:
509
+ value: false
510
+ tie_word_embeddings:
511
+ value: true
512
+ tokenizer_class:
513
+ value: null
514
+ top_k:
515
+ value: 50
516
+ top_p:
517
+ value: 1
518
+ torch_compile:
519
+ value: false
520
+ torch_compile_backend:
521
+ value: null
522
+ torch_compile_mode:
523
+ value: null
524
+ torch_dtype:
525
+ value: float16
526
+ torch_empty_cache_steps:
527
+ value: null
528
+ torchdynamo:
529
+ value: null
530
+ torchscript:
531
+ value: false
532
+ tpu_metrics_debug:
533
+ value: false
534
+ tpu_num_cores:
535
+ value: null
536
+ transformers_version:
537
+ value: 4.46.0.dev0
538
+ typical_p:
539
+ value: 1
540
+ use_bfloat16:
541
+ value: false
542
+ use_cache:
543
+ value: false
544
+ use_cpu:
545
+ value: false
546
+ use_ipex:
547
+ value: false
548
+ use_legacy_prediction_loop:
549
+ value: false
550
+ use_liger_kernel:
551
+ value: false
552
+ use_mps_device:
553
+ value: false
554
+ use_weighted_layer_sum:
555
+ value: false
556
+ vocab_size:
557
+ value: 51866
558
+ warmup_ratio:
559
+ value: 0
560
+ warmup_steps:
561
+ value: 500
562
+ weight_decay:
563
+ value: 0
wandb/run-20241005_141414-821qpm7o/files/output.log CHANGED
The diff for this file is too large to render. See raw diff
 
wandb/run-20241005_141414-821qpm7o/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eval/runtime":4130.3962,"train_samples_per_second":1.426,"total_flos":5.435317790834688e+20,"_step":420,"eval/steps_per_second":0.413,"train/epoch":5.148,"train_runtime":112211.4993,"train_loss":0.08380846776664257,"eval/wer":7.07244677342519,"_wandb":{"runtime":112523},"_runtime":112210.651756658,"train/grad_norm":1.3872036933898926,"eval/samples_per_second":3.3,"eval/loss":0.12359699606895447,"train/learning_rate":1.381578947368421e-09,"_timestamp":1.728249865647773e+09,"train_steps_per_second":0.089,"train/loss":0.0504,"train/global_step":10000}
wandb/run-20241005_141414-821qpm7o/logs/debug-core.log CHANGED
@@ -5,3 +5,10 @@
5
  {"time":"2024-10-05T14:14:14.397628756Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:50392"}
6
  {"time":"2024-10-05T14:14:14.997161032Z","level":"INFO","msg":"handleInformInit: received","streamId":"821qpm7o","id":"127.0.0.1:50392"}
7
  {"time":"2024-10-05T14:14:15.006939443Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"821qpm7o","id":"127.0.0.1:50392"}
 
 
 
 
 
 
 
 
5
  {"time":"2024-10-05T14:14:14.397628756Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:50392"}
6
  {"time":"2024-10-05T14:14:14.997161032Z","level":"INFO","msg":"handleInformInit: received","streamId":"821qpm7o","id":"127.0.0.1:50392"}
7
  {"time":"2024-10-05T14:14:15.006939443Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"821qpm7o","id":"127.0.0.1:50392"}
8
+ {"time":"2024-10-06T21:29:38.663713629Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"127.0.0.1:50392"}
9
+ {"time":"2024-10-06T21:29:38.663840158Z","level":"INFO","msg":"connection: Close: initiating connection closure","id":"127.0.0.1:50392"}
10
+ {"time":"2024-10-06T21:29:38.663872928Z","level":"INFO","msg":"server is shutting down"}
11
+ {"time":"2024-10-06T21:29:38.663969078Z","level":"INFO","msg":"connection: Close: connection successfully closed","id":"127.0.0.1:50392"}
12
+ {"time":"2024-10-06T21:29:42.340150906Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"127.0.0.1:50392"}
13
+ {"time":"2024-10-06T21:29:42.340203005Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"127.0.0.1:50392"}
14
+ {"time":"2024-10-06T21:29:42.340238265Z","level":"INFO","msg":"server is closed"}
wandb/run-20241005_141414-821qpm7o/logs/debug-internal.log CHANGED
@@ -8,3 +8,12 @@
8
  {"time":"2024-10-05T14:14:15.006956622Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"821qpm7o"}}
9
  {"time":"2024-10-05T14:14:15.412186114Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
  {"time":"2024-10-05T14:14:15.414550494Z","level":"INFO","msg":"Starting system monitor"}
 
 
 
 
 
 
 
 
 
 
8
  {"time":"2024-10-05T14:14:15.006956622Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"821qpm7o"}}
9
  {"time":"2024-10-05T14:14:15.412186114Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
  {"time":"2024-10-05T14:14:15.414550494Z","level":"INFO","msg":"Starting system monitor"}
11
+ {"time":"2024-10-06T08:59:01.966379884Z","level":"INFO","msg":"api: retrying HTTP error","status":502,"url":"https://api.wandb.ai/files/itzune/whisper-medium-eu/821qpm7o/file_stream"}
12
+ {"time":"2024-10-06T21:29:38.663823648Z","level":"INFO","msg":"stream: closing","id":"821qpm7o"}
13
+ {"time":"2024-10-06T21:29:38.663868898Z","level":"INFO","msg":"Stopping system monitor"}
14
+ {"time":"2024-10-06T21:29:38.671834293Z","level":"INFO","msg":"Stopped system monitor"}
15
+ {"time":"2024-10-06T21:29:41.998044798Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
16
+ {"time":"2024-10-06T21:29:42.339781659Z","level":"INFO","msg":"handler: closed","stream_id":{"value":"821qpm7o"}}
17
+ {"time":"2024-10-06T21:29:42.339864838Z","level":"INFO","msg":"writer: Close: closed","stream_id":{"value":"821qpm7o"}}
18
+ {"time":"2024-10-06T21:29:42.339872028Z","level":"INFO","msg":"sender: closed","stream_id":{"value":"821qpm7o"}}
19
+ {"time":"2024-10-06T21:29:42.339997497Z","level":"INFO","msg":"stream: closed","id":"821qpm7o"}
wandb/run-20241005_141414-821qpm7o/logs/debug.log CHANGED
@@ -26,3 +26,4 @@ config: {}
26
  2024-10-05 14:14:15,506 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct05_14-14-00_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
  2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x747f37484590>>
28
  2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
 
 
26
  2024-10-05 14:14:15,506 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct05_14-14-00_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
  2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x747f37484590>>
28
  2024-10-05 14:14:15,510 INFO MainThread:13682 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
29
+ 2024-10-06 21:29:38,664 WARNING MsgRouterThr:13682 [router.py:message_loop():77] message_loop has been closed
wandb/run-20241005_141414-821qpm7o/run-821qpm7o.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57538d6fe4cf3de3c7e22f43b680f5f476627d92828d497ca6d2355ed94147ad
3
- size 20316160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd99a95650ad5aa391338434550f1b7dad9b0cd6e88bc592ac045c56f68b301b
3
+ size 51687272
wandb/run-20241007_102112-r5qja96d/files/config.yaml ADDED
@@ -0,0 +1,508 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: openai/whisper-large-v3
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.18.3
6
+ m:
7
+ - "1": train/global_step
8
+ "6":
9
+ - 3
10
+ "7": []
11
+ python_version: 3.12.3
12
+ t:
13
+ "1":
14
+ - 1
15
+ - 5
16
+ - 11
17
+ - 49
18
+ - 51
19
+ - 53
20
+ - 55
21
+ - 71
22
+ - 100
23
+ "2":
24
+ - 1
25
+ - 5
26
+ - 11
27
+ - 49
28
+ - 51
29
+ - 53
30
+ - 55
31
+ - 71
32
+ - 100
33
+ "3":
34
+ - 7
35
+ - 13
36
+ - 19
37
+ - 23
38
+ - 55
39
+ - 66
40
+ "4": 3.12.3
41
+ "5": 0.18.3
42
+ "6": 4.46.0.dev0
43
+ "8":
44
+ - 5
45
+ "9":
46
+ "1": transformers_trainer
47
+ "12": 0.18.3
48
+ "13": linux-x86_64
49
+ accelerator_config:
50
+ value:
51
+ dispatch_batches: null
52
+ even_batches: true
53
+ gradient_accumulation_kwargs: null
54
+ non_blocking: false
55
+ split_batches: false
56
+ use_seedable_sampler: true
57
+ activation_dropout:
58
+ value: 0
59
+ activation_function:
60
+ value: gelu
61
+ adafactor:
62
+ value: false
63
+ adam_beta1:
64
+ value: 0.9
65
+ adam_beta2:
66
+ value: 0.999
67
+ adam_epsilon:
68
+ value: 1e-08
69
+ add_cross_attention:
70
+ value: false
71
+ apply_spec_augment:
72
+ value: false
73
+ architectures:
74
+ value:
75
+ - WhisperForConditionalGeneration
76
+ attention_dropout:
77
+ value: 0
78
+ auto_find_batch_size:
79
+ value: false
80
+ bad_words_ids:
81
+ value: null
82
+ batch_eval_metrics:
83
+ value: false
84
+ begin_suppress_tokens:
85
+ value:
86
+ - 220
87
+ - 50257
88
+ bf16:
89
+ value: false
90
+ bf16_full_eval:
91
+ value: false
92
+ bos_token_id:
93
+ value: 50257
94
+ chunk_size_feed_forward:
95
+ value: 0
96
+ classifier_proj_size:
97
+ value: 256
98
+ cross_attention_hidden_size:
99
+ value: null
100
+ d_model:
101
+ value: 1280
102
+ data_seed:
103
+ value: null
104
+ dataloader_drop_last:
105
+ value: false
106
+ dataloader_num_workers:
107
+ value: 0
108
+ dataloader_persistent_workers:
109
+ value: false
110
+ dataloader_pin_memory:
111
+ value: true
112
+ dataloader_prefetch_factor:
113
+ value: null
114
+ ddp_backend:
115
+ value: null
116
+ ddp_broadcast_buffers:
117
+ value: null
118
+ ddp_bucket_cap_mb:
119
+ value: null
120
+ ddp_find_unused_parameters:
121
+ value: null
122
+ ddp_timeout:
123
+ value: 1800
124
+ debug:
125
+ value: []
126
+ decoder_attention_heads:
127
+ value: 20
128
+ decoder_ffn_dim:
129
+ value: 5120
130
+ decoder_layerdrop:
131
+ value: 0
132
+ decoder_layers:
133
+ value: 32
134
+ decoder_start_token_id:
135
+ value: 50258
136
+ deepspeed:
137
+ value: null
138
+ disable_tqdm:
139
+ value: false
140
+ dispatch_batches:
141
+ value: null
142
+ diversity_penalty:
143
+ value: 0
144
+ do_eval:
145
+ value: true
146
+ do_predict:
147
+ value: false
148
+ do_sample:
149
+ value: false
150
+ do_train:
151
+ value: true
152
+ dropout:
153
+ value: 0
154
+ early_stopping:
155
+ value: false
156
+ encoder_attention_heads:
157
+ value: 20
158
+ encoder_ffn_dim:
159
+ value: 5120
160
+ encoder_layerdrop:
161
+ value: 0
162
+ encoder_layers:
163
+ value: 32
164
+ encoder_no_repeat_ngram_size:
165
+ value: 0
166
+ eos_token_id:
167
+ value: 50257
168
+ eval_accumulation_steps:
169
+ value: null
170
+ eval_delay:
171
+ value: 0
172
+ eval_do_concat_batches:
173
+ value: true
174
+ eval_on_start:
175
+ value: false
176
+ eval_steps:
177
+ value: 500
178
+ eval_strategy:
179
+ value: steps
180
+ eval_use_gather_object:
181
+ value: false
182
+ evaluation_strategy:
183
+ value: steps
184
+ exponential_decay_length_penalty:
185
+ value: null
186
+ finetuning_task:
187
+ value: null
188
+ forced_bos_token_id:
189
+ value: null
190
+ forced_decoder_ids:
191
+ value: null
192
+ forced_eos_token_id:
193
+ value: null
194
+ fp16:
195
+ value: true
196
+ fp16_backend:
197
+ value: auto
198
+ fp16_full_eval:
199
+ value: false
200
+ fp16_opt_level:
201
+ value: O1
202
+ fsdp:
203
+ value: []
204
+ fsdp_config:
205
+ value:
206
+ min_num_params: 0
207
+ xla: false
208
+ xla_fsdp_grad_ckpt: false
209
+ xla_fsdp_v2: false
210
+ fsdp_min_num_params:
211
+ value: 0
212
+ fsdp_transformer_layer_cls_to_wrap:
213
+ value: null
214
+ full_determinism:
215
+ value: false
216
+ generation_config:
217
+ value: null
218
+ generation_max_length:
219
+ value: 228
220
+ generation_num_beams:
221
+ value: null
222
+ gradient_accumulation_steps:
223
+ value: 1
224
+ gradient_checkpointing:
225
+ value: true
226
+ gradient_checkpointing_kwargs:
227
+ value: null
228
+ greater_is_better:
229
+ value: false
230
+ group_by_length:
231
+ value: false
232
+ half_precision_backend:
233
+ value: auto
234
+ hub_always_push:
235
+ value: false
236
+ hub_model_id:
237
+ value: null
238
+ hub_private_repo:
239
+ value: false
240
+ hub_strategy:
241
+ value: every_save
242
+ hub_token:
243
+ value: <HUB_TOKEN>
244
+ id2label:
245
+ value:
246
+ "0": LABEL_0
247
+ "1": LABEL_1
248
+ ignore_data_skip:
249
+ value: false
250
+ include_for_metrics:
251
+ value: []
252
+ include_inputs_for_metrics:
253
+ value: false
254
+ include_num_input_tokens_seen:
255
+ value: false
256
+ include_tokens_per_second:
257
+ value: false
258
+ init_std:
259
+ value: 0.02
260
+ is_decoder:
261
+ value: false
262
+ is_encoder_decoder:
263
+ value: true
264
+ jit_mode_eval:
265
+ value: false
266
+ label_names:
267
+ value: null
268
+ label_smoothing_factor:
269
+ value: 0
270
+ label2id:
271
+ value:
272
+ LABEL_0: 0
273
+ LABEL_1: 1
274
+ learning_rate:
275
+ value: 4.375e-06
276
+ length_column_name:
277
+ value: input_length
278
+ length_penalty:
279
+ value: 1
280
+ load_best_model_at_end:
281
+ value: true
282
+ local_rank:
283
+ value: 0
284
+ log_level:
285
+ value: passive
286
+ log_level_replica:
287
+ value: warning
288
+ log_on_each_node:
289
+ value: true
290
+ logging_dir:
291
+ value: ./runs/Oct07_10-20-37_tknika
292
+ logging_first_step:
293
+ value: false
294
+ logging_nan_inf_filter:
295
+ value: true
296
+ logging_steps:
297
+ value: 25
298
+ logging_strategy:
299
+ value: steps
300
+ lr_scheduler_type:
301
+ value: linear
302
+ mask_feature_length:
303
+ value: 10
304
+ mask_feature_min_masks:
305
+ value: 0
306
+ mask_feature_prob:
307
+ value: 0
308
+ mask_time_length:
309
+ value: 10
310
+ mask_time_min_masks:
311
+ value: 2
312
+ mask_time_prob:
313
+ value: 0.05
314
+ max_grad_norm:
315
+ value: 1
316
+ max_length:
317
+ value: 448
318
+ max_source_positions:
319
+ value: 1500
320
+ max_steps:
321
+ value: 10000
322
+ max_target_positions:
323
+ value: 448
324
+ median_filter_width:
325
+ value: 7
326
+ metric_for_best_model:
327
+ value: wer
328
+ min_length:
329
+ value: 0
330
+ model/num_parameters:
331
+ value: 1543490560
332
+ model_type:
333
+ value: whisper
334
+ mp_parameters:
335
+ value: ""
336
+ neftune_noise_alpha:
337
+ value: null
338
+ no_cuda:
339
+ value: false
340
+ no_repeat_ngram_size:
341
+ value: 0
342
+ num_beam_groups:
343
+ value: 1
344
+ num_beams:
345
+ value: 1
346
+ num_hidden_layers:
347
+ value: 32
348
+ num_mel_bins:
349
+ value: 128
350
+ num_return_sequences:
351
+ value: 1
352
+ num_train_epochs:
353
+ value: 3
354
+ optim:
355
+ value: adamw_torch
356
+ optim_args:
357
+ value: null
358
+ optim_target_modules:
359
+ value: null
360
+ output_attentions:
361
+ value: false
362
+ output_dir:
363
+ value: ./
364
+ output_hidden_states:
365
+ value: false
366
+ output_scores:
367
+ value: false
368
+ overwrite_output_dir:
369
+ value: true
370
+ pad_token_id:
371
+ value: 50256
372
+ past_index:
373
+ value: -1
374
+ per_device_eval_batch_size:
375
+ value: 8
376
+ per_device_train_batch_size:
377
+ value: 16
378
+ per_gpu_eval_batch_size:
379
+ value: null
380
+ per_gpu_train_batch_size:
381
+ value: null
382
+ predict_with_generate:
383
+ value: true
384
+ prediction_loss_only:
385
+ value: false
386
+ prefix:
387
+ value: null
388
+ problem_type:
389
+ value: null
390
+ push_to_hub:
391
+ value: true
392
+ push_to_hub_model_id:
393
+ value: null
394
+ push_to_hub_organization:
395
+ value: null
396
+ push_to_hub_token:
397
+ value: <PUSH_TO_HUB_TOKEN>
398
+ ray_scope:
399
+ value: last
400
+ remove_invalid_values:
401
+ value: false
402
+ remove_unused_columns:
403
+ value: true
404
+ repetition_penalty:
405
+ value: 1
406
+ report_to:
407
+ value:
408
+ - wandb
409
+ restore_callback_states_from_checkpoint:
410
+ value: false
411
+ resume_from_checkpoint:
412
+ value: ./checkpoint-9000/
413
+ return_dict:
414
+ value: true
415
+ return_dict_in_generate:
416
+ value: false
417
+ run_name:
418
+ value: whisper-large-eu
419
+ save_on_each_node:
420
+ value: false
421
+ save_only_model:
422
+ value: false
423
+ save_safetensors:
424
+ value: true
425
+ save_steps:
426
+ value: 1000
427
+ save_strategy:
428
+ value: steps
429
+ save_total_limit:
430
+ value: null
431
+ scale_embedding:
432
+ value: false
433
+ seed:
434
+ value: 42
435
+ sep_token_id:
436
+ value: null
437
+ skip_memory_metrics:
438
+ value: true
439
+ sortish_sampler:
440
+ value: false
441
+ split_batches:
442
+ value: null
443
+ suppress_tokens:
444
+ value: null
445
+ task_specific_params:
446
+ value: null
447
+ temperature:
448
+ value: 1
449
+ tf_legacy_loss:
450
+ value: false
451
+ tf32:
452
+ value: null
453
+ tie_encoder_decoder:
454
+ value: false
455
+ tie_word_embeddings:
456
+ value: true
457
+ tokenizer_class:
458
+ value: null
459
+ top_k:
460
+ value: 50
461
+ top_p:
462
+ value: 1
463
+ torch_compile:
464
+ value: false
465
+ torch_compile_backend:
466
+ value: null
467
+ torch_compile_mode:
468
+ value: null
469
+ torch_dtype:
470
+ value: float16
471
+ torch_empty_cache_steps:
472
+ value: null
473
+ torchdynamo:
474
+ value: null
475
+ torchscript:
476
+ value: false
477
+ tpu_metrics_debug:
478
+ value: false
479
+ tpu_num_cores:
480
+ value: null
481
+ transformers_version:
482
+ value: 4.46.0.dev0
483
+ typical_p:
484
+ value: 1
485
+ use_bfloat16:
486
+ value: false
487
+ use_cache:
488
+ value: false
489
+ use_cpu:
490
+ value: false
491
+ use_ipex:
492
+ value: false
493
+ use_legacy_prediction_loop:
494
+ value: false
495
+ use_liger_kernel:
496
+ value: false
497
+ use_mps_device:
498
+ value: false
499
+ use_weighted_layer_sum:
500
+ value: false
501
+ vocab_size:
502
+ value: 51866
503
+ warmup_ratio:
504
+ value: 0
505
+ warmup_steps:
506
+ value: 500
507
+ weight_decay:
508
+ value: 0
wandb/run-20241007_102112-r5qja96d/files/output.log ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Reading metadata...: 75336it [00:04, 15522.70it/s] | 0/10000 [00:00<?, ?it/s]
2
+ Reading metadata...: 13630it [00:00, 20518.62it/s]
3
+ [INFO|trainer_utils.py:830] 2024-10-07 10:21:26,106 >> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message.
4
+ Traceback (most recent call last):
5
+ File "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py", line 630, in <module>
6
+ main()
7
+ File "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py", line 579, in main
8
+ train_result = trainer.train(resume_from_checkpoint=checkpoint)
9
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2070, in train
11
+ return inner_training_loop(
12
+ ^^^^^^^^^^^^^^^^^^^^
13
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2372, in _inner_training_loop
14
+ for step, inputs in enumerate(epoch_iterator):
15
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/accelerate/data_loader.py", line 831, in __iter__
16
+ next_batch, next_batch_info = self._fetch_batches(main_iterator)
17
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/accelerate/data_loader.py", line 752, in _fetch_batches
19
+ batches.append(next(iterator))
20
+ ^^^^^^^^^^^^^^
21
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
22
+ data = self._next_data()
23
+ ^^^^^^^^^^^^^^^^^
24
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
25
+ data = self._dataset_fetcher.fetch(index) # may raise StopIteration
26
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
27
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 33, in fetch
28
+ data.append(next(self.dataset_iter))
29
+ ^^^^^^^^^^^^^^^^^^^^^^^
30
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2012, in __iter__
31
+ for key, example in ex_iterable:
32
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1203, in __iter__
33
+ yield from self._iter()
34
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1259, in _iter
35
+ for key, example in iterator:
36
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1393, in __iter__
37
+ for x in self.ex_iterable:
38
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 947, in __iter__
39
+ yield from self._iter()
40
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1027, in _iter
41
+ for key, example in iterator:
42
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1613, in __iter__
43
+ _apply_feature_types_on_example(example, self.features, token_per_repo_id=self.token_per_repo_id),
44
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
45
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 1566, in _apply_feature_types_on_example
46
+ decoded_example = features.decode_example(encoded_example, token_per_repo_id=token_per_repo_id)
47
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
48
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/features/features.py", line 2042, in decode_example
49
+ column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
50
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
51
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/features/features.py", line 1403, in decode_nested_example
52
+ return schema.decode_example(obj, token_per_repo_id=token_per_repo_id)
53
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
54
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/datasets/features/audio.py", line 193, in decode_example
55
+ array = librosa.resample(array, orig_sr=sampling_rate, target_sr=self.sampling_rate)
56
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/librosa/core/audio.py", line 669, in resample
58
+ y_hat = np.apply_along_axis(
59
+ ^^^^^^^^^^^^^^^^^^^^
60
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/numpy/lib/_shape_base_impl.py", line 384, in apply_along_axis
61
+ res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
62
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
63
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/soxr/__init__.py", line 206, in resample
64
+ y = divide_proc(in_rate, out_rate, x[:, np.newaxis], q)
65
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
66
+ KeyboardInterrupt
wandb/run-20241007_102112-r5qja96d/files/wandb-metadata.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-45-generic-x86_64-with-glibc2.39",
3
+ "python": "3.12.3",
4
+ "startedAt": "2024-10-07T10:21:12.262430Z",
5
+ "args": [
6
+ "--model_name_or_path=openai/whisper-large-v3",
7
+ "--dataset_name=mozilla-foundation/common_voice_17_0",
8
+ "--dataset_config_name=eu",
9
+ "--language=basque",
10
+ "--train_split_name=train+validation",
11
+ "--eval_split_name=test",
12
+ "--model_index_name=Whisper Large Basque",
13
+ "--max_steps=10000",
14
+ "--output_dir=./",
15
+ "--per_device_train_batch_size=16",
16
+ "--per_device_eval_batch_size=8",
17
+ "--gradient_accumulation_steps=1",
18
+ "--logging_steps=25",
19
+ "--learning_rate=4.375e-6",
20
+ "--warmup_steps=500",
21
+ "--evaluation_strategy=steps",
22
+ "--eval_steps=500",
23
+ "--save_strategy=steps",
24
+ "--save_steps=1000",
25
+ "--generation_max_length=228",
26
+ "--length_column_name=input_length",
27
+ "--max_duration_in_seconds=30",
28
+ "--text_column_name=sentence",
29
+ "--freeze_feature_encoder=False",
30
+ "--report_to=tensorboard",
31
+ "--metric_for_best_model=wer",
32
+ "--greater_is_better=False",
33
+ "--load_best_model_at_end",
34
+ "--gradient_checkpointing",
35
+ "--fp16",
36
+ "--overwrite_output_dir",
37
+ "--do_train",
38
+ "--do_eval",
39
+ "--predict_with_generate",
40
+ "--do_normalize_eval",
41
+ "--streaming",
42
+ "--push_to_hub",
43
+ "--resume_from_checkpoint=./checkpoint-9000/",
44
+ "--report_to",
45
+ "wandb",
46
+ "--run_name",
47
+ "whisper-large-eu"
48
+ ],
49
+ "program": "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py",
50
+ "codePath": "run_speech_recognition_seq2seq_streaming.py",
51
+ "git": {
52
+ "remote": "https://huggingface.co/xezpeleta/whisper-large-eu",
53
+ "commit": "45227421df6af8836af459c374361e7303a68aea"
54
+ },
55
+ "email": "[email protected]",
56
+ "root": "/home/tknika/whisper-large-eu",
57
+ "host": "tknika",
58
+ "username": "tknika",
59
+ "executable": "/home/tknika/whisper-large-eu/.venv/bin/python",
60
+ "codePathLocal": "run_speech_recognition_seq2seq_streaming.py",
61
+ "cpu_count": 8,
62
+ "cpu_count_logical": 8,
63
+ "gpu": "[NVIDIA L40-48Q]",
64
+ "gpu_count": 1,
65
+ "disk": {
66
+ "/": {
67
+ "total": "314615791616",
68
+ "used": "265683288064"
69
+ }
70
+ },
71
+ "memory": {
72
+ "total": "33654026240"
73
+ },
74
+ "cpu": {
75
+ "count": 8,
76
+ "countLogical": 8
77
+ },
78
+ "gpu_nvidia": [
79
+ {
80
+ "name": "NVIDIA L40-48Q",
81
+ "memoryTotal": "51539607552",
82
+ "cudaCores": 18176,
83
+ "architecture": "Ada"
84
+ }
85
+ ],
86
+ "cudaVersion": "12.4"
87
+ }
wandb/run-20241007_102112-r5qja96d/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_wandb":{"runtime":25}}
wandb/run-20241007_102112-r5qja96d/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T10:21:11.14807853Z","level":"INFO","msg":"started logging, with flags","port-filename":"/tmp/tmp7x1p2y77/port-20491.txt","pid":20491,"debug":false,"disable-analytics":false}
2
+ {"time":"2024-10-07T10:21:11.148124439Z","level":"INFO","msg":"FeatureState","shutdownOnParentExitEnabled":false}
3
+ {"time":"2024-10-07T10:21:11.347070999Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":20491}
4
+ {"time":"2024-10-07T10:21:11.347039129Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":33051,"Zone":""}}
5
+ {"time":"2024-10-07T10:21:11.538033075Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:44772"}
6
+ {"time":"2024-10-07T10:21:12.262950492Z","level":"INFO","msg":"handleInformInit: received","streamId":"r5qja96d","id":"127.0.0.1:44772"}
7
+ {"time":"2024-10-07T10:21:12.310433881Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"r5qja96d","id":"127.0.0.1:44772"}
8
+ {"time":"2024-10-07T10:21:37.757844263Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"127.0.0.1:44772"}
9
+ {"time":"2024-10-07T10:21:37.757906193Z","level":"INFO","msg":"connection: Close: initiating connection closure","id":"127.0.0.1:44772"}
10
+ {"time":"2024-10-07T10:21:37.757940542Z","level":"INFO","msg":"server is shutting down"}
11
+ {"time":"2024-10-07T10:21:37.75825213Z","level":"INFO","msg":"connection: Close: connection successfully closed","id":"127.0.0.1:44772"}
12
+ {"time":"2024-10-07T10:21:41.160473282Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"127.0.0.1:44772"}
13
+ {"time":"2024-10-07T10:21:41.160514042Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"127.0.0.1:44772"}
14
+ {"time":"2024-10-07T10:21:41.160573842Z","level":"INFO","msg":"server is closed"}
wandb/run-20241007_102112-r5qja96d/logs/debug-internal.log ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T10:21:12.26314249Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T10:21:12.26315631Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_102112-r5qja96d/logs/debug-core.log"}
3
+ {"time":"2024-10-07T10:21:12.304369743Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T10:21:12.310379031Z","level":"INFO","msg":"created new stream","id":"r5qja96d"}
5
+ {"time":"2024-10-07T10:21:12.310426121Z","level":"INFO","msg":"stream: started","id":"r5qja96d"}
6
+ {"time":"2024-10-07T10:21:12.31046721Z","level":"INFO","msg":"sender: started","stream_id":{"value":"r5qja96d"}}
7
+ {"time":"2024-10-07T10:21:12.31047811Z","level":"INFO","msg":"handler: started","stream_id":{"value":"r5qja96d"}}
8
+ {"time":"2024-10-07T10:21:12.310456601Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"r5qja96d"}}
9
+ {"time":"2024-10-07T10:21:12.736877759Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T10:21:12.73793955Z","level":"INFO","msg":"Starting system monitor"}
11
+ {"time":"2024-10-07T10:21:37.757898803Z","level":"INFO","msg":"stream: closing","id":"r5qja96d"}
12
+ {"time":"2024-10-07T10:21:37.757922673Z","level":"INFO","msg":"Stopping system monitor"}
13
+ {"time":"2024-10-07T10:21:37.761948169Z","level":"INFO","msg":"Stopped system monitor"}
14
+ {"time":"2024-10-07T10:21:40.805260539Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
15
+ {"time":"2024-10-07T10:21:41.160038266Z","level":"INFO","msg":"handler: closed","stream_id":{"value":"r5qja96d"}}
16
+ {"time":"2024-10-07T10:21:41.160110426Z","level":"INFO","msg":"sender: closed","stream_id":{"value":"r5qja96d"}}
17
+ {"time":"2024-10-07T10:21:41.160107876Z","level":"INFO","msg":"writer: Close: closed","stream_id":{"value":"r5qja96d"}}
18
+ {"time":"2024-10-07T10:21:41.160340304Z","level":"INFO","msg":"stream: closed","id":"r5qja96d"}
wandb/run-20241007_102112-r5qja96d/logs/debug.log ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Configure stats pid to 20491
3
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 10:21:12,259 INFO MainThread:20491 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_102112-r5qja96d/logs/debug.log
10
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_102112-r5qja96d/logs/debug-internal.log
11
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
+ config: {}
14
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 10:21:12,260 INFO MainThread:20491 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 10:21:12,261 INFO MainThread:20491 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 10:21:12,262 INFO MainThread:20491 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 10:21:12,265 INFO MainThread:20491 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 10:21:12,271 INFO MainThread:20491 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 10:21:12,731 INFO MainThread:20491 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 10:21:12,833 INFO MainThread:20491 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 10:21:12,833 INFO MainThread:20491 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 10:21:12,833 INFO MainThread:20491 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 10:21:12,833 INFO MainThread:20491 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 10:21:12,837 INFO MainThread:20491 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 10:21:12,838 INFO MainThread:20491 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_10-20-37_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': './checkpoint-9000/', 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 10:21:12,842 INFO MainThread:20491 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x78f7bc2a1760>>
28
+ 2024-10-07 10:21:12,842 INFO MainThread:20491 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
29
+ 2024-10-07 10:21:37,758 WARNING MsgRouterThr:20491 [router.py:message_loop():77] message_loop has been closed
wandb/run-20241007_102112-r5qja96d/run-r5qja96d.wandb ADDED
Binary file (22.3 kB). View file
 
wandb/run-20241007_102233-fvsz65yu/files/config.yaml ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: openai/whisper-large-v3
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.18.3
6
+ m:
7
+ - "1": train/global_step
8
+ "6":
9
+ - 3
10
+ "7": []
11
+ - "1": train/epoch
12
+ "5": 1
13
+ "6":
14
+ - 1
15
+ - 3
16
+ "7": []
17
+ python_version: 3.12.3
18
+ t:
19
+ "1":
20
+ - 1
21
+ - 5
22
+ - 11
23
+ - 49
24
+ - 51
25
+ - 53
26
+ - 55
27
+ - 71
28
+ - 100
29
+ "2":
30
+ - 1
31
+ - 5
32
+ - 11
33
+ - 49
34
+ - 51
35
+ - 53
36
+ - 55
37
+ - 71
38
+ - 100
39
+ "3":
40
+ - 7
41
+ - 13
42
+ - 19
43
+ - 23
44
+ - 55
45
+ - 62
46
+ - 66
47
+ "4": 3.12.3
48
+ "5": 0.18.3
49
+ "6": 4.46.0.dev0
50
+ "8":
51
+ - 5
52
+ "9":
53
+ "1": transformers_trainer
54
+ "12": 0.18.3
55
+ "13": linux-x86_64
56
+ accelerator_config:
57
+ value:
58
+ dispatch_batches: null
59
+ even_batches: true
60
+ gradient_accumulation_kwargs: null
61
+ non_blocking: false
62
+ split_batches: false
63
+ use_seedable_sampler: true
64
+ activation_dropout:
65
+ value: 0
66
+ activation_function:
67
+ value: gelu
68
+ adafactor:
69
+ value: false
70
+ adam_beta1:
71
+ value: 0.9
72
+ adam_beta2:
73
+ value: 0.999
74
+ adam_epsilon:
75
+ value: 1e-08
76
+ add_cross_attention:
77
+ value: false
78
+ apply_spec_augment:
79
+ value: false
80
+ architectures:
81
+ value:
82
+ - WhisperForConditionalGeneration
83
+ attention_dropout:
84
+ value: 0
85
+ auto_find_batch_size:
86
+ value: false
87
+ bad_words_ids:
88
+ value: null
89
+ batch_eval_metrics:
90
+ value: false
91
+ begin_suppress_tokens:
92
+ value:
93
+ - 220
94
+ - 50257
95
+ bf16:
96
+ value: false
97
+ bf16_full_eval:
98
+ value: false
99
+ bos_token_id:
100
+ value: 50257
101
+ chunk_size_feed_forward:
102
+ value: 0
103
+ classifier_proj_size:
104
+ value: 256
105
+ cross_attention_hidden_size:
106
+ value: null
107
+ d_model:
108
+ value: 1280
109
+ data_seed:
110
+ value: null
111
+ dataloader_drop_last:
112
+ value: false
113
+ dataloader_num_workers:
114
+ value: 0
115
+ dataloader_persistent_workers:
116
+ value: false
117
+ dataloader_pin_memory:
118
+ value: true
119
+ dataloader_prefetch_factor:
120
+ value: null
121
+ ddp_backend:
122
+ value: null
123
+ ddp_broadcast_buffers:
124
+ value: null
125
+ ddp_bucket_cap_mb:
126
+ value: null
127
+ ddp_find_unused_parameters:
128
+ value: null
129
+ ddp_timeout:
130
+ value: 1800
131
+ debug:
132
+ value: []
133
+ decoder_attention_heads:
134
+ value: 20
135
+ decoder_ffn_dim:
136
+ value: 5120
137
+ decoder_layerdrop:
138
+ value: 0
139
+ decoder_layers:
140
+ value: 32
141
+ decoder_start_token_id:
142
+ value: 50258
143
+ deepspeed:
144
+ value: null
145
+ disable_tqdm:
146
+ value: false
147
+ dispatch_batches:
148
+ value: null
149
+ diversity_penalty:
150
+ value: 0
151
+ do_eval:
152
+ value: true
153
+ do_predict:
154
+ value: false
155
+ do_sample:
156
+ value: false
157
+ do_train:
158
+ value: true
159
+ dropout:
160
+ value: 0
161
+ early_stopping:
162
+ value: false
163
+ encoder_attention_heads:
164
+ value: 20
165
+ encoder_ffn_dim:
166
+ value: 5120
167
+ encoder_layerdrop:
168
+ value: 0
169
+ encoder_layers:
170
+ value: 32
171
+ encoder_no_repeat_ngram_size:
172
+ value: 0
173
+ eos_token_id:
174
+ value: 50257
175
+ eval_accumulation_steps:
176
+ value: null
177
+ eval_delay:
178
+ value: 0
179
+ eval_do_concat_batches:
180
+ value: true
181
+ eval_on_start:
182
+ value: false
183
+ eval_steps:
184
+ value: 500
185
+ eval_strategy:
186
+ value: steps
187
+ eval_use_gather_object:
188
+ value: false
189
+ evaluation_strategy:
190
+ value: steps
191
+ exponential_decay_length_penalty:
192
+ value: null
193
+ finetuning_task:
194
+ value: null
195
+ forced_bos_token_id:
196
+ value: null
197
+ forced_decoder_ids:
198
+ value: null
199
+ forced_eos_token_id:
200
+ value: null
201
+ fp16:
202
+ value: true
203
+ fp16_backend:
204
+ value: auto
205
+ fp16_full_eval:
206
+ value: false
207
+ fp16_opt_level:
208
+ value: O1
209
+ fsdp:
210
+ value: []
211
+ fsdp_config:
212
+ value:
213
+ min_num_params: 0
214
+ xla: false
215
+ xla_fsdp_grad_ckpt: false
216
+ xla_fsdp_v2: false
217
+ fsdp_min_num_params:
218
+ value: 0
219
+ fsdp_transformer_layer_cls_to_wrap:
220
+ value: null
221
+ full_determinism:
222
+ value: false
223
+ generation_config:
224
+ value: null
225
+ generation_max_length:
226
+ value: 228
227
+ generation_num_beams:
228
+ value: null
229
+ gradient_accumulation_steps:
230
+ value: 1
231
+ gradient_checkpointing:
232
+ value: true
233
+ gradient_checkpointing_kwargs:
234
+ value: null
235
+ greater_is_better:
236
+ value: false
237
+ group_by_length:
238
+ value: false
239
+ half_precision_backend:
240
+ value: auto
241
+ hub_always_push:
242
+ value: false
243
+ hub_model_id:
244
+ value: null
245
+ hub_private_repo:
246
+ value: false
247
+ hub_strategy:
248
+ value: every_save
249
+ hub_token:
250
+ value: <HUB_TOKEN>
251
+ id2label:
252
+ value:
253
+ "0": LABEL_0
254
+ "1": LABEL_1
255
+ ignore_data_skip:
256
+ value: false
257
+ include_for_metrics:
258
+ value: []
259
+ include_inputs_for_metrics:
260
+ value: false
261
+ include_num_input_tokens_seen:
262
+ value: false
263
+ include_tokens_per_second:
264
+ value: false
265
+ init_std:
266
+ value: 0.02
267
+ is_decoder:
268
+ value: false
269
+ is_encoder_decoder:
270
+ value: true
271
+ jit_mode_eval:
272
+ value: false
273
+ label_names:
274
+ value: null
275
+ label_smoothing_factor:
276
+ value: 0
277
+ label2id:
278
+ value:
279
+ LABEL_0: 0
280
+ LABEL_1: 1
281
+ learning_rate:
282
+ value: 4.375e-06
283
+ length_column_name:
284
+ value: input_length
285
+ length_penalty:
286
+ value: 1
287
+ load_best_model_at_end:
288
+ value: true
289
+ local_rank:
290
+ value: 0
291
+ log_level:
292
+ value: passive
293
+ log_level_replica:
294
+ value: warning
295
+ log_on_each_node:
296
+ value: true
297
+ logging_dir:
298
+ value: ./runs/Oct07_10-22-04_tknika
299
+ logging_first_step:
300
+ value: false
301
+ logging_nan_inf_filter:
302
+ value: true
303
+ logging_steps:
304
+ value: 25
305
+ logging_strategy:
306
+ value: steps
307
+ lr_scheduler_type:
308
+ value: linear
309
+ mask_feature_length:
310
+ value: 10
311
+ mask_feature_min_masks:
312
+ value: 0
313
+ mask_feature_prob:
314
+ value: 0
315
+ mask_time_length:
316
+ value: 10
317
+ mask_time_min_masks:
318
+ value: 2
319
+ mask_time_prob:
320
+ value: 0.05
321
+ max_grad_norm:
322
+ value: 1
323
+ max_length:
324
+ value: 448
325
+ max_source_positions:
326
+ value: 1500
327
+ max_steps:
328
+ value: 1000
329
+ max_target_positions:
330
+ value: 448
331
+ median_filter_width:
332
+ value: 7
333
+ metric_for_best_model:
334
+ value: wer
335
+ min_length:
336
+ value: 0
337
+ model/num_parameters:
338
+ value: 1543490560
339
+ model_type:
340
+ value: whisper
341
+ mp_parameters:
342
+ value: ""
343
+ neftune_noise_alpha:
344
+ value: null
345
+ no_cuda:
346
+ value: false
347
+ no_repeat_ngram_size:
348
+ value: 0
349
+ num_beam_groups:
350
+ value: 1
351
+ num_beams:
352
+ value: 1
353
+ num_hidden_layers:
354
+ value: 32
355
+ num_mel_bins:
356
+ value: 128
357
+ num_return_sequences:
358
+ value: 1
359
+ num_train_epochs:
360
+ value: 3
361
+ optim:
362
+ value: adamw_torch
363
+ optim_args:
364
+ value: null
365
+ optim_target_modules:
366
+ value: null
367
+ output_attentions:
368
+ value: false
369
+ output_dir:
370
+ value: ./
371
+ output_hidden_states:
372
+ value: false
373
+ output_scores:
374
+ value: false
375
+ overwrite_output_dir:
376
+ value: true
377
+ pad_token_id:
378
+ value: 50256
379
+ past_index:
380
+ value: -1
381
+ per_device_eval_batch_size:
382
+ value: 8
383
+ per_device_train_batch_size:
384
+ value: 16
385
+ per_gpu_eval_batch_size:
386
+ value: null
387
+ per_gpu_train_batch_size:
388
+ value: null
389
+ predict_with_generate:
390
+ value: true
391
+ prediction_loss_only:
392
+ value: false
393
+ prefix:
394
+ value: null
395
+ problem_type:
396
+ value: null
397
+ push_to_hub:
398
+ value: true
399
+ push_to_hub_model_id:
400
+ value: null
401
+ push_to_hub_organization:
402
+ value: null
403
+ push_to_hub_token:
404
+ value: <PUSH_TO_HUB_TOKEN>
405
+ ray_scope:
406
+ value: last
407
+ remove_invalid_values:
408
+ value: false
409
+ remove_unused_columns:
410
+ value: true
411
+ repetition_penalty:
412
+ value: 1
413
+ report_to:
414
+ value:
415
+ - wandb
416
+ restore_callback_states_from_checkpoint:
417
+ value: false
418
+ resume_from_checkpoint:
419
+ value: ./checkpoint-9000/
420
+ return_dict:
421
+ value: true
422
+ return_dict_in_generate:
423
+ value: false
424
+ run_name:
425
+ value: whisper-large-eu
426
+ save_on_each_node:
427
+ value: false
428
+ save_only_model:
429
+ value: false
430
+ save_safetensors:
431
+ value: true
432
+ save_steps:
433
+ value: 1000
434
+ save_strategy:
435
+ value: steps
436
+ save_total_limit:
437
+ value: null
438
+ scale_embedding:
439
+ value: false
440
+ seed:
441
+ value: 42
442
+ sep_token_id:
443
+ value: null
444
+ skip_memory_metrics:
445
+ value: true
446
+ sortish_sampler:
447
+ value: false
448
+ split_batches:
449
+ value: null
450
+ suppress_tokens:
451
+ value: null
452
+ task_specific_params:
453
+ value: null
454
+ temperature:
455
+ value: 1
456
+ tf_legacy_loss:
457
+ value: false
458
+ tf32:
459
+ value: null
460
+ tie_encoder_decoder:
461
+ value: false
462
+ tie_word_embeddings:
463
+ value: true
464
+ tokenizer_class:
465
+ value: null
466
+ top_k:
467
+ value: 50
468
+ top_p:
469
+ value: 1
470
+ torch_compile:
471
+ value: false
472
+ torch_compile_backend:
473
+ value: null
474
+ torch_compile_mode:
475
+ value: null
476
+ torch_dtype:
477
+ value: float16
478
+ torch_empty_cache_steps:
479
+ value: null
480
+ torchdynamo:
481
+ value: null
482
+ torchscript:
483
+ value: false
484
+ tpu_metrics_debug:
485
+ value: false
486
+ tpu_num_cores:
487
+ value: null
488
+ transformers_version:
489
+ value: 4.46.0.dev0
490
+ typical_p:
491
+ value: 1
492
+ use_bfloat16:
493
+ value: false
494
+ use_cache:
495
+ value: false
496
+ use_cpu:
497
+ value: false
498
+ use_ipex:
499
+ value: false
500
+ use_legacy_prediction_loop:
501
+ value: false
502
+ use_liger_kernel:
503
+ value: false
504
+ use_mps_device:
505
+ value: false
506
+ use_weighted_layer_sum:
507
+ value: false
508
+ vocab_size:
509
+ value: 51866
510
+ warmup_ratio:
511
+ value: 0
512
+ warmup_steps:
513
+ value: 500
514
+ weight_decay:
515
+ value: 0
wandb/run-20241007_102233-fvsz65yu/files/output.log ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 0%| | 0/1000 [00:00<?, ?it/s]/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py:2974: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2
+ checkpoint_rng_state = torch.load(rng_file)
3
+ Reading metadata...: 75336it [00:02, 29074.31it/s]
4
+ Reading metadata...: 13630it [00:00, 14208.60it/s]
5
+ [INFO|trainer_utils.py:830] 2024-10-07 10:22:46,488 >> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message.
6
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
7
+ with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
8
+ 9001it [00:15, 576.66it/s] [INFO|trainer.py:3738] 2024-10-07 10:22:49,598 >> Saving model checkpoint to ./checkpoint-9001
9
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py:2774: UserWarning: Moving the following attributes in the config to the generation config: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.
10
+ warnings.warn(
11
+ [INFO|configuration_utils.py:410] 2024-10-07 10:22:49,601 >> Configuration saved in ./checkpoint-9001/config.json
12
+ [INFO|configuration_utils.py:868] 2024-10-07 10:22:49,602 >> Configuration saved in ./checkpoint-9001/generation_config.json
13
+ [INFO|modeling_utils.py:3000] 2024-10-07 10:22:55,796 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoint-9001/model.safetensors.index.json.
14
+ [INFO|feature_extraction_utils.py:435] 2024-10-07 10:22:55,797 >> Feature extractor saved in ./checkpoint-9001/preprocessor_config.json
15
+ 9001it [00:25, 576.66it/s][INFO|feature_extraction_utils.py:435] 2024-10-07 10:23:19,205 >> Feature extractor saved in ./preprocessor_config.json
16
+ [INFO|trainer.py:2532] 2024-10-07 10:23:19,229 >>
17
+
18
+ Training completed. Do not forget to share your model on huggingface.co/models =)
19
+
20
+
21
+ [INFO|trainer.py:2770] 2024-10-07 10:23:19,230 >> Loading best model from ./checkpoint-9000 (score: 7.215361500971087).
22
+ [WARNING|trainer.py:2892] 2024-10-07 10:23:25,170 >> There were missing keys in the checkpoint model loaded: ['proj_out.weight'].
23
+ 9001it [00:51, 175.86it/s]
24
+ {'train_runtime': 52.4848, 'train_samples_per_second': 304.85, 'train_steps_per_second': 19.053, 'train_loss': 7.723795109133682e-07, 'epoch': 9.0}
25
+ [INFO|trainer.py:4519] 2024-10-07 10:23:25,172 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
26
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.all-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
27
+ warnings.warn(
28
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.column-metadata-handling.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
29
+ warnings.warn(
30
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
31
+ warnings.warn(
32
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.some-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
33
+ warnings.warn(
34
+ Traceback (most recent call last):
35
+ File "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py", line 630, in <module>
36
+ main()
37
+ File "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py", line 579, in main
38
+ train_result = trainer.train(resume_from_checkpoint=checkpoint)
39
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
40
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2070, in train
41
+ return inner_training_loop(
42
+ ^^^^^^^^^^^^^^^^^^^^
43
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2579, in _inner_training_loop
44
+ self._finish_current_push()
45
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 4520, in _finish_current_push
46
+ self.push_in_progress.wait_until_done()
47
+ File "/home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/utils/hub.py", line 1305, in wait_until_done
48
+ futures.wait(self.jobs)
49
+ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 305, in wait
50
+ waiter.event.wait(timeout)
51
+ File "/usr/lib/python3.12/threading.py", line 655, in wait
52
+ signaled = self._cond.wait(timeout)
53
+ ^^^^^^^^^^^^^^^^^^^^^^^^
54
+ File "/usr/lib/python3.12/threading.py", line 355, in wait
55
+ waiter.acquire()
56
+ KeyboardInterrupt
57
+ Exception ignored in: <module 'threading' from '/usr/lib/python3.12/threading.py'>
58
+ Traceback (most recent call last):
59
+ File "/usr/lib/python3.12/threading.py", line 1592, in _shutdown
60
+ atexit_call()
61
+ File "/usr/lib/python3.12/concurrent/futures/thread.py", line 31, in _python_exit
62
+ t.join()
63
+ File "/usr/lib/python3.12/threading.py", line 1147, in join
64
+ self._wait_for_tstate_lock()
65
+ File "/usr/lib/python3.12/threading.py", line 1167, in _wait_for_tstate_lock
66
+ if lock.acquire(block, timeout):
67
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68
+ KeyboardInterrupt:
wandb/run-20241007_102233-fvsz65yu/files/wandb-metadata.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-45-generic-x86_64-with-glibc2.39",
3
+ "python": "3.12.3",
4
+ "startedAt": "2024-10-07T10:22:33.418638Z",
5
+ "args": [
6
+ "--model_name_or_path=openai/whisper-large-v3",
7
+ "--dataset_name=mozilla-foundation/common_voice_17_0",
8
+ "--dataset_config_name=eu",
9
+ "--language=basque",
10
+ "--train_split_name=train+validation",
11
+ "--eval_split_name=test",
12
+ "--model_index_name=Whisper Large Basque",
13
+ "--max_steps=1000",
14
+ "--output_dir=./",
15
+ "--per_device_train_batch_size=16",
16
+ "--per_device_eval_batch_size=8",
17
+ "--gradient_accumulation_steps=1",
18
+ "--logging_steps=25",
19
+ "--learning_rate=4.375e-6",
20
+ "--warmup_steps=500",
21
+ "--evaluation_strategy=steps",
22
+ "--eval_steps=500",
23
+ "--save_strategy=steps",
24
+ "--save_steps=1000",
25
+ "--generation_max_length=228",
26
+ "--length_column_name=input_length",
27
+ "--max_duration_in_seconds=30",
28
+ "--text_column_name=sentence",
29
+ "--freeze_feature_encoder=False",
30
+ "--report_to=tensorboard",
31
+ "--metric_for_best_model=wer",
32
+ "--greater_is_better=False",
33
+ "--load_best_model_at_end",
34
+ "--gradient_checkpointing",
35
+ "--fp16",
36
+ "--overwrite_output_dir",
37
+ "--do_train",
38
+ "--do_eval",
39
+ "--predict_with_generate",
40
+ "--do_normalize_eval",
41
+ "--streaming",
42
+ "--push_to_hub",
43
+ "--resume_from_checkpoint=./checkpoint-9000/",
44
+ "--report_to",
45
+ "wandb",
46
+ "--run_name",
47
+ "whisper-large-eu"
48
+ ],
49
+ "program": "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py",
50
+ "codePath": "run_speech_recognition_seq2seq_streaming.py",
51
+ "git": {
52
+ "remote": "https://huggingface.co/xezpeleta/whisper-large-eu",
53
+ "commit": "45227421df6af8836af459c374361e7303a68aea"
54
+ },
55
+ "email": "[email protected]",
56
+ "root": "/home/tknika/whisper-large-eu",
57
+ "host": "tknika",
58
+ "username": "tknika",
59
+ "executable": "/home/tknika/whisper-large-eu/.venv/bin/python",
60
+ "codePathLocal": "run_speech_recognition_seq2seq_streaming.py",
61
+ "cpu_count": 8,
62
+ "cpu_count_logical": 8,
63
+ "gpu": "[NVIDIA L40-48Q]",
64
+ "gpu_count": 1,
65
+ "disk": {
66
+ "/": {
67
+ "total": "314615791616",
68
+ "used": "265683410944"
69
+ }
70
+ },
71
+ "memory": {
72
+ "total": "33654026240"
73
+ },
74
+ "cpu": {
75
+ "count": 8,
76
+ "countLogical": 8
77
+ },
78
+ "gpu_nvidia": [
79
+ {
80
+ "name": "NVIDIA L40-48Q",
81
+ "memoryTotal": "51539607552",
82
+ "cudaCores": 18176,
83
+ "architecture": "Ada"
84
+ }
85
+ ],
86
+ "cudaVersion": "12.4"
87
+ }
wandb/run-20241007_102233-fvsz65yu/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_step":0,"train_runtime":52.4848,"_timestamp":1.7282966051712844e+09,"total_flos":4.8922616615141376e+20,"_runtime":51.793285945,"train/epoch":9.001,"_wandb":{"runtime":276},"train/global_step":9001,"train_steps_per_second":19.053,"train_loss":7.723795109133682e-07,"train_samples_per_second":304.85}
wandb/run-20241007_102233-fvsz65yu/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T10:22:32.752507789Z","level":"INFO","msg":"started logging, with flags","port-filename":"/tmp/tmpvaw3g7ob/port-20571.txt","pid":20571,"debug":false,"disable-analytics":false}
2
+ {"time":"2024-10-07T10:22:32.752679238Z","level":"INFO","msg":"FeatureState","shutdownOnParentExitEnabled":false}
3
+ {"time":"2024-10-07T10:22:32.766495142Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":20571}
4
+ {"time":"2024-10-07T10:22:32.766421472Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":42167,"Zone":""}}
5
+ {"time":"2024-10-07T10:22:32.942067452Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:56158"}
6
+ {"time":"2024-10-07T10:22:33.420767Z","level":"INFO","msg":"handleInformInit: received","streamId":"fvsz65yu","id":"127.0.0.1:56158"}
7
+ {"time":"2024-10-07T10:22:33.432081096Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"fvsz65yu","id":"127.0.0.1:56158"}
8
+ {"time":"2024-10-07T10:27:09.487145656Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"127.0.0.1:56158"}
9
+ {"time":"2024-10-07T10:27:09.487320605Z","level":"INFO","msg":"server is shutting down"}
10
+ {"time":"2024-10-07T10:27:09.487319855Z","level":"INFO","msg":"connection: Close: initiating connection closure","id":"127.0.0.1:56158"}
11
+ {"time":"2024-10-07T10:27:09.487457674Z","level":"INFO","msg":"connection: Close: connection successfully closed","id":"127.0.0.1:56158"}
12
+ {"time":"2024-10-07T10:27:12.377332382Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"127.0.0.1:56158"}
13
+ {"time":"2024-10-07T10:27:12.377385972Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"127.0.0.1:56158"}
14
+ {"time":"2024-10-07T10:27:12.377422151Z","level":"INFO","msg":"server is closed"}
wandb/run-20241007_102233-fvsz65yu/logs/debug-internal.log ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T10:22:33.421062458Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T10:22:33.421087507Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_102233-fvsz65yu/logs/debug-core.log"}
3
+ {"time":"2024-10-07T10:22:33.423573147Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T10:22:33.432020856Z","level":"INFO","msg":"created new stream","id":"fvsz65yu"}
5
+ {"time":"2024-10-07T10:22:33.432071516Z","level":"INFO","msg":"stream: started","id":"fvsz65yu"}
6
+ {"time":"2024-10-07T10:22:33.432108776Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"fvsz65yu"}}
7
+ {"time":"2024-10-07T10:22:33.432120206Z","level":"INFO","msg":"sender: started","stream_id":{"value":"fvsz65yu"}}
8
+ {"time":"2024-10-07T10:22:33.432199375Z","level":"INFO","msg":"handler: started","stream_id":{"value":"fvsz65yu"}}
9
+ {"time":"2024-10-07T10:22:33.862313722Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T10:22:33.862904357Z","level":"INFO","msg":"Starting system monitor"}
11
+ {"time":"2024-10-07T10:27:09.487255295Z","level":"INFO","msg":"stream: closing","id":"fvsz65yu"}
12
+ {"time":"2024-10-07T10:27:09.487299825Z","level":"INFO","msg":"Stopping system monitor"}
13
+ {"time":"2024-10-07T10:27:09.495096519Z","level":"INFO","msg":"Stopped system monitor"}
14
+ {"time":"2024-10-07T10:27:12.041470803Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
15
+ {"time":"2024-10-07T10:27:12.376948825Z","level":"INFO","msg":"handler: closed","stream_id":{"value":"fvsz65yu"}}
16
+ {"time":"2024-10-07T10:27:12.377003355Z","level":"INFO","msg":"writer: Close: closed","stream_id":{"value":"fvsz65yu"}}
17
+ {"time":"2024-10-07T10:27:12.377060484Z","level":"INFO","msg":"sender: closed","stream_id":{"value":"fvsz65yu"}}
18
+ {"time":"2024-10-07T10:27:12.377206333Z","level":"INFO","msg":"stream: closed","id":"fvsz65yu"}
wandb/run-20241007_102233-fvsz65yu/logs/debug.log ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Configure stats pid to 20571
3
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_102233-fvsz65yu/logs/debug.log
10
+ 2024-10-07 10:22:33,415 INFO MainThread:20571 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_102233-fvsz65yu/logs/debug-internal.log
11
+ 2024-10-07 10:22:33,416 INFO MainThread:20571 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 10:22:33,416 INFO MainThread:20571 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
+ config: {}
14
+ 2024-10-07 10:22:33,416 INFO MainThread:20571 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 10:22:33,416 INFO MainThread:20571 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 10:22:33,417 INFO MainThread:20571 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 10:22:33,418 INFO MainThread:20571 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 10:22:33,422 INFO MainThread:20571 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 10:22:33,430 INFO MainThread:20571 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 10:22:33,857 INFO MainThread:20571 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 10:22:33,981 INFO MainThread:20571 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 10:22:33,981 INFO MainThread:20571 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 10:22:33,981 INFO MainThread:20571 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 10:22:33,981 INFO MainThread:20571 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 10:22:33,983 INFO MainThread:20571 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 10:22:33,984 INFO MainThread:20571 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 1000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_10-22-04_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': './checkpoint-9000/', 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 10:22:33,988 INFO MainThread:20571 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f01bf62b7a0>>
28
+ 2024-10-07 10:22:33,988 INFO MainThread:20571 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
29
+ 2024-10-07 10:27:09,487 WARNING MsgRouterThr:20571 [router.py:message_loop():77] message_loop has been closed
wandb/run-20241007_102233-fvsz65yu/run-fvsz65yu.wandb ADDED
Binary file (56.8 kB). View file
 
wandb/run-20241007_125615-a3z1jk8c/files/output.log ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ***** eval metrics *****
2
+ eval_loss = 0.9278
3
+ eval_model_preparation_time = 0.0102
4
+ eval_runtime = 1:09:25.15
5
+ eval_samples_per_second = 3.272
6
+ eval_steps_per_second = 0.409
7
+ eval_wer = 44.2953
8
+ [INFO|trainer.py:3738] 2024-10-07 12:56:15,790 >> Saving model checkpoint to ./
9
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py:2774: UserWarning: Moving the following attributes in the config to the generation config: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.
10
+ warnings.warn(
11
+ [INFO|configuration_utils.py:410] 2024-10-07 12:56:15,792 >> Configuration saved in ./config.json
12
+ [INFO|configuration_utils.py:868] 2024-10-07 12:56:15,793 >> Configuration saved in ./generation_config.json
13
+ [INFO|modeling_utils.py:3000] 2024-10-07 12:56:27,544 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json.
14
+ [INFO|feature_extraction_utils.py:435] 2024-10-07 12:56:27,545 >> Feature extractor saved in ./preprocessor_config.json
15
+ [INFO|modelcard.py:449] 2024-10-07 12:56:27,732 >> Dropping the following result as it does not have all the necessary fields:
16
+ {'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'dataset': {'name': 'mozilla-foundation/common_voice_17_0 eu', 'type': 'mozilla-foundation/common_voice_17_0', 'config': 'eu', 'split': 'test', 'args': 'eu'}}
17
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.all-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
18
+ warnings.warn(
19
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.column-metadata-handling.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
20
+ warnings.warn(
21
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
22
+ warnings.warn(
23
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.some-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
24
+ warnings.warn(
25
+ events.out.tfevents.1728297734.tknika.20799.0: 100%|████████████████████████████████████████████████████████████████████████████████████████| 360/360 [00:00<00:00, 1.10kB/s]
26
+ training_args.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.37k/5.37k [00:00<00:00, 9.18kB/s]
27
+ model-00002-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1.18G/1.18G [00:47<00:00, 25.0MB/s]
28
+ model-00001-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [03:00<00:00, 27.6MB/s]
29
+ Upload 4 LFS files: 100%|██████████████████████████████████���███████████████████████████████████████████████████████████████████████████████████| 4/4 [03:01<00:00, 45.33s/it]
30
+
31
+
32
+ Upload 4 LFS files: 25%|█████████████████████████████▎ | 1/4 [03:01<09:04, 181.33s/it]
wandb/run-20241007_125615-a3z1jk8c/files/requirements.txt ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Markdown==3.7
2
+ requests==2.32.3
3
+ RapidFuzz==3.10.0
4
+ yarl==1.13.1
5
+ pyarrow==17.0.0
6
+ docker-pycreds==0.4.0
7
+ nvidia-cufft-cu12==11.0.2.54
8
+ PyYAML==6.0.2
9
+ packaging==24.1
10
+ librosa==0.10.2.post1
11
+ soxr==0.5.0.post1
12
+ multiprocess==0.70.16
13
+ nvidia-nvjitlink-cu12==12.6.77
14
+ safetensors==0.4.5
15
+ joblib==1.4.2
16
+ pip==24.0
17
+ wandb==0.18.3
18
+ networkx==3.3
19
+ numba==0.60.0
20
+ scipy==1.14.1
21
+ MarkupSafe==2.1.5
22
+ GitPython==3.1.43
23
+ aiohttp==3.10.9
24
+ msgpack==1.1.0
25
+ mpmath==1.3.0
26
+ tzdata==2024.2
27
+ nvidia-cudnn-cu12==9.1.0.70
28
+ scikit-learn==1.5.2
29
+ pytz==2024.2
30
+ dill==0.3.8
31
+ nvidia-cusparse-cu12==12.1.0.106
32
+ soundfile==0.12.1
33
+ aiosignal==1.3.1
34
+ gitdb==4.0.11
35
+ Jinja2==3.1.4
36
+ jiwer==3.0.4
37
+ decorator==5.1.1
38
+ nvidia-cusolver-cu12==11.4.5.107
39
+ protobuf==5.28.2
40
+ idna==3.10
41
+ tqdm==4.66.5
42
+ pandas==2.2.3
43
+ python-dateutil==2.9.0.post0
44
+ Werkzeug==3.0.4
45
+ click==8.1.7
46
+ regex==2024.9.11
47
+ typing_extensions==4.12.2
48
+ nvidia-cublas-cu12==12.1.3.1
49
+ transformers==4.46.0.dev0
50
+ nvidia-nccl-cu12==2.20.5
51
+ nvidia-cuda-cupti-cu12==12.1.105
52
+ triton==3.0.0
53
+ pooch==1.8.2
54
+ smmap==5.0.1
55
+ grpcio==1.66.2
56
+ setuptools==75.1.0
57
+ setproctitle==1.3.3
58
+ accelerate==0.34.2
59
+ nvidia-cuda-nvrtc-cu12==12.1.105
60
+ tensorboard==2.18.0
61
+ absl-py==2.1.0
62
+ nvidia-nvtx-cu12==12.1.105
63
+ fsspec==2024.6.1
64
+ pycparser==2.22
65
+ lazy_loader==0.4
66
+ tensorboard-data-server==0.7.2
67
+ urllib3==2.2.3
68
+ threadpoolctl==3.5.0
69
+ llvmlite==0.43.0
70
+ sympy==1.13.3
71
+ audioread==3.0.1
72
+ tokenizers==0.20.0
73
+ more-itertools==10.5.0
74
+ cffi==1.17.1
75
+ evaluate==0.4.3
76
+ nvidia-curand-cu12==10.3.2.106
77
+ psutil==6.0.0
78
+ filelock==3.16.1
79
+ attrs==24.2.0
80
+ six==1.16.0
81
+ frozenlist==1.4.1
82
+ sentry-sdk==2.15.0
83
+ nvidia-cuda-runtime-cu12==12.1.105
84
+ xxhash==3.5.0
85
+ platformdirs==4.3.6
86
+ multidict==6.1.0
87
+ aiohappyeyeballs==2.4.3
88
+ torch==2.4.1
89
+ huggingface-hub==0.25.1
90
+ numpy==2.0.2
91
+ datasets==3.0.2.dev0
92
+ torchaudio==2.4.1
93
+ charset-normalizer==3.3.2
94
+ certifi==2024.8.30
wandb/run-20241007_125615-a3z1jk8c/files/wandb-metadata.json ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-45-generic-x86_64-with-glibc2.39",
3
+ "python": "3.12.3",
4
+ "startedAt": "2024-10-07T12:56:15.255202Z",
5
+ "args": [
6
+ "--model_name_or_path=openai/whisper-large-v3",
7
+ "--dataset_name=mozilla-foundation/common_voice_17_0",
8
+ "--dataset_config_name=eu",
9
+ "--language=basque",
10
+ "--train_split_name=train+validation",
11
+ "--eval_split_name=test",
12
+ "--model_index_name=Whisper Large Basque",
13
+ "--max_steps=10000",
14
+ "--output_dir=./",
15
+ "--per_device_train_batch_size=16",
16
+ "--per_device_eval_batch_size=8",
17
+ "--gradient_accumulation_steps=1",
18
+ "--logging_steps=25",
19
+ "--learning_rate=4.375e-6",
20
+ "--warmup_steps=500",
21
+ "--evaluation_strategy=steps",
22
+ "--eval_steps=500",
23
+ "--save_strategy=steps",
24
+ "--save_steps=1000",
25
+ "--generation_max_length=228",
26
+ "--length_column_name=input_length",
27
+ "--max_duration_in_seconds=30",
28
+ "--text_column_name=sentence",
29
+ "--freeze_feature_encoder=False",
30
+ "--report_to=tensorboard",
31
+ "--metric_for_best_model=wer",
32
+ "--greater_is_better=False",
33
+ "--load_best_model_at_end",
34
+ "--gradient_checkpointing",
35
+ "--fp16",
36
+ "--overwrite_output_dir",
37
+ "--do_eval",
38
+ "--predict_with_generate",
39
+ "--do_normalize_eval",
40
+ "--streaming",
41
+ "--push_to_hub",
42
+ "--report_to",
43
+ "wandb",
44
+ "--run_name",
45
+ "whisper-large-eu"
46
+ ],
47
+ "program": "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py",
48
+ "codePath": "run_speech_recognition_seq2seq_streaming.py",
49
+ "git": {
50
+ "remote": "https://huggingface.co/xezpeleta/whisper-large-eu",
51
+ "commit": "45227421df6af8836af459c374361e7303a68aea"
52
+ },
53
+ "email": "[email protected]",
54
+ "root": "/home/tknika/whisper-large-eu",
55
+ "host": "tknika",
56
+ "username": "tknika",
57
+ "executable": "/home/tknika/whisper-large-eu/.venv/bin/python",
58
+ "codePathLocal": "run_speech_recognition_seq2seq_streaming.py",
59
+ "cpu_count": 8,
60
+ "cpu_count_logical": 8,
61
+ "gpu": "[NVIDIA L40-48Q]",
62
+ "gpu_count": 1,
63
+ "disk": {
64
+ "/": {
65
+ "total": "314615791616",
66
+ "used": "265684000768"
67
+ }
68
+ },
69
+ "memory": {
70
+ "total": "33654026240"
71
+ },
72
+ "cpu": {
73
+ "count": 8,
74
+ "countLogical": 8
75
+ },
76
+ "gpu_nvidia": [
77
+ {
78
+ "name": "NVIDIA L40-48Q",
79
+ "memoryTotal": "51539607552",
80
+ "cudaCores": 18176,
81
+ "architecture": "Ada"
82
+ }
83
+ ],
84
+ "cudaVersion": "12.4"
85
+ }
wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T12:56:14.506819991Z","level":"INFO","msg":"started logging, with flags","port-filename":"/tmp/tmp2qen0_gz/port-20958.txt","pid":20958,"debug":false,"disable-analytics":false}
2
+ {"time":"2024-10-07T12:56:14.506845531Z","level":"INFO","msg":"FeatureState","shutdownOnParentExitEnabled":false}
3
+ {"time":"2024-10-07T12:56:14.510463142Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":45237,"Zone":""}}
4
+ {"time":"2024-10-07T12:56:14.510485482Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":20958}
5
+ {"time":"2024-10-07T12:56:14.698116167Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:49446"}
6
+ {"time":"2024-10-07T12:56:15.257025559Z","level":"INFO","msg":"handleInformInit: received","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
7
+ {"time":"2024-10-07T12:56:15.264445669Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T12:56:15.257353437Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T12:56:15.257380326Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log"}
3
+ {"time":"2024-10-07T12:56:15.259721418Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T12:56:15.26442537Z","level":"INFO","msg":"created new stream","id":"a3z1jk8c"}
5
+ {"time":"2024-10-07T12:56:15.264442509Z","level":"INFO","msg":"stream: started","id":"a3z1jk8c"}
6
+ {"time":"2024-10-07T12:56:15.264458959Z","level":"INFO","msg":"handler: started","stream_id":{"value":"a3z1jk8c"}}
7
+ {"time":"2024-10-07T12:56:15.264475109Z","level":"INFO","msg":"sender: started","stream_id":{"value":"a3z1jk8c"}}
8
+ {"time":"2024-10-07T12:56:15.264497739Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"a3z1jk8c"}}
9
+ {"time":"2024-10-07T12:56:15.681557119Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T12:56:15.68260129Z","level":"INFO","msg":"Starting system monitor"}
wandb/run-20241007_125615-a3z1jk8c/logs/debug.log ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-10-07 12:56:15,251 INFO MainThread:20958 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Configure stats pid to 20958
3
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug.log
10
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log
11
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
+ config: {}
14
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 12:56:15,254 INFO MainThread:20958 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 12:56:15,254 INFO MainThread:20958 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 12:56:15,258 INFO MainThread:20958 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 12:56:15,265 INFO MainThread:20958 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 12:56:15,676 INFO MainThread:20958 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 12:56:15,775 INFO MainThread:20958 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 12:56:15,777 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_11-46-39_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x748ced2ceae0>>
28
+ 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
wandb/run-20241007_125615-a3z1jk8c/run-a3z1jk8c.wandb ADDED
Binary file (393 kB). View file