xezpeleta commited on
Commit
ea283e7
1 Parent(s): dedcdce

Model save

Browse files
README.md CHANGED
@@ -1,33 +1,40 @@
1
  ---
2
  library_name: transformers
3
- language:
4
- - eu
5
  license: apache-2.0
6
  base_model: openai/whisper-large-v3
7
  tags:
8
- - whisper-event
9
  - generated_from_trainer
10
  datasets:
11
- - mozilla-foundation/common_voice_17_0
 
 
12
  model-index:
13
- - name: Whisper Large Basque
14
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
 
20
- # Whisper Large Basque
21
 
22
- This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the mozilla-foundation/common_voice_17_0 eu dataset.
23
  It achieves the following results on the evaluation set:
24
- - eval_loss: 0.9278
25
- - eval_model_preparation_time: 0.0102
26
- - eval_wer: 44.2953
27
- - eval_runtime: 4165.1595
28
- - eval_samples_per_second: 3.272
29
- - eval_steps_per_second: 0.409
30
- - step: 0
31
 
32
  ## Model description
33
 
@@ -56,6 +63,30 @@ The following hyperparameters were used during training:
56
  - training_steps: 10000
57
  - mixed_precision_training: Native AMP
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ### Framework versions
60
 
61
  - Transformers 4.46.0.dev0
 
1
  ---
2
  library_name: transformers
 
 
3
  license: apache-2.0
4
  base_model: openai/whisper-large-v3
5
  tags:
 
6
  - generated_from_trainer
7
  datasets:
8
+ - common_voice_17_0
9
+ metrics:
10
+ - wer
11
  model-index:
12
+ - name: openai/whisper-large-v3
13
+ results:
14
+ - task:
15
+ name: Automatic Speech Recognition
16
+ type: automatic-speech-recognition
17
+ dataset:
18
+ name: common_voice_17_0
19
+ type: common_voice_17_0
20
+ config: eu
21
+ split: test
22
+ args: eu
23
+ metrics:
24
+ - name: Wer
25
+ type: wer
26
+ value: 7.215361500971087
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
  should probably proofread and complete it, then remove this comment. -->
31
 
32
+ # openai/whisper-large-v3
33
 
34
+ This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.1259
37
+ - Wer: 7.2154
 
 
 
 
 
38
 
39
  ## Model description
40
 
 
63
  - training_steps: 10000
64
  - mixed_precision_training: Native AMP
65
 
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
69
+ |:-------------:|:------:|:----:|:---------------:|:-------:|
70
+ | 0.2208 | 0.05 | 500 | 0.2592 | 20.6915 |
71
+ | 0.1489 | 0.1 | 1000 | 0.1971 | 14.6827 |
72
+ | 0.1973 | 0.15 | 1500 | 0.1747 | 12.3777 |
73
+ | 0.1353 | 1.0296 | 2000 | 0.1527 | 10.7195 |
74
+ | 0.1065 | 1.0796 | 2500 | 0.1456 | 9.8694 |
75
+ | 0.106 | 1.1296 | 3000 | 0.1362 | 9.0925 |
76
+ | 0.0718 | 2.0092 | 3500 | 0.1326 | 8.5428 |
77
+ | 0.0683 | 2.0592 | 4000 | 0.1343 | 8.4851 |
78
+ | 0.0482 | 2.1092 | 4500 | 0.1336 | 8.1049 |
79
+ | 0.0548 | 2.1592 | 5000 | 0.1316 | 7.9244 |
80
+ | 0.0282 | 3.0388 | 5500 | 0.1391 | 7.8182 |
81
+ | 0.025 | 3.0888 | 6000 | 0.1425 | 7.9409 |
82
+ | 0.0274 | 3.1388 | 6500 | 0.1391 | 7.7311 |
83
+ | 0.0155 | 4.0184 | 7000 | 0.1492 | 7.6972 |
84
+ | 0.0189 | 4.0684 | 7500 | 0.1517 | 7.6569 |
85
+ | 0.0139 | 4.1184 | 8000 | 0.1539 | 7.6267 |
86
+ | 0.0141 | 4.1684 | 8500 | 0.1550 | 7.5424 |
87
+ | 0.0368 | 5.048 | 9000 | 0.1259 | 7.2154 |
88
+
89
+
90
  ### Framework versions
91
 
92
  - Transformers 4.46.0.dev0
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:08e0005225b3dbaf55dd13ac62926cc7e02c1025d66fa375e6fb305ff79cd4f9
3
  size 4993448880
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70e95881b347aab16bfadb8b06b24d389bceb51ce8b9894db264ed67fc19f29e
3
  size 4993448880
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:630ca774672856d2e0e39a702e590f635a1cfc5726a64b6578ab46dd367369a9
3
  size 1180663192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52bad486fd09fef8dae53b1e1e2b13ea27a7a645786a7a9920fe7344b96bf162
3
  size 1180663192
run.sh CHANGED
@@ -31,6 +31,8 @@ WANDB_PROJECT=whisper-medium-eu \
31
  --gradient_checkpointing \
32
  --fp16 \
33
  --overwrite_output_dir \
 
 
34
  --do_eval \
35
  --predict_with_generate \
36
  --do_normalize_eval \
 
31
  --gradient_checkpointing \
32
  --fp16 \
33
  --overwrite_output_dir \
34
+ --resume_from_checkpoint="checkpoint-9000" \
35
+ --do_train \
36
  --do_eval \
37
  --predict_with_generate \
38
  --do_normalize_eval \
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4b703135451bdb3fdf1b0263595ef460845b9b248a97e458a32874babc3e4138
3
  size 5368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf7a986abd3107068a3a0d5d4f994d9792d101aaf76cd5472057cde7be944e18
3
  size 5368
wandb/debug-internal.log CHANGED
@@ -1,10 +1,10 @@
1
- {"time":"2024-10-07T12:56:15.257353437Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
- {"time":"2024-10-07T12:56:15.257380326Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log"}
3
- {"time":"2024-10-07T12:56:15.259721418Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
- {"time":"2024-10-07T12:56:15.26442537Z","level":"INFO","msg":"created new stream","id":"a3z1jk8c"}
5
- {"time":"2024-10-07T12:56:15.264442509Z","level":"INFO","msg":"stream: started","id":"a3z1jk8c"}
6
- {"time":"2024-10-07T12:56:15.264458959Z","level":"INFO","msg":"handler: started","stream_id":{"value":"a3z1jk8c"}}
7
- {"time":"2024-10-07T12:56:15.264475109Z","level":"INFO","msg":"sender: started","stream_id":{"value":"a3z1jk8c"}}
8
- {"time":"2024-10-07T12:56:15.264497739Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"a3z1jk8c"}}
9
- {"time":"2024-10-07T12:56:15.681557119Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
- {"time":"2024-10-07T12:56:15.68260129Z","level":"INFO","msg":"Starting system monitor"}
 
1
+ {"time":"2024-10-07T13:18:49.427383059Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T13:18:49.427406249Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug-core.log"}
3
+ {"time":"2024-10-07T13:18:49.429699121Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T13:18:49.436074781Z","level":"INFO","msg":"created new stream","id":"0rbzerob"}
5
+ {"time":"2024-10-07T13:18:49.436114921Z","level":"INFO","msg":"stream: started","id":"0rbzerob"}
6
+ {"time":"2024-10-07T13:18:49.43615474Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"0rbzerob"}}
7
+ {"time":"2024-10-07T13:18:49.43615079Z","level":"INFO","msg":"sender: started","stream_id":{"value":"0rbzerob"}}
8
+ {"time":"2024-10-07T13:18:49.43618894Z","level":"INFO","msg":"handler: started","stream_id":{"value":"0rbzerob"}}
9
+ {"time":"2024-10-07T13:18:49.857246919Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T13:18:49.859623571Z","level":"INFO","msg":"Starting system monitor"}
wandb/debug.log CHANGED
@@ -1,28 +1,28 @@
1
- 2024-10-07 12:56:15,251 INFO MainThread:20958 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Configure stats pid to 20958
3
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_setup.py:_flush():79] Applying login settings: {}
9
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug.log
10
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log
11
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():617] calling init triggers
12
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
  config: {}
14
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():667] starting backend
15
- 2024-10-07 12:56:15,252 INFO MainThread:20958 [wandb_init.py:init():671] sending inform_init request
16
- 2024-10-07 12:56:15,254 INFO MainThread:20958 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
- 2024-10-07 12:56:15,254 INFO MainThread:20958 [wandb_init.py:init():684] backend started and connected
18
- 2024-10-07 12:56:15,258 INFO MainThread:20958 [wandb_init.py:init():779] updated telemetry
19
- 2024-10-07 12:56:15,265 INFO MainThread:20958 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
- 2024-10-07 12:56:15,676 INFO MainThread:20958 [wandb_init.py:init():863] starting run threads in backend
21
- 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_console_start():2465] atexit reg
22
- 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
- 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
- 2024-10-07 12:56:15,774 INFO MainThread:20958 [wandb_run.py:_redirect():2403] Redirects installed.
25
- 2024-10-07 12:56:15,775 INFO MainThread:20958 [wandb_init.py:init():907] run started, returning control to user process
26
- 2024-10-07 12:56:15,777 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_11-46-39_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
- 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x748ced2ceae0>>
28
- 2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
 
1
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Configure stats pid to 21209
3
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug.log
10
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug-internal.log
11
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
  config: {}
14
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 13:18:49,424 INFO MainThread:21209 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 13:18:49,425 INFO MainThread:21209 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 13:18:49,428 INFO MainThread:21209 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 13:18:49,436 INFO MainThread:21209 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 13:18:49,852 INFO MainThread:21209 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 13:18:49,971 INFO MainThread:21209 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 13:18:49,973 INFO MainThread:21209 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 13:18:49,974 INFO MainThread:21209 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_13-18-21_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': 'checkpoint-9000', 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 13:18:49,978 INFO MainThread:21209 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7a0b4e8a1df0>>
28
+ 2024-10-07 13:18:49,978 INFO MainThread:21209 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
wandb/run-20241007_125615-a3z1jk8c/files/config.yaml ADDED
@@ -0,0 +1,544 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: openai/whisper-large-v3
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.18.3
6
+ m:
7
+ - "1": train/global_step
8
+ "6":
9
+ - 3
10
+ "7": []
11
+ - "1": eval/model_preparation_time
12
+ "5": 1
13
+ "6":
14
+ - 1
15
+ - 3
16
+ "7": []
17
+ - "1": eval/runtime
18
+ "5": 1
19
+ "6":
20
+ - 1
21
+ - 3
22
+ "7": []
23
+ - "1": eval/samples_per_second
24
+ "5": 1
25
+ "6":
26
+ - 1
27
+ - 3
28
+ "7": []
29
+ - "1": eval/steps_per_second
30
+ "5": 1
31
+ "6":
32
+ - 1
33
+ - 3
34
+ "7": []
35
+ - "1": eval/loss
36
+ "5": 1
37
+ "6":
38
+ - 1
39
+ - 3
40
+ "7": []
41
+ - "1": eval/wer
42
+ "5": 1
43
+ "6":
44
+ - 1
45
+ - 3
46
+ "7": []
47
+ python_version: 3.12.3
48
+ t:
49
+ "1":
50
+ - 1
51
+ - 5
52
+ - 11
53
+ - 49
54
+ - 51
55
+ - 53
56
+ - 55
57
+ - 71
58
+ - 100
59
+ "2":
60
+ - 1
61
+ - 5
62
+ - 11
63
+ - 49
64
+ - 51
65
+ - 53
66
+ - 55
67
+ - 71
68
+ - 100
69
+ "3":
70
+ - 7
71
+ - 13
72
+ - 19
73
+ - 23
74
+ - 55
75
+ - 66
76
+ "4": 3.12.3
77
+ "5": 0.18.3
78
+ "6": 4.46.0.dev0
79
+ "8":
80
+ - 5
81
+ "9":
82
+ "1": transformers_trainer
83
+ "12": 0.18.3
84
+ "13": linux-x86_64
85
+ accelerator_config:
86
+ value:
87
+ dispatch_batches: null
88
+ even_batches: true
89
+ gradient_accumulation_kwargs: null
90
+ non_blocking: false
91
+ split_batches: false
92
+ use_seedable_sampler: true
93
+ activation_dropout:
94
+ value: 0
95
+ activation_function:
96
+ value: gelu
97
+ adafactor:
98
+ value: false
99
+ adam_beta1:
100
+ value: 0.9
101
+ adam_beta2:
102
+ value: 0.999
103
+ adam_epsilon:
104
+ value: 1e-08
105
+ add_cross_attention:
106
+ value: false
107
+ apply_spec_augment:
108
+ value: false
109
+ architectures:
110
+ value:
111
+ - WhisperForConditionalGeneration
112
+ attention_dropout:
113
+ value: 0
114
+ auto_find_batch_size:
115
+ value: false
116
+ bad_words_ids:
117
+ value: null
118
+ batch_eval_metrics:
119
+ value: false
120
+ begin_suppress_tokens:
121
+ value:
122
+ - 220
123
+ - 50257
124
+ bf16:
125
+ value: false
126
+ bf16_full_eval:
127
+ value: false
128
+ bos_token_id:
129
+ value: 50257
130
+ chunk_size_feed_forward:
131
+ value: 0
132
+ classifier_proj_size:
133
+ value: 256
134
+ cross_attention_hidden_size:
135
+ value: null
136
+ d_model:
137
+ value: 1280
138
+ data_seed:
139
+ value: null
140
+ dataloader_drop_last:
141
+ value: false
142
+ dataloader_num_workers:
143
+ value: 0
144
+ dataloader_persistent_workers:
145
+ value: false
146
+ dataloader_pin_memory:
147
+ value: true
148
+ dataloader_prefetch_factor:
149
+ value: null
150
+ ddp_backend:
151
+ value: null
152
+ ddp_broadcast_buffers:
153
+ value: null
154
+ ddp_bucket_cap_mb:
155
+ value: null
156
+ ddp_find_unused_parameters:
157
+ value: null
158
+ ddp_timeout:
159
+ value: 1800
160
+ debug:
161
+ value: []
162
+ decoder_attention_heads:
163
+ value: 20
164
+ decoder_ffn_dim:
165
+ value: 5120
166
+ decoder_layerdrop:
167
+ value: 0
168
+ decoder_layers:
169
+ value: 32
170
+ decoder_start_token_id:
171
+ value: 50258
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ dispatch_batches:
177
+ value: null
178
+ diversity_penalty:
179
+ value: 0
180
+ do_eval:
181
+ value: true
182
+ do_predict:
183
+ value: false
184
+ do_sample:
185
+ value: false
186
+ do_train:
187
+ value: false
188
+ dropout:
189
+ value: 0
190
+ early_stopping:
191
+ value: false
192
+ encoder_attention_heads:
193
+ value: 20
194
+ encoder_ffn_dim:
195
+ value: 5120
196
+ encoder_layerdrop:
197
+ value: 0
198
+ encoder_layers:
199
+ value: 32
200
+ encoder_no_repeat_ngram_size:
201
+ value: 0
202
+ eos_token_id:
203
+ value: 50257
204
+ eval_accumulation_steps:
205
+ value: null
206
+ eval_delay:
207
+ value: 0
208
+ eval_do_concat_batches:
209
+ value: true
210
+ eval_on_start:
211
+ value: false
212
+ eval_steps:
213
+ value: 500
214
+ eval_strategy:
215
+ value: steps
216
+ eval_use_gather_object:
217
+ value: false
218
+ evaluation_strategy:
219
+ value: steps
220
+ exponential_decay_length_penalty:
221
+ value: null
222
+ finetuning_task:
223
+ value: null
224
+ forced_bos_token_id:
225
+ value: null
226
+ forced_decoder_ids:
227
+ value: null
228
+ forced_eos_token_id:
229
+ value: null
230
+ fp16:
231
+ value: true
232
+ fp16_backend:
233
+ value: auto
234
+ fp16_full_eval:
235
+ value: false
236
+ fp16_opt_level:
237
+ value: O1
238
+ fsdp:
239
+ value: []
240
+ fsdp_config:
241
+ value:
242
+ min_num_params: 0
243
+ xla: false
244
+ xla_fsdp_grad_ckpt: false
245
+ xla_fsdp_v2: false
246
+ fsdp_min_num_params:
247
+ value: 0
248
+ fsdp_transformer_layer_cls_to_wrap:
249
+ value: null
250
+ full_determinism:
251
+ value: false
252
+ generation_config:
253
+ value: null
254
+ generation_max_length:
255
+ value: 228
256
+ generation_num_beams:
257
+ value: null
258
+ gradient_accumulation_steps:
259
+ value: 1
260
+ gradient_checkpointing:
261
+ value: true
262
+ gradient_checkpointing_kwargs:
263
+ value: null
264
+ greater_is_better:
265
+ value: false
266
+ group_by_length:
267
+ value: false
268
+ half_precision_backend:
269
+ value: auto
270
+ hub_always_push:
271
+ value: false
272
+ hub_model_id:
273
+ value: null
274
+ hub_private_repo:
275
+ value: false
276
+ hub_strategy:
277
+ value: every_save
278
+ hub_token:
279
+ value: <HUB_TOKEN>
280
+ id2label:
281
+ value:
282
+ "0": LABEL_0
283
+ "1": LABEL_1
284
+ ignore_data_skip:
285
+ value: false
286
+ include_for_metrics:
287
+ value: []
288
+ include_inputs_for_metrics:
289
+ value: false
290
+ include_num_input_tokens_seen:
291
+ value: false
292
+ include_tokens_per_second:
293
+ value: false
294
+ init_std:
295
+ value: 0.02
296
+ is_decoder:
297
+ value: false
298
+ is_encoder_decoder:
299
+ value: true
300
+ jit_mode_eval:
301
+ value: false
302
+ label_names:
303
+ value: null
304
+ label_smoothing_factor:
305
+ value: 0
306
+ label2id:
307
+ value:
308
+ LABEL_0: 0
309
+ LABEL_1: 1
310
+ learning_rate:
311
+ value: 4.375e-06
312
+ length_column_name:
313
+ value: input_length
314
+ length_penalty:
315
+ value: 1
316
+ load_best_model_at_end:
317
+ value: true
318
+ local_rank:
319
+ value: 0
320
+ log_level:
321
+ value: passive
322
+ log_level_replica:
323
+ value: warning
324
+ log_on_each_node:
325
+ value: true
326
+ logging_dir:
327
+ value: ./runs/Oct07_11-46-39_tknika
328
+ logging_first_step:
329
+ value: false
330
+ logging_nan_inf_filter:
331
+ value: true
332
+ logging_steps:
333
+ value: 25
334
+ logging_strategy:
335
+ value: steps
336
+ lr_scheduler_type:
337
+ value: linear
338
+ mask_feature_length:
339
+ value: 10
340
+ mask_feature_min_masks:
341
+ value: 0
342
+ mask_feature_prob:
343
+ value: 0
344
+ mask_time_length:
345
+ value: 10
346
+ mask_time_min_masks:
347
+ value: 2
348
+ mask_time_prob:
349
+ value: 0.05
350
+ max_grad_norm:
351
+ value: 1
352
+ max_length:
353
+ value: 448
354
+ max_source_positions:
355
+ value: 1500
356
+ max_steps:
357
+ value: 10000
358
+ max_target_positions:
359
+ value: 448
360
+ median_filter_width:
361
+ value: 7
362
+ metric_for_best_model:
363
+ value: wer
364
+ min_length:
365
+ value: 0
366
+ model/num_parameters:
367
+ value: 1543490560
368
+ model_type:
369
+ value: whisper
370
+ mp_parameters:
371
+ value: ""
372
+ neftune_noise_alpha:
373
+ value: null
374
+ no_cuda:
375
+ value: false
376
+ no_repeat_ngram_size:
377
+ value: 0
378
+ num_beam_groups:
379
+ value: 1
380
+ num_beams:
381
+ value: 1
382
+ num_hidden_layers:
383
+ value: 32
384
+ num_mel_bins:
385
+ value: 128
386
+ num_return_sequences:
387
+ value: 1
388
+ num_train_epochs:
389
+ value: 3
390
+ optim:
391
+ value: adamw_torch
392
+ optim_args:
393
+ value: null
394
+ optim_target_modules:
395
+ value: null
396
+ output_attentions:
397
+ value: false
398
+ output_dir:
399
+ value: ./
400
+ output_hidden_states:
401
+ value: false
402
+ output_scores:
403
+ value: false
404
+ overwrite_output_dir:
405
+ value: true
406
+ pad_token_id:
407
+ value: 50256
408
+ past_index:
409
+ value: -1
410
+ per_device_eval_batch_size:
411
+ value: 8
412
+ per_device_train_batch_size:
413
+ value: 16
414
+ per_gpu_eval_batch_size:
415
+ value: null
416
+ per_gpu_train_batch_size:
417
+ value: null
418
+ predict_with_generate:
419
+ value: true
420
+ prediction_loss_only:
421
+ value: false
422
+ prefix:
423
+ value: null
424
+ problem_type:
425
+ value: null
426
+ push_to_hub:
427
+ value: true
428
+ push_to_hub_model_id:
429
+ value: null
430
+ push_to_hub_organization:
431
+ value: null
432
+ push_to_hub_token:
433
+ value: <PUSH_TO_HUB_TOKEN>
434
+ ray_scope:
435
+ value: last
436
+ remove_invalid_values:
437
+ value: false
438
+ remove_unused_columns:
439
+ value: true
440
+ repetition_penalty:
441
+ value: 1
442
+ report_to:
443
+ value:
444
+ - wandb
445
+ restore_callback_states_from_checkpoint:
446
+ value: false
447
+ resume_from_checkpoint:
448
+ value: null
449
+ return_dict:
450
+ value: true
451
+ return_dict_in_generate:
452
+ value: false
453
+ run_name:
454
+ value: whisper-large-eu
455
+ save_on_each_node:
456
+ value: false
457
+ save_only_model:
458
+ value: false
459
+ save_safetensors:
460
+ value: true
461
+ save_steps:
462
+ value: 1000
463
+ save_strategy:
464
+ value: steps
465
+ save_total_limit:
466
+ value: null
467
+ scale_embedding:
468
+ value: false
469
+ seed:
470
+ value: 42
471
+ sep_token_id:
472
+ value: null
473
+ skip_memory_metrics:
474
+ value: true
475
+ sortish_sampler:
476
+ value: false
477
+ split_batches:
478
+ value: null
479
+ suppress_tokens:
480
+ value: null
481
+ task_specific_params:
482
+ value: null
483
+ temperature:
484
+ value: 1
485
+ tf_legacy_loss:
486
+ value: false
487
+ tf32:
488
+ value: null
489
+ tie_encoder_decoder:
490
+ value: false
491
+ tie_word_embeddings:
492
+ value: true
493
+ tokenizer_class:
494
+ value: null
495
+ top_k:
496
+ value: 50
497
+ top_p:
498
+ value: 1
499
+ torch_compile:
500
+ value: false
501
+ torch_compile_backend:
502
+ value: null
503
+ torch_compile_mode:
504
+ value: null
505
+ torch_dtype:
506
+ value: float16
507
+ torch_empty_cache_steps:
508
+ value: null
509
+ torchdynamo:
510
+ value: null
511
+ torchscript:
512
+ value: false
513
+ tpu_metrics_debug:
514
+ value: false
515
+ tpu_num_cores:
516
+ value: null
517
+ transformers_version:
518
+ value: 4.46.0.dev0
519
+ typical_p:
520
+ value: 1
521
+ use_bfloat16:
522
+ value: false
523
+ use_cache:
524
+ value: false
525
+ use_cpu:
526
+ value: false
527
+ use_ipex:
528
+ value: false
529
+ use_legacy_prediction_loop:
530
+ value: false
531
+ use_liger_kernel:
532
+ value: false
533
+ use_mps_device:
534
+ value: false
535
+ use_weighted_layer_sum:
536
+ value: false
537
+ vocab_size:
538
+ value: 51866
539
+ warmup_ratio:
540
+ value: 0
541
+ warmup_steps:
542
+ value: 500
543
+ weight_decay:
544
+ value: 0
wandb/run-20241007_125615-a3z1jk8c/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_step":0,"_wandb":{"runtime":266},"eval/runtime":4165.1595,"_timestamp":1.7283057757810946e+09,"eval/model_preparation_time":0.0102,"eval/steps_per_second":0.409,"train/global_step":0,"eval/wer":44.29532045879292,"_runtime":0.559212086,"eval/samples_per_second":3.272,"eval/loss":0.9277587532997131}
wandb/run-20241007_125615-a3z1jk8c/logs/debug-core.log CHANGED
@@ -5,3 +5,10 @@
5
  {"time":"2024-10-07T12:56:14.698116167Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:49446"}
6
  {"time":"2024-10-07T12:56:15.257025559Z","level":"INFO","msg":"handleInformInit: received","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
7
  {"time":"2024-10-07T12:56:15.264445669Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
 
 
 
 
 
 
 
 
5
  {"time":"2024-10-07T12:56:14.698116167Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:49446"}
6
  {"time":"2024-10-07T12:56:15.257025559Z","level":"INFO","msg":"handleInformInit: received","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
7
  {"time":"2024-10-07T12:56:15.264445669Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"a3z1jk8c","id":"127.0.0.1:49446"}
8
+ {"time":"2024-10-07T13:00:42.167739631Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"127.0.0.1:49446"}
9
+ {"time":"2024-10-07T13:00:42.167972019Z","level":"INFO","msg":"server is shutting down"}
10
+ {"time":"2024-10-07T13:00:42.167958909Z","level":"INFO","msg":"connection: Close: initiating connection closure","id":"127.0.0.1:49446"}
11
+ {"time":"2024-10-07T13:00:42.168215837Z","level":"INFO","msg":"connection: Close: connection successfully closed","id":"127.0.0.1:49446"}
12
+ {"time":"2024-10-07T13:00:45.394906949Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"127.0.0.1:49446"}
13
+ {"time":"2024-10-07T13:00:45.394950439Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"127.0.0.1:49446"}
14
+ {"time":"2024-10-07T13:00:45.394977679Z","level":"INFO","msg":"server is closed"}
wandb/run-20241007_125615-a3z1jk8c/logs/debug-internal.log CHANGED
@@ -8,3 +8,11 @@
8
  {"time":"2024-10-07T12:56:15.264497739Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"a3z1jk8c"}}
9
  {"time":"2024-10-07T12:56:15.681557119Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
  {"time":"2024-10-07T12:56:15.68260129Z","level":"INFO","msg":"Starting system monitor"}
 
 
 
 
 
 
 
 
 
8
  {"time":"2024-10-07T12:56:15.264497739Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"a3z1jk8c"}}
9
  {"time":"2024-10-07T12:56:15.681557119Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
  {"time":"2024-10-07T12:56:15.68260129Z","level":"INFO","msg":"Starting system monitor"}
11
+ {"time":"2024-10-07T13:00:42.167889239Z","level":"INFO","msg":"stream: closing","id":"a3z1jk8c"}
12
+ {"time":"2024-10-07T13:00:42.167946829Z","level":"INFO","msg":"Stopping system monitor"}
13
+ {"time":"2024-10-07T13:00:42.173849946Z","level":"INFO","msg":"Stopped system monitor"}
14
+ {"time":"2024-10-07T13:00:45.087931056Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
15
+ {"time":"2024-10-07T13:00:45.394585792Z","level":"INFO","msg":"handler: closed","stream_id":{"value":"a3z1jk8c"}}
16
+ {"time":"2024-10-07T13:00:45.394682341Z","level":"INFO","msg":"sender: closed","stream_id":{"value":"a3z1jk8c"}}
17
+ {"time":"2024-10-07T13:00:45.394644421Z","level":"INFO","msg":"writer: Close: closed","stream_id":{"value":"a3z1jk8c"}}
18
+ {"time":"2024-10-07T13:00:45.39479126Z","level":"INFO","msg":"stream: closed","id":"a3z1jk8c"}
wandb/run-20241007_125615-a3z1jk8c/logs/debug.log CHANGED
@@ -26,3 +26,4 @@ config: {}
26
  2024-10-07 12:56:15,777 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_11-46-39_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
  2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x748ced2ceae0>>
28
  2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
 
 
26
  2024-10-07 12:56:15,777 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_11-46-39_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
  2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x748ced2ceae0>>
28
  2024-10-07 12:56:15,780 INFO MainThread:20958 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
29
+ 2024-10-07 13:00:42,168 WARNING MsgRouterThr:20958 [router.py:message_loop():77] message_loop has been closed
wandb/run-20241007_125615-a3z1jk8c/run-a3z1jk8c.wandb CHANGED
Binary files a/wandb/run-20241007_125615-a3z1jk8c/run-a3z1jk8c.wandb and b/wandb/run-20241007_125615-a3z1jk8c/run-a3z1jk8c.wandb differ
 
wandb/run-20241007_131849-0rbzerob/files/output.log ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Reading metadata...: 75336it [00:03, 21675.20it/s] | 0/10000 [00:00<?, ?it/s]
2
+ Reading metadata...: 13630it [00:00, 18317.27it/s]
3
+ [INFO|trainer_utils.py:830] 2024-10-07 13:19:00,935 >> The following columns in the training set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message.
4
+ [WARNING|trainer.py:2506] 2024-10-07 13:23:37,058 >> There seems not to be a single sample in your epoch_iterator, stopping training at step 9000! This is expected if you're using an IterableDataset and set num_steps (10000) higher than the number of available samples.
5
+ [INFO|trainer.py:2532] 2024-10-07 13:23:37,059 >>
6
+
7
+ Training completed. Do not forget to share your model on huggingface.co/models =)
8
+
9
+
10
+ [INFO|trainer.py:2770] 2024-10-07 13:23:37,059 >> Loading best model from ./checkpoint-9000 (score: 7.215361500971087).
11
+ [WARNING|trainer.py:2892] 2024-10-07 13:23:38,416 >> There were missing keys in the checkpoint model loaded: ['proj_out.weight'].
12
+ 0%| | 0/10000 [04:48<?, ?it/s]
13
+ {'train_runtime': 289.8068, 'train_samples_per_second': 552.092, 'train_steps_per_second': 34.506, 'train_loss': 0.0, 'epoch': 5.05}
14
+ [INFO|trainer.py:3738] 2024-10-07 13:23:38,418 >> Saving model checkpoint to ./
15
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py:2774: UserWarning: Moving the following attributes in the config to the generation config: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.
16
+ warnings.warn(
17
+ [INFO|configuration_utils.py:410] 2024-10-07 13:23:38,421 >> Configuration saved in ./config.json
18
+ [INFO|configuration_utils.py:868] 2024-10-07 13:23:38,422 >> Configuration saved in ./generation_config.json
19
+ [INFO|modeling_utils.py:3000] 2024-10-07 13:23:45,845 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json.
20
+ [INFO|feature_extraction_utils.py:435] 2024-10-07 13:23:45,849 >> Feature extractor saved in ./preprocessor_config.json
21
+ [INFO|trainer.py:3738] 2024-10-07 13:23:45,850 >> Saving model checkpoint to ./
22
+ [INFO|configuration_utils.py:410] 2024-10-07 13:23:45,852 >> Configuration saved in ./config.json
23
+ [INFO|configuration_utils.py:868] 2024-10-07 13:23:45,852 >> Configuration saved in ./generation_config.json
24
+ [INFO|modeling_utils.py:3000] 2024-10-07 13:23:53,998 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json.
25
+ [INFO|feature_extraction_utils.py:435] 2024-10-07 13:23:53,999 >> Feature extractor saved in ./preprocessor_config.json
26
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.all-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
27
+ warnings.warn(
28
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.column-metadata-handling.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
29
+ warnings.warn(
30
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
31
+ warnings.warn(
32
+ /home/tknika/whisper-large-eu/.venv/lib/python3.12/site-packages/huggingface_hub/hf_api.py:3889: UserWarning: It seems that you are about to commit a data file (.venv/lib/python3.12/site-packages/pyarrow/tests/data/parquet/v0.7.1.some-named-index.parquet) to a model repository. You are sure this is intended? If you are trying to upload a dataset, please set `repo_type='dataset'` or `--repo-type=dataset` in a CLI.
33
+ warnings.warn(
34
+ training_args.bin: 100%|██████████████���█████████████████████████████████████████████████████████████████████████████████████████████████| 5.37k/5.37k [00:00<00:00, 15.6kB/s]
wandb/run-20241007_131849-0rbzerob/files/requirements.txt ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Markdown==3.7
2
+ requests==2.32.3
3
+ RapidFuzz==3.10.0
4
+ yarl==1.13.1
5
+ pyarrow==17.0.0
6
+ docker-pycreds==0.4.0
7
+ nvidia-cufft-cu12==11.0.2.54
8
+ PyYAML==6.0.2
9
+ packaging==24.1
10
+ librosa==0.10.2.post1
11
+ soxr==0.5.0.post1
12
+ multiprocess==0.70.16
13
+ nvidia-nvjitlink-cu12==12.6.77
14
+ safetensors==0.4.5
15
+ joblib==1.4.2
16
+ pip==24.0
17
+ wandb==0.18.3
18
+ networkx==3.3
19
+ numba==0.60.0
20
+ scipy==1.14.1
21
+ MarkupSafe==2.1.5
22
+ GitPython==3.1.43
23
+ aiohttp==3.10.9
24
+ msgpack==1.1.0
25
+ mpmath==1.3.0
26
+ tzdata==2024.2
27
+ nvidia-cudnn-cu12==9.1.0.70
28
+ scikit-learn==1.5.2
29
+ pytz==2024.2
30
+ dill==0.3.8
31
+ nvidia-cusparse-cu12==12.1.0.106
32
+ soundfile==0.12.1
33
+ aiosignal==1.3.1
34
+ gitdb==4.0.11
35
+ Jinja2==3.1.4
36
+ jiwer==3.0.4
37
+ decorator==5.1.1
38
+ nvidia-cusolver-cu12==11.4.5.107
39
+ protobuf==5.28.2
40
+ idna==3.10
41
+ tqdm==4.66.5
42
+ pandas==2.2.3
43
+ python-dateutil==2.9.0.post0
44
+ Werkzeug==3.0.4
45
+ click==8.1.7
46
+ regex==2024.9.11
47
+ typing_extensions==4.12.2
48
+ nvidia-cublas-cu12==12.1.3.1
49
+ transformers==4.46.0.dev0
50
+ nvidia-nccl-cu12==2.20.5
51
+ nvidia-cuda-cupti-cu12==12.1.105
52
+ triton==3.0.0
53
+ pooch==1.8.2
54
+ smmap==5.0.1
55
+ grpcio==1.66.2
56
+ setuptools==75.1.0
57
+ setproctitle==1.3.3
58
+ accelerate==0.34.2
59
+ nvidia-cuda-nvrtc-cu12==12.1.105
60
+ tensorboard==2.18.0
61
+ absl-py==2.1.0
62
+ nvidia-nvtx-cu12==12.1.105
63
+ fsspec==2024.6.1
64
+ pycparser==2.22
65
+ lazy_loader==0.4
66
+ tensorboard-data-server==0.7.2
67
+ urllib3==2.2.3
68
+ threadpoolctl==3.5.0
69
+ llvmlite==0.43.0
70
+ sympy==1.13.3
71
+ audioread==3.0.1
72
+ tokenizers==0.20.0
73
+ more-itertools==10.5.0
74
+ cffi==1.17.1
75
+ evaluate==0.4.3
76
+ nvidia-curand-cu12==10.3.2.106
77
+ psutil==6.0.0
78
+ filelock==3.16.1
79
+ attrs==24.2.0
80
+ six==1.16.0
81
+ frozenlist==1.4.1
82
+ sentry-sdk==2.15.0
83
+ nvidia-cuda-runtime-cu12==12.1.105
84
+ xxhash==3.5.0
85
+ platformdirs==4.3.6
86
+ multidict==6.1.0
87
+ aiohappyeyeballs==2.4.3
88
+ torch==2.4.1
89
+ huggingface-hub==0.25.1
90
+ numpy==2.0.2
91
+ datasets==3.0.2.dev0
92
+ torchaudio==2.4.1
93
+ charset-normalizer==3.3.2
94
+ certifi==2024.8.30
wandb/run-20241007_131849-0rbzerob/files/wandb-metadata.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-45-generic-x86_64-with-glibc2.39",
3
+ "python": "3.12.3",
4
+ "startedAt": "2024-10-07T13:18:49.425338Z",
5
+ "args": [
6
+ "--model_name_or_path=openai/whisper-large-v3",
7
+ "--dataset_name=mozilla-foundation/common_voice_17_0",
8
+ "--dataset_config_name=eu",
9
+ "--language=basque",
10
+ "--train_split_name=train+validation",
11
+ "--eval_split_name=test",
12
+ "--model_index_name=Whisper Large Basque",
13
+ "--max_steps=10000",
14
+ "--output_dir=./",
15
+ "--per_device_train_batch_size=16",
16
+ "--per_device_eval_batch_size=8",
17
+ "--gradient_accumulation_steps=1",
18
+ "--logging_steps=25",
19
+ "--learning_rate=4.375e-6",
20
+ "--warmup_steps=500",
21
+ "--evaluation_strategy=steps",
22
+ "--eval_steps=500",
23
+ "--save_strategy=steps",
24
+ "--save_steps=1000",
25
+ "--generation_max_length=228",
26
+ "--length_column_name=input_length",
27
+ "--max_duration_in_seconds=30",
28
+ "--text_column_name=sentence",
29
+ "--freeze_feature_encoder=False",
30
+ "--report_to=tensorboard",
31
+ "--metric_for_best_model=wer",
32
+ "--greater_is_better=False",
33
+ "--load_best_model_at_end",
34
+ "--gradient_checkpointing",
35
+ "--fp16",
36
+ "--overwrite_output_dir",
37
+ "--resume_from_checkpoint=checkpoint-9000",
38
+ "--do_train",
39
+ "--do_eval",
40
+ "--predict_with_generate",
41
+ "--do_normalize_eval",
42
+ "--streaming",
43
+ "--push_to_hub",
44
+ "--report_to",
45
+ "wandb",
46
+ "--run_name",
47
+ "whisper-large-eu"
48
+ ],
49
+ "program": "/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py",
50
+ "codePath": "run_speech_recognition_seq2seq_streaming.py",
51
+ "git": {
52
+ "remote": "https://huggingface.co/xezpeleta/whisper-large-eu",
53
+ "commit": "45227421df6af8836af459c374361e7303a68aea"
54
+ },
55
+ "email": "[email protected]",
56
+ "root": "/home/tknika/whisper-large-eu",
57
+ "host": "tknika",
58
+ "username": "tknika",
59
+ "executable": "/home/tknika/whisper-large-eu/.venv/bin/python",
60
+ "codePathLocal": "run_speech_recognition_seq2seq_streaming.py",
61
+ "cpu_count": 8,
62
+ "cpu_count_logical": 8,
63
+ "gpu": "[NVIDIA L40-48Q]",
64
+ "gpu_count": 1,
65
+ "disk": {
66
+ "/": {
67
+ "total": "314615791616",
68
+ "used": "265684557824"
69
+ }
70
+ },
71
+ "memory": {
72
+ "total": "33654026240"
73
+ },
74
+ "cpu": {
75
+ "count": 8,
76
+ "countLogical": 8
77
+ },
78
+ "gpu_nvidia": [
79
+ {
80
+ "name": "NVIDIA L40-48Q",
81
+ "memoryTotal": "51539607552",
82
+ "cudaCores": 18176,
83
+ "architecture": "Ada"
84
+ }
85
+ ],
86
+ "cudaVersion": "12.4"
87
+ }
wandb/run-20241007_131849-0rbzerob/logs/debug-core.log ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T13:18:48.681266604Z","level":"INFO","msg":"started logging, with flags","port-filename":"/tmp/tmptcl7jc73/port-21209.txt","pid":21209,"debug":false,"disable-analytics":false}
2
+ {"time":"2024-10-07T13:18:48.681313674Z","level":"INFO","msg":"FeatureState","shutdownOnParentExitEnabled":false}
3
+ {"time":"2024-10-07T13:18:48.697970073Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":21209}
4
+ {"time":"2024-10-07T13:18:48.697953933Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":33313,"Zone":""}}
5
+ {"time":"2024-10-07T13:18:48.868568838Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:55398"}
6
+ {"time":"2024-10-07T13:18:49.427121931Z","level":"INFO","msg":"handleInformInit: received","streamId":"0rbzerob","id":"127.0.0.1:55398"}
7
+ {"time":"2024-10-07T13:18:49.436122661Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"0rbzerob","id":"127.0.0.1:55398"}
wandb/run-20241007_131849-0rbzerob/logs/debug-internal.log ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2024-10-07T13:18:49.427383059Z","level":"INFO","msg":"using version","core version":"0.18.3"}
2
+ {"time":"2024-10-07T13:18:49.427406249Z","level":"INFO","msg":"created symlink","path":"/home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug-core.log"}
3
+ {"time":"2024-10-07T13:18:49.429699121Z","level":"ERROR","msg":"dialing: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information"}
4
+ {"time":"2024-10-07T13:18:49.436074781Z","level":"INFO","msg":"created new stream","id":"0rbzerob"}
5
+ {"time":"2024-10-07T13:18:49.436114921Z","level":"INFO","msg":"stream: started","id":"0rbzerob"}
6
+ {"time":"2024-10-07T13:18:49.43615474Z","level":"INFO","msg":"writer: Do: started","stream_id":{"value":"0rbzerob"}}
7
+ {"time":"2024-10-07T13:18:49.43615079Z","level":"INFO","msg":"sender: started","stream_id":{"value":"0rbzerob"}}
8
+ {"time":"2024-10-07T13:18:49.43618894Z","level":"INFO","msg":"handler: started","stream_id":{"value":"0rbzerob"}}
9
+ {"time":"2024-10-07T13:18:49.857246919Z","level":"INFO","msg":"wandb-core","!BADKEY":null}
10
+ {"time":"2024-10-07T13:18:49.859623571Z","level":"INFO","msg":"Starting system monitor"}
wandb/run-20241007_131849-0rbzerob/logs/debug.log ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Configure stats pid to 21209
3
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/.config/wandb/settings
4
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from /home/tknika/whisper-large-eu/wandb/settings
5
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'project': 'whisper-medium-eu'}
6
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': None, '_disable_service': None}
7
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': 'run_speech_recognition_seq2seq_streaming.py', 'program_abspath': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py', 'program': '/home/tknika/whisper-large-eu/run_speech_recognition_seq2seq_streaming.py'}
8
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_setup.py:_flush():79] Applying login settings: {}
9
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:_log_setup():532] Logging user logs to /home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug.log
10
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:_log_setup():533] Logging internal logs to /home/tknika/whisper-large-eu/wandb/run-20241007_131849-0rbzerob/logs/debug-internal.log
11
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():617] calling init triggers
12
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
13
+ config: {}
14
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():667] starting backend
15
+ 2024-10-07 13:18:49,422 INFO MainThread:21209 [wandb_init.py:init():671] sending inform_init request
16
+ 2024-10-07 13:18:49,424 INFO MainThread:21209 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
17
+ 2024-10-07 13:18:49,425 INFO MainThread:21209 [wandb_init.py:init():684] backend started and connected
18
+ 2024-10-07 13:18:49,428 INFO MainThread:21209 [wandb_init.py:init():779] updated telemetry
19
+ 2024-10-07 13:18:49,436 INFO MainThread:21209 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
20
+ 2024-10-07 13:18:49,852 INFO MainThread:21209 [wandb_init.py:init():863] starting run threads in backend
21
+ 2024-10-07 13:18:49,971 INFO MainThread:21209 [wandb_run.py:_console_start():2465] atexit reg
22
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2313] redirect: wrap_raw
23
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2378] Wrapping output streams.
24
+ 2024-10-07 13:18:49,972 INFO MainThread:21209 [wandb_run.py:_redirect():2403] Redirects installed.
25
+ 2024-10-07 13:18:49,973 INFO MainThread:21209 [wandb_init.py:init():907] run started, returning control to user process
26
+ 2024-10-07 13:18:49,974 INFO MainThread:21209 [wandb_run.py:_config_callback():1394] config_cb None None {'vocab_size': 51866, 'num_mel_bins': 128, 'd_model': 1280, 'encoder_layers': 32, 'encoder_attention_heads': 20, 'decoder_layers': 32, 'decoder_attention_heads': 20, 'decoder_ffn_dim': 5120, 'encoder_ffn_dim': 5120, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'use_cache': False, 'num_hidden_layers': 32, 'scale_embedding': False, 'max_source_positions': 1500, 'max_target_positions': 448, 'classifier_proj_size': 256, 'use_weighted_layer_sum': False, 'apply_spec_augment': False, 'mask_time_prob': 0.05, 'mask_time_length': 10, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.0, 'mask_feature_length': 10, 'mask_feature_min_masks': 0, 'median_filter_width': 7, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 448, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': [220, 50257], 'architectures': ['WhisperForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 50257, 'pad_token_id': 50256, 'eos_token_id': 50257, 'sep_token_id': None, 'decoder_start_token_id': 50258, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'openai/whisper-large-v3', 'transformers_version': '4.46.0.dev0', 'model_type': 'whisper', 'forced_decoder_ids': None, 'output_dir': './', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 4.375e-06, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 10000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './runs/Oct07_13-18-21_tknika', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 25, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': True, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 500, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'whisper-large-eu', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': True, 'resume_from_checkpoint': 'checkpoint-9000', 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': 228, 'generation_num_beams': None, 'generation_config': None}
27
+ 2024-10-07 13:18:49,978 INFO MainThread:21209 [wandb_config.py:__setitem__():154] config set model/num_parameters = 1543490560 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7a0b4e8a1df0>>
28
+ 2024-10-07 13:18:49,978 INFO MainThread:21209 [wandb_run.py:_config_callback():1394] config_cb model/num_parameters 1543490560 None
wandb/run-20241007_131849-0rbzerob/run-0rbzerob.wandb ADDED
Binary file (32.8 kB). View file