/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions FutureWarning, WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** torch.distributed: socket accepted connection from 127.0.0.1:35102. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:35104. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:18.382 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.385 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.386 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.386 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.390 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.397 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.407 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' 2023-05-01 02:32:18.419 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc' invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. invig-fairseq module load success. 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 6): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36434. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36436. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 6 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 7): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36438. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36440. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 7 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 2): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36442. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36444. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 2 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 0): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36446. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36448. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 0 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 1): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36450. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36452. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 1 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 5): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 4): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init 2023-05-01 02:32:39 - utils.py[line:258] - INFO: distributed init (rank 3): env:// 2023-05-01 02:32:39 - utils.py[line:261] - INFO: Start init torch.distributed: socket accepted connection from 127.0.0.1:36454. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36456. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36458. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36460. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36462. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) torch.distributed: socket accepted connection from 127.0.0.1:36464. (To turn off this message, please set BYTED_TORCH_C10D_LOG_LEVEL={WARNING, ERROR, CRITICAL}) 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 5 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 3 2023-05-01 02:32:39 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:1 to store for rank: 4 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 0 single-machine distributed training is initialized. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 2 single-machine distributed training is initialized. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 4 single-machine distributed training is initialized. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 1 single-machine distributed training is initialized. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 6 single-machine distributed training is initialized. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - distributed_c10d.py[line:252] - INFO: Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 5 single-machine distributed training is initialized. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 3 single-machine distributed training is initialized. 2023-05-01 02:32:39 - utils.py[line:274] - INFO: initialized host n176-082-134 as rank 7 single-machine distributed training is initialized. 2023-05-01 02:32:47 - train.py[line:77] - INFO: {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 10, 'log_format': 'simple', 'log_file': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': 512, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': '/mnt/bn/hri-lq/projects/VLDD/OFA-Invig/ofa_invig', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 8, 'distributed_num_procs': 8, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'env://', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': True, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': True, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 8, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': True, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False}, 'dataset': {'_name': None, 'num_workers': 2, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 5, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 1000, 'validate_after_updates': 0, 'fixed_validation_seed': 7, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 5, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 10, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 1.0, 'sentence_avg': False, 'update_freq': [3], 'lr': [3e-05], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': '/mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232', 'restore_file': '/mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230428-1407/checkpoint_9_52000.pt', 'finetune_from_model': None, 'reset_dataloader': True, 'reset_lr_scheduler': False, 'reset_meters': True, 'reset_optimizer': True, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 1000, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': 2, 'no_save': False, 'no_epoch_checkpoints': True, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'score', 'maximize_best_checkpoint_metric': True, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1, 'use_ema_weights_to_init_param': False, 'use_latest_weights_to_init_ema': False}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 8}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': Namespace(_name='ofa_huge', activation_fn='gelu', adam_betas='(0.9,0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, add_type_embedding=True, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, arch='ofa_huge', attention_dropout=0.0, attn_scale_factor=2, azureml_logging=False, batch_size=5, batch_size_valid=5, best_checkpoint_metric='score', bf16=False, bitfit=False, bpe=None, bpe_dir='/mnt/bn/hri-lq/projects/VLDD/OFA/utils/BPE', broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=1.0, code_dict_size=8192, code_image_size=128, code_layernorm_embedding=True, combine_valid_subsets=None, config_yaml='/mnt/bn/hri-lq/projects/VLDD/OFA-Invig/config/invig_env.yml', constraint_range=None, cpu=False, cpu_offload=False, criterion='adjust_label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='invig,invig', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', ddp_comm_hook='none', debug=False, decoder_attention_heads=16, decoder_drop_path_rate=0.2, decoder_embed_dim=1280, decoder_embed_path=None, decoder_ffn_embed_dim=5120, decoder_input_dim=1280, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=True, decoder_output_dim=1280, device_id=0, disable_entangle=True, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=8, distributed_port=-1, distributed_rank=0, distributed_world_size=8, drop_worst_after=0, drop_worst_ratio=0.0, dropout=0.1, ema_decay=0.9999, ema_fp32=False, ema_seed_model=None, ema_start_update=0, ema_update_freq=1, empty_cache_freq=0, encoder_attention_heads=16, encoder_drop_path_rate=0.2, encoder_embed_dim=1280, encoder_embed_path=None, encoder_ffn_embed_dim=5120, encoder_layerdrop=0, encoder_layers=24, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=True, end_learning_rate=0.0, entangle_position_embedding=False, eos=2, eval_acc=True, eval_args='{"beam":5,"min_len":1,"max_len_a":0,"max_len_b":100}', eval_print_samples=True, fast_stat_sync=False, find_unused_parameters=True, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=7, force_anneal=None, fp16=True, fp16_adam_stats=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=512, fp32_reduce_scatter=False, freeze_decoder_embedding=False, freeze_encoder_embedding=False, gen_subset='test', gradient_as_bucket_view=False, heartbeat_timeout=-1, ignore_eos=False, ignore_prefix_size=0, ignore_unused_valid_subsets=False, image_bucket_size=42, imagenet_default_mean_and_std=False, keep_best_checkpoints=2, keep_interval_updates=-1, keep_interval_updates_pattern=-1, keep_last_epochs=-1, label_smoothing=0.1, layernorm_embedding=True, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_file=None, log_format='simple', log_interval=10, lr=[3e-05], lr_scheduler='polynomial_decay', max_epoch=10, max_image_size=512, max_source_positions=1024, max_src_length=240, max_target_positions=1024, max_tgt_length=280, max_tokens=None, max_tokens_valid=None, max_update=0, max_valid_steps=None, maximize_best_checkpoint_metric=True, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_params_to_wrap=100000000, model_parallel_size=1, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_reshard_after_forward=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_seed_provided=False, no_token_positional_embeddings=False, nprocs_per_node=8, num_bins=1000, num_shards=1, num_workers=2, on_cpu_convert_precision=False, optimizer='adam', optimizer_overrides='{}', orig_patch_image_size=256, pad=1, patch_image_size=512, patch_layernorm_embedding=True, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, plasma_path='/tmp/plasma', pooler_activation_fn='tanh', pooler_classifier='mlp', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, reg_alpha=1.0, relu_dropout=0.0, report_accuracy=False, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=True, reset_logging=False, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, resnet_drop_path_rate=0.0, resnet_type='resnet152', restore_file='/mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230428-1407/checkpoint_9_52000.pt', sample_patch_num=196, save_dir='/mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232', save_interval=1, save_interval_updates=1000, scale_attn=True, scale_fc=True, scale_heads=True, scale_resids=False, scoring='bleu', scst=False, scst_args='{}', seed=1, selected_cols='0', sentence_avg=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=True, simul_type=None, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_min_lr=-1.0, stop_time_hours=0, store_ema=False, suppress_crashes=False, sync_bn=False, task='invig', tensorboard_logdir=None, threshold_loss_scale=None, token_bucket_size=256, tokenizer=None, total_num_update=1000000, tpu=False, train_subset='train', unk=3, update_freq=[3], use_bmuf=False, use_ema_weights_to_init_param=False, use_latest_weights_to_init_ema=False, use_old_adam=False, use_plasma_view=False, use_rdrop=False, use_sharded_state=False, user_dir='/mnt/bn/hri-lq/projects/VLDD/OFA-Invig/ofa_invig', valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=1000, wandb_project=None, warmup_ratio=0.06, warmup_updates=0, weight_decay=0.01, write_checkpoints_asynchronously=False, zero_sharding='none'), 'task': {'_name': 'invig', 'data': 'invig,invig', 'selected_cols': '0', 'bpe': None, 'bpe_dir': '/mnt/bn/hri-lq/projects/VLDD/OFA/utils/BPE', 'max_source_positions': 1024, 'max_target_positions': 1024, 'max_src_length': 240, 'max_tgt_length': 280, 'code_dict_size': 8192, 'patch_image_size': 512, 'orig_patch_image_size': 256, 'num_bins': 1000, 'imagenet_default_mean_and_std': False, 'constraint_range': None, 'eval_acc': True, 'eval_args': '{"beam":5,"min_len":1,"max_len_a":0,"max_len_b":100}', 'eval_print_samples': True, 'max_image_size': 512, 'scst': False, 'scst_args': '{}', 'debug': False, 'config_yaml': '/mnt/bn/hri-lq/projects/VLDD/OFA-Invig/config/invig_env.yml'}, 'criterion': {'_name': 'adjust_label_smoothed_cross_entropy', 'label_smoothing': 0.1, 'report_accuracy': False, 'ignore_prefix_size': 0, 'ignore_eos': False, 'sentence_avg': False, 'drop_worst_ratio': 0.0, 'drop_worst_after': 0, 'use_rdrop': False, 'reg_alpha': 1.0, 'sample_patch_num': 196, 'constraint_range': None}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9,0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.01, 'use_old_adam': False, 'fp16_adam_stats': False, 'tpu': False, 'lr': [3e-05]}, 'lr_scheduler': {'_name': 'polynomial_decay', 'warmup_updates': 0, 'warmup_ratio': 0.06, 'force_anneal': None, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 1000000.0, 'lr': [3e-05]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}, 'simul_type': None} 2023-05-01 02:32:47 - ofa_task.py[line:109] - INFO: source dictionary: 59457 types 2023-05-01 02:32:47 - ofa_task.py[line:110] - INFO: target dictionary: 59457 types /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large/mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large, > , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /mnt/bn/hri-lq/projects/VLDD/Link/ofa-pretrain/hf/ofa-large , > /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 2023-05-01 02:33:02 - train.py[line:111] - INFO: task: InvigTask 2023-05-01 02:33:02 - train.py[line:112] - INFO: model: OFAModel 2023-05-01 02:33:02 - train.py[line:113] - INFO: criterion: AdjustLabelSmoothedCrossEntropyCriterion 2023-05-01 02:33:02 - train.py[line:117] - INFO: num. shared model params: 929,486,848 (num. trained: 929,486,848) 2023-05-01 02:33:02 - train.py[line:124] - INFO: num. expert model params: 0 (num. trained: 0) 2023-05-01 02:33:02 - dialog_dataset.py[line:647] - INFO: loading invig-validation from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-01 02:33:02 - dialog_dataset.py[line:647] - INFO: loading guesswhat-validation from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-01 02:33:03 - dialog_dataset.py[line:671] - INFO: load validation data: 2 (256/2048 samples) dataset(s) 2023-05-01 02:33:03 - dialog_dataset.py[line:672] - INFO: Tasks: invig_grounding(498), guesswhat_grounding(1550) 2023-05-01 02:33:04 - distributed_c10d.py[line:217] - INFO: Added key: store_based_barrier_key:2 to store for rank: 0 2023-05-01 02:33:05 - distributed_c10d.py[line:252] - INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_tokens.weight <- decoder.embed_tokens.weight 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_tokens.weight <- decoder.output_projection.weight 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.0.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.0.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.0.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.0.downsample.0.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.1.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.1.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.1.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.2.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.2.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer1.2.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.0.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.0.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.0.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.0.downsample.0.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.1.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.1.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.1.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.2.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.2.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.2.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.3.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.3.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.3.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.4.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.4.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.4.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.5.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.5.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.5.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.6.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.6.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.6.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.7.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.7.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer2.7.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.0.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.0.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.0.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.0.downsample.0.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.1.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.1.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.1.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.2.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.2.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.2.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.3.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.3.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.3.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.4.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.4.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.4.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.5.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.5.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.5.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.6.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.6.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.6.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.7.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.7.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.7.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.8.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.8.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.8.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.9.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.9.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.9.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.10.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.10.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.10.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.11.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.11.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.11.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.12.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.12.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.12.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.13.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.13.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.13.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.14.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.14.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.14.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.15.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.15.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.15.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.16.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.16.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.16.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.17.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.17.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.17.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.18.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.18.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.18.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.19.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.19.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.19.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.20.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.20.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.20.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.21.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.21.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.21.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.22.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.22.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.22.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.23.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.23.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.23.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.24.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.24.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.24.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.25.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.25.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.25.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.26.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.26.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.26.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.27.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.27.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.27.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.28.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.28.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.28.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.29.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.29.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.29.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.30.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.30.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.30.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.31.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.31.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.31.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.32.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.32.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.32.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.33.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.33.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.33.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.34.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.34.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.34.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.35.conv1.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.35.conv2.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- encoder.embed_images.layer3.35.conv3.bias 2023-05-01 02:33:05 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- decoder.output_projection.bias 2023-05-01 02:33:08 - utils.py[line:759] - INFO: ***********************CUDA enviroments for all 8 workers*********************** 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 0: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 1: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 2: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 3: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 4: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 5: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 6: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:765] - INFO: rank 7: capabilities = 8.0 ; total memory = 79.347 GB ; name = NVIDIA A100-SXM4-80GB 2023-05-01 02:33:08 - utils.py[line:767] - INFO: ***********************CUDA enviroments for all 8 workers*********************** 2023-05-01 02:33:08 - train.py[line:154] - INFO: training on 8 devices (GPUs/TPUs) 2023-05-01 02:33:08 - train.py[line:160] - INFO: max tokens per device = None and max sentences per device = 5 2023-05-01 02:33:08 - trainer.py[line:458] - INFO: Preparing to load checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230428-1407/checkpoint_9_52000.pt 2023-05-01 02:33:25 - trainer.py[line:619] - INFO: Loaded checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230428-1407/checkpoint_9_52000.pt (epoch 9 @ 0 updates) 2023-05-01 02:33:25 - trainer.py[line:639] - INFO: loading train data for epoch 1 2023-05-01 02:33:25 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-01 02:33:26 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-01 02:33:27 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-01 02:33:30 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-01 02:33:31 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-01 02:33:32 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-01 02:33:36 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-01 02:33:36 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-01 02:33:40 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-01 02:33:42 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-01 02:33:43 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-01 02:33:43 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-01 02:33:44 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-01 02:33:44 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 Total steps 60420, warmup steps 3625, warmup_factor 0.00027586206896551725 2023-05-01 02:33:45 - trainer.py[line:703] - INFO: begin training epoch 1 2023-05-01 02:33:45 - train.py[line:305] - INFO: Start iterating over samples 2023-05-01 02:33:58 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 02:34:36 - progress_bar.py[line:274] - INFO: epoch 001: 11 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7849.9, nsentences=120, sample_size=4039.6, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1662.3, ups=0.22, wpb=7849.9, bsz=120, num_updates=10, lr=8.27586e-08, gnorm=0.921, clip=0, loss_scale=64, train_wall=49, gb_free=29.8, wall=88 2023-05-01 02:35:15 - progress_bar.py[line:274] - INFO: epoch 001: 21 / 6042 loss=2.535, loss_v1=0, loss_v2=0, nll_loss=1.302, ntokens=7538, nsentences=120, sample_size=3856.2, sample_size_v1=0, sample_size_v2=0, ppl=2.47, wps=1905, ups=0.25, wpb=7538, bsz=120, num_updates=20, lr=1.65517e-07, gnorm=0.933, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=128 2023-05-01 02:35:55 - progress_bar.py[line:274] - INFO: epoch 001: 31 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7683.6, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1932.7, ups=0.25, wpb=7683.6, bsz=120, num_updates=30, lr=2.48276e-07, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=167 2023-05-01 02:36:35 - progress_bar.py[line:274] - INFO: epoch 001: 41 / 6042 loss=2.534, loss_v1=0, loss_v2=0, nll_loss=1.3, ntokens=7367.2, nsentences=120, sample_size=4013.5, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1828.5, ups=0.25, wpb=7367.2, bsz=120, num_updates=40, lr=3.31034e-07, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=208 2023-05-01 02:37:16 - progress_bar.py[line:274] - INFO: epoch 001: 51 / 6042 loss=2.539, loss_v1=0, loss_v2=0, nll_loss=1.308, ntokens=8093, nsentences=120, sample_size=4110.8, sample_size_v1=0, sample_size_v2=0, ppl=2.48, wps=2004.7, ups=0.25, wpb=8093, bsz=120, num_updates=50, lr=4.13793e-07, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=27, wall=248 2023-05-01 02:37:55 - progress_bar.py[line:274] - INFO: epoch 001: 61 / 6042 loss=2.524, loss_v1=0, loss_v2=0, nll_loss=1.286, ntokens=7869.6, nsentences=120, sample_size=4122.2, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1973.3, ups=0.25, wpb=7869.6, bsz=120, num_updates=60, lr=4.96552e-07, gnorm=0.912, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=288 2023-05-01 02:38:36 - progress_bar.py[line:274] - INFO: epoch 001: 71 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7596.5, nsentences=120, sample_size=4103.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1891.9, ups=0.25, wpb=7596.5, bsz=120, num_updates=70, lr=5.7931e-07, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=328 2023-05-01 02:39:16 - progress_bar.py[line:274] - INFO: epoch 001: 81 / 6042 loss=2.53, loss_v1=0, loss_v2=0, nll_loss=1.296, ntokens=7620.8, nsentences=120, sample_size=3834.8, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1903.5, ups=0.25, wpb=7620.8, bsz=120, num_updates=80, lr=6.62069e-07, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=368 2023-05-01 02:39:56 - progress_bar.py[line:274] - INFO: epoch 001: 91 / 6042 loss=2.545, loss_v1=0, loss_v2=0, nll_loss=1.309, ntokens=7615.3, nsentences=120, sample_size=4206.3, sample_size_v1=0, sample_size_v2=0, ppl=2.48, wps=1882.6, ups=0.25, wpb=7615.3, bsz=120, num_updates=90, lr=7.44828e-07, gnorm=0.898, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=408 2023-05-01 02:40:37 - progress_bar.py[line:274] - INFO: epoch 001: 101 / 6042 loss=2.546, loss_v1=0, loss_v2=0, nll_loss=1.312, ntokens=7803.1, nsentences=120, sample_size=4237.5, sample_size_v1=0, sample_size_v2=0, ppl=2.48, wps=1925.4, ups=0.25, wpb=7803.1, bsz=120, num_updates=100, lr=8.27586e-07, gnorm=0.878, clip=0, loss_scale=64, train_wall=40, gb_free=26.2, wall=449 2023-05-01 02:41:16 - progress_bar.py[line:274] - INFO: epoch 001: 111 / 6042 loss=2.532, loss_v1=0, loss_v2=0, nll_loss=1.293, ntokens=7626.4, nsentences=120, sample_size=3992.6, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1910.5, ups=0.25, wpb=7626.4, bsz=120, num_updates=110, lr=9.10345e-07, gnorm=0.927, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=489 2023-05-01 02:41:20 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 02:42:00 - progress_bar.py[line:274] - INFO: epoch 001: 122 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.286, ntokens=7669.5, nsentences=120, sample_size=3999.9, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1759.4, ups=0.23, wpb=7669.5, bsz=120, num_updates=120, lr=9.93103e-07, gnorm=0.925, clip=20, loss_scale=32, train_wall=44, gb_free=31.1, wall=533 2023-05-01 02:42:40 - progress_bar.py[line:274] - INFO: epoch 001: 132 / 6042 loss=2.529, loss_v1=0, loss_v2=0, nll_loss=1.294, ntokens=7521.4, nsentences=120, sample_size=3988.2, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1877.1, ups=0.25, wpb=7521.4, bsz=120, num_updates=130, lr=1.07586e-06, gnorm=0.927, clip=0, loss_scale=32, train_wall=40, gb_free=30.1, wall=573 2023-05-01 02:43:19 - progress_bar.py[line:274] - INFO: epoch 001: 142 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.283, ntokens=7583.8, nsentences=120, sample_size=4166.6, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1931.1, ups=0.25, wpb=7583.8, bsz=120, num_updates=140, lr=1.15862e-06, gnorm=0.883, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=612 2023-05-01 02:44:00 - progress_bar.py[line:274] - INFO: epoch 001: 152 / 6042 loss=2.547, loss_v1=0, loss_v2=0, nll_loss=1.316, ntokens=7620, nsentences=120, sample_size=4194.4, sample_size_v1=0, sample_size_v2=0, ppl=2.49, wps=1897, ups=0.25, wpb=7620, bsz=120, num_updates=150, lr=1.24138e-06, gnorm=0.888, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=652 2023-05-01 02:44:40 - progress_bar.py[line:274] - INFO: epoch 001: 162 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7793.9, nsentences=120, sample_size=4162.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1950, ups=0.25, wpb=7793.9, bsz=120, num_updates=160, lr=1.32414e-06, gnorm=0.883, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=692 2023-05-01 02:45:22 - progress_bar.py[line:274] - INFO: epoch 001: 172 / 6042 loss=2.531, loss_v1=0, loss_v2=0, nll_loss=1.301, ntokens=8240.5, nsentences=120, sample_size=4191.5, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1951.6, ups=0.24, wpb=8240.5, bsz=120, num_updates=170, lr=1.4069e-06, gnorm=0.886, clip=0, loss_scale=32, train_wall=42, gb_free=28.9, wall=734 2023-05-01 02:46:02 - progress_bar.py[line:274] - INFO: epoch 001: 182 / 6042 loss=2.53, loss_v1=0, loss_v2=0, nll_loss=1.295, ntokens=8075.9, nsentences=120, sample_size=3989.4, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1987.7, ups=0.25, wpb=8075.9, bsz=120, num_updates=180, lr=1.48966e-06, gnorm=0.907, clip=10, loss_scale=32, train_wall=41, gb_free=30.9, wall=775 2023-05-01 02:46:42 - progress_bar.py[line:274] - INFO: epoch 001: 192 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.255, ntokens=7853.2, nsentences=120, sample_size=3994.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1987.1, ups=0.25, wpb=7853.2, bsz=120, num_updates=190, lr=1.57241e-06, gnorm=0.905, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=814 2023-05-01 02:47:21 - progress_bar.py[line:274] - INFO: epoch 001: 202 / 6042 loss=2.54, loss_v1=0, loss_v2=0, nll_loss=1.306, ntokens=7844.5, nsentences=120, sample_size=4088.3, sample_size_v1=0, sample_size_v2=0, ppl=2.47, wps=1990, ups=0.25, wpb=7844.5, bsz=120, num_updates=200, lr=1.65517e-06, gnorm=0.911, clip=0, loss_scale=32, train_wall=39, gb_free=27.6, wall=854 2023-05-01 02:48:02 - progress_bar.py[line:274] - INFO: epoch 001: 212 / 6042 loss=2.521, loss_v1=0, loss_v2=0, nll_loss=1.285, ntokens=7829.9, nsentences=120, sample_size=3703.6, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1946.6, ups=0.25, wpb=7829.9, bsz=120, num_updates=210, lr=1.73793e-06, gnorm=0.949, clip=30, loss_scale=32, train_wall=40, gb_free=31.1, wall=894 2023-05-01 02:48:41 - progress_bar.py[line:274] - INFO: epoch 001: 222 / 6042 loss=2.514, loss_v1=0, loss_v2=0, nll_loss=1.277, ntokens=7585.6, nsentences=120, sample_size=3809.8, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1945.8, ups=0.26, wpb=7585.6, bsz=120, num_updates=220, lr=1.82069e-06, gnorm=0.916, clip=10, loss_scale=32, train_wall=39, gb_free=29.8, wall=933 2023-05-01 02:49:20 - progress_bar.py[line:274] - INFO: epoch 001: 232 / 6042 loss=2.533, loss_v1=0, loss_v2=0, nll_loss=1.3, ntokens=7606.3, nsentences=120, sample_size=4015.7, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1927.1, ups=0.25, wpb=7606.3, bsz=120, num_updates=230, lr=1.90345e-06, gnorm=0.908, clip=20, loss_scale=32, train_wall=39, gb_free=30, wall=972 2023-05-01 02:50:00 - progress_bar.py[line:274] - INFO: epoch 001: 242 / 6042 loss=2.532, loss_v1=0, loss_v2=0, nll_loss=1.297, ntokens=7627.1, nsentences=120, sample_size=4148.7, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1907.5, ups=0.25, wpb=7627.1, bsz=120, num_updates=240, lr=1.98621e-06, gnorm=0.895, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=1012 2023-05-01 02:50:40 - progress_bar.py[line:274] - INFO: epoch 001: 252 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7771.2, nsentences=120, sample_size=4006.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1956, ups=0.25, wpb=7771.2, bsz=120, num_updates=250, lr=2.06897e-06, gnorm=0.905, clip=10, loss_scale=32, train_wall=40, gb_free=31.3, wall=1052 2023-05-01 02:51:19 - progress_bar.py[line:274] - INFO: epoch 001: 262 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7363.4, nsentences=120, sample_size=3654, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1878.7, ups=0.26, wpb=7363.4, bsz=120, num_updates=260, lr=2.15172e-06, gnorm=0.955, clip=30, loss_scale=32, train_wall=39, gb_free=29.5, wall=1091 2023-05-01 02:51:58 - progress_bar.py[line:274] - INFO: epoch 001: 272 / 6042 loss=2.547, loss_v1=0, loss_v2=0, nll_loss=1.308, ntokens=7623.7, nsentences=120, sample_size=4188.5, sample_size_v1=0, sample_size_v2=0, ppl=2.48, wps=1939.4, ups=0.25, wpb=7623.7, bsz=120, num_updates=270, lr=2.23448e-06, gnorm=0.93, clip=20, loss_scale=32, train_wall=39, gb_free=27.1, wall=1131 2023-05-01 02:52:39 - progress_bar.py[line:274] - INFO: epoch 001: 282 / 6042 loss=2.523, loss_v1=0, loss_v2=0, nll_loss=1.288, ntokens=7935.6, nsentences=120, sample_size=3968.4, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1971.9, ups=0.25, wpb=7935.6, bsz=120, num_updates=280, lr=2.31724e-06, gnorm=0.873, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=1171 2023-05-01 02:53:18 - progress_bar.py[line:274] - INFO: epoch 001: 292 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7835.4, nsentences=120, sample_size=4165.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1969.9, ups=0.25, wpb=7835.4, bsz=120, num_updates=290, lr=2.4e-06, gnorm=0.908, clip=0, loss_scale=32, train_wall=40, gb_free=24.9, wall=1211 2023-05-01 02:53:57 - progress_bar.py[line:274] - INFO: epoch 001: 302 / 6042 loss=2.529, loss_v1=0, loss_v2=0, nll_loss=1.294, ntokens=7593.1, nsentences=120, sample_size=3810.8, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1949.7, ups=0.26, wpb=7593.1, bsz=120, num_updates=300, lr=2.48276e-06, gnorm=0.949, clip=30, loss_scale=32, train_wall=39, gb_free=30.8, wall=1250 2023-05-01 02:54:37 - progress_bar.py[line:274] - INFO: epoch 001: 312 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7519.7, nsentences=120, sample_size=3985.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1898.5, ups=0.25, wpb=7519.7, bsz=120, num_updates=310, lr=2.56552e-06, gnorm=0.929, clip=30, loss_scale=32, train_wall=40, gb_free=28.4, wall=1289 2023-05-01 02:55:16 - progress_bar.py[line:274] - INFO: epoch 001: 322 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.276, ntokens=7434.4, nsentences=120, sample_size=4184.1, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1922.4, ups=0.26, wpb=7434.4, bsz=120, num_updates=320, lr=2.64828e-06, gnorm=0.876, clip=0, loss_scale=32, train_wall=39, gb_free=30.2, wall=1328 2023-05-01 02:55:56 - progress_bar.py[line:274] - INFO: epoch 001: 332 / 6042 loss=2.514, loss_v1=0, loss_v2=0, nll_loss=1.279, ntokens=7640.9, nsentences=120, sample_size=3847, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1907.6, ups=0.25, wpb=7640.9, bsz=120, num_updates=330, lr=2.73103e-06, gnorm=0.919, clip=0, loss_scale=32, train_wall=40, gb_free=29.3, wall=1368 2023-05-01 02:56:35 - progress_bar.py[line:274] - INFO: epoch 001: 342 / 6042 loss=2.538, loss_v1=0, loss_v2=0, nll_loss=1.295, ntokens=7758.5, nsentences=120, sample_size=3903.1, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1982.2, ups=0.26, wpb=7758.5, bsz=120, num_updates=340, lr=2.81379e-06, gnorm=0.942, clip=20, loss_scale=32, train_wall=39, gb_free=27.3, wall=1407 2023-05-01 02:57:15 - progress_bar.py[line:274] - INFO: epoch 001: 352 / 6042 loss=2.547, loss_v1=0, loss_v2=0, nll_loss=1.309, ntokens=7645.1, nsentences=120, sample_size=3933.2, sample_size_v1=0, sample_size_v2=0, ppl=2.48, wps=1874.3, ups=0.25, wpb=7645.1, bsz=120, num_updates=350, lr=2.89655e-06, gnorm=0.93, clip=30, loss_scale=32, train_wall=41, gb_free=29.9, wall=1448 2023-05-01 02:57:55 - progress_bar.py[line:274] - INFO: epoch 001: 362 / 6042 loss=2.53, loss_v1=0, loss_v2=0, nll_loss=1.291, ntokens=7940.9, nsentences=120, sample_size=3954.7, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=2007.3, ups=0.25, wpb=7940.9, bsz=120, num_updates=360, lr=2.97931e-06, gnorm=0.873, clip=0, loss_scale=32, train_wall=39, gb_free=28.6, wall=1488 2023-05-01 02:58:35 - progress_bar.py[line:274] - INFO: epoch 001: 372 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7909.3, nsentences=120, sample_size=4094.4, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1977, ups=0.25, wpb=7909.3, bsz=120, num_updates=370, lr=3.06207e-06, gnorm=0.998, clip=10, loss_scale=32, train_wall=40, gb_free=31.6, wall=1528 2023-05-01 02:59:15 - progress_bar.py[line:274] - INFO: epoch 001: 382 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.277, ntokens=7878.4, nsentences=120, sample_size=4115.6, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1985.5, ups=0.25, wpb=7878.4, bsz=120, num_updates=380, lr=3.14483e-06, gnorm=0.909, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=1567 2023-05-01 02:59:55 - progress_bar.py[line:274] - INFO: epoch 001: 392 / 6042 loss=2.53, loss_v1=0, loss_v2=0, nll_loss=1.298, ntokens=7832.6, nsentences=120, sample_size=3727.4, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1965, ups=0.25, wpb=7832.6, bsz=120, num_updates=390, lr=3.22759e-06, gnorm=0.947, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=1607 2023-05-01 03:00:34 - progress_bar.py[line:274] - INFO: epoch 001: 402 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7607.5, nsentences=120, sample_size=3911.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1922.3, ups=0.25, wpb=7607.5, bsz=120, num_updates=400, lr=3.31034e-06, gnorm=0.92, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=1647 2023-05-01 03:01:14 - progress_bar.py[line:274] - INFO: epoch 001: 412 / 6042 loss=2.529, loss_v1=0, loss_v2=0, nll_loss=1.291, ntokens=7735.7, nsentences=120, sample_size=4125.2, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1932.8, ups=0.25, wpb=7735.7, bsz=120, num_updates=410, lr=3.3931e-06, gnorm=0.922, clip=30, loss_scale=32, train_wall=40, gb_free=31, wall=1687 2023-05-01 03:01:54 - progress_bar.py[line:274] - INFO: epoch 001: 422 / 6042 loss=2.531, loss_v1=0, loss_v2=0, nll_loss=1.296, ntokens=7895.6, nsentences=120, sample_size=3838.4, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1999.9, ups=0.25, wpb=7895.6, bsz=120, num_updates=420, lr=3.47586e-06, gnorm=0.942, clip=20, loss_scale=32, train_wall=39, gb_free=30.5, wall=1726 2023-05-01 03:02:35 - progress_bar.py[line:274] - INFO: epoch 001: 432 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7743.8, nsentences=120, sample_size=4362.2, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1886.8, ups=0.24, wpb=7743.8, bsz=120, num_updates=430, lr=3.55862e-06, gnorm=0.856, clip=0, loss_scale=32, train_wall=41, gb_free=29.5, wall=1767 2023-05-01 03:02:47 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 16.0 2023-05-01 03:03:19 - progress_bar.py[line:274] - INFO: epoch 001: 443 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.255, ntokens=7906.3, nsentences=120, sample_size=4257, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1770.8, ups=0.22, wpb=7906.3, bsz=120, num_updates=440, lr=3.64138e-06, gnorm=0.876, clip=0, loss_scale=16, train_wall=45, gb_free=30, wall=1812 2023-05-01 03:03:58 - progress_bar.py[line:274] - INFO: epoch 001: 453 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7535.3, nsentences=120, sample_size=3881.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1931.1, ups=0.26, wpb=7535.3, bsz=120, num_updates=450, lr=3.72414e-06, gnorm=0.899, clip=0, loss_scale=16, train_wall=39, gb_free=30, wall=1851 2023-05-01 03:04:38 - progress_bar.py[line:274] - INFO: epoch 001: 463 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.274, ntokens=7634.6, nsentences=120, sample_size=4106.7, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1929.8, ups=0.25, wpb=7634.6, bsz=120, num_updates=460, lr=3.8069e-06, gnorm=0.91, clip=10, loss_scale=16, train_wall=39, gb_free=29.7, wall=1890 2023-05-01 03:05:18 - progress_bar.py[line:274] - INFO: epoch 001: 473 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7780.2, nsentences=120, sample_size=3971.3, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1934.6, ups=0.25, wpb=7780.2, bsz=120, num_updates=470, lr=3.88966e-06, gnorm=0.898, clip=0, loss_scale=16, train_wall=40, gb_free=29.6, wall=1931 2023-05-01 03:05:59 - progress_bar.py[line:274] - INFO: epoch 001: 483 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7542.4, nsentences=120, sample_size=4023.4, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1846.4, ups=0.24, wpb=7542.4, bsz=120, num_updates=480, lr=3.97241e-06, gnorm=0.897, clip=20, loss_scale=16, train_wall=41, gb_free=30.2, wall=1972 2023-05-01 03:06:39 - progress_bar.py[line:274] - INFO: epoch 001: 493 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7848.5, nsentences=120, sample_size=3827.3, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1960.8, ups=0.25, wpb=7848.5, bsz=120, num_updates=490, lr=4.05517e-06, gnorm=0.919, clip=20, loss_scale=16, train_wall=40, gb_free=31.2, wall=2012 2023-05-01 03:07:19 - progress_bar.py[line:274] - INFO: epoch 001: 503 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7542.2, nsentences=120, sample_size=4401.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1887.9, ups=0.25, wpb=7542.2, bsz=120, num_updates=500, lr=4.13793e-06, gnorm=0.842, clip=0, loss_scale=16, train_wall=40, gb_free=29.8, wall=2052 2023-05-01 03:07:59 - progress_bar.py[line:274] - INFO: epoch 001: 513 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7434.7, nsentences=120, sample_size=4026.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1879.1, ups=0.25, wpb=7434.7, bsz=120, num_updates=510, lr=4.22069e-06, gnorm=0.874, clip=0, loss_scale=16, train_wall=39, gb_free=30.4, wall=2091 2023-05-01 03:08:38 - progress_bar.py[line:274] - INFO: epoch 001: 523 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7512.2, nsentences=120, sample_size=3948.7, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1929.6, ups=0.26, wpb=7512.2, bsz=120, num_updates=520, lr=4.30345e-06, gnorm=0.878, clip=0, loss_scale=16, train_wall=39, gb_free=29.8, wall=2130 2023-05-01 03:09:17 - progress_bar.py[line:274] - INFO: epoch 001: 533 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7488.1, nsentences=120, sample_size=4237.2, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1884, ups=0.25, wpb=7488.1, bsz=120, num_updates=530, lr=4.38621e-06, gnorm=0.88, clip=0, loss_scale=16, train_wall=40, gb_free=29.8, wall=2170 2023-05-01 03:09:57 - progress_bar.py[line:274] - INFO: epoch 001: 543 / 6042 loss=2.548, loss_v1=0, loss_v2=0, nll_loss=1.301, ntokens=7724.3, nsentences=120, sample_size=3622.4, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1940, ups=0.25, wpb=7724.3, bsz=120, num_updates=540, lr=4.46897e-06, gnorm=0.942, clip=30, loss_scale=16, train_wall=40, gb_free=29.9, wall=2210 2023-05-01 03:10:36 - progress_bar.py[line:274] - INFO: epoch 001: 553 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7577, nsentences=120, sample_size=3818, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1936.1, ups=0.26, wpb=7577, bsz=120, num_updates=550, lr=4.55172e-06, gnorm=0.968, clip=30, loss_scale=16, train_wall=39, gb_free=30.1, wall=2249 2023-05-01 03:11:15 - progress_bar.py[line:274] - INFO: epoch 001: 563 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7607.4, nsentences=120, sample_size=4104.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1971.1, ups=0.26, wpb=7607.4, bsz=120, num_updates=560, lr=4.63448e-06, gnorm=0.884, clip=0, loss_scale=16, train_wall=39, gb_free=30.2, wall=2287 2023-05-01 03:11:55 - progress_bar.py[line:274] - INFO: epoch 001: 573 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7734.5, nsentences=120, sample_size=4199.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1946.7, ups=0.25, wpb=7734.5, bsz=120, num_updates=570, lr=4.71724e-06, gnorm=0.884, clip=0, loss_scale=16, train_wall=40, gb_free=30.2, wall=2327 2023-05-01 03:12:34 - progress_bar.py[line:274] - INFO: epoch 001: 583 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7819.9, nsentences=120, sample_size=4035.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1990.3, ups=0.25, wpb=7819.9, bsz=120, num_updates=580, lr=4.8e-06, gnorm=0.909, clip=20, loss_scale=16, train_wall=39, gb_free=29.6, wall=2366 2023-05-01 03:13:14 - progress_bar.py[line:274] - INFO: epoch 001: 593 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7453.8, nsentences=120, sample_size=3961.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1875.2, ups=0.25, wpb=7453.8, bsz=120, num_updates=590, lr=4.88276e-06, gnorm=0.894, clip=0, loss_scale=16, train_wall=40, gb_free=29.9, wall=2406 2023-05-01 03:13:54 - progress_bar.py[line:274] - INFO: epoch 001: 603 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7753, nsentences=120, sample_size=4085.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1918.8, ups=0.25, wpb=7753, bsz=120, num_updates=600, lr=4.96552e-06, gnorm=0.891, clip=10, loss_scale=16, train_wall=40, gb_free=30.4, wall=2446 2023-05-01 03:14:34 - progress_bar.py[line:274] - INFO: epoch 001: 613 / 6042 loss=2.528, loss_v1=0, loss_v2=0, nll_loss=1.285, ntokens=7610.8, nsentences=120, sample_size=4110.6, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1919.3, ups=0.25, wpb=7610.8, bsz=120, num_updates=610, lr=5.04828e-06, gnorm=0.912, clip=10, loss_scale=16, train_wall=40, gb_free=29.8, wall=2486 2023-05-01 03:15:13 - progress_bar.py[line:274] - INFO: epoch 001: 623 / 6042 loss=2.522, loss_v1=0, loss_v2=0, nll_loss=1.283, ntokens=7665.7, nsentences=120, sample_size=3878.3, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1945.9, ups=0.25, wpb=7665.7, bsz=120, num_updates=620, lr=5.13103e-06, gnorm=0.947, clip=20, loss_scale=16, train_wall=39, gb_free=31, wall=2526 2023-05-01 03:15:53 - progress_bar.py[line:274] - INFO: epoch 001: 633 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7943.9, nsentences=120, sample_size=3886.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1969.8, ups=0.25, wpb=7943.9, bsz=120, num_updates=630, lr=5.21379e-06, gnorm=0.919, clip=10, loss_scale=16, train_wall=40, gb_free=30.2, wall=2566 2023-05-01 03:16:33 - progress_bar.py[line:274] - INFO: epoch 001: 643 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7403, nsentences=120, sample_size=4083.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1881.6, ups=0.25, wpb=7403, bsz=120, num_updates=640, lr=5.29655e-06, gnorm=0.9, clip=10, loss_scale=16, train_wall=39, gb_free=31.2, wall=2605 2023-05-01 03:17:12 - progress_bar.py[line:274] - INFO: epoch 001: 653 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7863.8, nsentences=120, sample_size=3602.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2003.7, ups=0.25, wpb=7863.8, bsz=120, num_updates=650, lr=5.37931e-06, gnorm=0.936, clip=10, loss_scale=16, train_wall=39, gb_free=28.4, wall=2644 2023-05-01 03:17:52 - progress_bar.py[line:274] - INFO: epoch 001: 663 / 6042 loss=2.526, loss_v1=0, loss_v2=0, nll_loss=1.299, ntokens=7903.1, nsentences=120, sample_size=4032.5, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1969.3, ups=0.25, wpb=7903.1, bsz=120, num_updates=660, lr=5.46207e-06, gnorm=0.953, clip=30, loss_scale=16, train_wall=40, gb_free=29, wall=2685 2023-05-01 03:18:31 - progress_bar.py[line:274] - INFO: epoch 001: 673 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.253, ntokens=7644.5, nsentences=120, sample_size=4119, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1951.3, ups=0.26, wpb=7644.5, bsz=120, num_updates=670, lr=5.54483e-06, gnorm=0.909, clip=0, loss_scale=16, train_wall=39, gb_free=29.9, wall=2724 2023-05-01 03:19:12 - progress_bar.py[line:274] - INFO: epoch 001: 683 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.255, ntokens=8050.8, nsentences=120, sample_size=4003.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1964.3, ups=0.24, wpb=8050.8, bsz=120, num_updates=680, lr=5.62759e-06, gnorm=0.889, clip=10, loss_scale=16, train_wall=41, gb_free=29.3, wall=2765 2023-05-01 03:19:53 - progress_bar.py[line:274] - INFO: epoch 001: 693 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7933.2, nsentences=120, sample_size=4169.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1942.3, ups=0.24, wpb=7933.2, bsz=120, num_updates=690, lr=5.71034e-06, gnorm=0.897, clip=0, loss_scale=16, train_wall=41, gb_free=29.8, wall=2806 2023-05-01 03:20:33 - progress_bar.py[line:274] - INFO: epoch 001: 703 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7625.5, nsentences=120, sample_size=3962.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1927.9, ups=0.25, wpb=7625.5, bsz=120, num_updates=700, lr=5.7931e-06, gnorm=0.908, clip=0, loss_scale=16, train_wall=39, gb_free=30.7, wall=2845 2023-05-01 03:21:13 - progress_bar.py[line:274] - INFO: epoch 001: 713 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7565.6, nsentences=120, sample_size=4035.5, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1873.4, ups=0.25, wpb=7565.6, bsz=120, num_updates=710, lr=5.87586e-06, gnorm=0.926, clip=10, loss_scale=16, train_wall=40, gb_free=30.6, wall=2886 2023-05-01 03:21:53 - progress_bar.py[line:274] - INFO: epoch 001: 723 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7858, nsentences=120, sample_size=4229.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1967.3, ups=0.25, wpb=7858, bsz=120, num_updates=720, lr=5.95862e-06, gnorm=0.904, clip=0, loss_scale=16, train_wall=40, gb_free=30.4, wall=2925 2023-05-01 03:22:32 - progress_bar.py[line:274] - INFO: epoch 001: 733 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7498.3, nsentences=120, sample_size=3970.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1912.3, ups=0.26, wpb=7498.3, bsz=120, num_updates=730, lr=6.04138e-06, gnorm=0.919, clip=0, loss_scale=16, train_wall=39, gb_free=29.1, wall=2965 2023-05-01 03:23:12 - progress_bar.py[line:274] - INFO: epoch 001: 743 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7422.8, nsentences=120, sample_size=4072.3, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1865, ups=0.25, wpb=7422.8, bsz=120, num_updates=740, lr=6.12414e-06, gnorm=0.921, clip=30, loss_scale=16, train_wall=40, gb_free=29.7, wall=3005 2023-05-01 03:23:52 - progress_bar.py[line:274] - INFO: epoch 001: 753 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7772.1, nsentences=120, sample_size=3985.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1951.8, ups=0.25, wpb=7772.1, bsz=120, num_updates=750, lr=6.2069e-06, gnorm=0.914, clip=0, loss_scale=16, train_wall=40, gb_free=30.7, wall=3044 2023-05-01 03:24:31 - progress_bar.py[line:274] - INFO: epoch 001: 763 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7563.8, nsentences=120, sample_size=4197.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1926.5, ups=0.25, wpb=7563.8, bsz=120, num_updates=760, lr=6.28966e-06, gnorm=0.87, clip=0, loss_scale=16, train_wall=39, gb_free=27.3, wall=3084 2023-05-01 03:25:11 - progress_bar.py[line:274] - INFO: epoch 001: 773 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7780.3, nsentences=120, sample_size=3627.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1953.5, ups=0.25, wpb=7780.3, bsz=120, num_updates=770, lr=6.37241e-06, gnorm=0.946, clip=10, loss_scale=16, train_wall=40, gb_free=29.7, wall=3123 2023-05-01 03:25:50 - progress_bar.py[line:274] - INFO: epoch 001: 783 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7612, nsentences=120, sample_size=4421.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1928.7, ups=0.25, wpb=7612, bsz=120, num_updates=780, lr=6.45517e-06, gnorm=0.88, clip=0, loss_scale=16, train_wall=39, gb_free=28.3, wall=3163 2023-05-01 03:26:30 - progress_bar.py[line:274] - INFO: epoch 001: 793 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7811.6, nsentences=120, sample_size=4033.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1989.9, ups=0.25, wpb=7811.6, bsz=120, num_updates=790, lr=6.53793e-06, gnorm=0.912, clip=10, loss_scale=16, train_wall=39, gb_free=30.5, wall=3202 2023-05-01 03:27:09 - progress_bar.py[line:274] - INFO: epoch 001: 803 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7358.9, nsentences=120, sample_size=4078, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1870.4, ups=0.25, wpb=7358.9, bsz=120, num_updates=800, lr=6.62069e-06, gnorm=0.895, clip=10, loss_scale=16, train_wall=39, gb_free=30, wall=3241 2023-05-01 03:27:49 - progress_bar.py[line:274] - INFO: epoch 001: 813 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7439.8, nsentences=120, sample_size=4194.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1884.1, ups=0.25, wpb=7439.8, bsz=120, num_updates=810, lr=6.70345e-06, gnorm=0.884, clip=0, loss_scale=16, train_wall=39, gb_free=30.3, wall=3281 2023-05-01 03:28:28 - progress_bar.py[line:274] - INFO: epoch 001: 823 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7786.7, nsentences=120, sample_size=4060.6, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1953.3, ups=0.25, wpb=7786.7, bsz=120, num_updates=820, lr=6.78621e-06, gnorm=0.91, clip=0, loss_scale=16, train_wall=40, gb_free=31.4, wall=3321 2023-05-01 03:29:08 - progress_bar.py[line:274] - INFO: epoch 001: 833 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7449.6, nsentences=120, sample_size=4184.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1881.6, ups=0.25, wpb=7449.6, bsz=120, num_updates=830, lr=6.86897e-06, gnorm=0.985, clip=30, loss_scale=16, train_wall=40, gb_free=30.5, wall=3360 2023-05-01 03:29:47 - progress_bar.py[line:274] - INFO: epoch 001: 843 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7797, nsentences=120, sample_size=3801.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1983, ups=0.25, wpb=7797, bsz=120, num_updates=840, lr=6.95172e-06, gnorm=0.937, clip=10, loss_scale=16, train_wall=39, gb_free=30.6, wall=3400 2023-05-01 03:30:27 - progress_bar.py[line:274] - INFO: epoch 001: 853 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7467.5, nsentences=120, sample_size=3919.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1902, ups=0.25, wpb=7467.5, bsz=120, num_updates=850, lr=7.03448e-06, gnorm=0.935, clip=10, loss_scale=16, train_wall=39, gb_free=31, wall=3439 2023-05-01 03:31:06 - progress_bar.py[line:274] - INFO: epoch 001: 863 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7396.1, nsentences=120, sample_size=4106.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1897.9, ups=0.26, wpb=7396.1, bsz=120, num_updates=860, lr=7.11724e-06, gnorm=0.921, clip=0, loss_scale=16, train_wall=39, gb_free=30.5, wall=3478 2023-05-01 03:31:45 - progress_bar.py[line:274] - INFO: epoch 001: 873 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7597.4, nsentences=120, sample_size=4003.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1920.1, ups=0.25, wpb=7597.4, bsz=120, num_updates=870, lr=7.2e-06, gnorm=0.893, clip=10, loss_scale=16, train_wall=39, gb_free=29.3, wall=3518 2023-05-01 03:32:25 - progress_bar.py[line:274] - INFO: epoch 001: 883 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7612.4, nsentences=120, sample_size=4328.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1893.2, ups=0.25, wpb=7612.4, bsz=120, num_updates=880, lr=7.28276e-06, gnorm=0.884, clip=0, loss_scale=16, train_wall=40, gb_free=30, wall=3558 2023-05-01 03:33:05 - progress_bar.py[line:274] - INFO: epoch 001: 893 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7886.2, nsentences=120, sample_size=3880.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1974.3, ups=0.25, wpb=7886.2, bsz=120, num_updates=890, lr=7.36552e-06, gnorm=0.891, clip=0, loss_scale=16, train_wall=40, gb_free=28.4, wall=3598 2023-05-01 03:33:45 - progress_bar.py[line:274] - INFO: epoch 001: 903 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7543.8, nsentences=120, sample_size=4136, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1876.1, ups=0.25, wpb=7543.8, bsz=120, num_updates=900, lr=7.44828e-06, gnorm=0.876, clip=0, loss_scale=16, train_wall=40, gb_free=29.4, wall=3638 2023-05-01 03:34:25 - progress_bar.py[line:274] - INFO: epoch 001: 913 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7572.9, nsentences=120, sample_size=4141.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1905.7, ups=0.25, wpb=7572.9, bsz=120, num_updates=910, lr=7.53103e-06, gnorm=0.891, clip=10, loss_scale=16, train_wall=40, gb_free=29.1, wall=3678 2023-05-01 03:35:05 - progress_bar.py[line:274] - INFO: epoch 001: 923 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7665.9, nsentences=120, sample_size=3924.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1930.5, ups=0.25, wpb=7665.9, bsz=120, num_updates=920, lr=7.61379e-06, gnorm=0.92, clip=10, loss_scale=16, train_wall=40, gb_free=28.4, wall=3717 2023-05-01 03:35:44 - progress_bar.py[line:274] - INFO: epoch 001: 933 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7673.7, nsentences=120, sample_size=3932.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1942.3, ups=0.25, wpb=7673.7, bsz=120, num_updates=930, lr=7.69655e-06, gnorm=0.918, clip=10, loss_scale=16, train_wall=39, gb_free=30, wall=3757 2023-05-01 03:36:24 - progress_bar.py[line:274] - INFO: epoch 001: 943 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7942.6, nsentences=120, sample_size=3665.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2006.6, ups=0.25, wpb=7942.6, bsz=120, num_updates=940, lr=7.77931e-06, gnorm=0.942, clip=30, loss_scale=16, train_wall=40, gb_free=30.3, wall=3796 2023-05-01 03:37:03 - progress_bar.py[line:274] - INFO: epoch 001: 953 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7657, nsentences=120, sample_size=3958.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1967.2, ups=0.26, wpb=7657, bsz=120, num_updates=950, lr=7.86207e-06, gnorm=0.96, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=3835 2023-05-01 03:37:42 - progress_bar.py[line:274] - INFO: epoch 001: 963 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7877.9, nsentences=120, sample_size=3736.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2000.7, ups=0.25, wpb=7877.9, bsz=120, num_updates=960, lr=7.94483e-06, gnorm=0.954, clip=30, loss_scale=32, train_wall=39, gb_free=29.3, wall=3875 2023-05-01 03:38:22 - progress_bar.py[line:274] - INFO: epoch 001: 973 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7761.9, nsentences=120, sample_size=3868.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1934.7, ups=0.25, wpb=7761.9, bsz=120, num_updates=970, lr=8.02759e-06, gnorm=0.953, clip=20, loss_scale=32, train_wall=40, gb_free=30.6, wall=3915 2023-05-01 03:39:03 - progress_bar.py[line:274] - INFO: epoch 001: 983 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7853.9, nsentences=120, sample_size=4015.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1945.9, ups=0.25, wpb=7853.9, bsz=120, num_updates=980, lr=8.11034e-06, gnorm=0.896, clip=0, loss_scale=32, train_wall=40, gb_free=28.6, wall=3955 2023-05-01 03:39:42 - progress_bar.py[line:274] - INFO: epoch 001: 993 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7484.2, nsentences=120, sample_size=3980.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1893.4, ups=0.25, wpb=7484.2, bsz=120, num_updates=990, lr=8.1931e-06, gnorm=0.889, clip=0, loss_scale=32, train_wall=39, gb_free=29.9, wall=3995 2023-05-01 03:40:21 - progress_bar.py[line:274] - INFO: epoch 001: 1003 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7560.4, nsentences=120, sample_size=3739.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1938.6, ups=0.26, wpb=7560.4, bsz=120, num_updates=1000, lr=8.27586e-06, gnorm=0.933, clip=0, loss_scale=32, train_wall=39, gb_free=29.8, wall=4034 2023-05-01 03:40:21 - train.py[line:445] - INFO: begin validation on "valid" subset /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size 2023-05-01 03:40:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 03:40:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/fairseq/fairseq/search.py:140: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). beams_buf = indices_buf // vocab_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size /mnt/bn/hri-lq/projects/VLDD/OFA/models/sequence_generator.py:708: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). unfin_idx = bbsz_idx // beam_size 2023-05-01 03:40:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 03:40:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 03:40:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 03:40:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 03:40:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:40:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:40:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 03:41:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 03:41:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:08 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 03:41:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 03:41:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 03:41:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 03:41:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 03:41:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 03:41:13 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.186 | loss_v1 0 | loss_v2 0 | nll_loss 2.015 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.04 | score 0.749 | wps 3276.1 | wpb 3202.1 | bsz 39.4 | num_updates 1000 2023-05-01 03:41:13 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 1000 updates 2023-05-01 03:41:13 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_1000.pt 2023-05-01 03:41:38 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_1000.pt 2023-05-01 03:42:15 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_1000.pt (epoch 1 @ 1000 updates, score 0.749) (writing took 62.325430693803355 seconds) 2023-05-01 03:42:55 - progress_bar.py[line:274] - INFO: epoch 001: 1013 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7528.3, nsentences=120, sample_size=4009, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=490, ups=0.07, wpb=7528.3, bsz=120, num_updates=1010, lr=8.35862e-06, gnorm=0.895, clip=10, loss_scale=32, train_wall=39, gb_free=29.7, wall=4187 2023-05-01 03:43:36 - progress_bar.py[line:274] - INFO: epoch 001: 1023 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7655.8, nsentences=120, sample_size=4237.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1876, ups=0.25, wpb=7655.8, bsz=120, num_updates=1020, lr=8.44138e-06, gnorm=0.915, clip=0, loss_scale=32, train_wall=41, gb_free=30.1, wall=4228 2023-05-01 03:44:16 - progress_bar.py[line:274] - INFO: epoch 001: 1033 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7661.9, nsentences=120, sample_size=3959.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1916.2, ups=0.25, wpb=7661.9, bsz=120, num_updates=1030, lr=8.52414e-06, gnorm=0.951, clip=10, loss_scale=32, train_wall=40, gb_free=30.3, wall=4268 2023-05-01 03:44:56 - progress_bar.py[line:274] - INFO: epoch 001: 1043 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7985.7, nsentences=120, sample_size=3647.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1992.6, ups=0.25, wpb=7985.7, bsz=120, num_updates=1040, lr=8.6069e-06, gnorm=0.945, clip=30, loss_scale=32, train_wall=40, gb_free=29.2, wall=4308 2023-05-01 03:45:36 - progress_bar.py[line:274] - INFO: epoch 001: 1053 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7757, nsentences=120, sample_size=4236.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1924.8, ups=0.25, wpb=7757, bsz=120, num_updates=1050, lr=8.68966e-06, gnorm=0.874, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=4349 2023-05-01 03:46:16 - progress_bar.py[line:274] - INFO: epoch 001: 1063 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.255, ntokens=7758.5, nsentences=120, sample_size=4056, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1958.3, ups=0.25, wpb=7758.5, bsz=120, num_updates=1060, lr=8.77241e-06, gnorm=0.912, clip=10, loss_scale=32, train_wall=40, gb_free=31.1, wall=4388 2023-05-01 03:46:56 - progress_bar.py[line:274] - INFO: epoch 001: 1073 / 6042 loss=2.532, loss_v1=0, loss_v2=0, nll_loss=1.299, ntokens=7827.2, nsentences=120, sample_size=4149.2, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1952.4, ups=0.25, wpb=7827.2, bsz=120, num_updates=1070, lr=8.85517e-06, gnorm=0.914, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=4428 2023-05-01 03:47:35 - progress_bar.py[line:274] - INFO: epoch 001: 1083 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7394.2, nsentences=120, sample_size=4013.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1892.5, ups=0.26, wpb=7394.2, bsz=120, num_updates=1080, lr=8.93793e-06, gnorm=0.929, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=4467 2023-05-01 03:48:14 - progress_bar.py[line:274] - INFO: epoch 001: 1093 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7760.9, nsentences=120, sample_size=3930.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1976.2, ups=0.25, wpb=7760.9, bsz=120, num_updates=1090, lr=9.02069e-06, gnorm=0.925, clip=10, loss_scale=32, train_wall=39, gb_free=30.1, wall=4507 2023-05-01 03:48:55 - progress_bar.py[line:274] - INFO: epoch 001: 1103 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=8083, nsentences=120, sample_size=3983.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1993.3, ups=0.25, wpb=8083, bsz=120, num_updates=1100, lr=9.10345e-06, gnorm=0.941, clip=20, loss_scale=32, train_wall=40, gb_free=29.1, wall=4547 2023-05-01 03:49:34 - progress_bar.py[line:274] - INFO: epoch 001: 1113 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7525.3, nsentences=120, sample_size=3817.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1917, ups=0.25, wpb=7525.3, bsz=120, num_updates=1110, lr=9.18621e-06, gnorm=0.938, clip=20, loss_scale=32, train_wall=39, gb_free=31, wall=4586 2023-05-01 03:50:14 - progress_bar.py[line:274] - INFO: epoch 001: 1123 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7827.1, nsentences=120, sample_size=3924.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1942, ups=0.25, wpb=7827.1, bsz=120, num_updates=1120, lr=9.26897e-06, gnorm=0.926, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=4627 2023-05-01 03:50:54 - progress_bar.py[line:274] - INFO: epoch 001: 1133 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7776.7, nsentences=120, sample_size=4248.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1945.8, ups=0.25, wpb=7776.7, bsz=120, num_updates=1130, lr=9.35172e-06, gnorm=0.912, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=4667 2023-05-01 03:51:34 - progress_bar.py[line:274] - INFO: epoch 001: 1143 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7517.5, nsentences=120, sample_size=3857.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1894.4, ups=0.25, wpb=7517.5, bsz=120, num_updates=1140, lr=9.43448e-06, gnorm=0.924, clip=0, loss_scale=32, train_wall=40, gb_free=30.3, wall=4706 2023-05-01 03:52:14 - progress_bar.py[line:274] - INFO: epoch 001: 1153 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7871.5, nsentences=120, sample_size=3812.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1950.2, ups=0.25, wpb=7871.5, bsz=120, num_updates=1150, lr=9.51724e-06, gnorm=0.922, clip=30, loss_scale=32, train_wall=40, gb_free=30.3, wall=4747 2023-05-01 03:52:54 - progress_bar.py[line:274] - INFO: epoch 001: 1163 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7622.1, nsentences=120, sample_size=4288.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1905.7, ups=0.25, wpb=7622.1, bsz=120, num_updates=1160, lr=9.6e-06, gnorm=0.877, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=4787 2023-05-01 03:53:34 - progress_bar.py[line:274] - INFO: epoch 001: 1173 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7747.8, nsentences=120, sample_size=3926.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1939.3, ups=0.25, wpb=7747.8, bsz=120, num_updates=1170, lr=9.68276e-06, gnorm=0.89, clip=0, loss_scale=32, train_wall=40, gb_free=25.8, wall=4827 2023-05-01 03:54:14 - progress_bar.py[line:274] - INFO: epoch 001: 1183 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7647.4, nsentences=120, sample_size=4114.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1922.2, ups=0.25, wpb=7647.4, bsz=120, num_updates=1180, lr=9.76552e-06, gnorm=0.913, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=4867 2023-05-01 03:54:54 - progress_bar.py[line:274] - INFO: epoch 001: 1193 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7750.5, nsentences=120, sample_size=3973.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1932.3, ups=0.25, wpb=7750.5, bsz=120, num_updates=1190, lr=9.84828e-06, gnorm=0.903, clip=0, loss_scale=32, train_wall=40, gb_free=29.6, wall=4907 2023-05-01 03:55:34 - progress_bar.py[line:274] - INFO: epoch 001: 1203 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=8052.9, nsentences=120, sample_size=4271.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2012.2, ups=0.25, wpb=8052.9, bsz=120, num_updates=1200, lr=9.93103e-06, gnorm=0.909, clip=10, loss_scale=32, train_wall=40, gb_free=29.1, wall=4947 2023-05-01 03:56:15 - progress_bar.py[line:274] - INFO: epoch 001: 1213 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=8163.8, nsentences=120, sample_size=3907.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2004.9, ups=0.25, wpb=8163.8, bsz=120, num_updates=1210, lr=1.00138e-05, gnorm=0.89, clip=0, loss_scale=32, train_wall=41, gb_free=30.6, wall=4987 2023-05-01 03:56:55 - progress_bar.py[line:274] - INFO: epoch 001: 1223 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7828.5, nsentences=120, sample_size=3938.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1951.3, ups=0.25, wpb=7828.5, bsz=120, num_updates=1220, lr=1.00966e-05, gnorm=0.905, clip=0, loss_scale=32, train_wall=40, gb_free=31, wall=5027 2023-05-01 03:57:35 - progress_bar.py[line:274] - INFO: epoch 001: 1233 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7553.5, nsentences=120, sample_size=4261.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1891.8, ups=0.25, wpb=7553.5, bsz=120, num_updates=1230, lr=1.01793e-05, gnorm=0.902, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=5067 2023-05-01 03:58:15 - progress_bar.py[line:274] - INFO: epoch 001: 1243 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7797.8, nsentences=120, sample_size=4101.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1965.5, ups=0.25, wpb=7797.8, bsz=120, num_updates=1240, lr=1.02621e-05, gnorm=0.926, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=5107 2023-05-01 03:58:55 - progress_bar.py[line:274] - INFO: epoch 001: 1253 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7783.6, nsentences=120, sample_size=4287.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1939.1, ups=0.25, wpb=7783.6, bsz=120, num_updates=1250, lr=1.03448e-05, gnorm=0.894, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=5147 2023-05-01 03:59:34 - progress_bar.py[line:274] - INFO: epoch 001: 1263 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7594.9, nsentences=120, sample_size=4145.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1924.1, ups=0.25, wpb=7594.9, bsz=120, num_updates=1260, lr=1.04276e-05, gnorm=0.899, clip=0, loss_scale=32, train_wall=39, gb_free=30, wall=5187 2023-05-01 04:00:14 - progress_bar.py[line:274] - INFO: epoch 001: 1273 / 6042 loss=2.529, loss_v1=0, loss_v2=0, nll_loss=1.285, ntokens=7958.2, nsentences=120, sample_size=4229.4, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1987.1, ups=0.25, wpb=7958.2, bsz=120, num_updates=1270, lr=1.05103e-05, gnorm=0.918, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=5227 2023-05-01 04:00:54 - progress_bar.py[line:274] - INFO: epoch 001: 1283 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7831.3, nsentences=120, sample_size=4153.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1985.2, ups=0.25, wpb=7831.3, bsz=120, num_updates=1280, lr=1.05931e-05, gnorm=0.927, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=5266 2023-05-01 04:01:33 - progress_bar.py[line:274] - INFO: epoch 001: 1293 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7805.6, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1978, ups=0.25, wpb=7805.6, bsz=120, num_updates=1290, lr=1.06759e-05, gnorm=0.909, clip=10, loss_scale=32, train_wall=39, gb_free=29.9, wall=5306 2023-05-01 04:02:13 - progress_bar.py[line:274] - INFO: epoch 001: 1303 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7692.9, nsentences=120, sample_size=4059.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1947.3, ups=0.25, wpb=7692.9, bsz=120, num_updates=1300, lr=1.07586e-05, gnorm=0.936, clip=20, loss_scale=32, train_wall=39, gb_free=31.5, wall=5345 2023-05-01 04:02:52 - progress_bar.py[line:274] - INFO: epoch 001: 1313 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=8022.6, nsentences=120, sample_size=3851.4, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2017.4, ups=0.25, wpb=8022.6, bsz=120, num_updates=1310, lr=1.08414e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=5385 2023-05-01 04:03:33 - progress_bar.py[line:274] - INFO: epoch 001: 1323 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7870.6, nsentences=120, sample_size=4110, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1940, ups=0.25, wpb=7870.6, bsz=120, num_updates=1320, lr=1.09241e-05, gnorm=0.886, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=5426 2023-05-01 04:04:13 - progress_bar.py[line:274] - INFO: epoch 001: 1333 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7812.9, nsentences=120, sample_size=4148.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1946.7, ups=0.25, wpb=7812.9, bsz=120, num_updates=1330, lr=1.10069e-05, gnorm=0.9, clip=0, loss_scale=32, train_wall=40, gb_free=30.3, wall=5466 2023-05-01 04:04:53 - progress_bar.py[line:274] - INFO: epoch 001: 1343 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7515.1, nsentences=120, sample_size=3824.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1901.2, ups=0.25, wpb=7515.1, bsz=120, num_updates=1340, lr=1.10897e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=39, gb_free=29.1, wall=5505 2023-05-01 04:05:33 - progress_bar.py[line:274] - INFO: epoch 001: 1353 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7472.4, nsentences=120, sample_size=4179.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1865.7, ups=0.25, wpb=7472.4, bsz=120, num_updates=1350, lr=1.11724e-05, gnorm=0.885, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=5545 2023-05-01 04:06:13 - progress_bar.py[line:274] - INFO: epoch 001: 1363 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7675.9, nsentences=120, sample_size=3970.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1924.5, ups=0.25, wpb=7675.9, bsz=120, num_updates=1360, lr=1.12552e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=5585 2023-05-01 04:06:53 - progress_bar.py[line:274] - INFO: epoch 001: 1373 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7758.7, nsentences=120, sample_size=3870.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1923.4, ups=0.25, wpb=7758.7, bsz=120, num_updates=1370, lr=1.13379e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=5625 2023-05-01 04:07:32 - progress_bar.py[line:274] - INFO: epoch 001: 1383 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7573.6, nsentences=120, sample_size=3695.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1937.9, ups=0.26, wpb=7573.6, bsz=120, num_updates=1380, lr=1.14207e-05, gnorm=0.968, clip=40, loss_scale=32, train_wall=39, gb_free=30.4, wall=5665 2023-05-01 04:08:12 - progress_bar.py[line:274] - INFO: epoch 001: 1393 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7743.7, nsentences=120, sample_size=4213.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1962.4, ups=0.25, wpb=7743.7, bsz=120, num_updates=1390, lr=1.15034e-05, gnorm=0.891, clip=0, loss_scale=32, train_wall=39, gb_free=26.3, wall=5704 2023-05-01 04:08:51 - progress_bar.py[line:274] - INFO: epoch 001: 1403 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7929.2, nsentences=120, sample_size=3922.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1988.3, ups=0.25, wpb=7929.2, bsz=120, num_updates=1400, lr=1.15862e-05, gnorm=0.965, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=5744 2023-05-01 04:09:30 - progress_bar.py[line:274] - INFO: epoch 001: 1413 / 6042 loss=2.529, loss_v1=0, loss_v2=0, nll_loss=1.287, ntokens=7845.4, nsentences=120, sample_size=3980.5, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=2020.1, ups=0.26, wpb=7845.4, bsz=120, num_updates=1410, lr=1.1669e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=39, gb_free=29.3, wall=5783 2023-05-01 04:10:10 - progress_bar.py[line:274] - INFO: epoch 001: 1423 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7779.5, nsentences=120, sample_size=3947.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1956.1, ups=0.25, wpb=7779.5, bsz=120, num_updates=1420, lr=1.17517e-05, gnorm=0.933, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=5822 2023-05-01 04:10:49 - progress_bar.py[line:274] - INFO: epoch 001: 1433 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7620.6, nsentences=120, sample_size=3900.1, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1938.5, ups=0.25, wpb=7620.6, bsz=120, num_updates=1430, lr=1.18345e-05, gnorm=0.922, clip=0, loss_scale=32, train_wall=39, gb_free=27.5, wall=5862 2023-05-01 04:11:29 - progress_bar.py[line:274] - INFO: epoch 001: 1443 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7920.9, nsentences=120, sample_size=4069.8, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1986.1, ups=0.25, wpb=7920.9, bsz=120, num_updates=1440, lr=1.19172e-05, gnorm=0.891, clip=0, loss_scale=32, train_wall=40, gb_free=27.4, wall=5902 2023-05-01 04:12:09 - progress_bar.py[line:274] - INFO: epoch 001: 1453 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7601.4, nsentences=120, sample_size=4201, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1890.3, ups=0.25, wpb=7601.4, bsz=120, num_updates=1450, lr=1.2e-05, gnorm=0.875, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=5942 2023-05-01 04:12:50 - progress_bar.py[line:274] - INFO: epoch 001: 1463 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7935.7, nsentences=120, sample_size=3815.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1975.2, ups=0.25, wpb=7935.7, bsz=120, num_updates=1460, lr=1.20828e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=26, wall=5982 2023-05-01 04:13:30 - progress_bar.py[line:274] - INFO: epoch 001: 1473 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7799.8, nsentences=120, sample_size=4067.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1950.8, ups=0.25, wpb=7799.8, bsz=120, num_updates=1470, lr=1.21655e-05, gnorm=0.904, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=6022 2023-05-01 04:14:09 - progress_bar.py[line:274] - INFO: epoch 001: 1483 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7842.4, nsentences=120, sample_size=3860.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1986.4, ups=0.25, wpb=7842.4, bsz=120, num_updates=1480, lr=1.22483e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=6062 2023-05-01 04:14:50 - progress_bar.py[line:274] - INFO: epoch 001: 1493 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7655.8, nsentences=120, sample_size=4301.6, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1891.8, ups=0.25, wpb=7655.8, bsz=120, num_updates=1490, lr=1.2331e-05, gnorm=0.868, clip=0, loss_scale=64, train_wall=40, gb_free=29.1, wall=6102 2023-05-01 04:15:29 - progress_bar.py[line:274] - INFO: epoch 001: 1503 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7585.7, nsentences=120, sample_size=4002, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1910.8, ups=0.25, wpb=7585.7, bsz=120, num_updates=1500, lr=1.24138e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=6142 2023-05-01 04:16:09 - progress_bar.py[line:274] - INFO: epoch 001: 1513 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7894.9, nsentences=120, sample_size=4077.5, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1982.6, ups=0.25, wpb=7894.9, bsz=120, num_updates=1510, lr=1.24966e-05, gnorm=0.89, clip=0, loss_scale=64, train_wall=40, gb_free=28.9, wall=6182 2023-05-01 04:16:50 - progress_bar.py[line:274] - INFO: epoch 001: 1523 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7962.3, nsentences=120, sample_size=3560.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1967.4, ups=0.25, wpb=7962.3, bsz=120, num_updates=1520, lr=1.25793e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=6222 2023-05-01 04:17:29 - progress_bar.py[line:274] - INFO: epoch 001: 1533 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7733.8, nsentences=120, sample_size=3858.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1957.3, ups=0.25, wpb=7733.8, bsz=120, num_updates=1530, lr=1.26621e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=6262 2023-05-01 04:18:09 - progress_bar.py[line:274] - INFO: epoch 001: 1543 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7818, nsentences=120, sample_size=4205.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1946.2, ups=0.25, wpb=7818, bsz=120, num_updates=1540, lr=1.27448e-05, gnorm=0.892, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=6302 2023-05-01 04:18:48 - progress_bar.py[line:274] - INFO: epoch 001: 1553 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7432.8, nsentences=120, sample_size=4059.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1897.6, ups=0.26, wpb=7432.8, bsz=120, num_updates=1550, lr=1.28276e-05, gnorm=0.901, clip=0, loss_scale=64, train_wall=39, gb_free=30, wall=6341 2023-05-01 04:19:29 - progress_bar.py[line:274] - INFO: epoch 001: 1563 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=8168, nsentences=120, sample_size=4106.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1994.2, ups=0.24, wpb=8168, bsz=120, num_updates=1560, lr=1.29103e-05, gnorm=0.864, clip=0, loss_scale=64, train_wall=41, gb_free=27.8, wall=6382 2023-05-01 04:20:09 - progress_bar.py[line:274] - INFO: epoch 001: 1573 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7617.7, nsentences=120, sample_size=4325.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1909.4, ups=0.25, wpb=7617.7, bsz=120, num_updates=1570, lr=1.29931e-05, gnorm=0.862, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=6422 2023-05-01 04:20:49 - progress_bar.py[line:274] - INFO: epoch 001: 1583 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7703.6, nsentences=120, sample_size=4177, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1943.4, ups=0.25, wpb=7703.6, bsz=120, num_updates=1580, lr=1.30759e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=6461 2023-05-01 04:21:29 - progress_bar.py[line:274] - INFO: epoch 001: 1593 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7755.7, nsentences=120, sample_size=3822.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1956.5, ups=0.25, wpb=7755.7, bsz=120, num_updates=1590, lr=1.31586e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=40, gb_free=31.5, wall=6501 2023-05-01 04:22:08 - progress_bar.py[line:274] - INFO: epoch 001: 1603 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7564.6, nsentences=120, sample_size=3984.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1907.1, ups=0.25, wpb=7564.6, bsz=120, num_updates=1600, lr=1.32414e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=6541 2023-05-01 04:22:49 - progress_bar.py[line:274] - INFO: epoch 001: 1613 / 6042 loss=2.524, loss_v1=0, loss_v2=0, nll_loss=1.291, ntokens=7736.8, nsentences=120, sample_size=3919.8, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1904.2, ups=0.25, wpb=7736.8, bsz=120, num_updates=1610, lr=1.33241e-05, gnorm=0.895, clip=10, loss_scale=64, train_wall=41, gb_free=29.2, wall=6581 2023-05-01 04:23:29 - progress_bar.py[line:274] - INFO: epoch 001: 1623 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7982.4, nsentences=120, sample_size=3941.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1980.5, ups=0.25, wpb=7982.4, bsz=120, num_updates=1620, lr=1.34069e-05, gnorm=0.885, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=6622 2023-05-01 04:24:09 - progress_bar.py[line:274] - INFO: epoch 001: 1633 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7438.9, nsentences=120, sample_size=4146.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1862.5, ups=0.25, wpb=7438.9, bsz=120, num_updates=1630, lr=1.34897e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=6662 2023-05-01 04:24:49 - progress_bar.py[line:274] - INFO: epoch 001: 1643 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7746.7, nsentences=120, sample_size=3976, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1957.4, ups=0.25, wpb=7746.7, bsz=120, num_updates=1640, lr=1.35724e-05, gnorm=0.936, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=6701 2023-05-01 04:25:29 - progress_bar.py[line:274] - INFO: epoch 001: 1653 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7736.1, nsentences=120, sample_size=4140.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1926.6, ups=0.25, wpb=7736.1, bsz=120, num_updates=1650, lr=1.36552e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=6741 2023-05-01 04:26:10 - progress_bar.py[line:274] - INFO: epoch 001: 1663 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7820, nsentences=120, sample_size=4119.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1907.6, ups=0.24, wpb=7820, bsz=120, num_updates=1660, lr=1.37379e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=41, gb_free=30.4, wall=6782 2023-05-01 04:26:51 - progress_bar.py[line:274] - INFO: epoch 001: 1673 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7762.6, nsentences=120, sample_size=3871.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1896.1, ups=0.24, wpb=7762.6, bsz=120, num_updates=1670, lr=1.38207e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=6823 2023-05-01 04:27:30 - progress_bar.py[line:274] - INFO: epoch 001: 1683 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7778.5, nsentences=120, sample_size=3998, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1982, ups=0.25, wpb=7778.5, bsz=120, num_updates=1680, lr=1.39034e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=6862 2023-05-01 04:28:09 - progress_bar.py[line:274] - INFO: epoch 001: 1693 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7541.2, nsentences=120, sample_size=4256.3, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1922.1, ups=0.25, wpb=7541.2, bsz=120, num_updates=1690, lr=1.39862e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=39, gb_free=28.9, wall=6902 2023-05-01 04:28:48 - progress_bar.py[line:274] - INFO: epoch 001: 1703 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.255, ntokens=7738.4, nsentences=120, sample_size=4043.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1982.8, ups=0.26, wpb=7738.4, bsz=120, num_updates=1700, lr=1.4069e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=6941 2023-05-01 04:29:28 - progress_bar.py[line:274] - INFO: epoch 001: 1713 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7574, nsentences=120, sample_size=4091, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1922.5, ups=0.25, wpb=7574, bsz=120, num_updates=1710, lr=1.41517e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=6980 2023-05-01 04:30:07 - progress_bar.py[line:274] - INFO: epoch 001: 1723 / 6042 loss=2.524, loss_v1=0, loss_v2=0, nll_loss=1.284, ntokens=7899.1, nsentences=120, sample_size=4138, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1993.6, ups=0.25, wpb=7899.1, bsz=120, num_updates=1720, lr=1.42345e-05, gnorm=0.908, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=7020 2023-05-01 04:30:47 - progress_bar.py[line:274] - INFO: epoch 001: 1733 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7747.3, nsentences=120, sample_size=4174.5, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1939.8, ups=0.25, wpb=7747.3, bsz=120, num_updates=1730, lr=1.43172e-05, gnorm=0.893, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=7060 2023-05-01 04:31:26 - progress_bar.py[line:274] - INFO: epoch 001: 1743 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7397, nsentences=120, sample_size=4066.2, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1904.5, ups=0.26, wpb=7397, bsz=120, num_updates=1740, lr=1.44e-05, gnorm=0.895, clip=10, loss_scale=64, train_wall=39, gb_free=31.1, wall=7099 2023-05-01 04:32:05 - progress_bar.py[line:274] - INFO: epoch 001: 1753 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7431.5, nsentences=120, sample_size=4211.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1909.2, ups=0.26, wpb=7431.5, bsz=120, num_updates=1750, lr=1.44828e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=7137 2023-05-01 04:32:45 - progress_bar.py[line:274] - INFO: epoch 001: 1763 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7671.3, nsentences=120, sample_size=3852, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1942.1, ups=0.25, wpb=7671.3, bsz=120, num_updates=1760, lr=1.45655e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=7177 2023-05-01 04:33:24 - progress_bar.py[line:274] - INFO: epoch 001: 1773 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7819.8, nsentences=120, sample_size=4001.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1965.8, ups=0.25, wpb=7819.8, bsz=120, num_updates=1770, lr=1.46483e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=7217 2023-05-01 04:34:04 - progress_bar.py[line:274] - INFO: epoch 001: 1783 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=8108.8, nsentences=120, sample_size=3975.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2021.1, ups=0.25, wpb=8108.8, bsz=120, num_updates=1780, lr=1.4731e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=31.6, wall=7257 2023-05-01 04:34:44 - progress_bar.py[line:274] - INFO: epoch 001: 1793 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7507.1, nsentences=120, sample_size=3863.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1876.8, ups=0.25, wpb=7507.1, bsz=120, num_updates=1790, lr=1.48138e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=7297 2023-05-01 04:35:25 - progress_bar.py[line:274] - INFO: epoch 001: 1803 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7987, nsentences=120, sample_size=4099.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1960.8, ups=0.25, wpb=7987, bsz=120, num_updates=1800, lr=1.48966e-05, gnorm=0.901, clip=0, loss_scale=64, train_wall=41, gb_free=29.6, wall=7338 2023-05-01 04:36:05 - progress_bar.py[line:274] - INFO: epoch 001: 1813 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7417.2, nsentences=120, sample_size=4458.5, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1867.8, ups=0.25, wpb=7417.2, bsz=120, num_updates=1810, lr=1.49793e-05, gnorm=0.874, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=7377 2023-05-01 04:36:46 - progress_bar.py[line:274] - INFO: epoch 001: 1823 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7529.7, nsentences=120, sample_size=3987.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1842.8, ups=0.24, wpb=7529.7, bsz=120, num_updates=1820, lr=1.50621e-05, gnorm=0.918, clip=20, loss_scale=64, train_wall=41, gb_free=28.5, wall=7418 2023-05-01 04:37:26 - progress_bar.py[line:274] - INFO: epoch 001: 1833 / 6042 loss=2.519, loss_v1=0, loss_v2=0, nll_loss=1.282, ntokens=7856.2, nsentences=120, sample_size=4028.2, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1961.4, ups=0.25, wpb=7856.2, bsz=120, num_updates=1830, lr=1.51448e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=7458 2023-05-01 04:38:06 - progress_bar.py[line:274] - INFO: epoch 001: 1843 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7642.3, nsentences=120, sample_size=3966.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1899.5, ups=0.25, wpb=7642.3, bsz=120, num_updates=1840, lr=1.52276e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=7498 2023-05-01 04:38:45 - progress_bar.py[line:274] - INFO: epoch 001: 1853 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7695.9, nsentences=120, sample_size=3993.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1953, ups=0.25, wpb=7695.9, bsz=120, num_updates=1850, lr=1.53103e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=7538 2023-05-01 04:39:25 - progress_bar.py[line:274] - INFO: epoch 001: 1863 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7725.8, nsentences=120, sample_size=4108.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1946.7, ups=0.25, wpb=7725.8, bsz=120, num_updates=1860, lr=1.53931e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=7578 2023-05-01 04:40:05 - progress_bar.py[line:274] - INFO: epoch 001: 1873 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7847.9, nsentences=120, sample_size=4339, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1947.5, ups=0.25, wpb=7847.9, bsz=120, num_updates=1870, lr=1.54759e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=7618 2023-05-01 04:40:45 - progress_bar.py[line:274] - INFO: epoch 001: 1883 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7948.3, nsentences=120, sample_size=3881.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1991.7, ups=0.25, wpb=7948.3, bsz=120, num_updates=1880, lr=1.55586e-05, gnorm=0.927, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=7658 2023-05-01 04:41:24 - progress_bar.py[line:274] - INFO: epoch 001: 1893 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7648.2, nsentences=120, sample_size=3808.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1970.1, ups=0.26, wpb=7648.2, bsz=120, num_updates=1890, lr=1.56414e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=7697 2023-05-01 04:42:04 - progress_bar.py[line:274] - INFO: epoch 001: 1903 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7413.8, nsentences=120, sample_size=3915.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1858, ups=0.25, wpb=7413.8, bsz=120, num_updates=1900, lr=1.57241e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=7737 2023-05-01 04:42:45 - progress_bar.py[line:274] - INFO: epoch 001: 1913 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7993.4, nsentences=120, sample_size=4114.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1952.8, ups=0.24, wpb=7993.4, bsz=120, num_updates=1910, lr=1.58069e-05, gnorm=1.043, clip=20, loss_scale=64, train_wall=41, gb_free=30.2, wall=7777 2023-05-01 04:43:25 - progress_bar.py[line:274] - INFO: epoch 001: 1923 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.274, ntokens=7735.3, nsentences=120, sample_size=4051.7, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1944, ups=0.25, wpb=7735.3, bsz=120, num_updates=1920, lr=1.58897e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=7817 2023-05-01 04:44:05 - progress_bar.py[line:274] - INFO: epoch 001: 1933 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7740.1, nsentences=120, sample_size=4238.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1947.3, ups=0.25, wpb=7740.1, bsz=120, num_updates=1930, lr=1.59724e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=7857 2023-05-01 04:44:44 - progress_bar.py[line:274] - INFO: epoch 001: 1943 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7583.3, nsentences=120, sample_size=3775.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1904.1, ups=0.25, wpb=7583.3, bsz=120, num_updates=1940, lr=1.60552e-05, gnorm=0.947, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=7897 2023-05-01 04:45:24 - progress_bar.py[line:274] - INFO: epoch 001: 1953 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7784.4, nsentences=120, sample_size=4303.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1942.5, ups=0.25, wpb=7784.4, bsz=120, num_updates=1950, lr=1.61379e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=40, gb_free=25.9, wall=7937 2023-05-01 04:46:04 - progress_bar.py[line:274] - INFO: epoch 001: 1963 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=8033.3, nsentences=120, sample_size=4005.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2018.1, ups=0.25, wpb=8033.3, bsz=120, num_updates=1960, lr=1.62207e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=7977 2023-05-01 04:46:44 - progress_bar.py[line:274] - INFO: epoch 001: 1973 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7632.3, nsentences=120, sample_size=4041.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1932.4, ups=0.25, wpb=7632.3, bsz=120, num_updates=1970, lr=1.63034e-05, gnorm=0.938, clip=30, loss_scale=128, train_wall=39, gb_free=31.3, wall=8016 2023-05-01 04:46:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 04:47:27 - progress_bar.py[line:274] - INFO: epoch 001: 1984 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7792.8, nsentences=120, sample_size=4039.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1784.9, ups=0.23, wpb=7792.8, bsz=120, num_updates=1980, lr=1.63862e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=44, gb_free=26.1, wall=8060 2023-05-01 04:48:07 - progress_bar.py[line:274] - INFO: epoch 001: 1994 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7725.1, nsentences=120, sample_size=3803.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1942, ups=0.25, wpb=7725.1, bsz=120, num_updates=1990, lr=1.6469e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=40, gb_free=31.2, wall=8100 2023-05-01 04:48:47 - progress_bar.py[line:274] - INFO: epoch 001: 2004 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7545.7, nsentences=120, sample_size=4205.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1915.2, ups=0.25, wpb=7545.7, bsz=120, num_updates=2000, lr=1.65517e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=28.8, wall=8139 2023-05-01 04:48:47 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 04:48:48 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 04:48:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:48:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:48:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:48:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 04:49:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:49:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:17 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 04:49:17 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:49:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:29 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 04:49:29 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:49:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:33 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 04:49:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:49:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 04:49:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 04:49:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 04:49:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 04:49:38 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.199 | loss_v1 0 | loss_v2 0 | nll_loss 2.029 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.08 | score 0.7471 | wps 3284.8 | wpb 3202.1 | bsz 39.4 | num_updates 2000 | best_score 0.749 2023-05-01 04:49:38 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 2000 updates 2023-05-01 04:49:38 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_2000.pt 2023-05-01 04:50:04 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_2000.pt 2023-05-01 04:50:18 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_2000.pt (epoch 1 @ 2000 updates, score 0.7471) (writing took 39.59572662389837 seconds) 2023-05-01 04:50:56 - progress_bar.py[line:274] - INFO: epoch 001: 2014 / 6042 loss=2.518, loss_v1=0, loss_v2=0, nll_loss=1.28, ntokens=7860.2, nsentences=120, sample_size=3876, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=606, ups=0.08, wpb=7860.2, bsz=120, num_updates=2010, lr=1.66345e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=8269 2023-05-01 04:51:36 - progress_bar.py[line:274] - INFO: epoch 001: 2024 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7786.6, nsentences=120, sample_size=3644.5, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1954.8, ups=0.25, wpb=7786.6, bsz=120, num_updates=2020, lr=1.67172e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=8309 2023-05-01 04:52:15 - progress_bar.py[line:274] - INFO: epoch 001: 2034 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7531.1, nsentences=120, sample_size=4010.6, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1918.5, ups=0.25, wpb=7531.1, bsz=120, num_updates=2030, lr=1.68e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=39, gb_free=25.2, wall=8348 2023-05-01 04:52:56 - progress_bar.py[line:274] - INFO: epoch 001: 2044 / 6042 loss=2.515, loss_v1=0, loss_v2=0, nll_loss=1.277, ntokens=7632.8, nsentences=120, sample_size=4294.8, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1898.8, ups=0.25, wpb=7632.8, bsz=120, num_updates=2040, lr=1.68828e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=8388 2023-05-01 04:53:36 - progress_bar.py[line:274] - INFO: epoch 001: 2054 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7941.9, nsentences=120, sample_size=4001.4, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1982.2, ups=0.25, wpb=7941.9, bsz=120, num_updates=2050, lr=1.69655e-05, gnorm=0.916, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=8428 2023-05-01 04:54:15 - progress_bar.py[line:274] - INFO: epoch 001: 2064 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7766.5, nsentences=120, sample_size=3760.3, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1971.9, ups=0.25, wpb=7766.5, bsz=120, num_updates=2060, lr=1.70483e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=8467 2023-05-01 04:54:55 - progress_bar.py[line:274] - INFO: epoch 001: 2074 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7693.1, nsentences=120, sample_size=4304.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1924.7, ups=0.25, wpb=7693.1, bsz=120, num_updates=2070, lr=1.7131e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=8507 2023-05-01 04:55:35 - progress_bar.py[line:274] - INFO: epoch 001: 2084 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7575, nsentences=120, sample_size=4455.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1876.7, ups=0.25, wpb=7575, bsz=120, num_updates=2080, lr=1.72138e-05, gnorm=0.886, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=8548 2023-05-01 04:56:15 - progress_bar.py[line:274] - INFO: epoch 001: 2094 / 6042 loss=2.508, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7491.4, nsentences=120, sample_size=3911, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1892, ups=0.25, wpb=7491.4, bsz=120, num_updates=2090, lr=1.72966e-05, gnorm=0.929, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=8587 2023-05-01 04:56:54 - progress_bar.py[line:274] - INFO: epoch 001: 2104 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7600.8, nsentences=120, sample_size=3869.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1934.3, ups=0.25, wpb=7600.8, bsz=120, num_updates=2100, lr=1.73793e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=8627 2023-05-01 04:57:33 - progress_bar.py[line:274] - INFO: epoch 001: 2114 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7703.8, nsentences=120, sample_size=3798.8, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1973.5, ups=0.26, wpb=7703.8, bsz=120, num_updates=2110, lr=1.74621e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=8666 2023-05-01 04:58:13 - progress_bar.py[line:274] - INFO: epoch 001: 2124 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7850.3, nsentences=120, sample_size=4094.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1957.5, ups=0.25, wpb=7850.3, bsz=120, num_updates=2120, lr=1.75448e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=29.5, wall=8706 2023-05-01 04:58:53 - progress_bar.py[line:274] - INFO: epoch 001: 2134 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7605.8, nsentences=120, sample_size=3944, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1939, ups=0.25, wpb=7605.8, bsz=120, num_updates=2130, lr=1.76276e-05, gnorm=0.931, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=8745 2023-05-01 04:59:33 - progress_bar.py[line:274] - INFO: epoch 001: 2144 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=8012.1, nsentences=120, sample_size=4201.1, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1976.4, ups=0.25, wpb=8012.1, bsz=120, num_updates=2140, lr=1.77103e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=40, gb_free=30.9, wall=8786 2023-05-01 05:00:12 - progress_bar.py[line:274] - INFO: epoch 001: 2154 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7998, nsentences=120, sample_size=3827.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2035.3, ups=0.25, wpb=7998, bsz=120, num_updates=2150, lr=1.77931e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=39, gb_free=28.1, wall=8825 2023-05-01 05:00:53 - progress_bar.py[line:274] - INFO: epoch 001: 2164 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7498, nsentences=120, sample_size=4032.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1848.1, ups=0.25, wpb=7498, bsz=120, num_updates=2160, lr=1.78759e-05, gnorm=0.905, clip=20, loss_scale=64, train_wall=40, gb_free=28.4, wall=8865 2023-05-01 05:01:32 - progress_bar.py[line:274] - INFO: epoch 001: 2174 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7483.4, nsentences=120, sample_size=4166.4, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1924.9, ups=0.26, wpb=7483.4, bsz=120, num_updates=2170, lr=1.79586e-05, gnorm=0.921, clip=0, loss_scale=64, train_wall=39, gb_free=26.7, wall=8904 2023-05-01 05:02:12 - progress_bar.py[line:274] - INFO: epoch 001: 2184 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.274, ntokens=7569.6, nsentences=120, sample_size=3927.4, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1897.3, ups=0.25, wpb=7569.6, bsz=120, num_updates=2180, lr=1.80414e-05, gnorm=0.919, clip=0, loss_scale=64, train_wall=40, gb_free=27.8, wall=8944 2023-05-01 05:02:52 - progress_bar.py[line:274] - INFO: epoch 001: 2194 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.278, ntokens=7771.1, nsentences=120, sample_size=4205, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1948.8, ups=0.25, wpb=7771.1, bsz=120, num_updates=2190, lr=1.81241e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=8984 2023-05-01 05:03:31 - progress_bar.py[line:274] - INFO: epoch 001: 2204 / 6042 loss=2.521, loss_v1=0, loss_v2=0, nll_loss=1.285, ntokens=7790.2, nsentences=120, sample_size=4073.4, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1962.1, ups=0.25, wpb=7790.2, bsz=120, num_updates=2200, lr=1.82069e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=9024 2023-05-01 05:04:11 - progress_bar.py[line:274] - INFO: epoch 001: 2214 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7740.6, nsentences=120, sample_size=3863, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1955.3, ups=0.25, wpb=7740.6, bsz=120, num_updates=2210, lr=1.82897e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=9063 2023-05-01 05:04:51 - progress_bar.py[line:274] - INFO: epoch 001: 2224 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7894.7, nsentences=120, sample_size=4034, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1963.7, ups=0.25, wpb=7894.7, bsz=120, num_updates=2220, lr=1.83724e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=9104 2023-05-01 05:05:31 - progress_bar.py[line:274] - INFO: epoch 001: 2234 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7958.6, nsentences=120, sample_size=3987.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1994.5, ups=0.25, wpb=7958.6, bsz=120, num_updates=2230, lr=1.84552e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=9144 2023-05-01 05:06:11 - progress_bar.py[line:274] - INFO: epoch 001: 2244 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7956.3, nsentences=120, sample_size=4308.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1994.5, ups=0.25, wpb=7956.3, bsz=120, num_updates=2240, lr=1.85379e-05, gnorm=0.874, clip=0, loss_scale=64, train_wall=40, gb_free=28.7, wall=9183 2023-05-01 05:06:52 - progress_bar.py[line:274] - INFO: epoch 001: 2254 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.281, ntokens=7876, nsentences=120, sample_size=4084.2, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1931.5, ups=0.25, wpb=7876, bsz=120, num_updates=2250, lr=1.86207e-05, gnorm=0.893, clip=10, loss_scale=64, train_wall=41, gb_free=31, wall=9224 2023-05-01 05:07:31 - progress_bar.py[line:274] - INFO: epoch 001: 2264 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7745.8, nsentences=120, sample_size=3886.8, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1972, ups=0.25, wpb=7745.8, bsz=120, num_updates=2260, lr=1.87034e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=9263 2023-05-01 05:08:11 - progress_bar.py[line:274] - INFO: epoch 001: 2274 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7611.3, nsentences=120, sample_size=3933.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1894.7, ups=0.25, wpb=7611.3, bsz=120, num_updates=2270, lr=1.87862e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=9304 2023-05-01 05:08:52 - progress_bar.py[line:274] - INFO: epoch 001: 2284 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7894.6, nsentences=120, sample_size=3994.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1953, ups=0.25, wpb=7894.6, bsz=120, num_updates=2280, lr=1.8869e-05, gnorm=0.923, clip=30, loss_scale=64, train_wall=40, gb_free=28.6, wall=9344 2023-05-01 05:09:32 - progress_bar.py[line:274] - INFO: epoch 001: 2294 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7703.2, nsentences=120, sample_size=3737.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1927.9, ups=0.25, wpb=7703.2, bsz=120, num_updates=2290, lr=1.89517e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=9384 2023-05-01 05:10:12 - progress_bar.py[line:274] - INFO: epoch 001: 2304 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7923.5, nsentences=120, sample_size=3830.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1968.8, ups=0.25, wpb=7923.5, bsz=120, num_updates=2300, lr=1.90345e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=9424 2023-05-01 05:10:52 - progress_bar.py[line:274] - INFO: epoch 001: 2314 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.279, ntokens=7975.6, nsentences=120, sample_size=4402.7, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1992.8, ups=0.25, wpb=7975.6, bsz=120, num_updates=2310, lr=1.91172e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=40, gb_free=28.2, wall=9464 2023-05-01 05:11:30 - progress_bar.py[line:274] - INFO: epoch 001: 2324 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7711.2, nsentences=120, sample_size=4026.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2002.1, ups=0.26, wpb=7711.2, bsz=120, num_updates=2320, lr=1.92e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=38, gb_free=28.4, wall=9503 2023-05-01 05:12:10 - progress_bar.py[line:274] - INFO: epoch 001: 2334 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7814.6, nsentences=120, sample_size=3897.3, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1978.5, ups=0.25, wpb=7814.6, bsz=120, num_updates=2330, lr=1.92828e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=9542 2023-05-01 05:12:50 - progress_bar.py[line:274] - INFO: epoch 001: 2344 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.271, ntokens=7636.5, nsentences=120, sample_size=4143.5, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1924.3, ups=0.25, wpb=7636.5, bsz=120, num_updates=2340, lr=1.93655e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=31.4, wall=9582 2023-05-01 05:13:29 - progress_bar.py[line:274] - INFO: epoch 001: 2354 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7619.6, nsentences=120, sample_size=4140.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1936.9, ups=0.25, wpb=7619.6, bsz=120, num_updates=2350, lr=1.94483e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=9621 2023-05-01 05:14:09 - progress_bar.py[line:274] - INFO: epoch 001: 2364 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7807.4, nsentences=120, sample_size=4120.5, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1928.6, ups=0.25, wpb=7807.4, bsz=120, num_updates=2360, lr=1.9531e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=9662 2023-05-01 05:14:50 - progress_bar.py[line:274] - INFO: epoch 001: 2374 / 6042 loss=2.508, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7897.4, nsentences=120, sample_size=3980.4, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1953.2, ups=0.25, wpb=7897.4, bsz=120, num_updates=2370, lr=1.96138e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=9702 2023-05-01 05:15:31 - progress_bar.py[line:274] - INFO: epoch 001: 2384 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7462.6, nsentences=120, sample_size=4193.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1825.9, ups=0.24, wpb=7462.6, bsz=120, num_updates=2380, lr=1.96966e-05, gnorm=0.893, clip=10, loss_scale=64, train_wall=41, gb_free=28.3, wall=9743 2023-05-01 05:16:11 - progress_bar.py[line:274] - INFO: epoch 001: 2394 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7933.1, nsentences=120, sample_size=4073.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1971, ups=0.25, wpb=7933.1, bsz=120, num_updates=2390, lr=1.97793e-05, gnorm=0.907, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=9783 2023-05-01 05:16:51 - progress_bar.py[line:274] - INFO: epoch 001: 2404 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7857.4, nsentences=120, sample_size=4305, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1970.7, ups=0.25, wpb=7857.4, bsz=120, num_updates=2400, lr=1.98621e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=28.9, wall=9823 2023-05-01 05:17:31 - progress_bar.py[line:274] - INFO: epoch 001: 2414 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7918.8, nsentences=120, sample_size=3721.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1961, ups=0.25, wpb=7918.8, bsz=120, num_updates=2410, lr=1.99448e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=9864 2023-05-01 05:18:11 - progress_bar.py[line:274] - INFO: epoch 001: 2424 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7908.7, nsentences=120, sample_size=4105, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2002.7, ups=0.25, wpb=7908.7, bsz=120, num_updates=2420, lr=2.00276e-05, gnorm=0.916, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=9903 2023-05-01 05:18:50 - progress_bar.py[line:274] - INFO: epoch 001: 2434 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7615.3, nsentences=120, sample_size=3944.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1920, ups=0.25, wpb=7615.3, bsz=120, num_updates=2430, lr=2.01103e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=28.6, wall=9943 2023-05-01 05:19:30 - progress_bar.py[line:274] - INFO: epoch 001: 2444 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.286, ntokens=7937, nsentences=120, sample_size=4215, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=2002.5, ups=0.25, wpb=7937, bsz=120, num_updates=2440, lr=2.01931e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=9982 2023-05-01 05:20:10 - progress_bar.py[line:274] - INFO: epoch 001: 2454 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7841, nsentences=120, sample_size=3726.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1974.4, ups=0.25, wpb=7841, bsz=120, num_updates=2450, lr=2.02759e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=10022 2023-05-01 05:20:49 - progress_bar.py[line:274] - INFO: epoch 001: 2464 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7765.1, nsentences=120, sample_size=4267.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1965.7, ups=0.25, wpb=7765.1, bsz=120, num_updates=2460, lr=2.03586e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=39, gb_free=27.3, wall=10062 2023-05-01 05:21:29 - progress_bar.py[line:274] - INFO: epoch 001: 2474 / 6042 loss=2.514, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7847.7, nsentences=120, sample_size=4208.7, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1951.8, ups=0.25, wpb=7847.7, bsz=120, num_updates=2470, lr=2.04414e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=10102 2023-05-01 05:22:09 - progress_bar.py[line:274] - INFO: epoch 001: 2484 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7438.9, nsentences=120, sample_size=4301, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1878.2, ups=0.25, wpb=7438.9, bsz=120, num_updates=2480, lr=2.05241e-05, gnorm=0.896, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=10141 2023-05-01 05:22:48 - progress_bar.py[line:274] - INFO: epoch 001: 2494 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.271, ntokens=7794.1, nsentences=120, sample_size=4134.7, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1975.8, ups=0.25, wpb=7794.1, bsz=120, num_updates=2490, lr=2.06069e-05, gnorm=0.885, clip=0, loss_scale=128, train_wall=39, gb_free=29.2, wall=10181 2023-05-01 05:23:29 - progress_bar.py[line:274] - INFO: epoch 001: 2504 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7766.7, nsentences=120, sample_size=4106.4, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1915.3, ups=0.25, wpb=7766.7, bsz=120, num_updates=2500, lr=2.06897e-05, gnorm=0.93, clip=20, loss_scale=128, train_wall=40, gb_free=28.9, wall=10221 2023-05-01 05:23:33 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 05:24:13 - progress_bar.py[line:274] - INFO: epoch 001: 2515 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7571.7, nsentences=120, sample_size=4176.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1719.2, ups=0.23, wpb=7571.7, bsz=120, num_updates=2510, lr=2.07724e-05, gnorm=0.902, clip=20, loss_scale=64, train_wall=44, gb_free=30.1, wall=10266 2023-05-01 05:24:52 - progress_bar.py[line:274] - INFO: epoch 001: 2525 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7623.1, nsentences=120, sample_size=4330.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1942.7, ups=0.25, wpb=7623.1, bsz=120, num_updates=2520, lr=2.08552e-05, gnorm=0.896, clip=0, loss_scale=64, train_wall=39, gb_free=28.7, wall=10305 2023-05-01 05:25:32 - progress_bar.py[line:274] - INFO: epoch 001: 2535 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7692.5, nsentences=120, sample_size=4213, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1923.9, ups=0.25, wpb=7692.5, bsz=120, num_updates=2530, lr=2.09379e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=10345 2023-05-01 05:26:12 - progress_bar.py[line:274] - INFO: epoch 001: 2545 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7754.2, nsentences=120, sample_size=3865.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1949.5, ups=0.25, wpb=7754.2, bsz=120, num_updates=2540, lr=2.10207e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=10385 2023-05-01 05:26:51 - progress_bar.py[line:274] - INFO: epoch 001: 2555 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7621.9, nsentences=120, sample_size=3892.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1939.9, ups=0.25, wpb=7621.9, bsz=120, num_updates=2550, lr=2.11034e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=10424 2023-05-01 05:27:31 - progress_bar.py[line:274] - INFO: epoch 001: 2565 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=8196.7, nsentences=120, sample_size=3714.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2046.8, ups=0.25, wpb=8196.7, bsz=120, num_updates=2560, lr=2.11862e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=31.7, wall=10464 2023-05-01 05:28:12 - progress_bar.py[line:274] - INFO: epoch 001: 2575 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7819, nsentences=120, sample_size=3842.8, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1947.5, ups=0.25, wpb=7819, bsz=120, num_updates=2570, lr=2.1269e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=10504 2023-05-01 05:28:50 - progress_bar.py[line:274] - INFO: epoch 001: 2585 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7862.8, nsentences=120, sample_size=4119.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2037.1, ups=0.26, wpb=7862.8, bsz=120, num_updates=2580, lr=2.13517e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=39, gb_free=30, wall=10543 2023-05-01 05:29:29 - progress_bar.py[line:274] - INFO: epoch 001: 2595 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7494.5, nsentences=120, sample_size=3905.3, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1914.1, ups=0.26, wpb=7494.5, bsz=120, num_updates=2590, lr=2.14345e-05, gnorm=0.901, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=10582 2023-05-01 05:30:09 - progress_bar.py[line:274] - INFO: epoch 001: 2605 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7965.7, nsentences=120, sample_size=3989.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2023.2, ups=0.25, wpb=7965.7, bsz=120, num_updates=2600, lr=2.15172e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=39, gb_free=30.8, wall=10621 2023-05-01 05:30:48 - progress_bar.py[line:274] - INFO: epoch 001: 2615 / 6042 loss=2.527, loss_v1=0, loss_v2=0, nll_loss=1.282, ntokens=7968, nsentences=120, sample_size=4067.5, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=2014.4, ups=0.25, wpb=7968, bsz=120, num_updates=2610, lr=2.16e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=10661 2023-05-01 05:31:28 - progress_bar.py[line:274] - INFO: epoch 001: 2625 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7901.7, nsentences=120, sample_size=3989.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2011.7, ups=0.25, wpb=7901.7, bsz=120, num_updates=2620, lr=2.16828e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=28.7, wall=10700 2023-05-01 05:32:08 - progress_bar.py[line:274] - INFO: epoch 001: 2635 / 6042 loss=2.515, loss_v1=0, loss_v2=0, nll_loss=1.277, ntokens=7733.6, nsentences=120, sample_size=4169.2, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1927, ups=0.25, wpb=7733.6, bsz=120, num_updates=2630, lr=2.17655e-05, gnorm=0.909, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=10740 2023-05-01 05:32:47 - progress_bar.py[line:274] - INFO: epoch 001: 2645 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7767.7, nsentences=120, sample_size=4126.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1957.1, ups=0.25, wpb=7767.7, bsz=120, num_updates=2640, lr=2.18483e-05, gnorm=0.882, clip=0, loss_scale=64, train_wall=40, gb_free=26.2, wall=10780 2023-05-01 05:33:27 - progress_bar.py[line:274] - INFO: epoch 001: 2655 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7917.2, nsentences=120, sample_size=3851.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2021.8, ups=0.26, wpb=7917.2, bsz=120, num_updates=2650, lr=2.1931e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=10819 2023-05-01 05:34:06 - progress_bar.py[line:274] - INFO: epoch 001: 2665 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7733.6, nsentences=120, sample_size=4385.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1963, ups=0.25, wpb=7733.6, bsz=120, num_updates=2660, lr=2.20138e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=39, gb_free=28.6, wall=10858 2023-05-01 05:34:45 - progress_bar.py[line:274] - INFO: epoch 001: 2675 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7773.2, nsentences=120, sample_size=4353.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1988.9, ups=0.26, wpb=7773.2, bsz=120, num_updates=2670, lr=2.20966e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=10897 2023-05-01 05:35:25 - progress_bar.py[line:274] - INFO: epoch 001: 2685 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.283, ntokens=7564.6, nsentences=120, sample_size=4207.6, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1895.2, ups=0.25, wpb=7564.6, bsz=120, num_updates=2680, lr=2.21793e-05, gnorm=0.9, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=10937 2023-05-01 05:36:04 - progress_bar.py[line:274] - INFO: epoch 001: 2695 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7742.7, nsentences=120, sample_size=4156.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1957.4, ups=0.25, wpb=7742.7, bsz=120, num_updates=2690, lr=2.22621e-05, gnorm=0.892, clip=10, loss_scale=64, train_wall=39, gb_free=28, wall=10977 2023-05-01 05:36:45 - progress_bar.py[line:274] - INFO: epoch 001: 2705 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7567.8, nsentences=120, sample_size=4054.9, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1885, ups=0.25, wpb=7567.8, bsz=120, num_updates=2700, lr=2.23448e-05, gnorm=0.909, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=11017 2023-05-01 05:37:24 - progress_bar.py[line:274] - INFO: epoch 001: 2715 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7549.6, nsentences=120, sample_size=4056.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1893.9, ups=0.25, wpb=7549.6, bsz=120, num_updates=2710, lr=2.24276e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=11057 2023-05-01 05:38:04 - progress_bar.py[line:274] - INFO: epoch 001: 2725 / 6042 loss=2.531, loss_v1=0, loss_v2=0, nll_loss=1.301, ntokens=7681, nsentences=120, sample_size=4183.2, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1922, ups=0.25, wpb=7681, bsz=120, num_updates=2720, lr=2.25103e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=11097 2023-05-01 05:38:45 - progress_bar.py[line:274] - INFO: epoch 001: 2735 / 6042 loss=2.526, loss_v1=0, loss_v2=0, nll_loss=1.283, ntokens=7789.9, nsentences=120, sample_size=4149.7, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1944.6, ups=0.25, wpb=7789.9, bsz=120, num_updates=2730, lr=2.25931e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=11137 2023-05-01 05:39:25 - progress_bar.py[line:274] - INFO: epoch 001: 2745 / 6042 loss=2.528, loss_v1=0, loss_v2=0, nll_loss=1.291, ntokens=8000.6, nsentences=120, sample_size=4013, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1966.8, ups=0.25, wpb=8000.6, bsz=120, num_updates=2740, lr=2.26759e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=41, gb_free=28.6, wall=11178 2023-05-01 05:40:04 - progress_bar.py[line:274] - INFO: epoch 001: 2755 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7568.2, nsentences=120, sample_size=3964, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1934.5, ups=0.26, wpb=7568.2, bsz=120, num_updates=2750, lr=2.27586e-05, gnorm=0.895, clip=0, loss_scale=64, train_wall=39, gb_free=27.9, wall=11217 2023-05-01 05:40:44 - progress_bar.py[line:274] - INFO: epoch 001: 2765 / 6042 loss=2.518, loss_v1=0, loss_v2=0, nll_loss=1.276, ntokens=8073.6, nsentences=120, sample_size=3658.3, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=2035, ups=0.25, wpb=8073.6, bsz=120, num_updates=2760, lr=2.28414e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=11256 2023-05-01 05:41:24 - progress_bar.py[line:274] - INFO: epoch 001: 2775 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7738.1, nsentences=120, sample_size=4074.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1940.5, ups=0.25, wpb=7738.1, bsz=120, num_updates=2770, lr=2.29241e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=11296 2023-05-01 05:42:03 - progress_bar.py[line:274] - INFO: epoch 001: 2785 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7851, nsentences=120, sample_size=3991.5, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=2010.2, ups=0.26, wpb=7851, bsz=120, num_updates=2780, lr=2.30069e-05, gnorm=0.894, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=11335 2023-05-01 05:42:43 - progress_bar.py[line:274] - INFO: epoch 001: 2795 / 6042 loss=2.508, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7679.3, nsentences=120, sample_size=4089.1, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1927.6, ups=0.25, wpb=7679.3, bsz=120, num_updates=2790, lr=2.30897e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=11375 2023-05-01 05:43:22 - progress_bar.py[line:274] - INFO: epoch 001: 2805 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7688.6, nsentences=120, sample_size=4203.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1939.5, ups=0.25, wpb=7688.6, bsz=120, num_updates=2800, lr=2.31724e-05, gnorm=0.887, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=11415 2023-05-01 05:44:02 - progress_bar.py[line:274] - INFO: epoch 001: 2815 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7916.3, nsentences=120, sample_size=3891, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2003.6, ups=0.25, wpb=7916.3, bsz=120, num_updates=2810, lr=2.32552e-05, gnorm=0.927, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=11454 2023-05-01 05:44:41 - progress_bar.py[line:274] - INFO: epoch 001: 2825 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7731.4, nsentences=120, sample_size=3821.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1955.3, ups=0.25, wpb=7731.4, bsz=120, num_updates=2820, lr=2.33379e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=11494 2023-05-01 05:45:22 - progress_bar.py[line:274] - INFO: epoch 001: 2835 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7575.8, nsentences=120, sample_size=4017.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1870.5, ups=0.25, wpb=7575.8, bsz=120, num_updates=2830, lr=2.34207e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=40, gb_free=28.5, wall=11534 2023-05-01 05:46:01 - progress_bar.py[line:274] - INFO: epoch 001: 2845 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7478.2, nsentences=120, sample_size=4031, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1897.3, ups=0.25, wpb=7478.2, bsz=120, num_updates=2840, lr=2.35034e-05, gnorm=0.893, clip=0, loss_scale=64, train_wall=39, gb_free=29.6, wall=11574 2023-05-01 05:46:41 - progress_bar.py[line:274] - INFO: epoch 001: 2855 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=8021.5, nsentences=120, sample_size=3826.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2011, ups=0.25, wpb=8021.5, bsz=120, num_updates=2850, lr=2.35862e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=11614 2023-05-01 05:47:21 - progress_bar.py[line:274] - INFO: epoch 001: 2865 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7783.3, nsentences=120, sample_size=4127.8, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1943.8, ups=0.25, wpb=7783.3, bsz=120, num_updates=2860, lr=2.3669e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=29, wall=11654 2023-05-01 05:48:01 - progress_bar.py[line:274] - INFO: epoch 001: 2875 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7830.6, nsentences=120, sample_size=4114.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1952.8, ups=0.25, wpb=7830.6, bsz=120, num_updates=2870, lr=2.37517e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=11694 2023-05-01 05:48:42 - progress_bar.py[line:274] - INFO: epoch 001: 2885 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7627.1, nsentences=120, sample_size=4022.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1888.5, ups=0.25, wpb=7627.1, bsz=120, num_updates=2880, lr=2.38345e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=11734 2023-05-01 05:49:22 - progress_bar.py[line:274] - INFO: epoch 001: 2895 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.279, ntokens=7781.1, nsentences=120, sample_size=3979.9, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1957.3, ups=0.25, wpb=7781.1, bsz=120, num_updates=2890, lr=2.39172e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=11774 2023-05-01 05:50:01 - progress_bar.py[line:274] - INFO: epoch 001: 2905 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7671.1, nsentences=120, sample_size=3951.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1960.1, ups=0.26, wpb=7671.1, bsz=120, num_updates=2900, lr=2.4e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=39, gb_free=31.6, wall=11813 2023-05-01 05:50:40 - progress_bar.py[line:274] - INFO: epoch 001: 2915 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7527.4, nsentences=120, sample_size=4191, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1908.6, ups=0.25, wpb=7527.4, bsz=120, num_updates=2910, lr=2.40828e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=39, gb_free=30.8, wall=11853 2023-05-01 05:51:20 - progress_bar.py[line:274] - INFO: epoch 001: 2925 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7852.4, nsentences=120, sample_size=4106.6, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1978.3, ups=0.25, wpb=7852.4, bsz=120, num_updates=2920, lr=2.41655e-05, gnorm=0.89, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=11892 2023-05-01 05:52:00 - progress_bar.py[line:274] - INFO: epoch 001: 2935 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7776.6, nsentences=120, sample_size=3929.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1932.2, ups=0.25, wpb=7776.6, bsz=120, num_updates=2930, lr=2.42483e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=11933 2023-05-01 05:52:40 - progress_bar.py[line:274] - INFO: epoch 001: 2945 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7575.8, nsentences=120, sample_size=4040.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1920.7, ups=0.25, wpb=7575.8, bsz=120, num_updates=2940, lr=2.4331e-05, gnorm=0.919, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=11972 2023-05-01 05:53:19 - progress_bar.py[line:274] - INFO: epoch 001: 2955 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7919.2, nsentences=120, sample_size=4057.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2000.6, ups=0.25, wpb=7919.2, bsz=120, num_updates=2950, lr=2.44138e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=12012 2023-05-01 05:53:58 - progress_bar.py[line:274] - INFO: epoch 001: 2965 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7793.8, nsentences=120, sample_size=4164.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1978.5, ups=0.25, wpb=7793.8, bsz=120, num_updates=2960, lr=2.44966e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=12051 2023-05-01 05:54:38 - progress_bar.py[line:274] - INFO: epoch 001: 2975 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7737.9, nsentences=120, sample_size=3834.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1950.5, ups=0.25, wpb=7737.9, bsz=120, num_updates=2970, lr=2.45793e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=12091 2023-05-01 05:55:18 - progress_bar.py[line:274] - INFO: epoch 001: 2985 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7754.5, nsentences=120, sample_size=3982.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1967.2, ups=0.25, wpb=7754.5, bsz=120, num_updates=2980, lr=2.46621e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=12130 2023-05-01 05:55:57 - progress_bar.py[line:274] - INFO: epoch 001: 2995 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7578.2, nsentences=120, sample_size=4164.3, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1900.8, ups=0.25, wpb=7578.2, bsz=120, num_updates=2990, lr=2.47448e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=40, gb_free=25.8, wall=12170 2023-05-01 05:56:38 - progress_bar.py[line:274] - INFO: epoch 001: 3005 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=8012.8, nsentences=120, sample_size=3999.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1983.9, ups=0.25, wpb=8012.8, bsz=120, num_updates=3000, lr=2.48276e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=12210 2023-05-01 05:56:38 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 05:56:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 05:56:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:56:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 05:56:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:56:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:56:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:56:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:08 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 05:57:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:57:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 05:57:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:57:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:24 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the far left? 2023-05-01 05:57:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:57:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:29 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 05:57:29 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 05:57:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 05:57:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 05:57:29 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.204 | loss_v1 0 | loss_v2 0 | nll_loss 2.033 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.09 | score 0.7476 | wps 3298.2 | wpb 3202.1 | bsz 39.4 | num_updates 3000 | best_score 0.749 2023-05-01 05:57:29 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 3000 updates 2023-05-01 05:57:29 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_3000.pt 2023-05-01 05:57:53 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_3000.pt 2023-05-01 05:58:07 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_3000.pt (epoch 1 @ 3000 updates, score 0.7476) (writing took 37.574248204007745 seconds) 2023-05-01 05:58:45 - progress_bar.py[line:274] - INFO: epoch 001: 3015 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7429.5, nsentences=120, sample_size=4087.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=582.8, ups=0.08, wpb=7429.5, bsz=120, num_updates=3010, lr=2.49103e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=12338 2023-05-01 05:59:25 - progress_bar.py[line:274] - INFO: epoch 001: 3025 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7717.1, nsentences=120, sample_size=4153.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1946, ups=0.25, wpb=7717.1, bsz=120, num_updates=3020, lr=2.49931e-05, gnorm=0.883, clip=0, loss_scale=128, train_wall=40, gb_free=29.4, wall=12377 2023-05-01 06:00:05 - progress_bar.py[line:274] - INFO: epoch 001: 3035 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7887.1, nsentences=120, sample_size=4114.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1983.9, ups=0.25, wpb=7887.1, bsz=120, num_updates=3030, lr=2.50759e-05, gnorm=0.929, clip=10, loss_scale=128, train_wall=40, gb_free=29.9, wall=12417 2023-05-01 06:00:21 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 06:00:49 - progress_bar.py[line:274] - INFO: epoch 001: 3046 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7601, nsentences=120, sample_size=4116, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1710.1, ups=0.22, wpb=7601, bsz=120, num_updates=3040, lr=2.51586e-05, gnorm=0.886, clip=0, loss_scale=64, train_wall=44, gb_free=29.9, wall=12462 2023-05-01 06:01:29 - progress_bar.py[line:274] - INFO: epoch 001: 3056 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7854.6, nsentences=120, sample_size=4289.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1960, ups=0.25, wpb=7854.6, bsz=120, num_updates=3050, lr=2.52414e-05, gnorm=0.877, clip=10, loss_scale=64, train_wall=40, gb_free=28, wall=12502 2023-05-01 06:02:08 - progress_bar.py[line:274] - INFO: epoch 001: 3066 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7697.9, nsentences=120, sample_size=3972, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1980.4, ups=0.26, wpb=7697.9, bsz=120, num_updates=3060, lr=2.53241e-05, gnorm=0.93, clip=0, loss_scale=64, train_wall=39, gb_free=30.8, wall=12541 2023-05-01 06:02:20 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 06:02:52 - progress_bar.py[line:274] - INFO: epoch 001: 3077 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7714.9, nsentences=120, sample_size=4075, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1745.7, ups=0.23, wpb=7714.9, bsz=120, num_updates=3070, lr=2.54069e-05, gnorm=0.918, clip=0, loss_scale=32, train_wall=44, gb_free=30.7, wall=12585 2023-05-01 06:03:32 - progress_bar.py[line:274] - INFO: epoch 001: 3087 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7679.1, nsentences=120, sample_size=4328, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1936.1, ups=0.25, wpb=7679.1, bsz=120, num_updates=3080, lr=2.54897e-05, gnorm=0.879, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=12624 2023-05-01 06:04:12 - progress_bar.py[line:274] - INFO: epoch 001: 3097 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7745.5, nsentences=120, sample_size=4254.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1956.6, ups=0.25, wpb=7745.5, bsz=120, num_updates=3090, lr=2.55724e-05, gnorm=0.878, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=12664 2023-05-01 06:04:51 - progress_bar.py[line:274] - INFO: epoch 001: 3107 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7788.8, nsentences=120, sample_size=3943, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.9, ups=0.25, wpb=7788.8, bsz=120, num_updates=3100, lr=2.56552e-05, gnorm=0.903, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=12704 2023-05-01 06:05:32 - progress_bar.py[line:274] - INFO: epoch 001: 3117 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7277.7, nsentences=120, sample_size=4454.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1799.3, ups=0.25, wpb=7277.7, bsz=120, num_updates=3110, lr=2.57379e-05, gnorm=0.867, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=12744 2023-05-01 06:06:11 - progress_bar.py[line:274] - INFO: epoch 001: 3127 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7522.3, nsentences=120, sample_size=4127.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1922.5, ups=0.26, wpb=7522.3, bsz=120, num_updates=3120, lr=2.58207e-05, gnorm=0.881, clip=0, loss_scale=32, train_wall=39, gb_free=29.5, wall=12783 2023-05-01 06:06:50 - progress_bar.py[line:274] - INFO: epoch 001: 3137 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7967.3, nsentences=120, sample_size=4206.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2036.4, ups=0.26, wpb=7967.3, bsz=120, num_updates=3130, lr=2.59034e-05, gnorm=0.876, clip=10, loss_scale=32, train_wall=39, gb_free=30.3, wall=12822 2023-05-01 06:07:30 - progress_bar.py[line:274] - INFO: epoch 001: 3147 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7605.8, nsentences=120, sample_size=4199.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1914.6, ups=0.25, wpb=7605.8, bsz=120, num_updates=3140, lr=2.59862e-05, gnorm=0.875, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=12862 2023-05-01 06:08:09 - progress_bar.py[line:274] - INFO: epoch 001: 3157 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=8010.4, nsentences=120, sample_size=3901.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2009.6, ups=0.25, wpb=8010.4, bsz=120, num_updates=3150, lr=2.6069e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=40, gb_free=28.8, wall=12902 2023-05-01 06:08:50 - progress_bar.py[line:274] - INFO: epoch 001: 3167 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7809.5, nsentences=120, sample_size=4075.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1946.6, ups=0.25, wpb=7809.5, bsz=120, num_updates=3160, lr=2.61517e-05, gnorm=0.918, clip=20, loss_scale=32, train_wall=40, gb_free=29.2, wall=12942 2023-05-01 06:09:30 - progress_bar.py[line:274] - INFO: epoch 001: 3177 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7847.5, nsentences=120, sample_size=3827.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927.9, ups=0.25, wpb=7847.5, bsz=120, num_updates=3170, lr=2.62345e-05, gnorm=0.916, clip=10, loss_scale=32, train_wall=41, gb_free=30.6, wall=12983 2023-05-01 06:10:10 - progress_bar.py[line:274] - INFO: epoch 001: 3187 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7645.5, nsentences=120, sample_size=4127.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1926.3, ups=0.25, wpb=7645.5, bsz=120, num_updates=3180, lr=2.63172e-05, gnorm=0.898, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=13022 2023-05-01 06:10:50 - progress_bar.py[line:274] - INFO: epoch 001: 3197 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7879.1, nsentences=120, sample_size=3831.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1967, ups=0.25, wpb=7879.1, bsz=120, num_updates=3190, lr=2.64e-05, gnorm=0.908, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=13063 2023-05-01 06:11:30 - progress_bar.py[line:274] - INFO: epoch 001: 3207 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7806.1, nsentences=120, sample_size=3847.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1971.7, ups=0.25, wpb=7806.1, bsz=120, num_updates=3200, lr=2.64828e-05, gnorm=0.935, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=13102 2023-05-01 06:12:09 - progress_bar.py[line:274] - INFO: epoch 001: 3217 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7813, nsentences=120, sample_size=4183, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1966.6, ups=0.25, wpb=7813, bsz=120, num_updates=3210, lr=2.65655e-05, gnorm=0.874, clip=0, loss_scale=32, train_wall=40, gb_free=31, wall=13142 2023-05-01 06:12:48 - progress_bar.py[line:274] - INFO: epoch 001: 3227 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7665.7, nsentences=120, sample_size=4031.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1963.5, ups=0.26, wpb=7665.7, bsz=120, num_updates=3220, lr=2.66483e-05, gnorm=0.886, clip=0, loss_scale=32, train_wall=39, gb_free=30, wall=13181 2023-05-01 06:13:29 - progress_bar.py[line:274] - INFO: epoch 001: 3237 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7877.8, nsentences=120, sample_size=3962, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1954.6, ups=0.25, wpb=7877.8, bsz=120, num_updates=3230, lr=2.6731e-05, gnorm=0.895, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=13221 2023-05-01 06:14:09 - progress_bar.py[line:274] - INFO: epoch 001: 3247 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7728.4, nsentences=120, sample_size=3990.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1916.9, ups=0.25, wpb=7728.4, bsz=120, num_updates=3240, lr=2.68138e-05, gnorm=0.91, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=13262 2023-05-01 06:14:49 - progress_bar.py[line:274] - INFO: epoch 001: 3257 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7794.4, nsentences=120, sample_size=4167.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1964.6, ups=0.25, wpb=7794.4, bsz=120, num_updates=3250, lr=2.68966e-05, gnorm=0.887, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=13301 2023-05-01 06:15:28 - progress_bar.py[line:274] - INFO: epoch 001: 3267 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7522.7, nsentences=120, sample_size=4236.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1903.7, ups=0.25, wpb=7522.7, bsz=120, num_updates=3260, lr=2.69793e-05, gnorm=0.88, clip=0, loss_scale=32, train_wall=39, gb_free=25.5, wall=13341 2023-05-01 06:16:08 - progress_bar.py[line:274] - INFO: epoch 001: 3277 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7553.4, nsentences=120, sample_size=4074.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1892.1, ups=0.25, wpb=7553.4, bsz=120, num_updates=3270, lr=2.70621e-05, gnorm=0.922, clip=0, loss_scale=32, train_wall=40, gb_free=30.9, wall=13381 2023-05-01 06:16:48 - progress_bar.py[line:274] - INFO: epoch 001: 3287 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7604.1, nsentences=120, sample_size=3657.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1910.4, ups=0.25, wpb=7604.1, bsz=120, num_updates=3280, lr=2.71448e-05, gnorm=0.943, clip=10, loss_scale=32, train_wall=40, gb_free=31, wall=13420 2023-05-01 06:17:27 - progress_bar.py[line:274] - INFO: epoch 001: 3297 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7635.6, nsentences=120, sample_size=4254.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1956.8, ups=0.26, wpb=7635.6, bsz=120, num_updates=3290, lr=2.72276e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=39, gb_free=29.5, wall=13459 2023-05-01 06:18:07 - progress_bar.py[line:274] - INFO: epoch 001: 3307 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7702.3, nsentences=120, sample_size=4082.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1944.7, ups=0.25, wpb=7702.3, bsz=120, num_updates=3300, lr=2.73103e-05, gnorm=0.915, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=13499 2023-05-01 06:18:47 - progress_bar.py[line:274] - INFO: epoch 001: 3317 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7622.7, nsentences=120, sample_size=4073, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1907.1, ups=0.25, wpb=7622.7, bsz=120, num_updates=3310, lr=2.73931e-05, gnorm=0.913, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=13539 2023-05-01 06:19:26 - progress_bar.py[line:274] - INFO: epoch 001: 3327 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7502.3, nsentences=120, sample_size=4280.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1909.7, ups=0.25, wpb=7502.3, bsz=120, num_updates=3320, lr=2.74759e-05, gnorm=0.915, clip=0, loss_scale=32, train_wall=39, gb_free=29.5, wall=13578 2023-05-01 06:20:05 - progress_bar.py[line:274] - INFO: epoch 001: 3337 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7755.4, nsentences=120, sample_size=3803.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1967.6, ups=0.25, wpb=7755.4, bsz=120, num_updates=3330, lr=2.75586e-05, gnorm=1.003, clip=40, loss_scale=32, train_wall=39, gb_free=30.7, wall=13618 2023-05-01 06:20:45 - progress_bar.py[line:274] - INFO: epoch 001: 3347 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7803.6, nsentences=120, sample_size=4037, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1950, ups=0.25, wpb=7803.6, bsz=120, num_updates=3340, lr=2.76414e-05, gnorm=0.927, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=13658 2023-05-01 06:21:26 - progress_bar.py[line:274] - INFO: epoch 001: 3357 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7719.9, nsentences=120, sample_size=3983.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1917.5, ups=0.25, wpb=7719.9, bsz=120, num_updates=3350, lr=2.77241e-05, gnorm=0.923, clip=0, loss_scale=32, train_wall=40, gb_free=29.2, wall=13698 2023-05-01 06:22:05 - progress_bar.py[line:274] - INFO: epoch 001: 3367 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7832.9, nsentences=120, sample_size=3993.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1966.5, ups=0.25, wpb=7832.9, bsz=120, num_updates=3360, lr=2.78069e-05, gnorm=0.947, clip=30, loss_scale=32, train_wall=40, gb_free=30.2, wall=13738 2023-05-01 06:22:45 - progress_bar.py[line:274] - INFO: epoch 001: 3377 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7703, nsentences=120, sample_size=4159.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1927.7, ups=0.25, wpb=7703, bsz=120, num_updates=3370, lr=2.78897e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=40, gb_free=30.6, wall=13778 2023-05-01 06:23:25 - progress_bar.py[line:274] - INFO: epoch 001: 3387 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7585.2, nsentences=120, sample_size=4057.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1910.3, ups=0.25, wpb=7585.2, bsz=120, num_updates=3380, lr=2.79724e-05, gnorm=0.905, clip=0, loss_scale=32, train_wall=40, gb_free=27.5, wall=13818 2023-05-01 06:24:05 - progress_bar.py[line:274] - INFO: epoch 001: 3397 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7757.4, nsentences=120, sample_size=3812.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1941.8, ups=0.25, wpb=7757.4, bsz=120, num_updates=3390, lr=2.80552e-05, gnorm=0.911, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=13857 2023-05-01 06:24:45 - progress_bar.py[line:274] - INFO: epoch 001: 3407 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7619.5, nsentences=120, sample_size=3913, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1922, ups=0.25, wpb=7619.5, bsz=120, num_updates=3400, lr=2.81379e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=13897 2023-05-01 06:25:25 - progress_bar.py[line:274] - INFO: epoch 001: 3417 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7418, nsentences=120, sample_size=4305.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1856.7, ups=0.25, wpb=7418, bsz=120, num_updates=3410, lr=2.82207e-05, gnorm=0.883, clip=10, loss_scale=32, train_wall=40, gb_free=30.3, wall=13937 2023-05-01 06:26:04 - progress_bar.py[line:274] - INFO: epoch 001: 3427 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7473.7, nsentences=120, sample_size=4083.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1885.3, ups=0.25, wpb=7473.7, bsz=120, num_updates=3420, lr=2.83034e-05, gnorm=0.913, clip=0, loss_scale=32, train_wall=40, gb_free=28.4, wall=13977 2023-05-01 06:26:44 - progress_bar.py[line:274] - INFO: epoch 001: 3437 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7529.4, nsentences=120, sample_size=4095.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1906.8, ups=0.25, wpb=7529.4, bsz=120, num_updates=3430, lr=2.83862e-05, gnorm=0.91, clip=0, loss_scale=32, train_wall=39, gb_free=29.6, wall=14016 2023-05-01 06:27:24 - progress_bar.py[line:274] - INFO: epoch 001: 3447 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7707.3, nsentences=120, sample_size=3911.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1927.5, ups=0.25, wpb=7707.3, bsz=120, num_updates=3440, lr=2.8469e-05, gnorm=0.922, clip=20, loss_scale=32, train_wall=40, gb_free=30.7, wall=14056 2023-05-01 06:28:04 - progress_bar.py[line:274] - INFO: epoch 001: 3457 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7867.9, nsentences=120, sample_size=4162.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1969.1, ups=0.25, wpb=7867.9, bsz=120, num_updates=3450, lr=2.85517e-05, gnorm=0.914, clip=10, loss_scale=32, train_wall=40, gb_free=28.7, wall=14096 2023-05-01 06:28:43 - progress_bar.py[line:274] - INFO: epoch 001: 3467 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7690.8, nsentences=120, sample_size=4110.4, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1977.6, ups=0.26, wpb=7690.8, bsz=120, num_updates=3460, lr=2.86345e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=39, gb_free=30.9, wall=14135 2023-05-01 06:29:22 - progress_bar.py[line:274] - INFO: epoch 001: 3477 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7728.1, nsentences=120, sample_size=3915.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1938.8, ups=0.25, wpb=7728.1, bsz=120, num_updates=3470, lr=2.87172e-05, gnorm=0.921, clip=20, loss_scale=32, train_wall=40, gb_free=29.1, wall=14175 2023-05-01 06:30:02 - progress_bar.py[line:274] - INFO: epoch 001: 3487 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7834.2, nsentences=120, sample_size=3888.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1966.8, ups=0.25, wpb=7834.2, bsz=120, num_updates=3480, lr=2.88e-05, gnorm=0.939, clip=20, loss_scale=32, train_wall=40, gb_free=28.9, wall=14215 2023-05-01 06:30:42 - progress_bar.py[line:274] - INFO: epoch 001: 3497 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7878.7, nsentences=120, sample_size=3995.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1970.2, ups=0.25, wpb=7878.7, bsz=120, num_updates=3490, lr=2.88828e-05, gnorm=0.928, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=14255 2023-05-01 06:31:22 - progress_bar.py[line:274] - INFO: epoch 001: 3507 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7417.3, nsentences=120, sample_size=3899, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1888.2, ups=0.25, wpb=7417.3, bsz=120, num_updates=3500, lr=2.89655e-05, gnorm=0.951, clip=30, loss_scale=32, train_wall=39, gb_free=30.3, wall=14294 2023-05-01 06:32:01 - progress_bar.py[line:274] - INFO: epoch 001: 3517 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7433.1, nsentences=120, sample_size=4174.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1881.4, ups=0.25, wpb=7433.1, bsz=120, num_updates=3510, lr=2.90483e-05, gnorm=0.905, clip=0, loss_scale=32, train_wall=39, gb_free=30.1, wall=14334 2023-05-01 06:32:41 - progress_bar.py[line:274] - INFO: epoch 001: 3527 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7606.4, nsentences=120, sample_size=4095.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1889.4, ups=0.25, wpb=7606.4, bsz=120, num_updates=3520, lr=2.9131e-05, gnorm=0.925, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=14374 2023-05-01 06:33:21 - progress_bar.py[line:274] - INFO: epoch 001: 3537 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7451.9, nsentences=120, sample_size=3851.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1868.4, ups=0.25, wpb=7451.9, bsz=120, num_updates=3530, lr=2.92138e-05, gnorm=0.948, clip=10, loss_scale=32, train_wall=40, gb_free=30.6, wall=14414 2023-05-01 06:34:01 - progress_bar.py[line:274] - INFO: epoch 001: 3547 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7644.3, nsentences=120, sample_size=4188.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1929.3, ups=0.25, wpb=7644.3, bsz=120, num_updates=3540, lr=2.92966e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=14453 2023-05-01 06:34:40 - progress_bar.py[line:274] - INFO: epoch 001: 3557 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7526.1, nsentences=120, sample_size=3989.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1899.5, ups=0.25, wpb=7526.1, bsz=120, num_updates=3550, lr=2.93793e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=28.1, wall=14493 2023-05-01 06:35:20 - progress_bar.py[line:274] - INFO: epoch 001: 3567 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7863.2, nsentences=120, sample_size=3782.1, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1965.3, ups=0.25, wpb=7863.2, bsz=120, num_updates=3560, lr=2.94621e-05, gnorm=0.965, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=14533 2023-05-01 06:36:01 - progress_bar.py[line:274] - INFO: epoch 001: 3577 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7816.8, nsentences=120, sample_size=4198.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1940.2, ups=0.25, wpb=7816.8, bsz=120, num_updates=3570, lr=2.95448e-05, gnorm=1.067, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=14573 2023-05-01 06:36:40 - progress_bar.py[line:274] - INFO: epoch 001: 3587 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7672.9, nsentences=120, sample_size=3885.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1946.2, ups=0.25, wpb=7672.9, bsz=120, num_updates=3580, lr=2.96276e-05, gnorm=0.961, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=14613 2023-05-01 06:37:21 - progress_bar.py[line:274] - INFO: epoch 001: 3597 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7781.8, nsentences=120, sample_size=4371.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1927.1, ups=0.25, wpb=7781.8, bsz=120, num_updates=3590, lr=2.97103e-05, gnorm=0.889, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=14653 2023-05-01 06:38:00 - progress_bar.py[line:274] - INFO: epoch 001: 3607 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7644.5, nsentences=120, sample_size=4132.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1945.5, ups=0.25, wpb=7644.5, bsz=120, num_updates=3600, lr=2.97931e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=14692 2023-05-01 06:38:39 - progress_bar.py[line:274] - INFO: epoch 001: 3617 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7938.9, nsentences=120, sample_size=4003.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2024.4, ups=0.25, wpb=7938.9, bsz=120, num_updates=3610, lr=2.98759e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=39, gb_free=29.2, wall=14732 2023-05-01 06:39:19 - progress_bar.py[line:274] - INFO: epoch 001: 3627 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7655, nsentences=120, sample_size=4002.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1913.7, ups=0.25, wpb=7655, bsz=120, num_updates=3620, lr=2.99586e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=14772 2023-05-01 06:39:59 - progress_bar.py[line:274] - INFO: epoch 001: 3637 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7460.1, nsentences=120, sample_size=4004.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1888.6, ups=0.25, wpb=7460.1, bsz=120, num_updates=3630, lr=2.99974e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=14811 2023-05-01 06:40:38 - progress_bar.py[line:274] - INFO: epoch 001: 3647 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7859.8, nsentences=120, sample_size=3832, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1982.5, ups=0.25, wpb=7859.8, bsz=120, num_updates=3640, lr=2.99921e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=14851 2023-05-01 06:41:18 - progress_bar.py[line:274] - INFO: epoch 001: 3657 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7536.8, nsentences=120, sample_size=4248.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1875.6, ups=0.25, wpb=7536.8, bsz=120, num_updates=3650, lr=2.99868e-05, gnorm=0.906, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=14891 2023-05-01 06:41:58 - progress_bar.py[line:274] - INFO: epoch 001: 3667 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7582.7, nsentences=120, sample_size=4064.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1892.4, ups=0.25, wpb=7582.7, bsz=120, num_updates=3660, lr=2.99815e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=14931 2023-05-01 06:42:39 - progress_bar.py[line:274] - INFO: epoch 001: 3677 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7908.6, nsentences=120, sample_size=4228.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1968.2, ups=0.25, wpb=7908.6, bsz=120, num_updates=3670, lr=2.99762e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=26.5, wall=14971 2023-05-01 06:43:18 - progress_bar.py[line:274] - INFO: epoch 001: 3687 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7535.2, nsentences=120, sample_size=4156, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1914.5, ups=0.25, wpb=7535.2, bsz=120, num_updates=3680, lr=2.99709e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=15010 2023-05-01 06:43:59 - progress_bar.py[line:274] - INFO: epoch 001: 3697 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7925.1, nsentences=120, sample_size=4144, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1953.3, ups=0.25, wpb=7925.1, bsz=120, num_updates=3690, lr=2.99657e-05, gnorm=0.926, clip=40, loss_scale=64, train_wall=41, gb_free=29.9, wall=15051 2023-05-01 06:44:38 - progress_bar.py[line:274] - INFO: epoch 001: 3707 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7761.1, nsentences=120, sample_size=3826.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1951.4, ups=0.25, wpb=7761.1, bsz=120, num_updates=3700, lr=2.99604e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=15091 2023-05-01 06:45:18 - progress_bar.py[line:274] - INFO: epoch 001: 3717 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7804.3, nsentences=120, sample_size=4029, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1984, ups=0.25, wpb=7804.3, bsz=120, num_updates=3710, lr=2.99551e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=15130 2023-05-01 06:45:58 - progress_bar.py[line:274] - INFO: epoch 001: 3727 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7778.9, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1935.8, ups=0.25, wpb=7778.9, bsz=120, num_updates=3720, lr=2.99498e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=15170 2023-05-01 06:46:38 - progress_bar.py[line:274] - INFO: epoch 001: 3737 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7843.1, nsentences=120, sample_size=3864.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1967.6, ups=0.25, wpb=7843.1, bsz=120, num_updates=3730, lr=2.99445e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=15210 2023-05-01 06:47:17 - progress_bar.py[line:274] - INFO: epoch 001: 3747 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7660.3, nsentences=120, sample_size=3893.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1953.6, ups=0.26, wpb=7660.3, bsz=120, num_updates=3740, lr=2.99393e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=15249 2023-05-01 06:47:57 - progress_bar.py[line:274] - INFO: epoch 001: 3757 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7842.5, nsentences=120, sample_size=4117.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1941.7, ups=0.25, wpb=7842.5, bsz=120, num_updates=3750, lr=2.9934e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=15290 2023-05-01 06:48:38 - progress_bar.py[line:274] - INFO: epoch 001: 3767 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7913.3, nsentences=120, sample_size=4047, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1958.1, ups=0.25, wpb=7913.3, bsz=120, num_updates=3760, lr=2.99287e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=15330 2023-05-01 06:49:17 - progress_bar.py[line:274] - INFO: epoch 001: 3777 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.284, ntokens=7816.8, nsentences=120, sample_size=4045.7, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1984.3, ups=0.25, wpb=7816.8, bsz=120, num_updates=3770, lr=2.99234e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=39, gb_free=31.5, wall=15370 2023-05-01 06:49:57 - progress_bar.py[line:274] - INFO: epoch 001: 3787 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7917.4, nsentences=119.2, sample_size=3849.8, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2007.8, ups=0.25, wpb=7917.4, bsz=119.2, num_updates=3780, lr=2.99181e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=15409 2023-05-01 06:50:36 - progress_bar.py[line:274] - INFO: epoch 001: 3797 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7943.3, nsentences=120, sample_size=3946.1, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2006.8, ups=0.25, wpb=7943.3, bsz=120, num_updates=3790, lr=2.99128e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=15449 2023-05-01 06:51:16 - progress_bar.py[line:274] - INFO: epoch 001: 3807 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7703.4, nsentences=120, sample_size=4155, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1927, ups=0.25, wpb=7703.4, bsz=120, num_updates=3800, lr=2.99076e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=15489 2023-05-01 06:51:56 - progress_bar.py[line:274] - INFO: epoch 001: 3817 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7570, nsentences=120, sample_size=4369.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1883, ups=0.25, wpb=7570, bsz=120, num_updates=3810, lr=2.99023e-05, gnorm=0.877, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=15529 2023-05-01 06:52:36 - progress_bar.py[line:274] - INFO: epoch 001: 3827 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7646.5, nsentences=120, sample_size=4400.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1944.7, ups=0.25, wpb=7646.5, bsz=120, num_updates=3820, lr=2.9897e-05, gnorm=0.885, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=15568 2023-05-01 06:53:15 - progress_bar.py[line:274] - INFO: epoch 001: 3837 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=8018.7, nsentences=120, sample_size=4125.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2019.4, ups=0.25, wpb=8018.7, bsz=120, num_updates=3830, lr=2.98917e-05, gnorm=0.911, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=15608 2023-05-01 06:53:56 - progress_bar.py[line:274] - INFO: epoch 001: 3847 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7991.3, nsentences=120, sample_size=3747.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1971.8, ups=0.25, wpb=7991.3, bsz=120, num_updates=3840, lr=2.98864e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=15648 2023-05-01 06:54:37 - progress_bar.py[line:274] - INFO: epoch 001: 3857 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7941.9, nsentences=120, sample_size=3910.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1952.8, ups=0.25, wpb=7941.9, bsz=120, num_updates=3850, lr=2.98812e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=41, gb_free=29.8, wall=15689 2023-05-01 06:55:16 - progress_bar.py[line:274] - INFO: epoch 001: 3867 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7684.6, nsentences=120, sample_size=3853.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1946.2, ups=0.25, wpb=7684.6, bsz=120, num_updates=3860, lr=2.98759e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=15729 2023-05-01 06:55:56 - progress_bar.py[line:274] - INFO: epoch 001: 3877 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7767.4, nsentences=120, sample_size=4184.6, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1953.9, ups=0.25, wpb=7767.4, bsz=120, num_updates=3870, lr=2.98706e-05, gnorm=0.924, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=15768 2023-05-01 06:56:35 - progress_bar.py[line:274] - INFO: epoch 001: 3887 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7478.5, nsentences=120, sample_size=4284.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1895.6, ups=0.25, wpb=7478.5, bsz=120, num_updates=3880, lr=2.98653e-05, gnorm=0.887, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=15808 2023-05-01 06:57:15 - progress_bar.py[line:274] - INFO: epoch 001: 3897 / 6042 loss=2.522, loss_v1=0, loss_v2=0, nll_loss=1.288, ntokens=7702.8, nsentences=120, sample_size=4019.2, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1919.4, ups=0.25, wpb=7702.8, bsz=120, num_updates=3890, lr=2.986e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=15848 2023-05-01 06:57:55 - progress_bar.py[line:274] - INFO: epoch 001: 3907 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7634.7, nsentences=120, sample_size=4218.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1942.3, ups=0.25, wpb=7634.7, bsz=120, num_updates=3900, lr=2.98547e-05, gnorm=0.882, clip=0, loss_scale=64, train_wall=39, gb_free=29.3, wall=15887 2023-05-01 06:58:34 - progress_bar.py[line:274] - INFO: epoch 001: 3917 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7670.6, nsentences=120, sample_size=4104.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1949.7, ups=0.25, wpb=7670.6, bsz=120, num_updates=3910, lr=2.98495e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=15927 2023-05-01 06:59:14 - progress_bar.py[line:274] - INFO: epoch 001: 3927 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7744, nsentences=120, sample_size=4144.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1946.5, ups=0.25, wpb=7744, bsz=120, num_updates=3920, lr=2.98442e-05, gnorm=0.909, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=15966 2023-05-01 06:59:54 - progress_bar.py[line:274] - INFO: epoch 001: 3937 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7735.6, nsentences=120, sample_size=4169, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1946.5, ups=0.25, wpb=7735.6, bsz=120, num_updates=3930, lr=2.98389e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=16006 2023-05-01 07:00:33 - progress_bar.py[line:274] - INFO: epoch 001: 3947 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.253, ntokens=7792.9, nsentences=120, sample_size=4010.8, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1954.7, ups=0.25, wpb=7792.9, bsz=120, num_updates=3940, lr=2.98336e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=16046 2023-05-01 07:01:14 - progress_bar.py[line:274] - INFO: epoch 001: 3957 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7841, nsentences=120, sample_size=3973.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1945, ups=0.25, wpb=7841, bsz=120, num_updates=3950, lr=2.98283e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=16086 2023-05-01 07:01:53 - progress_bar.py[line:274] - INFO: epoch 001: 3967 / 6042 loss=2.508, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=8053.9, nsentences=120, sample_size=4025.1, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=2039.6, ups=0.25, wpb=8053.9, bsz=120, num_updates=3960, lr=2.9823e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=28.4, wall=16126 2023-05-01 07:02:33 - progress_bar.py[line:274] - INFO: epoch 001: 3977 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7941, nsentences=120, sample_size=4306.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2000.2, ups=0.25, wpb=7941, bsz=120, num_updates=3970, lr=2.98178e-05, gnorm=0.887, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=16165 2023-05-01 07:03:13 - progress_bar.py[line:274] - INFO: epoch 001: 3987 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7621, nsentences=120, sample_size=3845.2, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1900.2, ups=0.25, wpb=7621, bsz=120, num_updates=3980, lr=2.98125e-05, gnorm=1.004, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=16206 2023-05-01 07:03:54 - progress_bar.py[line:274] - INFO: epoch 001: 3997 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=8039, nsentences=120, sample_size=4183.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1983.8, ups=0.25, wpb=8039, bsz=120, num_updates=3990, lr=2.98072e-05, gnorm=0.898, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=16246 2023-05-01 07:04:34 - progress_bar.py[line:274] - INFO: epoch 001: 4007 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7616.3, nsentences=120, sample_size=3761.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1908.1, ups=0.25, wpb=7616.3, bsz=120, num_updates=4000, lr=2.98019e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=16286 2023-05-01 07:04:34 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 07:04:35 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 07:04:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:04:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 07:04:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:04:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:04:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:04:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 07:05:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:05:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:15 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 07:05:15 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:05:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:19 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 07:05:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:05:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 07:05:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 07:05:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 07:05:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 07:05:24 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.222 | loss_v1 0 | loss_v2 0 | nll_loss 2.053 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.15 | score 0.7427 | wps 3296.3 | wpb 3202.1 | bsz 39.4 | num_updates 4000 | best_score 0.749 2023-05-01 07:05:24 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 4000 updates 2023-05-01 07:05:24 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_4000.pt 2023-05-01 07:05:49 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_4000.pt 2023-05-01 07:06:03 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_4000.pt (epoch 1 @ 4000 updates, score 0.7427) (writing took 38.115170079050586 seconds) 2023-05-01 07:06:42 - progress_bar.py[line:274] - INFO: epoch 001: 4017 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7494, nsentences=120, sample_size=3772.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=584.7, ups=0.08, wpb=7494, bsz=120, num_updates=4010, lr=2.97966e-05, gnorm=0.978, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=16414 2023-05-01 07:07:21 - progress_bar.py[line:274] - INFO: epoch 001: 4027 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7654.9, nsentences=120, sample_size=3905.7, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1952.2, ups=0.26, wpb=7654.9, bsz=120, num_updates=4020, lr=2.97914e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=16453 2023-05-01 07:08:01 - progress_bar.py[line:274] - INFO: epoch 001: 4037 / 6042 loss=2.537, loss_v1=0, loss_v2=0, nll_loss=1.303, ntokens=7882.2, nsentences=120, sample_size=3897.2, sample_size_v1=0, sample_size_v2=0, ppl=2.47, wps=1975.8, ups=0.25, wpb=7882.2, bsz=120, num_updates=4030, lr=2.97861e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=16493 2023-05-01 07:08:41 - progress_bar.py[line:274] - INFO: epoch 001: 4047 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7780, nsentences=120, sample_size=4088.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1949.4, ups=0.25, wpb=7780, bsz=120, num_updates=4040, lr=2.97808e-05, gnorm=0.904, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=16533 2023-05-01 07:09:21 - progress_bar.py[line:274] - INFO: epoch 001: 4057 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7818.9, nsentences=120, sample_size=3927.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1956.8, ups=0.25, wpb=7818.9, bsz=120, num_updates=4050, lr=2.97755e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=16573 2023-05-01 07:09:59 - progress_bar.py[line:274] - INFO: epoch 001: 4067 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7513.4, nsentences=120, sample_size=4209.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1936, ups=0.26, wpb=7513.4, bsz=120, num_updates=4060, lr=2.97702e-05, gnorm=0.927, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=16612 2023-05-01 07:10:40 - progress_bar.py[line:274] - INFO: epoch 001: 4077 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7969.6, nsentences=120, sample_size=3822.7, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1957.6, ups=0.25, wpb=7969.6, bsz=120, num_updates=4070, lr=2.97649e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=41, gb_free=27.8, wall=16653 2023-05-01 07:11:21 - progress_bar.py[line:274] - INFO: epoch 001: 4087 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=8017, nsentences=120, sample_size=4033.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1967.8, ups=0.25, wpb=8017, bsz=120, num_updates=4080, lr=2.97597e-05, gnorm=0.899, clip=0, loss_scale=64, train_wall=41, gb_free=31.4, wall=16693 2023-05-01 07:12:01 - progress_bar.py[line:274] - INFO: epoch 001: 4097 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7879.6, nsentences=120, sample_size=4052.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1985.1, ups=0.25, wpb=7879.6, bsz=120, num_updates=4090, lr=2.97544e-05, gnorm=0.869, clip=0, loss_scale=128, train_wall=40, gb_free=29.7, wall=16733 2023-05-01 07:12:29 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 07:12:44 - progress_bar.py[line:274] - INFO: epoch 001: 4108 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7562.3, nsentences=120, sample_size=3707.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1754.9, ups=0.23, wpb=7562.3, bsz=120, num_updates=4100, lr=2.97491e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=43, gb_free=29.4, wall=16776 2023-05-01 07:13:24 - progress_bar.py[line:274] - INFO: epoch 001: 4118 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7861.8, nsentences=120, sample_size=4046.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1940.4, ups=0.25, wpb=7861.8, bsz=120, num_updates=4110, lr=2.97438e-05, gnorm=0.882, clip=0, loss_scale=64, train_wall=40, gb_free=29.5, wall=16817 2023-05-01 07:14:04 - progress_bar.py[line:274] - INFO: epoch 001: 4128 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7492.7, nsentences=120, sample_size=3982.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1873.8, ups=0.25, wpb=7492.7, bsz=120, num_updates=4120, lr=2.97385e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=16857 2023-05-01 07:14:44 - progress_bar.py[line:274] - INFO: epoch 001: 4138 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7619, nsentences=120, sample_size=4005.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1925.4, ups=0.25, wpb=7619, bsz=120, num_updates=4130, lr=2.97333e-05, gnorm=0.905, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=16896 2023-05-01 07:15:23 - progress_bar.py[line:274] - INFO: epoch 001: 4148 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7706.5, nsentences=120, sample_size=4117, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.7, ups=0.26, wpb=7706.5, bsz=120, num_updates=4140, lr=2.9728e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=39, gb_free=30.8, wall=16935 2023-05-01 07:16:03 - progress_bar.py[line:274] - INFO: epoch 001: 4158 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7809, nsentences=120, sample_size=3755.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1970.2, ups=0.25, wpb=7809, bsz=120, num_updates=4150, lr=2.97227e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=16975 2023-05-01 07:16:42 - progress_bar.py[line:274] - INFO: epoch 001: 4168 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7713.7, nsentences=120, sample_size=4083.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1958.4, ups=0.25, wpb=7713.7, bsz=120, num_updates=4160, lr=2.97174e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=39, gb_free=30.6, wall=17014 2023-05-01 07:17:22 - progress_bar.py[line:274] - INFO: epoch 001: 4178 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7582.3, nsentences=120, sample_size=3875, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1900, ups=0.25, wpb=7582.3, bsz=120, num_updates=4170, lr=2.97121e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=28, wall=17054 2023-05-01 07:18:02 - progress_bar.py[line:274] - INFO: epoch 001: 4188 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7372.1, nsentences=120, sample_size=4132.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1835.2, ups=0.25, wpb=7372.1, bsz=120, num_updates=4180, lr=2.97068e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=24.8, wall=17094 2023-05-01 07:18:41 - progress_bar.py[line:274] - INFO: epoch 001: 4198 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7727.6, nsentences=120, sample_size=3864.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1999.2, ups=0.26, wpb=7727.6, bsz=120, num_updates=4190, lr=2.97016e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=17133 2023-05-01 07:19:20 - progress_bar.py[line:274] - INFO: epoch 001: 4208 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7763.1, nsentences=120, sample_size=3961.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1956.7, ups=0.25, wpb=7763.1, bsz=120, num_updates=4200, lr=2.96963e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=29.2, wall=17173 2023-05-01 07:20:00 - progress_bar.py[line:274] - INFO: epoch 001: 4218 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7880.8, nsentences=120, sample_size=3770.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1975, ups=0.25, wpb=7880.8, bsz=120, num_updates=4210, lr=2.9691e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=17213 2023-05-01 07:20:40 - progress_bar.py[line:274] - INFO: epoch 001: 4228 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7647.8, nsentences=120, sample_size=3977.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927.6, ups=0.25, wpb=7647.8, bsz=120, num_updates=4220, lr=2.96857e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=17252 2023-05-01 07:21:20 - progress_bar.py[line:274] - INFO: epoch 001: 4238 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7755.8, nsentences=120, sample_size=4163.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1941.4, ups=0.25, wpb=7755.8, bsz=120, num_updates=4230, lr=2.96804e-05, gnorm=0.898, clip=0, loss_scale=64, train_wall=40, gb_free=29.2, wall=17292 2023-05-01 07:22:00 - progress_bar.py[line:274] - INFO: epoch 001: 4248 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7711.2, nsentences=120, sample_size=4276, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1921.2, ups=0.25, wpb=7711.2, bsz=120, num_updates=4240, lr=2.96751e-05, gnorm=0.877, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=17332 2023-05-01 07:22:40 - progress_bar.py[line:274] - INFO: epoch 001: 4258 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7592.2, nsentences=120, sample_size=4111, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1910.4, ups=0.25, wpb=7592.2, bsz=120, num_updates=4250, lr=2.96699e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=17372 2023-05-01 07:23:19 - progress_bar.py[line:274] - INFO: epoch 001: 4268 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7539.4, nsentences=120, sample_size=4035.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1899.7, ups=0.25, wpb=7539.4, bsz=120, num_updates=4260, lr=2.96646e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=17412 2023-05-01 07:23:59 - progress_bar.py[line:274] - INFO: epoch 001: 4278 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7599.6, nsentences=120, sample_size=4330.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1909.2, ups=0.25, wpb=7599.6, bsz=120, num_updates=4270, lr=2.96593e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=17452 2023-05-01 07:24:40 - progress_bar.py[line:274] - INFO: epoch 001: 4288 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7618.9, nsentences=120, sample_size=3838.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1891.6, ups=0.25, wpb=7618.9, bsz=120, num_updates=4280, lr=2.9654e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=17492 2023-05-01 07:25:19 - progress_bar.py[line:274] - INFO: epoch 001: 4298 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=8009.8, nsentences=120, sample_size=4037.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2017.6, ups=0.25, wpb=8009.8, bsz=120, num_updates=4290, lr=2.96487e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=17532 2023-05-01 07:25:59 - progress_bar.py[line:274] - INFO: epoch 001: 4308 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7837.6, nsentences=120, sample_size=3956.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1983.8, ups=0.25, wpb=7837.6, bsz=120, num_updates=4300, lr=2.96435e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=17571 2023-05-01 07:26:38 - progress_bar.py[line:274] - INFO: epoch 001: 4318 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7744.4, nsentences=120, sample_size=4221.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1960.1, ups=0.25, wpb=7744.4, bsz=120, num_updates=4310, lr=2.96382e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=39, gb_free=27.6, wall=17611 2023-05-01 07:27:17 - progress_bar.py[line:274] - INFO: epoch 001: 4328 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7930.5, nsentences=120, sample_size=3995, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2022.7, ups=0.26, wpb=7930.5, bsz=120, num_updates=4320, lr=2.96329e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=17650 2023-05-01 07:27:58 - progress_bar.py[line:274] - INFO: epoch 001: 4338 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7535.9, nsentences=120, sample_size=4372.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1878.4, ups=0.25, wpb=7535.9, bsz=120, num_updates=4330, lr=2.96276e-05, gnorm=0.924, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=17690 2023-05-01 07:28:37 - progress_bar.py[line:274] - INFO: epoch 001: 4348 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7709.3, nsentences=120, sample_size=3992.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1953.5, ups=0.25, wpb=7709.3, bsz=120, num_updates=4340, lr=2.96223e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=39, gb_free=26.9, wall=17730 2023-05-01 07:29:17 - progress_bar.py[line:274] - INFO: epoch 001: 4358 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7851.9, nsentences=120, sample_size=4249.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1989.3, ups=0.25, wpb=7851.9, bsz=120, num_updates=4350, lr=2.9617e-05, gnorm=0.895, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=17769 2023-05-01 07:29:56 - progress_bar.py[line:274] - INFO: epoch 001: 4368 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7795.1, nsentences=120, sample_size=4072.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1961.9, ups=0.25, wpb=7795.1, bsz=120, num_updates=4360, lr=2.96118e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=17809 2023-05-01 07:30:37 - progress_bar.py[line:274] - INFO: epoch 001: 4378 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=8095.2, nsentences=120, sample_size=3965.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1999.4, ups=0.25, wpb=8095.2, bsz=120, num_updates=4370, lr=2.96065e-05, gnorm=0.915, clip=20, loss_scale=64, train_wall=40, gb_free=26, wall=17849 2023-05-01 07:31:17 - progress_bar.py[line:274] - INFO: epoch 001: 4388 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7863.8, nsentences=120, sample_size=4021.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1960.5, ups=0.25, wpb=7863.8, bsz=120, num_updates=4380, lr=2.96012e-05, gnorm=0.926, clip=0, loss_scale=64, train_wall=40, gb_free=29.7, wall=17889 2023-05-01 07:31:57 - progress_bar.py[line:274] - INFO: epoch 001: 4398 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=8043.1, nsentences=120, sample_size=4257.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1997.7, ups=0.25, wpb=8043.1, bsz=120, num_updates=4390, lr=2.95959e-05, gnorm=0.885, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=17930 2023-05-01 07:32:37 - progress_bar.py[line:274] - INFO: epoch 001: 4408 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7624.6, nsentences=120, sample_size=3848.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1931.3, ups=0.25, wpb=7624.6, bsz=120, num_updates=4400, lr=2.95906e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=17969 2023-05-01 07:33:17 - progress_bar.py[line:274] - INFO: epoch 001: 4418 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7541.3, nsentences=120, sample_size=4061.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1890.5, ups=0.25, wpb=7541.3, bsz=120, num_updates=4410, lr=2.95854e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=18009 2023-05-01 07:33:57 - progress_bar.py[line:274] - INFO: epoch 001: 4428 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7899.8, nsentences=120, sample_size=4086.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1967.5, ups=0.25, wpb=7899.8, bsz=120, num_updates=4420, lr=2.95801e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=40, gb_free=28, wall=18049 2023-05-01 07:34:37 - progress_bar.py[line:274] - INFO: epoch 001: 4438 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7681.6, nsentences=120, sample_size=4139.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1924.1, ups=0.25, wpb=7681.6, bsz=120, num_updates=4430, lr=2.95748e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=18089 2023-05-01 07:35:16 - progress_bar.py[line:274] - INFO: epoch 001: 4448 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7744.2, nsentences=120, sample_size=3903.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1969.2, ups=0.25, wpb=7744.2, bsz=120, num_updates=4440, lr=2.95695e-05, gnorm=0.915, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=18128 2023-05-01 07:35:55 - progress_bar.py[line:274] - INFO: epoch 001: 4458 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7614.7, nsentences=120, sample_size=3696.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1960.4, ups=0.26, wpb=7614.7, bsz=120, num_updates=4450, lr=2.95642e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=30.7, wall=18167 2023-05-01 07:36:35 - progress_bar.py[line:274] - INFO: epoch 001: 4468 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=8003.2, nsentences=120, sample_size=4132.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1984.2, ups=0.25, wpb=8003.2, bsz=120, num_updates=4460, lr=2.95589e-05, gnorm=0.918, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=18208 2023-05-01 07:37:15 - progress_bar.py[line:274] - INFO: epoch 001: 4478 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7803.2, nsentences=120, sample_size=3931.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1976.9, ups=0.25, wpb=7803.2, bsz=120, num_updates=4470, lr=2.95537e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=18247 2023-05-01 07:37:54 - progress_bar.py[line:274] - INFO: epoch 001: 4488 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7682.5, nsentences=120, sample_size=4065.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1950.9, ups=0.25, wpb=7682.5, bsz=120, num_updates=4480, lr=2.95484e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=18286 2023-05-01 07:38:34 - progress_bar.py[line:274] - INFO: epoch 001: 4498 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7673.9, nsentences=120, sample_size=4152.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1907.8, ups=0.25, wpb=7673.9, bsz=120, num_updates=4490, lr=2.95431e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=18327 2023-05-01 07:39:14 - progress_bar.py[line:274] - INFO: epoch 001: 4508 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7470.9, nsentences=120, sample_size=3885.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1869.6, ups=0.25, wpb=7470.9, bsz=120, num_updates=4500, lr=2.95378e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=18367 2023-05-01 07:39:54 - progress_bar.py[line:274] - INFO: epoch 001: 4518 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7520.1, nsentences=120, sample_size=4187, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1897.1, ups=0.25, wpb=7520.1, bsz=120, num_updates=4510, lr=2.95325e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=18406 2023-05-01 07:40:33 - progress_bar.py[line:274] - INFO: epoch 001: 4528 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7375.4, nsentences=120, sample_size=4021.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1865.6, ups=0.25, wpb=7375.4, bsz=120, num_updates=4520, lr=2.95272e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=18446 2023-05-01 07:41:12 - progress_bar.py[line:274] - INFO: epoch 001: 4538 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7595.2, nsentences=120, sample_size=4095.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1985.2, ups=0.26, wpb=7595.2, bsz=120, num_updates=4530, lr=2.9522e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=38, gb_free=30.5, wall=18484 2023-05-01 07:41:51 - progress_bar.py[line:274] - INFO: epoch 001: 4548 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7878.6, nsentences=120, sample_size=4017.4, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1989.5, ups=0.25, wpb=7878.6, bsz=120, num_updates=4540, lr=2.95167e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=28.1, wall=18524 2023-05-01 07:42:31 - progress_bar.py[line:274] - INFO: epoch 001: 4558 / 6042 loss=2.514, loss_v1=0, loss_v2=0, nll_loss=1.28, ntokens=7827.7, nsentences=120, sample_size=4203.8, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1978.7, ups=0.25, wpb=7827.7, bsz=120, num_updates=4550, lr=2.95114e-05, gnorm=0.904, clip=0, loss_scale=64, train_wall=39, gb_free=30, wall=18563 2023-05-01 07:43:10 - progress_bar.py[line:274] - INFO: epoch 001: 4568 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7501.4, nsentences=120, sample_size=3846.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1886.6, ups=0.25, wpb=7501.4, bsz=120, num_updates=4560, lr=2.95061e-05, gnorm=0.934, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=18603 2023-05-01 07:43:50 - progress_bar.py[line:274] - INFO: epoch 001: 4578 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7627.4, nsentences=120, sample_size=3997.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1943.1, ups=0.25, wpb=7627.4, bsz=120, num_updates=4570, lr=2.95008e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=39, gb_free=28.5, wall=18642 2023-05-01 07:44:29 - progress_bar.py[line:274] - INFO: epoch 001: 4588 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7603, nsentences=120, sample_size=3961.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1949.8, ups=0.26, wpb=7603, bsz=120, num_updates=4580, lr=2.94956e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=18681 2023-05-01 07:45:08 - progress_bar.py[line:274] - INFO: epoch 001: 4598 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7714.1, nsentences=120, sample_size=4132, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1965, ups=0.25, wpb=7714.1, bsz=120, num_updates=4590, lr=2.94903e-05, gnorm=0.881, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=18720 2023-05-01 07:45:48 - progress_bar.py[line:274] - INFO: epoch 001: 4608 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7957.8, nsentences=120, sample_size=4044.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1983.8, ups=0.25, wpb=7957.8, bsz=120, num_updates=4600, lr=2.9485e-05, gnorm=0.896, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=18761 2023-05-01 07:46:28 - progress_bar.py[line:274] - INFO: epoch 001: 4618 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7896.1, nsentences=120, sample_size=4056.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1970.9, ups=0.25, wpb=7896.1, bsz=120, num_updates=4610, lr=2.94797e-05, gnorm=0.925, clip=10, loss_scale=128, train_wall=40, gb_free=28.4, wall=18801 2023-05-01 07:46:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 07:47:12 - progress_bar.py[line:274] - INFO: epoch 001: 4629 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7806.4, nsentences=120, sample_size=3993.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1785.3, ups=0.23, wpb=7806.4, bsz=120, num_updates=4620, lr=2.94744e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=44, gb_free=28.5, wall=18844 2023-05-01 07:47:51 - progress_bar.py[line:274] - INFO: epoch 001: 4639 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7870.6, nsentences=120, sample_size=4163.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1994.8, ups=0.25, wpb=7870.6, bsz=120, num_updates=4630, lr=2.94691e-05, gnorm=0.864, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=18884 2023-05-01 07:48:32 - progress_bar.py[line:274] - INFO: epoch 001: 4649 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=8011.7, nsentences=120, sample_size=3867.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1970, ups=0.25, wpb=8011.7, bsz=120, num_updates=4640, lr=2.94639e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=41, gb_free=28.3, wall=18925 2023-05-01 07:49:12 - progress_bar.py[line:274] - INFO: epoch 001: 4659 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7803.3, nsentences=120, sample_size=4134.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1952.1, ups=0.25, wpb=7803.3, bsz=120, num_updates=4650, lr=2.94586e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=40, gb_free=28.1, wall=18964 2023-05-01 07:49:52 - progress_bar.py[line:274] - INFO: epoch 001: 4669 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7723.7, nsentences=120, sample_size=3816, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1945.7, ups=0.25, wpb=7723.7, bsz=120, num_updates=4660, lr=2.94533e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=19004 2023-05-01 07:50:31 - progress_bar.py[line:274] - INFO: epoch 001: 4679 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7671, nsentences=120, sample_size=3872.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1944.9, ups=0.25, wpb=7671, bsz=120, num_updates=4670, lr=2.9448e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=19044 2023-05-01 07:51:11 - progress_bar.py[line:274] - INFO: epoch 001: 4689 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7973.9, nsentences=120, sample_size=4170.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2007.2, ups=0.25, wpb=7973.9, bsz=120, num_updates=4680, lr=2.94427e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=19083 2023-05-01 07:51:52 - progress_bar.py[line:274] - INFO: epoch 001: 4699 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7855.1, nsentences=120, sample_size=3882.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1928, ups=0.25, wpb=7855.1, bsz=120, num_updates=4690, lr=2.94375e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=41, gb_free=30.5, wall=19124 2023-05-01 07:52:32 - progress_bar.py[line:274] - INFO: epoch 001: 4709 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7971.9, nsentences=120, sample_size=4132, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1992.5, ups=0.25, wpb=7971.9, bsz=120, num_updates=4700, lr=2.94322e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=19164 2023-05-01 07:53:12 - progress_bar.py[line:274] - INFO: epoch 001: 4719 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7789.1, nsentences=120, sample_size=3961.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1937.6, ups=0.25, wpb=7789.1, bsz=120, num_updates=4710, lr=2.94269e-05, gnorm=0.887, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=19204 2023-05-01 07:53:51 - progress_bar.py[line:274] - INFO: epoch 001: 4729 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7804.3, nsentences=120, sample_size=3763.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1995.3, ups=0.26, wpb=7804.3, bsz=120, num_updates=4720, lr=2.94216e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=39, gb_free=29.2, wall=19243 2023-05-01 07:54:31 - progress_bar.py[line:274] - INFO: epoch 001: 4739 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7591.5, nsentences=120, sample_size=3864.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1873, ups=0.25, wpb=7591.5, bsz=120, num_updates=4730, lr=2.94163e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=19284 2023-05-01 07:55:11 - progress_bar.py[line:274] - INFO: epoch 001: 4749 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7441.5, nsentences=120, sample_size=4016.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1898.8, ups=0.26, wpb=7441.5, bsz=120, num_updates=4740, lr=2.9411e-05, gnorm=0.915, clip=0, loss_scale=64, train_wall=39, gb_free=30.7, wall=19323 2023-05-01 07:55:51 - progress_bar.py[line:274] - INFO: epoch 001: 4759 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7657.3, nsentences=120, sample_size=4025.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1905.4, ups=0.25, wpb=7657.3, bsz=120, num_updates=4750, lr=2.94058e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=29.6, wall=19363 2023-05-01 07:56:30 - progress_bar.py[line:274] - INFO: epoch 001: 4769 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7527, nsentences=120, sample_size=3948.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1903.1, ups=0.25, wpb=7527, bsz=120, num_updates=4760, lr=2.94005e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=19403 2023-05-01 07:57:09 - progress_bar.py[line:274] - INFO: epoch 001: 4779 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7728.2, nsentences=120, sample_size=4173.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1991.8, ups=0.26, wpb=7728.2, bsz=120, num_updates=4770, lr=2.93952e-05, gnorm=0.883, clip=0, loss_scale=64, train_wall=39, gb_free=28.8, wall=19442 2023-05-01 07:57:49 - progress_bar.py[line:274] - INFO: epoch 001: 4789 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7746.1, nsentences=120, sample_size=4028.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1925.8, ups=0.25, wpb=7746.1, bsz=120, num_updates=4780, lr=2.93899e-05, gnorm=0.898, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=19482 2023-05-01 07:58:29 - progress_bar.py[line:274] - INFO: epoch 001: 4799 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=8093.5, nsentences=120, sample_size=4121, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2024.9, ups=0.25, wpb=8093.5, bsz=120, num_updates=4790, lr=2.93846e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=27.2, wall=19522 2023-05-01 07:59:09 - progress_bar.py[line:274] - INFO: epoch 001: 4809 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7500.7, nsentences=120, sample_size=4045.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1910.4, ups=0.25, wpb=7500.7, bsz=120, num_updates=4800, lr=2.93793e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=19561 2023-05-01 07:59:48 - progress_bar.py[line:274] - INFO: epoch 001: 4819 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7485.9, nsentences=120, sample_size=4015.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1896, ups=0.25, wpb=7485.9, bsz=120, num_updates=4810, lr=2.93741e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=19601 2023-05-01 08:00:27 - progress_bar.py[line:274] - INFO: epoch 001: 4829 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7988.8, nsentences=120, sample_size=3819.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2045.1, ups=0.26, wpb=7988.8, bsz=120, num_updates=4820, lr=2.93688e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=19640 2023-05-01 08:01:07 - progress_bar.py[line:274] - INFO: epoch 001: 4839 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7736.3, nsentences=120, sample_size=4097, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1943.9, ups=0.25, wpb=7736.3, bsz=120, num_updates=4830, lr=2.93635e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=19679 2023-05-01 08:01:47 - progress_bar.py[line:274] - INFO: epoch 001: 4849 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7801.3, nsentences=120, sample_size=3943, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1940.8, ups=0.25, wpb=7801.3, bsz=120, num_updates=4840, lr=2.93582e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=19720 2023-05-01 08:02:19 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 08:02:31 - progress_bar.py[line:274] - INFO: epoch 001: 4860 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7702.4, nsentences=120, sample_size=3744.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1741.9, ups=0.23, wpb=7702.4, bsz=120, num_updates=4850, lr=2.93529e-05, gnorm=0.939, clip=20, loss_scale=32, train_wall=44, gb_free=29.8, wall=19764 2023-05-01 08:03:11 - progress_bar.py[line:274] - INFO: epoch 001: 4870 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7692.2, nsentences=120, sample_size=3854, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1940.5, ups=0.25, wpb=7692.2, bsz=120, num_updates=4860, lr=2.93477e-05, gnorm=0.976, clip=40, loss_scale=32, train_wall=40, gb_free=29.5, wall=19804 2023-05-01 08:03:51 - progress_bar.py[line:274] - INFO: epoch 001: 4880 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7710.1, nsentences=120, sample_size=3942.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1921.1, ups=0.25, wpb=7710.1, bsz=120, num_updates=4870, lr=2.93424e-05, gnorm=0.935, clip=10, loss_scale=32, train_wall=40, gb_free=29.3, wall=19844 2023-05-01 08:04:31 - progress_bar.py[line:274] - INFO: epoch 001: 4890 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7840.9, nsentences=120, sample_size=4175.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1968.1, ups=0.25, wpb=7840.9, bsz=120, num_updates=4880, lr=2.93371e-05, gnorm=0.875, clip=0, loss_scale=32, train_wall=40, gb_free=25.5, wall=19884 2023-05-01 08:05:10 - progress_bar.py[line:274] - INFO: epoch 001: 4900 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7626.9, nsentences=120, sample_size=4240, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1941.9, ups=0.25, wpb=7626.9, bsz=120, num_updates=4890, lr=2.93318e-05, gnorm=0.885, clip=0, loss_scale=32, train_wall=39, gb_free=29.6, wall=19923 2023-05-01 08:05:50 - progress_bar.py[line:274] - INFO: epoch 001: 4910 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7406.7, nsentences=120, sample_size=3828.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1876.5, ups=0.25, wpb=7406.7, bsz=120, num_updates=4900, lr=2.93265e-05, gnorm=0.931, clip=0, loss_scale=32, train_wall=39, gb_free=29.7, wall=19962 2023-05-01 08:06:29 - progress_bar.py[line:274] - INFO: epoch 001: 4920 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7700.8, nsentences=120, sample_size=4288.2, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1954.2, ups=0.25, wpb=7700.8, bsz=120, num_updates=4910, lr=2.93212e-05, gnorm=0.884, clip=0, loss_scale=32, train_wall=39, gb_free=29.2, wall=20002 2023-05-01 08:07:10 - progress_bar.py[line:274] - INFO: epoch 001: 4930 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7501.8, nsentences=120, sample_size=3941.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1850.7, ups=0.25, wpb=7501.8, bsz=120, num_updates=4920, lr=2.9316e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=40, gb_free=29.6, wall=20042 2023-05-01 08:07:49 - progress_bar.py[line:274] - INFO: epoch 001: 4940 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7520.9, nsentences=120, sample_size=4427.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1902.4, ups=0.25, wpb=7520.9, bsz=120, num_updates=4930, lr=2.93107e-05, gnorm=0.853, clip=0, loss_scale=32, train_wall=39, gb_free=30.3, wall=20082 2023-05-01 08:08:30 - progress_bar.py[line:274] - INFO: epoch 001: 4950 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7687.4, nsentences=120, sample_size=3904.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1910.8, ups=0.25, wpb=7687.4, bsz=120, num_updates=4940, lr=2.93054e-05, gnorm=0.896, clip=0, loss_scale=32, train_wall=40, gb_free=28.5, wall=20122 2023-05-01 08:09:09 - progress_bar.py[line:274] - INFO: epoch 001: 4960 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7421, nsentences=120, sample_size=4124.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1856.8, ups=0.25, wpb=7421, bsz=120, num_updates=4950, lr=2.93001e-05, gnorm=0.906, clip=10, loss_scale=32, train_wall=40, gb_free=30.1, wall=20162 2023-05-01 08:09:50 - progress_bar.py[line:274] - INFO: epoch 001: 4970 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7684.4, nsentences=120, sample_size=3987.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1920.1, ups=0.25, wpb=7684.4, bsz=120, num_updates=4960, lr=2.92948e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=20202 2023-05-01 08:10:30 - progress_bar.py[line:274] - INFO: epoch 001: 4980 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7881.7, nsentences=120, sample_size=3861.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1962.5, ups=0.25, wpb=7881.7, bsz=120, num_updates=4970, lr=2.92896e-05, gnorm=0.934, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=20242 2023-05-01 08:11:09 - progress_bar.py[line:274] - INFO: epoch 001: 4990 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7576.7, nsentences=120, sample_size=4111.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1905.5, ups=0.25, wpb=7576.7, bsz=120, num_updates=4980, lr=2.92843e-05, gnorm=0.886, clip=0, loss_scale=32, train_wall=40, gb_free=29.5, wall=20282 2023-05-01 08:11:49 - progress_bar.py[line:274] - INFO: epoch 001: 5000 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=8180.1, nsentences=120, sample_size=3828.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2043.8, ups=0.25, wpb=8180.1, bsz=120, num_updates=4990, lr=2.9279e-05, gnorm=0.908, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=20322 2023-05-01 08:12:29 - progress_bar.py[line:274] - INFO: epoch 001: 5010 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7453.9, nsentences=120, sample_size=4299.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1862.1, ups=0.25, wpb=7453.9, bsz=120, num_updates=5000, lr=2.92737e-05, gnorm=0.862, clip=0, loss_scale=32, train_wall=40, gb_free=30.7, wall=20362 2023-05-01 08:12:29 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 08:12:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 08:12:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:12:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:48 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 08:12:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:12:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:12:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:12:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:00 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 08:13:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:13:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 08:13:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:13:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:16 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 08:13:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:13:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 08:13:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 08:13:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 08:13:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 08:13:21 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.211 | loss_v1 0 | loss_v2 0 | nll_loss 2.043 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.12 | score 0.7417 | wps 3274.8 | wpb 3202.1 | bsz 39.4 | num_updates 5000 | best_score 0.749 2023-05-01 08:13:21 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 5000 updates 2023-05-01 08:13:21 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_5000.pt 2023-05-01 08:13:45 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_5000.pt 2023-05-01 08:13:59 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_5000.pt (epoch 1 @ 5000 updates, score 0.7417) (writing took 37.941789399133995 seconds) 2023-05-01 08:14:38 - progress_bar.py[line:274] - INFO: epoch 001: 5020 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7748.6, nsentences=120, sample_size=3848.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=604.1, ups=0.08, wpb=7748.6, bsz=120, num_updates=5010, lr=2.92684e-05, gnorm=0.922, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=20490 2023-05-01 08:15:17 - progress_bar.py[line:274] - INFO: epoch 001: 5030 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7733.1, nsentences=120, sample_size=4045.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1949, ups=0.25, wpb=7733.1, bsz=120, num_updates=5020, lr=2.92631e-05, gnorm=0.899, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=20530 2023-05-01 08:15:57 - progress_bar.py[line:274] - INFO: epoch 001: 5040 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7653.1, nsentences=120, sample_size=4117, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1934.3, ups=0.25, wpb=7653.1, bsz=120, num_updates=5030, lr=2.92579e-05, gnorm=0.891, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=20569 2023-05-01 08:16:37 - progress_bar.py[line:274] - INFO: epoch 001: 5050 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7735.3, nsentences=120, sample_size=3931.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1958.2, ups=0.25, wpb=7735.3, bsz=120, num_updates=5040, lr=2.92526e-05, gnorm=0.918, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=20609 2023-05-01 08:17:17 - progress_bar.py[line:274] - INFO: epoch 001: 5060 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7495.5, nsentences=120, sample_size=4132.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1870.7, ups=0.25, wpb=7495.5, bsz=120, num_updates=5050, lr=2.92473e-05, gnorm=0.943, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=20649 2023-05-01 08:17:57 - progress_bar.py[line:274] - INFO: epoch 001: 5070 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7740.5, nsentences=120, sample_size=4145.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1909.6, ups=0.25, wpb=7740.5, bsz=120, num_updates=5060, lr=2.9242e-05, gnorm=0.878, clip=0, loss_scale=32, train_wall=40, gb_free=27.6, wall=20690 2023-05-01 08:18:37 - progress_bar.py[line:274] - INFO: epoch 001: 5080 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7869.1, nsentences=120, sample_size=3922.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1975.5, ups=0.25, wpb=7869.1, bsz=120, num_updates=5070, lr=2.92367e-05, gnorm=0.896, clip=10, loss_scale=32, train_wall=40, gb_free=29.3, wall=20729 2023-05-01 08:19:16 - progress_bar.py[line:274] - INFO: epoch 001: 5090 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7622.1, nsentences=120, sample_size=3913.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1936, ups=0.25, wpb=7622.1, bsz=120, num_updates=5080, lr=2.92314e-05, gnorm=0.91, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=20769 2023-05-01 08:19:56 - progress_bar.py[line:274] - INFO: epoch 001: 5100 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7360.1, nsentences=120, sample_size=3892.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1858.7, ups=0.25, wpb=7360.1, bsz=120, num_updates=5090, lr=2.92262e-05, gnorm=0.924, clip=0, loss_scale=32, train_wall=40, gb_free=29, wall=20808 2023-05-01 08:20:35 - progress_bar.py[line:274] - INFO: epoch 001: 5110 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7796.4, nsentences=120, sample_size=4002.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1985.9, ups=0.25, wpb=7796.4, bsz=120, num_updates=5100, lr=2.92209e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=39, gb_free=30.9, wall=20848 2023-05-01 08:21:16 - progress_bar.py[line:274] - INFO: epoch 001: 5120 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7720.8, nsentences=120, sample_size=4250.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1901, ups=0.25, wpb=7720.8, bsz=120, num_updates=5110, lr=2.92156e-05, gnorm=0.887, clip=10, loss_scale=32, train_wall=41, gb_free=29.6, wall=20888 2023-05-01 08:21:55 - progress_bar.py[line:274] - INFO: epoch 001: 5130 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7845, nsentences=120, sample_size=3954.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2017.8, ups=0.26, wpb=7845, bsz=120, num_updates=5120, lr=2.92103e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=39, gb_free=30.5, wall=20927 2023-05-01 08:22:35 - progress_bar.py[line:274] - INFO: epoch 001: 5140 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7524.4, nsentences=120, sample_size=4282.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1887.7, ups=0.25, wpb=7524.4, bsz=120, num_updates=5130, lr=2.9205e-05, gnorm=0.891, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=20967 2023-05-01 08:23:14 - progress_bar.py[line:274] - INFO: epoch 001: 5150 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7640.9, nsentences=120, sample_size=4085.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1929.4, ups=0.25, wpb=7640.9, bsz=120, num_updates=5140, lr=2.91998e-05, gnorm=0.939, clip=20, loss_scale=32, train_wall=40, gb_free=30.6, wall=21007 2023-05-01 08:23:55 - progress_bar.py[line:274] - INFO: epoch 001: 5160 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7763.5, nsentences=120, sample_size=4105.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1904.1, ups=0.25, wpb=7763.5, bsz=120, num_updates=5150, lr=2.91945e-05, gnorm=0.889, clip=0, loss_scale=32, train_wall=41, gb_free=30.1, wall=21047 2023-05-01 08:24:35 - progress_bar.py[line:274] - INFO: epoch 001: 5170 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7920.1, nsentences=120, sample_size=3979.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1963, ups=0.25, wpb=7920.1, bsz=120, num_updates=5160, lr=2.91892e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=21088 2023-05-01 08:25:15 - progress_bar.py[line:274] - INFO: epoch 001: 5180 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7711.2, nsentences=120, sample_size=4267.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1929, ups=0.25, wpb=7711.2, bsz=120, num_updates=5170, lr=2.91839e-05, gnorm=0.889, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=21128 2023-05-01 08:25:56 - progress_bar.py[line:274] - INFO: epoch 001: 5190 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7714, nsentences=120, sample_size=3727.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914.9, ups=0.25, wpb=7714, bsz=120, num_updates=5180, lr=2.91786e-05, gnorm=0.918, clip=10, loss_scale=32, train_wall=40, gb_free=31, wall=21168 2023-05-01 08:26:35 - progress_bar.py[line:274] - INFO: epoch 001: 5200 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7515.7, nsentences=120, sample_size=3830.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923.6, ups=0.26, wpb=7515.7, bsz=120, num_updates=5190, lr=2.91733e-05, gnorm=0.958, clip=20, loss_scale=32, train_wall=39, gb_free=30.7, wall=21207 2023-05-01 08:27:14 - progress_bar.py[line:274] - INFO: epoch 001: 5210 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7823.1, nsentences=120, sample_size=3780.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2008.7, ups=0.26, wpb=7823.1, bsz=120, num_updates=5200, lr=2.91681e-05, gnorm=0.972, clip=30, loss_scale=32, train_wall=39, gb_free=30.8, wall=21246 2023-05-01 08:27:54 - progress_bar.py[line:274] - INFO: epoch 001: 5220 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=8208.2, nsentences=120, sample_size=3893.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2042, ups=0.25, wpb=8208.2, bsz=120, num_updates=5210, lr=2.91628e-05, gnorm=0.944, clip=10, loss_scale=32, train_wall=40, gb_free=27.6, wall=21286 2023-05-01 08:28:33 - progress_bar.py[line:274] - INFO: epoch 001: 5230 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7626.7, nsentences=120, sample_size=4116.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1939.9, ups=0.25, wpb=7626.7, bsz=120, num_updates=5220, lr=2.91575e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=39, gb_free=28, wall=21326 2023-05-01 08:29:13 - progress_bar.py[line:274] - INFO: epoch 001: 5240 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7826.8, nsentences=120, sample_size=3811.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1977.6, ups=0.25, wpb=7826.8, bsz=120, num_updates=5230, lr=2.91522e-05, gnorm=0.913, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=21365 2023-05-01 08:29:53 - progress_bar.py[line:274] - INFO: epoch 001: 5250 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7892.4, nsentences=120, sample_size=4031.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1954.2, ups=0.25, wpb=7892.4, bsz=120, num_updates=5240, lr=2.91469e-05, gnorm=0.917, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=21405 2023-05-01 08:30:32 - progress_bar.py[line:274] - INFO: epoch 001: 5260 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7698.5, nsentences=120, sample_size=4241.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1966.7, ups=0.26, wpb=7698.5, bsz=120, num_updates=5250, lr=2.91416e-05, gnorm=0.9, clip=0, loss_scale=32, train_wall=39, gb_free=29.4, wall=21445 2023-05-01 08:31:12 - progress_bar.py[line:274] - INFO: epoch 001: 5270 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7731.4, nsentences=120, sample_size=3854, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1961.4, ups=0.25, wpb=7731.4, bsz=120, num_updates=5260, lr=2.91364e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=39, gb_free=31, wall=21484 2023-05-01 08:31:51 - progress_bar.py[line:274] - INFO: epoch 001: 5280 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7983.5, nsentences=120, sample_size=3820.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2020.8, ups=0.25, wpb=7983.5, bsz=120, num_updates=5270, lr=2.91311e-05, gnorm=0.914, clip=0, loss_scale=32, train_wall=39, gb_free=30.1, wall=21524 2023-05-01 08:32:30 - progress_bar.py[line:274] - INFO: epoch 001: 5290 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7353.9, nsentences=120, sample_size=4050.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1880.6, ups=0.26, wpb=7353.9, bsz=120, num_updates=5280, lr=2.91258e-05, gnorm=0.905, clip=0, loss_scale=32, train_wall=39, gb_free=30, wall=21563 2023-05-01 08:33:10 - progress_bar.py[line:274] - INFO: epoch 001: 5300 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7490.5, nsentences=120, sample_size=3921.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1904.6, ups=0.25, wpb=7490.5, bsz=120, num_updates=5290, lr=2.91205e-05, gnorm=0.922, clip=10, loss_scale=32, train_wall=39, gb_free=30.2, wall=21602 2023-05-01 08:33:49 - progress_bar.py[line:274] - INFO: epoch 001: 5310 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7626.7, nsentences=120, sample_size=3941.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1940.5, ups=0.25, wpb=7626.7, bsz=120, num_updates=5300, lr=2.91152e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=39, gb_free=29.4, wall=21641 2023-05-01 08:34:28 - progress_bar.py[line:274] - INFO: epoch 001: 5320 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7512, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1911.4, ups=0.25, wpb=7512, bsz=120, num_updates=5310, lr=2.911e-05, gnorm=0.905, clip=10, loss_scale=32, train_wall=39, gb_free=30.8, wall=21681 2023-05-01 08:35:08 - progress_bar.py[line:274] - INFO: epoch 001: 5330 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7867.5, nsentences=120, sample_size=4444.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1985.1, ups=0.25, wpb=7867.5, bsz=120, num_updates=5320, lr=2.91047e-05, gnorm=0.858, clip=0, loss_scale=32, train_wall=40, gb_free=29.1, wall=21720 2023-05-01 08:35:48 - progress_bar.py[line:274] - INFO: epoch 001: 5340 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7761.9, nsentences=120, sample_size=3940.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1950.3, ups=0.25, wpb=7761.9, bsz=120, num_updates=5330, lr=2.90994e-05, gnorm=0.902, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=21760 2023-05-01 08:36:27 - progress_bar.py[line:274] - INFO: epoch 001: 5350 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7899.6, nsentences=120, sample_size=3658.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1983.6, ups=0.25, wpb=7899.6, bsz=120, num_updates=5340, lr=2.90941e-05, gnorm=0.923, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=21800 2023-05-01 08:37:07 - progress_bar.py[line:274] - INFO: epoch 001: 5360 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=8076.6, nsentences=120, sample_size=4120.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2027.6, ups=0.25, wpb=8076.6, bsz=120, num_updates=5350, lr=2.90888e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=21840 2023-05-01 08:37:47 - progress_bar.py[line:274] - INFO: epoch 001: 5370 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7622.8, nsentences=120, sample_size=4166.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1929.1, ups=0.25, wpb=7622.8, bsz=120, num_updates=5360, lr=2.90835e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=39, gb_free=31, wall=21879 2023-05-01 08:38:28 - progress_bar.py[line:274] - INFO: epoch 001: 5380 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7796.2, nsentences=120, sample_size=4048.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1908.8, ups=0.24, wpb=7796.2, bsz=120, num_updates=5370, lr=2.90783e-05, gnorm=0.882, clip=10, loss_scale=64, train_wall=41, gb_free=30, wall=21920 2023-05-01 08:39:08 - progress_bar.py[line:274] - INFO: epoch 001: 5390 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7808.4, nsentences=120, sample_size=4018.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1945.2, ups=0.25, wpb=7808.4, bsz=120, num_updates=5380, lr=2.9073e-05, gnorm=0.895, clip=20, loss_scale=64, train_wall=40, gb_free=23.6, wall=21960 2023-05-01 08:39:47 - progress_bar.py[line:274] - INFO: epoch 001: 5400 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7791.3, nsentences=120, sample_size=4299.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1990.1, ups=0.26, wpb=7791.3, bsz=120, num_updates=5390, lr=2.90677e-05, gnorm=0.896, clip=20, loss_scale=64, train_wall=39, gb_free=28.2, wall=21999 2023-05-01 08:40:26 - progress_bar.py[line:274] - INFO: epoch 001: 5410 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7780.3, nsentences=120, sample_size=4022.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1979.6, ups=0.25, wpb=7780.3, bsz=120, num_updates=5400, lr=2.90624e-05, gnorm=0.895, clip=10, loss_scale=64, train_wall=39, gb_free=31.6, wall=22039 2023-05-01 08:41:06 - progress_bar.py[line:274] - INFO: epoch 001: 5420 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7692.3, nsentences=120, sample_size=3988.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1946.4, ups=0.25, wpb=7692.3, bsz=120, num_updates=5410, lr=2.90571e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=26.8, wall=22078 2023-05-01 08:41:45 - progress_bar.py[line:274] - INFO: epoch 001: 5430 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7824, nsentences=120, sample_size=3846.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1989.4, ups=0.25, wpb=7824, bsz=120, num_updates=5420, lr=2.90519e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=22118 2023-05-01 08:42:25 - progress_bar.py[line:274] - INFO: epoch 001: 5440 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7587.4, nsentences=120, sample_size=3928.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1916.3, ups=0.25, wpb=7587.4, bsz=120, num_updates=5430, lr=2.90466e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=22157 2023-05-01 08:43:05 - progress_bar.py[line:274] - INFO: epoch 001: 5450 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7943.7, nsentences=120, sample_size=4103, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1972.3, ups=0.25, wpb=7943.7, bsz=120, num_updates=5440, lr=2.90413e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=22197 2023-05-01 08:43:45 - progress_bar.py[line:274] - INFO: epoch 001: 5460 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7496, nsentences=120, sample_size=4243.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1893.5, ups=0.25, wpb=7496, bsz=120, num_updates=5450, lr=2.9036e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=22237 2023-05-01 08:44:24 - progress_bar.py[line:274] - INFO: epoch 001: 5470 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7748.4, nsentences=120, sample_size=3955.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1969.2, ups=0.25, wpb=7748.4, bsz=120, num_updates=5460, lr=2.90307e-05, gnorm=0.877, clip=0, loss_scale=64, train_wall=39, gb_free=29.3, wall=22276 2023-05-01 08:45:03 - progress_bar.py[line:274] - INFO: epoch 001: 5480 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7414.4, nsentences=120, sample_size=4391.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1911, ups=0.26, wpb=7414.4, bsz=120, num_updates=5470, lr=2.90254e-05, gnorm=0.882, clip=0, loss_scale=64, train_wall=39, gb_free=26.4, wall=22315 2023-05-01 08:45:43 - progress_bar.py[line:274] - INFO: epoch 001: 5490 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7897.7, nsentences=120, sample_size=3955.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.4, ups=0.25, wpb=7897.7, bsz=120, num_updates=5480, lr=2.90202e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=28.9, wall=22355 2023-05-01 08:46:23 - progress_bar.py[line:274] - INFO: epoch 001: 5500 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7978.7, nsentences=120, sample_size=4284.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1994, ups=0.25, wpb=7978.7, bsz=120, num_updates=5490, lr=2.90149e-05, gnorm=0.872, clip=0, loss_scale=64, train_wall=40, gb_free=29.1, wall=22395 2023-05-01 08:47:03 - progress_bar.py[line:274] - INFO: epoch 001: 5510 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7601.5, nsentences=120, sample_size=3824.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1878.5, ups=0.25, wpb=7601.5, bsz=120, num_updates=5500, lr=2.90096e-05, gnorm=0.901, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=22436 2023-05-01 08:47:43 - progress_bar.py[line:274] - INFO: epoch 001: 5520 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7705.8, nsentences=120, sample_size=4029.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1932.1, ups=0.25, wpb=7705.8, bsz=120, num_updates=5510, lr=2.90043e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=22476 2023-05-01 08:48:23 - progress_bar.py[line:274] - INFO: epoch 001: 5530 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7617, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1936.4, ups=0.25, wpb=7617, bsz=120, num_updates=5520, lr=2.8999e-05, gnorm=0.921, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=22515 2023-05-01 08:49:03 - progress_bar.py[line:274] - INFO: epoch 001: 5540 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7639.6, nsentences=120, sample_size=4120.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1908.6, ups=0.25, wpb=7639.6, bsz=120, num_updates=5530, lr=2.89937e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=22555 2023-05-01 08:49:42 - progress_bar.py[line:274] - INFO: epoch 001: 5550 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7598, nsentences=120, sample_size=4175.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1932.5, ups=0.25, wpb=7598, bsz=120, num_updates=5540, lr=2.89885e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=29.4, wall=22594 2023-05-01 08:50:21 - progress_bar.py[line:274] - INFO: epoch 001: 5560 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7674.2, nsentences=120, sample_size=4101.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1958.3, ups=0.26, wpb=7674.2, bsz=120, num_updates=5550, lr=2.89832e-05, gnorm=0.907, clip=0, loss_scale=64, train_wall=39, gb_free=29, wall=22634 2023-05-01 08:51:00 - progress_bar.py[line:274] - INFO: epoch 001: 5570 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7885.9, nsentences=120, sample_size=3907.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2006.3, ups=0.25, wpb=7885.9, bsz=120, num_updates=5560, lr=2.89779e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=22673 2023-05-01 08:51:41 - progress_bar.py[line:274] - INFO: epoch 001: 5580 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7738.9, nsentences=120, sample_size=4336.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1912.9, ups=0.25, wpb=7738.9, bsz=120, num_updates=5570, lr=2.89726e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=40, gb_free=28.8, wall=22713 2023-05-01 08:52:21 - progress_bar.py[line:274] - INFO: epoch 001: 5590 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7806.8, nsentences=120, sample_size=4053.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1944.3, ups=0.25, wpb=7806.8, bsz=120, num_updates=5580, lr=2.89673e-05, gnorm=0.89, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=22753 2023-05-01 08:53:00 - progress_bar.py[line:274] - INFO: epoch 001: 5600 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7837.5, nsentences=120, sample_size=3832.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2011.8, ups=0.26, wpb=7837.5, bsz=120, num_updates=5590, lr=2.89621e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=39, gb_free=28.4, wall=22792 2023-05-01 08:53:39 - progress_bar.py[line:274] - INFO: epoch 001: 5610 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7651.3, nsentences=120, sample_size=4359.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1958.4, ups=0.26, wpb=7651.3, bsz=120, num_updates=5600, lr=2.89568e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=22831 2023-05-01 08:54:19 - progress_bar.py[line:274] - INFO: epoch 001: 5620 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7821.7, nsentences=120, sample_size=3762.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1939.5, ups=0.25, wpb=7821.7, bsz=120, num_updates=5610, lr=2.89515e-05, gnorm=0.919, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=22872 2023-05-01 08:54:58 - progress_bar.py[line:274] - INFO: epoch 001: 5630 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7876.3, nsentences=120, sample_size=3874.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2021.6, ups=0.26, wpb=7876.3, bsz=120, num_updates=5620, lr=2.89462e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=31.6, wall=22911 2023-05-01 08:55:37 - progress_bar.py[line:274] - INFO: epoch 001: 5640 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7387.2, nsentences=120, sample_size=4182.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1903.5, ups=0.26, wpb=7387.2, bsz=120, num_updates=5630, lr=2.89409e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=22950 2023-05-01 08:56:18 - progress_bar.py[line:274] - INFO: epoch 001: 5650 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7750.5, nsentences=120, sample_size=4112.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1886.5, ups=0.24, wpb=7750.5, bsz=120, num_updates=5640, lr=2.89356e-05, gnorm=0.904, clip=20, loss_scale=64, train_wall=41, gb_free=30.6, wall=22991 2023-05-01 08:56:57 - progress_bar.py[line:274] - INFO: epoch 001: 5660 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7535.2, nsentences=120, sample_size=4049, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1934.3, ups=0.26, wpb=7535.2, bsz=120, num_updates=5650, lr=2.89304e-05, gnorm=0.898, clip=0, loss_scale=64, train_wall=39, gb_free=30.9, wall=23030 2023-05-01 08:57:37 - progress_bar.py[line:274] - INFO: epoch 001: 5670 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7711.5, nsentences=120, sample_size=3981.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1943.4, ups=0.25, wpb=7711.5, bsz=120, num_updates=5660, lr=2.89251e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=23069 2023-05-01 08:58:17 - progress_bar.py[line:274] - INFO: epoch 001: 5680 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7475, nsentences=120, sample_size=4030.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1872.8, ups=0.25, wpb=7475, bsz=120, num_updates=5670, lr=2.89198e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=23109 2023-05-01 08:58:57 - progress_bar.py[line:274] - INFO: epoch 001: 5690 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8120.6, nsentences=120, sample_size=4091.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2021.9, ups=0.25, wpb=8120.6, bsz=120, num_updates=5680, lr=2.89145e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=23149 2023-05-01 08:59:38 - progress_bar.py[line:274] - INFO: epoch 001: 5700 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7654, nsentences=120, sample_size=4241.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1882.9, ups=0.25, wpb=7654, bsz=120, num_updates=5690, lr=2.89092e-05, gnorm=0.895, clip=0, loss_scale=64, train_wall=41, gb_free=31.2, wall=23190 2023-05-01 09:00:18 - progress_bar.py[line:274] - INFO: epoch 001: 5710 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7788.3, nsentences=120, sample_size=4134, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1925.6, ups=0.25, wpb=7788.3, bsz=120, num_updates=5700, lr=2.8904e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=23230 2023-05-01 09:00:58 - progress_bar.py[line:274] - INFO: epoch 001: 5720 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7978.3, nsentences=120, sample_size=4392.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2003.8, ups=0.25, wpb=7978.3, bsz=120, num_updates=5710, lr=2.88987e-05, gnorm=0.851, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=23270 2023-05-01 09:01:38 - progress_bar.py[line:274] - INFO: epoch 001: 5730 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7686.9, nsentences=120, sample_size=3969.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1931, ups=0.25, wpb=7686.9, bsz=120, num_updates=5720, lr=2.88934e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=23310 2023-05-01 09:02:17 - progress_bar.py[line:274] - INFO: epoch 001: 5740 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7644, nsentences=120, sample_size=4268.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1926.1, ups=0.25, wpb=7644, bsz=120, num_updates=5730, lr=2.88881e-05, gnorm=0.885, clip=0, loss_scale=64, train_wall=40, gb_free=29.2, wall=23350 2023-05-01 09:02:58 - progress_bar.py[line:274] - INFO: epoch 001: 5750 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7750.2, nsentences=120, sample_size=4129.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927.8, ups=0.25, wpb=7750.2, bsz=120, num_updates=5740, lr=2.88828e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=23390 2023-05-01 09:03:38 - progress_bar.py[line:274] - INFO: epoch 001: 5760 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7557.3, nsentences=120, sample_size=3996.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1885.3, ups=0.25, wpb=7557.3, bsz=120, num_updates=5750, lr=2.88775e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=23430 2023-05-01 09:04:17 - progress_bar.py[line:274] - INFO: epoch 001: 5770 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7586, nsentences=120, sample_size=4288.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1915.9, ups=0.25, wpb=7586, bsz=120, num_updates=5760, lr=2.88723e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=23470 2023-05-01 09:04:57 - progress_bar.py[line:274] - INFO: epoch 001: 5780 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7594.9, nsentences=120, sample_size=4552.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1923.1, ups=0.25, wpb=7594.9, bsz=120, num_updates=5770, lr=2.8867e-05, gnorm=0.883, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=23509 2023-05-01 09:05:37 - progress_bar.py[line:274] - INFO: epoch 001: 5790 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7980.3, nsentences=120, sample_size=4092.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1996.8, ups=0.25, wpb=7980.3, bsz=120, num_updates=5780, lr=2.88617e-05, gnorm=0.895, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=23549 2023-05-01 09:06:15 - progress_bar.py[line:274] - INFO: epoch 001: 5800 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7715.8, nsentences=120, sample_size=4229.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1996.9, ups=0.26, wpb=7715.8, bsz=120, num_updates=5790, lr=2.88564e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=39, gb_free=30.4, wall=23588 2023-05-01 09:06:31 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 09:06:58 - progress_bar.py[line:274] - INFO: epoch 001: 5811 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7975.4, nsentences=120, sample_size=4090.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1851.7, ups=0.23, wpb=7975.4, bsz=120, num_updates=5800, lr=2.88511e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=43, gb_free=27.7, wall=23631 2023-05-01 09:07:38 - progress_bar.py[line:274] - INFO: epoch 001: 5821 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7584.9, nsentences=120, sample_size=4038.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1932.2, ups=0.25, wpb=7584.9, bsz=120, num_updates=5810, lr=2.88458e-05, gnorm=0.914, clip=0, loss_scale=32, train_wall=39, gb_free=29.7, wall=23670 2023-05-01 09:08:17 - progress_bar.py[line:274] - INFO: epoch 001: 5831 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7491.7, nsentences=120, sample_size=4005.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1887.2, ups=0.25, wpb=7491.7, bsz=120, num_updates=5820, lr=2.88406e-05, gnorm=0.9, clip=10, loss_scale=32, train_wall=40, gb_free=30.3, wall=23710 2023-05-01 09:08:57 - progress_bar.py[line:274] - INFO: epoch 001: 5841 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7778.3, nsentences=120, sample_size=3939.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1956.4, ups=0.25, wpb=7778.3, bsz=120, num_updates=5830, lr=2.88353e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=23750 2023-05-01 09:09:37 - progress_bar.py[line:274] - INFO: epoch 001: 5851 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7557.1, nsentences=120, sample_size=4023.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1911.4, ups=0.25, wpb=7557.1, bsz=120, num_updates=5840, lr=2.883e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=23789 2023-05-01 09:10:16 - progress_bar.py[line:274] - INFO: epoch 001: 5861 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7590.9, nsentences=120, sample_size=4072.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1933.3, ups=0.25, wpb=7590.9, bsz=120, num_updates=5850, lr=2.88247e-05, gnorm=0.922, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=23828 2023-05-01 09:10:57 - progress_bar.py[line:274] - INFO: epoch 001: 5871 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=8039.2, nsentences=120, sample_size=4368, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1975.5, ups=0.25, wpb=8039.2, bsz=120, num_updates=5860, lr=2.88194e-05, gnorm=0.884, clip=0, loss_scale=32, train_wall=41, gb_free=29.4, wall=23869 2023-05-01 09:11:36 - progress_bar.py[line:274] - INFO: epoch 001: 5881 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7694.7, nsentences=120, sample_size=4450.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1934.8, ups=0.25, wpb=7694.7, bsz=120, num_updates=5870, lr=2.88142e-05, gnorm=0.861, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=23909 2023-05-01 09:12:16 - progress_bar.py[line:274] - INFO: epoch 001: 5891 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7322, nsentences=120, sample_size=4084.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1825.1, ups=0.25, wpb=7322, bsz=120, num_updates=5880, lr=2.88089e-05, gnorm=0.892, clip=0, loss_scale=32, train_wall=40, gb_free=28.8, wall=23949 2023-05-01 09:12:57 - progress_bar.py[line:274] - INFO: epoch 001: 5901 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7655.9, nsentences=120, sample_size=4058.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1889.2, ups=0.25, wpb=7655.9, bsz=120, num_updates=5890, lr=2.88036e-05, gnorm=0.922, clip=10, loss_scale=32, train_wall=40, gb_free=29.3, wall=23989 2023-05-01 09:13:37 - progress_bar.py[line:274] - INFO: epoch 001: 5911 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7659.7, nsentences=120, sample_size=4259.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1911.4, ups=0.25, wpb=7659.7, bsz=120, num_updates=5900, lr=2.87983e-05, gnorm=0.911, clip=0, loss_scale=32, train_wall=40, gb_free=28.7, wall=24030 2023-05-01 09:14:17 - progress_bar.py[line:274] - INFO: epoch 001: 5921 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7759.2, nsentences=120, sample_size=3991.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1953.3, ups=0.25, wpb=7759.2, bsz=120, num_updates=5910, lr=2.8793e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=24069 2023-05-01 09:14:58 - progress_bar.py[line:274] - INFO: epoch 001: 5931 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7808.4, nsentences=120, sample_size=3922.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1905.8, ups=0.24, wpb=7808.4, bsz=120, num_updates=5920, lr=2.87877e-05, gnorm=0.931, clip=0, loss_scale=32, train_wall=41, gb_free=30.1, wall=24110 2023-05-01 09:15:39 - progress_bar.py[line:274] - INFO: epoch 001: 5941 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=8129.7, nsentences=120, sample_size=3610.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1993.3, ups=0.25, wpb=8129.7, bsz=120, num_updates=5930, lr=2.87825e-05, gnorm=0.998, clip=30, loss_scale=32, train_wall=41, gb_free=31.4, wall=24151 2023-05-01 09:16:19 - progress_bar.py[line:274] - INFO: epoch 001: 5951 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7671.1, nsentences=120, sample_size=3962.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1894.4, ups=0.25, wpb=7671.1, bsz=120, num_updates=5940, lr=2.87772e-05, gnorm=0.966, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=24192 2023-05-01 09:16:59 - progress_bar.py[line:274] - INFO: epoch 001: 5961 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7800, nsentences=120, sample_size=4172.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1960.1, ups=0.25, wpb=7800, bsz=120, num_updates=5950, lr=2.87719e-05, gnorm=0.921, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=24231 2023-05-01 09:17:39 - progress_bar.py[line:274] - INFO: epoch 001: 5971 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7825.2, nsentences=120, sample_size=3915.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1961.6, ups=0.25, wpb=7825.2, bsz=120, num_updates=5960, lr=2.87666e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=26.8, wall=24271 2023-05-01 09:18:18 - progress_bar.py[line:274] - INFO: epoch 001: 5981 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7821.8, nsentences=120, sample_size=4070.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1969.5, ups=0.25, wpb=7821.8, bsz=120, num_updates=5970, lr=2.87613e-05, gnorm=0.93, clip=20, loss_scale=32, train_wall=40, gb_free=30.8, wall=24311 2023-05-01 09:18:58 - progress_bar.py[line:274] - INFO: epoch 001: 5991 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7712.9, nsentences=120, sample_size=4027.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1938.7, ups=0.25, wpb=7712.9, bsz=120, num_updates=5980, lr=2.87561e-05, gnorm=0.932, clip=20, loss_scale=32, train_wall=40, gb_free=28.8, wall=24351 2023-05-01 09:19:38 - progress_bar.py[line:274] - INFO: epoch 001: 6001 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7824.1, nsentences=120, sample_size=3827.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1988.6, ups=0.25, wpb=7824.1, bsz=120, num_updates=5990, lr=2.87508e-05, gnorm=0.963, clip=10, loss_scale=32, train_wall=39, gb_free=29.8, wall=24390 2023-05-01 09:20:18 - progress_bar.py[line:274] - INFO: epoch 001: 6011 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7688.3, nsentences=120, sample_size=3867.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1903.9, ups=0.25, wpb=7688.3, bsz=120, num_updates=6000, lr=2.87455e-05, gnorm=0.902, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=24430 2023-05-01 09:20:18 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 09:20:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:20:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:20:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:37 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:20:37 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:20:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:49 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:20:49 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:20:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:20:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:20:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:00 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:21:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:21:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:04 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the first one from the left? 2023-05-01 09:21:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:21:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:21:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:21:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:21:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:21:09 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.23 | loss_v1 0 | loss_v2 0 | nll_loss 2.06 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.17 | score 0.751 | wps 3285.8 | wpb 3202.1 | bsz 39.4 | num_updates 6000 | best_score 0.751 2023-05-01 09:21:09 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 6000 updates 2023-05-01 09:21:09 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_6000.pt 2023-05-01 09:21:33 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_6000.pt 2023-05-01 09:22:14 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_1_6000.pt (epoch 1 @ 6000 updates, score 0.751) (writing took 64.77220738097094 seconds) 2023-05-01 09:22:53 - progress_bar.py[line:274] - INFO: epoch 001: 6021 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7881.7, nsentences=120, sample_size=4343.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=508.3, ups=0.06, wpb=7881.7, bsz=120, num_updates=6010, lr=2.87402e-05, gnorm=0.885, clip=0, loss_scale=32, train_wall=39, gb_free=29.2, wall=24586 2023-05-01 09:23:32 - progress_bar.py[line:274] - INFO: epoch 001: 6031 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7883.4, nsentences=120, sample_size=4110.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2000.2, ups=0.25, wpb=7883.4, bsz=120, num_updates=6020, lr=2.87349e-05, gnorm=0.907, clip=10, loss_scale=32, train_wall=39, gb_free=30.4, wall=24625 2023-05-01 09:24:13 - progress_bar.py[line:274] - INFO: epoch 001: 6041 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7891, nsentences=120, sample_size=3936.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1964.9, ups=0.25, wpb=7891, bsz=120, num_updates=6030, lr=2.87296e-05, gnorm=0.917, clip=0, loss_scale=32, train_wall=40, gb_free=30.8, wall=24665 2023-05-01 09:24:15 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 09:24:17 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:24:17 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:24:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:24:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:24:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:46 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:24:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:24:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:57 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:24:57 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:24:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:24:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:24:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:01 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 09:25:01 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:25:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:06 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 09:25:06 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 09:25:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 09:25:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 09:25:06 - progress_bar.py[line:282] - INFO: epoch 001 | valid on 'valid' subset | loss 3.233 | loss_v1 0 | loss_v2 0 | nll_loss 2.067 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.19 | score 0.7432 | wps 3288.5 | wpb 3202.1 | bsz 39.4 | num_updates 6031 | best_score 0.751 2023-05-01 09:25:06 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 1 @ 6031 updates 2023-05-01 09:25:06 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 09:25:32 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 09:25:33 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 1 @ 6031 updates, score 0.7432) (writing took 26.493908488890156 seconds) 2023-05-01 09:25:33 - train.py[line:332] - INFO: end of epoch 1 (average epoch stats below) 2023-05-01 09:25:33 - progress_bar.py[line:282] - INFO: epoch 001 | loss 2.465 | loss_v1 0 | loss_v2 0 | nll_loss 1.217 | ntokens 7727.15 | nsentences 119.992 | sample_size 4037.11 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.32 | wps 1886.5 | ups 0.24 | wpb 7727.2 | bsz 120 | num_updates 6031 | lr 2.87291e-05 | gnorm 0.915 | clip 10 | loss_scale 32 | train_wall 23996 | gb_free 31 | wall 24745 2023-05-01 09:25:33 - trainer.py[line:639] - INFO: loading train data for epoch 2 2023-05-01 09:25:33 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-01 09:25:33 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-01 09:25:34 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-01 09:25:36 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-01 09:25:36 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-01 09:25:36 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-01 09:25:37 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-01 09:25:37 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-01 09:25:38 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-01 09:25:38 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-01 09:25:38 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-01 09:25:38 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-01 09:25:39 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-01 09:25:39 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-01 09:25:39 - trainer.py[line:703] - INFO: begin training epoch 2 2023-05-01 09:25:39 - train.py[line:305] - INFO: Start iterating over samples 2023-05-01 09:26:14 - progress_bar.py[line:274] - INFO: epoch 002: 9 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7351.2, nsentences=116, sample_size=3936, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=604.8, ups=0.08, wpb=7351.2, bsz=116, num_updates=6040, lr=2.87244e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=37, gb_free=30.2, wall=24787 2023-05-01 09:26:54 - progress_bar.py[line:274] - INFO: epoch 002: 19 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7696.8, nsentences=120, sample_size=4091.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1949.7, ups=0.25, wpb=7696.8, bsz=120, num_updates=6050, lr=2.87191e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=39, gb_free=29.7, wall=24826 2023-05-01 09:27:34 - progress_bar.py[line:274] - INFO: epoch 002: 29 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7519.7, nsentences=120, sample_size=3894, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1886.2, ups=0.25, wpb=7519.7, bsz=120, num_updates=6060, lr=2.87138e-05, gnorm=0.917, clip=0, loss_scale=32, train_wall=40, gb_free=29.5, wall=24866 2023-05-01 09:28:13 - progress_bar.py[line:274] - INFO: epoch 002: 39 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7754.2, nsentences=120, sample_size=3822.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1977.1, ups=0.25, wpb=7754.2, bsz=120, num_updates=6070, lr=2.87085e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=39, gb_free=30.3, wall=24905 2023-05-01 09:28:52 - progress_bar.py[line:274] - INFO: epoch 002: 49 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7849.8, nsentences=120, sample_size=4106.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1997.4, ups=0.25, wpb=7849.8, bsz=120, num_updates=6080, lr=2.87032e-05, gnorm=0.917, clip=10, loss_scale=32, train_wall=39, gb_free=29, wall=24944 2023-05-01 09:29:31 - progress_bar.py[line:274] - INFO: epoch 002: 59 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7667.9, nsentences=120, sample_size=4159.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1960.5, ups=0.26, wpb=7667.9, bsz=120, num_updates=6090, lr=2.86979e-05, gnorm=0.871, clip=0, loss_scale=32, train_wall=39, gb_free=30.3, wall=24984 2023-05-01 09:30:11 - progress_bar.py[line:274] - INFO: epoch 002: 69 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=8005, nsentences=120, sample_size=4127.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1998.8, ups=0.25, wpb=8005, bsz=120, num_updates=6100, lr=2.86927e-05, gnorm=0.884, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=25024 2023-05-01 09:30:52 - progress_bar.py[line:274] - INFO: epoch 002: 79 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7638.2, nsentences=120, sample_size=3891.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1887.6, ups=0.25, wpb=7638.2, bsz=120, num_updates=6110, lr=2.86874e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=25064 2023-05-01 09:31:32 - progress_bar.py[line:274] - INFO: epoch 002: 89 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7716.2, nsentences=120, sample_size=3850.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1934.4, ups=0.25, wpb=7716.2, bsz=120, num_updates=6120, lr=2.86821e-05, gnorm=0.945, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=25104 2023-05-01 09:32:11 - progress_bar.py[line:274] - INFO: epoch 002: 99 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7876.6, nsentences=120, sample_size=4233.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1976.5, ups=0.25, wpb=7876.6, bsz=120, num_updates=6130, lr=2.86768e-05, gnorm=0.925, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=25144 2023-05-01 09:32:51 - progress_bar.py[line:274] - INFO: epoch 002: 109 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7696, nsentences=120, sample_size=3907.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1936, ups=0.25, wpb=7696, bsz=120, num_updates=6140, lr=2.86715e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=30.8, wall=25184 2023-05-01 09:33:31 - progress_bar.py[line:274] - INFO: epoch 002: 119 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7865.5, nsentences=120, sample_size=4099.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1990.8, ups=0.25, wpb=7865.5, bsz=120, num_updates=6150, lr=2.86663e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=39, gb_free=29.8, wall=25223 2023-05-01 09:34:10 - progress_bar.py[line:274] - INFO: epoch 002: 129 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7710.5, nsentences=120, sample_size=4081.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1965.5, ups=0.25, wpb=7710.5, bsz=120, num_updates=6160, lr=2.8661e-05, gnorm=0.899, clip=0, loss_scale=32, train_wall=39, gb_free=28.7, wall=25262 2023-05-01 09:34:50 - progress_bar.py[line:274] - INFO: epoch 002: 139 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7536, nsentences=120, sample_size=4241.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1896.6, ups=0.25, wpb=7536, bsz=120, num_updates=6170, lr=2.86557e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=40, gb_free=29.6, wall=25302 2023-05-01 09:35:30 - progress_bar.py[line:274] - INFO: epoch 002: 149 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7744.5, nsentences=120, sample_size=4329.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1931, ups=0.25, wpb=7744.5, bsz=120, num_updates=6180, lr=2.86504e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=40, gb_free=29.3, wall=25342 2023-05-01 09:36:10 - progress_bar.py[line:274] - INFO: epoch 002: 159 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7609.6, nsentences=120, sample_size=4298.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1903.7, ups=0.25, wpb=7609.6, bsz=120, num_updates=6190, lr=2.86451e-05, gnorm=0.872, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=25382 2023-05-01 09:36:49 - progress_bar.py[line:274] - INFO: epoch 002: 169 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7887.1, nsentences=120, sample_size=3931.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2018.3, ups=0.26, wpb=7887.1, bsz=120, num_updates=6200, lr=2.86398e-05, gnorm=0.927, clip=0, loss_scale=32, train_wall=39, gb_free=29.7, wall=25421 2023-05-01 09:37:28 - progress_bar.py[line:274] - INFO: epoch 002: 179 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7526.6, nsentences=120, sample_size=3872.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1918.3, ups=0.25, wpb=7526.6, bsz=120, num_updates=6210, lr=2.86346e-05, gnorm=0.909, clip=0, loss_scale=32, train_wall=39, gb_free=31.5, wall=25460 2023-05-01 09:38:08 - progress_bar.py[line:274] - INFO: epoch 002: 189 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7746.3, nsentences=120, sample_size=4151.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929, ups=0.25, wpb=7746.3, bsz=120, num_updates=6220, lr=2.86293e-05, gnorm=0.905, clip=10, loss_scale=32, train_wall=40, gb_free=28.6, wall=25501 2023-05-01 09:38:47 - progress_bar.py[line:274] - INFO: epoch 002: 199 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7642, nsentences=120, sample_size=4133.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1965.7, ups=0.26, wpb=7642, bsz=120, num_updates=6230, lr=2.8624e-05, gnorm=0.893, clip=0, loss_scale=32, train_wall=39, gb_free=30.5, wall=25540 2023-05-01 09:39:27 - progress_bar.py[line:274] - INFO: epoch 002: 209 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7931.9, nsentences=120, sample_size=4278.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1987.3, ups=0.25, wpb=7931.9, bsz=120, num_updates=6240, lr=2.86187e-05, gnorm=0.888, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=25579 2023-05-01 09:40:07 - progress_bar.py[line:274] - INFO: epoch 002: 219 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7754.6, nsentences=120, sample_size=4109.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1924.3, ups=0.25, wpb=7754.6, bsz=120, num_updates=6250, lr=2.86134e-05, gnorm=0.913, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=25620 2023-05-01 09:40:48 - progress_bar.py[line:274] - INFO: epoch 002: 229 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7876.7, nsentences=120, sample_size=4253, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1936.2, ups=0.25, wpb=7876.7, bsz=120, num_updates=6260, lr=2.86082e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=41, gb_free=29, wall=25660 2023-05-01 09:41:29 - progress_bar.py[line:274] - INFO: epoch 002: 239 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7916, nsentences=120, sample_size=4077.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1947, ups=0.25, wpb=7916, bsz=120, num_updates=6270, lr=2.86029e-05, gnorm=0.908, clip=10, loss_scale=32, train_wall=41, gb_free=29.2, wall=25701 2023-05-01 09:42:09 - progress_bar.py[line:274] - INFO: epoch 002: 249 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7842.3, nsentences=120, sample_size=4093.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1949.9, ups=0.25, wpb=7842.3, bsz=120, num_updates=6280, lr=2.85976e-05, gnorm=0.907, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=25741 2023-05-01 09:42:49 - progress_bar.py[line:274] - INFO: epoch 002: 259 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7941.9, nsentences=120, sample_size=3767.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973, ups=0.25, wpb=7941.9, bsz=120, num_updates=6290, lr=2.85923e-05, gnorm=0.926, clip=10, loss_scale=32, train_wall=40, gb_free=30.1, wall=25782 2023-05-01 09:43:29 - progress_bar.py[line:274] - INFO: epoch 002: 269 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7856.5, nsentences=120, sample_size=4259.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1985.2, ups=0.25, wpb=7856.5, bsz=120, num_updates=6300, lr=2.8587e-05, gnorm=0.904, clip=10, loss_scale=32, train_wall=40, gb_free=26.4, wall=25821 2023-05-01 09:44:09 - progress_bar.py[line:274] - INFO: epoch 002: 279 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7745.6, nsentences=120, sample_size=3969, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1931.3, ups=0.25, wpb=7745.6, bsz=120, num_updates=6310, lr=2.85817e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=25861 2023-05-01 09:44:48 - progress_bar.py[line:274] - INFO: epoch 002: 289 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7370.4, nsentences=120, sample_size=4004.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1863.3, ups=0.25, wpb=7370.4, bsz=120, num_updates=6320, lr=2.85765e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=25901 2023-05-01 09:45:28 - progress_bar.py[line:274] - INFO: epoch 002: 299 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7813.3, nsentences=120, sample_size=4101.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1951, ups=0.25, wpb=7813.3, bsz=120, num_updates=6330, lr=2.85712e-05, gnorm=0.909, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=25941 2023-05-01 09:46:08 - progress_bar.py[line:274] - INFO: epoch 002: 309 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7560.5, nsentences=120, sample_size=3658.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1922.9, ups=0.25, wpb=7560.5, bsz=120, num_updates=6340, lr=2.85659e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=25980 2023-05-01 09:46:48 - progress_bar.py[line:274] - INFO: epoch 002: 319 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7595, nsentences=120, sample_size=4279.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1887.6, ups=0.25, wpb=7595, bsz=120, num_updates=6350, lr=2.85606e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=26020 2023-05-01 09:47:28 - progress_bar.py[line:274] - INFO: epoch 002: 329 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7963.7, nsentences=120, sample_size=3901.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2007.3, ups=0.25, wpb=7963.7, bsz=120, num_updates=6360, lr=2.85553e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=26060 2023-05-01 09:48:08 - progress_bar.py[line:274] - INFO: epoch 002: 339 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7813, nsentences=120, sample_size=3793.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1956.1, ups=0.25, wpb=7813, bsz=120, num_updates=6370, lr=2.855e-05, gnorm=0.942, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=26100 2023-05-01 09:48:47 - progress_bar.py[line:274] - INFO: epoch 002: 349 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7766.6, nsentences=120, sample_size=3937.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1970.9, ups=0.25, wpb=7766.6, bsz=120, num_updates=6380, lr=2.85448e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=26139 2023-05-01 09:49:27 - progress_bar.py[line:274] - INFO: epoch 002: 359 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7797.6, nsentences=120, sample_size=3990.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1928.6, ups=0.25, wpb=7797.6, bsz=120, num_updates=6390, lr=2.85395e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=26180 2023-05-01 09:50:07 - progress_bar.py[line:274] - INFO: epoch 002: 369 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7728.2, nsentences=120, sample_size=4015.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1933.1, ups=0.25, wpb=7728.2, bsz=120, num_updates=6400, lr=2.85342e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=26220 2023-05-01 09:50:48 - progress_bar.py[line:274] - INFO: epoch 002: 379 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7601, nsentences=120, sample_size=3809.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1883.2, ups=0.25, wpb=7601, bsz=120, num_updates=6410, lr=2.85289e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=31.6, wall=26260 2023-05-01 09:51:28 - progress_bar.py[line:274] - INFO: epoch 002: 389 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7753.2, nsentences=120, sample_size=4147.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1950, ups=0.25, wpb=7753.2, bsz=120, num_updates=6420, lr=2.85236e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=26300 2023-05-01 09:52:07 - progress_bar.py[line:274] - INFO: epoch 002: 399 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7684.5, nsentences=120, sample_size=3875, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1963.6, ups=0.26, wpb=7684.5, bsz=120, num_updates=6430, lr=2.85184e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=26339 2023-05-01 09:52:46 - progress_bar.py[line:274] - INFO: epoch 002: 409 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7794.7, nsentences=120, sample_size=4308.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1959.8, ups=0.25, wpb=7794.7, bsz=120, num_updates=6440, lr=2.85131e-05, gnorm=0.885, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=26379 2023-05-01 09:53:26 - progress_bar.py[line:274] - INFO: epoch 002: 419 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7584.8, nsentences=120, sample_size=4230.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1898.1, ups=0.25, wpb=7584.8, bsz=120, num_updates=6450, lr=2.85078e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=26419 2023-05-01 09:54:06 - progress_bar.py[line:274] - INFO: epoch 002: 429 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7489.3, nsentences=120, sample_size=4048.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1869.6, ups=0.25, wpb=7489.3, bsz=120, num_updates=6460, lr=2.85025e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=26459 2023-05-01 09:54:46 - progress_bar.py[line:274] - INFO: epoch 002: 439 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7840.4, nsentences=120, sample_size=4006.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1970, ups=0.25, wpb=7840.4, bsz=120, num_updates=6470, lr=2.84972e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=26499 2023-05-01 09:55:26 - progress_bar.py[line:274] - INFO: epoch 002: 449 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7812, nsentences=120, sample_size=4080.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1945, ups=0.25, wpb=7812, bsz=120, num_updates=6480, lr=2.84919e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=26539 2023-05-01 09:56:06 - progress_bar.py[line:274] - INFO: epoch 002: 459 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7444.9, nsentences=120, sample_size=3943.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1899.2, ups=0.26, wpb=7444.9, bsz=120, num_updates=6490, lr=2.84867e-05, gnorm=0.929, clip=0, loss_scale=64, train_wall=39, gb_free=30.4, wall=26578 2023-05-01 09:56:46 - progress_bar.py[line:274] - INFO: epoch 002: 469 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7923.9, nsentences=120, sample_size=3701.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1947.5, ups=0.25, wpb=7923.9, bsz=120, num_updates=6500, lr=2.84814e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=41, gb_free=30, wall=26619 2023-05-01 09:57:27 - progress_bar.py[line:274] - INFO: epoch 002: 479 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7865.1, nsentences=120, sample_size=4169.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1945.6, ups=0.25, wpb=7865.1, bsz=120, num_updates=6510, lr=2.84761e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=26659 2023-05-01 09:58:07 - progress_bar.py[line:274] - INFO: epoch 002: 489 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7559.1, nsentences=120, sample_size=4240.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1865.8, ups=0.25, wpb=7559.1, bsz=120, num_updates=6520, lr=2.84708e-05, gnorm=0.927, clip=30, loss_scale=64, train_wall=40, gb_free=28.1, wall=26700 2023-05-01 09:58:48 - progress_bar.py[line:274] - INFO: epoch 002: 499 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7907.3, nsentences=120, sample_size=3866.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1945.2, ups=0.25, wpb=7907.3, bsz=120, num_updates=6530, lr=2.84655e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=41, gb_free=30.4, wall=26740 2023-05-01 09:59:16 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 09:59:33 - progress_bar.py[line:274] - INFO: epoch 002: 510 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7860.4, nsentences=120, sample_size=4284.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1761.4, ups=0.22, wpb=7860.4, bsz=120, num_updates=6540, lr=2.84603e-05, gnorm=0.87, clip=0, loss_scale=32, train_wall=45, gb_free=28.9, wall=26785 2023-05-01 10:00:13 - progress_bar.py[line:274] - INFO: epoch 002: 520 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7885.3, nsentences=120, sample_size=4132.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.3, ups=0.25, wpb=7885.3, bsz=120, num_updates=6550, lr=2.8455e-05, gnorm=0.903, clip=0, loss_scale=32, train_wall=40, gb_free=31, wall=26825 2023-05-01 10:00:52 - progress_bar.py[line:274] - INFO: epoch 002: 530 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7713, nsentences=120, sample_size=4016.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1943.2, ups=0.25, wpb=7713, bsz=120, num_updates=6560, lr=2.84497e-05, gnorm=0.928, clip=0, loss_scale=32, train_wall=40, gb_free=30.9, wall=26865 2023-05-01 10:01:32 - progress_bar.py[line:274] - INFO: epoch 002: 540 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7533.4, nsentences=120, sample_size=4010.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1892.3, ups=0.25, wpb=7533.4, bsz=120, num_updates=6570, lr=2.84444e-05, gnorm=0.911, clip=0, loss_scale=32, train_wall=40, gb_free=30.7, wall=26905 2023-05-01 10:02:11 - progress_bar.py[line:274] - INFO: epoch 002: 550 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7704.3, nsentences=120, sample_size=3907.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1967.4, ups=0.26, wpb=7704.3, bsz=120, num_updates=6580, lr=2.84391e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=39, gb_free=30.7, wall=26944 2023-05-01 10:02:51 - progress_bar.py[line:274] - INFO: epoch 002: 560 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7881.1, nsentences=120, sample_size=3953.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1979.4, ups=0.25, wpb=7881.1, bsz=120, num_updates=6590, lr=2.84338e-05, gnorm=0.946, clip=30, loss_scale=32, train_wall=40, gb_free=30.2, wall=26984 2023-05-01 10:03:31 - progress_bar.py[line:274] - INFO: epoch 002: 570 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7962.6, nsentences=120, sample_size=4057.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2012.3, ups=0.25, wpb=7962.6, bsz=120, num_updates=6600, lr=2.84286e-05, gnorm=0.903, clip=0, loss_scale=32, train_wall=39, gb_free=28.3, wall=27023 2023-05-01 10:04:11 - progress_bar.py[line:274] - INFO: epoch 002: 580 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7913, nsentences=120, sample_size=3772, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1982.6, ups=0.25, wpb=7913, bsz=120, num_updates=6610, lr=2.84233e-05, gnorm=0.992, clip=50, loss_scale=32, train_wall=40, gb_free=30.1, wall=27063 2023-05-01 10:04:50 - progress_bar.py[line:274] - INFO: epoch 002: 590 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7585.6, nsentences=120, sample_size=3862.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1913.3, ups=0.25, wpb=7585.6, bsz=120, num_updates=6620, lr=2.8418e-05, gnorm=0.923, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=27103 2023-05-01 10:05:30 - progress_bar.py[line:274] - INFO: epoch 002: 600 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7964, nsentences=120, sample_size=3659.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1984.8, ups=0.25, wpb=7964, bsz=120, num_updates=6630, lr=2.84127e-05, gnorm=0.953, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=27143 2023-05-01 10:06:10 - progress_bar.py[line:274] - INFO: epoch 002: 610 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7443.8, nsentences=120, sample_size=3940.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1891.8, ups=0.25, wpb=7443.8, bsz=120, num_updates=6640, lr=2.84074e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=39, gb_free=28.8, wall=27182 2023-05-01 10:06:50 - progress_bar.py[line:274] - INFO: epoch 002: 620 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7811.9, nsentences=120, sample_size=4260.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1933.4, ups=0.25, wpb=7811.9, bsz=120, num_updates=6650, lr=2.84021e-05, gnorm=0.905, clip=10, loss_scale=32, train_wall=40, gb_free=27.8, wall=27223 2023-05-01 10:07:30 - progress_bar.py[line:274] - INFO: epoch 002: 630 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7437.9, nsentences=120, sample_size=4125.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1859.7, ups=0.25, wpb=7437.9, bsz=120, num_updates=6660, lr=2.83969e-05, gnorm=0.904, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=27263 2023-05-01 10:08:11 - progress_bar.py[line:274] - INFO: epoch 002: 640 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=8010.8, nsentences=120, sample_size=4031.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1981.6, ups=0.25, wpb=8010.8, bsz=120, num_updates=6670, lr=2.83916e-05, gnorm=0.876, clip=0, loss_scale=32, train_wall=40, gb_free=29.2, wall=27303 2023-05-01 10:08:50 - progress_bar.py[line:274] - INFO: epoch 002: 650 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7648.9, nsentences=120, sample_size=4225.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1929.4, ups=0.25, wpb=7648.9, bsz=120, num_updates=6680, lr=2.83863e-05, gnorm=0.93, clip=20, loss_scale=32, train_wall=40, gb_free=29, wall=27343 2023-05-01 10:09:30 - progress_bar.py[line:274] - INFO: epoch 002: 660 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7841.2, nsentences=120, sample_size=4151.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1964.4, ups=0.25, wpb=7841.2, bsz=120, num_updates=6690, lr=2.8381e-05, gnorm=0.926, clip=20, loss_scale=32, train_wall=40, gb_free=31.1, wall=27383 2023-05-01 10:10:10 - progress_bar.py[line:274] - INFO: epoch 002: 670 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7482.3, nsentences=120, sample_size=4218.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1870.5, ups=0.25, wpb=7482.3, bsz=120, num_updates=6700, lr=2.83757e-05, gnorm=0.901, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=27423 2023-05-01 10:10:49 - progress_bar.py[line:274] - INFO: epoch 002: 680 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7528.7, nsentences=120, sample_size=3969.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1945.4, ups=0.26, wpb=7528.7, bsz=120, num_updates=6710, lr=2.83705e-05, gnorm=0.9, clip=10, loss_scale=32, train_wall=39, gb_free=31.5, wall=27461 2023-05-01 10:11:28 - progress_bar.py[line:274] - INFO: epoch 002: 690 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7594.1, nsentences=120, sample_size=4039.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1918.5, ups=0.25, wpb=7594.1, bsz=120, num_updates=6720, lr=2.83652e-05, gnorm=0.921, clip=10, loss_scale=32, train_wall=40, gb_free=27.9, wall=27501 2023-05-01 10:12:08 - progress_bar.py[line:274] - INFO: epoch 002: 700 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7599, nsentences=120, sample_size=4004.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1917.8, ups=0.25, wpb=7599, bsz=120, num_updates=6730, lr=2.83599e-05, gnorm=0.922, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=27540 2023-05-01 10:12:48 - progress_bar.py[line:274] - INFO: epoch 002: 710 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7783.9, nsentences=120, sample_size=3960.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1963, ups=0.25, wpb=7783.9, bsz=120, num_updates=6740, lr=2.83546e-05, gnorm=0.954, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=27580 2023-05-01 10:13:27 - progress_bar.py[line:274] - INFO: epoch 002: 720 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7285.6, nsentences=120, sample_size=4062.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1846.9, ups=0.25, wpb=7285.6, bsz=120, num_updates=6750, lr=2.83493e-05, gnorm=0.969, clip=30, loss_scale=32, train_wall=39, gb_free=30.8, wall=27620 2023-05-01 10:14:07 - progress_bar.py[line:274] - INFO: epoch 002: 730 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7906.3, nsentences=120, sample_size=3725.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2004.2, ups=0.25, wpb=7906.3, bsz=120, num_updates=6760, lr=2.8344e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=39, gb_free=29.1, wall=27659 2023-05-01 10:14:46 - progress_bar.py[line:274] - INFO: epoch 002: 740 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7480.3, nsentences=120, sample_size=4142.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1884.2, ups=0.25, wpb=7480.3, bsz=120, num_updates=6770, lr=2.83388e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=27699 2023-05-01 10:15:26 - progress_bar.py[line:274] - INFO: epoch 002: 750 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7733.8, nsentences=120, sample_size=3829.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1953.7, ups=0.25, wpb=7733.8, bsz=120, num_updates=6780, lr=2.83335e-05, gnorm=0.926, clip=10, loss_scale=32, train_wall=40, gb_free=28.8, wall=27738 2023-05-01 10:16:06 - progress_bar.py[line:274] - INFO: epoch 002: 760 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7673.6, nsentences=120, sample_size=4179.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1904, ups=0.25, wpb=7673.6, bsz=120, num_updates=6790, lr=2.83282e-05, gnorm=0.9, clip=10, loss_scale=32, train_wall=40, gb_free=31, wall=27779 2023-05-01 10:16:45 - progress_bar.py[line:274] - INFO: epoch 002: 770 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7488.5, nsentences=120, sample_size=3952.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1910.9, ups=0.26, wpb=7488.5, bsz=120, num_updates=6800, lr=2.83229e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=39, gb_free=28, wall=27818 2023-05-01 10:17:25 - progress_bar.py[line:274] - INFO: epoch 002: 780 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7726.1, nsentences=120, sample_size=4144.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.5, ups=0.25, wpb=7726.1, bsz=120, num_updates=6810, lr=2.83176e-05, gnorm=0.958, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=27857 2023-05-01 10:18:05 - progress_bar.py[line:274] - INFO: epoch 002: 790 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7548.9, nsentences=120, sample_size=3884.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1888, ups=0.25, wpb=7548.9, bsz=120, num_updates=6820, lr=2.83124e-05, gnorm=0.944, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=27897 2023-05-01 10:18:44 - progress_bar.py[line:274] - INFO: epoch 002: 800 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7653.3, nsentences=120, sample_size=3941.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1949.7, ups=0.25, wpb=7653.3, bsz=120, num_updates=6830, lr=2.83071e-05, gnorm=0.928, clip=0, loss_scale=32, train_wall=39, gb_free=29.3, wall=27936 2023-05-01 10:19:24 - progress_bar.py[line:274] - INFO: epoch 002: 810 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7643.4, nsentences=120, sample_size=4003.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1912.4, ups=0.25, wpb=7643.4, bsz=120, num_updates=6840, lr=2.83018e-05, gnorm=0.929, clip=0, loss_scale=32, train_wall=40, gb_free=29.1, wall=27976 2023-05-01 10:20:04 - progress_bar.py[line:274] - INFO: epoch 002: 820 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7698, nsentences=120, sample_size=4080.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1923.6, ups=0.25, wpb=7698, bsz=120, num_updates=6850, lr=2.82965e-05, gnorm=0.93, clip=0, loss_scale=32, train_wall=40, gb_free=29, wall=28016 2023-05-01 10:20:44 - progress_bar.py[line:274] - INFO: epoch 002: 830 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7869, nsentences=120, sample_size=4170.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1961.9, ups=0.25, wpb=7869, bsz=120, num_updates=6860, lr=2.82912e-05, gnorm=0.896, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=28056 2023-05-01 10:21:24 - progress_bar.py[line:274] - INFO: epoch 002: 840 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7523.9, nsentences=120, sample_size=4189.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.5, ups=0.25, wpb=7523.9, bsz=120, num_updates=6870, lr=2.82859e-05, gnorm=0.901, clip=10, loss_scale=32, train_wall=40, gb_free=30.9, wall=28096 2023-05-01 10:22:04 - progress_bar.py[line:274] - INFO: epoch 002: 850 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7709.1, nsentences=120, sample_size=3798.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1918.2, ups=0.25, wpb=7709.1, bsz=120, num_updates=6880, lr=2.82807e-05, gnorm=0.943, clip=30, loss_scale=32, train_wall=40, gb_free=29.4, wall=28136 2023-05-01 10:22:44 - progress_bar.py[line:274] - INFO: epoch 002: 860 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7863.5, nsentences=120, sample_size=4057, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1973.5, ups=0.25, wpb=7863.5, bsz=120, num_updates=6890, lr=2.82754e-05, gnorm=0.94, clip=20, loss_scale=32, train_wall=40, gb_free=31, wall=28176 2023-05-01 10:23:23 - progress_bar.py[line:274] - INFO: epoch 002: 870 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7524.6, nsentences=120, sample_size=4003.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1903.8, ups=0.25, wpb=7524.6, bsz=120, num_updates=6900, lr=2.82701e-05, gnorm=0.952, clip=10, loss_scale=32, train_wall=39, gb_free=29.7, wall=28216 2023-05-01 10:24:04 - progress_bar.py[line:274] - INFO: epoch 002: 880 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7854.3, nsentences=120, sample_size=4144.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1931.3, ups=0.25, wpb=7854.3, bsz=120, num_updates=6910, lr=2.82648e-05, gnorm=0.922, clip=0, loss_scale=32, train_wall=41, gb_free=24.9, wall=28256 2023-05-01 10:24:43 - progress_bar.py[line:274] - INFO: epoch 002: 890 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7703.4, nsentences=120, sample_size=4134, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958.1, ups=0.25, wpb=7703.4, bsz=120, num_updates=6920, lr=2.82595e-05, gnorm=0.9, clip=0, loss_scale=32, train_wall=39, gb_free=29.6, wall=28296 2023-05-01 10:25:23 - progress_bar.py[line:274] - INFO: epoch 002: 900 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7607.5, nsentences=120, sample_size=3992.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1922.2, ups=0.25, wpb=7607.5, bsz=120, num_updates=6930, lr=2.82542e-05, gnorm=0.909, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=28335 2023-05-01 10:26:03 - progress_bar.py[line:274] - INFO: epoch 002: 910 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7780.1, nsentences=120, sample_size=4138.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1910.9, ups=0.25, wpb=7780.1, bsz=120, num_updates=6940, lr=2.8249e-05, gnorm=0.923, clip=20, loss_scale=32, train_wall=41, gb_free=29.3, wall=28376 2023-05-01 10:26:43 - progress_bar.py[line:274] - INFO: epoch 002: 920 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7840.7, nsentences=120, sample_size=3953.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1969.2, ups=0.25, wpb=7840.7, bsz=120, num_updates=6950, lr=2.82437e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=30.6, wall=28416 2023-05-01 10:27:23 - progress_bar.py[line:274] - INFO: epoch 002: 930 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7597.3, nsentences=120, sample_size=4107.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.5, ups=0.25, wpb=7597.3, bsz=120, num_updates=6960, lr=2.82384e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=40, gb_free=31.1, wall=28456 2023-05-01 10:28:03 - progress_bar.py[line:274] - INFO: epoch 002: 940 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7817.3, nsentences=120, sample_size=4088.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1976.3, ups=0.25, wpb=7817.3, bsz=120, num_updates=6970, lr=2.82331e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=39, gb_free=31.1, wall=28495 2023-05-01 10:28:43 - progress_bar.py[line:274] - INFO: epoch 002: 950 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7799.7, nsentences=120, sample_size=4092.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1930.6, ups=0.25, wpb=7799.7, bsz=120, num_updates=6980, lr=2.82278e-05, gnorm=0.919, clip=0, loss_scale=32, train_wall=40, gb_free=29.3, wall=28536 2023-05-01 10:29:23 - progress_bar.py[line:274] - INFO: epoch 002: 960 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7716.8, nsentences=120, sample_size=4152.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1936.9, ups=0.25, wpb=7716.8, bsz=120, num_updates=6990, lr=2.82226e-05, gnorm=0.902, clip=0, loss_scale=32, train_wall=40, gb_free=29.5, wall=28575 2023-05-01 10:30:02 - progress_bar.py[line:274] - INFO: epoch 002: 970 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7807.9, nsentences=120, sample_size=3958.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1986.2, ups=0.25, wpb=7807.9, bsz=120, num_updates=7000, lr=2.82173e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=28615 2023-05-01 10:30:02 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 10:30:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 10:30:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 10:30:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 10:30:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 10:30:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:48 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 10:30:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 10:30:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 10:30:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 10:30:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 10:30:53 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.239 | loss_v1 0 | loss_v2 0 | nll_loss 2.074 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.21 | score 0.7305 | wps 3282.2 | wpb 3202.1 | bsz 39.4 | num_updates 7000 | best_score 0.751 2023-05-01 10:30:53 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 7000 updates 2023-05-01 10:30:53 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_7000.pt 2023-05-01 10:31:18 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_7000.pt 2023-05-01 10:31:32 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_7000.pt (epoch 2 @ 7000 updates, score 0.7305) (writing took 38.61496375105344 seconds) 2023-05-01 10:32:11 - progress_bar.py[line:274] - INFO: epoch 002: 980 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7983, nsentences=120, sample_size=3977.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=619.7, ups=0.08, wpb=7983, bsz=120, num_updates=7010, lr=2.8212e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=39, gb_free=30.8, wall=28744 2023-05-01 10:32:51 - progress_bar.py[line:274] - INFO: epoch 002: 990 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7498.7, nsentences=120, sample_size=4032.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1899.6, ups=0.25, wpb=7498.7, bsz=120, num_updates=7020, lr=2.82067e-05, gnorm=0.902, clip=10, loss_scale=32, train_wall=39, gb_free=31, wall=28783 2023-05-01 10:33:32 - progress_bar.py[line:274] - INFO: epoch 002: 1000 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=8082.9, nsentences=120, sample_size=4264.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1959.8, ups=0.24, wpb=8082.9, bsz=120, num_updates=7030, lr=2.82014e-05, gnorm=0.913, clip=10, loss_scale=32, train_wall=41, gb_free=30, wall=28824 2023-05-01 10:34:12 - progress_bar.py[line:274] - INFO: epoch 002: 1010 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7856.7, nsentences=120, sample_size=4115.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1958.1, ups=0.25, wpb=7856.7, bsz=120, num_updates=7040, lr=2.81961e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=40, gb_free=30.7, wall=28864 2023-05-01 10:34:52 - progress_bar.py[line:274] - INFO: epoch 002: 1020 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7759.9, nsentences=120, sample_size=3780.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1943.1, ups=0.25, wpb=7759.9, bsz=120, num_updates=7050, lr=2.81909e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=28.9, wall=28904 2023-05-01 10:35:32 - progress_bar.py[line:274] - INFO: epoch 002: 1030 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7819.3, nsentences=120, sample_size=4234.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1939.7, ups=0.25, wpb=7819.3, bsz=120, num_updates=7060, lr=2.81856e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=28.2, wall=28945 2023-05-01 10:36:12 - progress_bar.py[line:274] - INFO: epoch 002: 1040 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7845.1, nsentences=120, sample_size=4059.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1953.6, ups=0.25, wpb=7845.1, bsz=120, num_updates=7070, lr=2.81803e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=28985 2023-05-01 10:36:53 - progress_bar.py[line:274] - INFO: epoch 002: 1050 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7502.7, nsentences=120, sample_size=4203.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1863.9, ups=0.25, wpb=7502.7, bsz=120, num_updates=7080, lr=2.8175e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=29025 2023-05-01 10:37:33 - progress_bar.py[line:274] - INFO: epoch 002: 1060 / 6042 loss=2.525, loss_v1=0, loss_v2=0, nll_loss=1.292, ntokens=7779.5, nsentences=120, sample_size=4127.6, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1947.8, ups=0.25, wpb=7779.5, bsz=120, num_updates=7090, lr=2.81697e-05, gnorm=0.932, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=29065 2023-05-01 10:38:13 - progress_bar.py[line:274] - INFO: epoch 002: 1070 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=8042.4, nsentences=120, sample_size=3886.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1984.8, ups=0.25, wpb=8042.4, bsz=120, num_updates=7100, lr=2.81645e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=29106 2023-05-01 10:38:53 - progress_bar.py[line:274] - INFO: epoch 002: 1080 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7669.8, nsentences=120, sample_size=4080.5, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1897.6, ups=0.25, wpb=7669.8, bsz=120, num_updates=7110, lr=2.81592e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=29146 2023-05-01 10:39:33 - progress_bar.py[line:274] - INFO: epoch 002: 1090 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7988.9, nsentences=120, sample_size=3923.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2017.5, ups=0.25, wpb=7988.9, bsz=120, num_updates=7120, lr=2.81539e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=29186 2023-05-01 10:40:12 - progress_bar.py[line:274] - INFO: epoch 002: 1100 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7726.6, nsentences=120, sample_size=4225, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1968.5, ups=0.25, wpb=7726.6, bsz=120, num_updates=7130, lr=2.81486e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=27.1, wall=29225 2023-05-01 10:40:52 - progress_bar.py[line:274] - INFO: epoch 002: 1110 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.281, ntokens=8028.3, nsentences=120, sample_size=4007.5, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=2027.7, ups=0.25, wpb=8028.3, bsz=120, num_updates=7140, lr=2.81433e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=29264 2023-05-01 10:41:32 - progress_bar.py[line:274] - INFO: epoch 002: 1120 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7810.9, nsentences=120, sample_size=3726.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1939.2, ups=0.25, wpb=7810.9, bsz=120, num_updates=7150, lr=2.8138e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=29305 2023-05-01 10:42:12 - progress_bar.py[line:274] - INFO: epoch 002: 1130 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7861.7, nsentences=120, sample_size=3812.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1966.5, ups=0.25, wpb=7861.7, bsz=120, num_updates=7160, lr=2.81328e-05, gnorm=0.943, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=29345 2023-05-01 10:42:52 - progress_bar.py[line:274] - INFO: epoch 002: 1140 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7959.2, nsentences=120, sample_size=3964.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1991.6, ups=0.25, wpb=7959.2, bsz=120, num_updates=7170, lr=2.81275e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=29385 2023-05-01 10:43:32 - progress_bar.py[line:274] - INFO: epoch 002: 1150 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7813.5, nsentences=120, sample_size=4031.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1976.2, ups=0.25, wpb=7813.5, bsz=120, num_updates=7180, lr=2.81222e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=29424 2023-05-01 10:44:12 - progress_bar.py[line:274] - INFO: epoch 002: 1160 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7809, nsentences=120, sample_size=3978.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1926.1, ups=0.25, wpb=7809, bsz=120, num_updates=7190, lr=2.81169e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=29465 2023-05-01 10:44:53 - progress_bar.py[line:274] - INFO: epoch 002: 1170 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7606.2, nsentences=120, sample_size=3987.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1889, ups=0.25, wpb=7606.2, bsz=120, num_updates=7200, lr=2.81116e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=29505 2023-05-01 10:45:32 - progress_bar.py[line:274] - INFO: epoch 002: 1180 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7824.3, nsentences=120, sample_size=3767.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1973.5, ups=0.25, wpb=7824.3, bsz=120, num_updates=7210, lr=2.81063e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=29545 2023-05-01 10:46:12 - progress_bar.py[line:274] - INFO: epoch 002: 1190 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7482.8, nsentences=120, sample_size=4261.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1887.1, ups=0.25, wpb=7482.8, bsz=120, num_updates=7220, lr=2.81011e-05, gnorm=0.898, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=29584 2023-05-01 10:46:51 - progress_bar.py[line:274] - INFO: epoch 002: 1200 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7569.7, nsentences=120, sample_size=4095.1, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1919.5, ups=0.25, wpb=7569.7, bsz=120, num_updates=7230, lr=2.80958e-05, gnorm=0.948, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=29624 2023-05-01 10:47:31 - progress_bar.py[line:274] - INFO: epoch 002: 1210 / 6042 loss=2.526, loss_v1=0, loss_v2=0, nll_loss=1.291, ntokens=7918, nsentences=120, sample_size=3804.4, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=2003.4, ups=0.25, wpb=7918, bsz=120, num_updates=7240, lr=2.80905e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=29663 2023-05-01 10:48:10 - progress_bar.py[line:274] - INFO: epoch 002: 1220 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7728.4, nsentences=120, sample_size=4050.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1960.6, ups=0.25, wpb=7728.4, bsz=120, num_updates=7250, lr=2.80852e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=30.8, wall=29703 2023-05-01 10:48:50 - progress_bar.py[line:274] - INFO: epoch 002: 1230 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7344.9, nsentences=120, sample_size=3955.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1862, ups=0.25, wpb=7344.9, bsz=120, num_updates=7260, lr=2.80799e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=29742 2023-05-01 10:49:30 - progress_bar.py[line:274] - INFO: epoch 002: 1240 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7891.4, nsentences=120, sample_size=3878.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1974.3, ups=0.25, wpb=7891.4, bsz=120, num_updates=7270, lr=2.80747e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=29782 2023-05-01 10:50:09 - progress_bar.py[line:274] - INFO: epoch 002: 1250 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=8084.5, nsentences=120, sample_size=3896.3, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2028.5, ups=0.25, wpb=8084.5, bsz=120, num_updates=7280, lr=2.80694e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=29822 2023-05-01 10:50:49 - progress_bar.py[line:274] - INFO: epoch 002: 1260 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.248, ntokens=7630.6, nsentences=120, sample_size=4190.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1925.9, ups=0.25, wpb=7630.6, bsz=120, num_updates=7290, lr=2.80641e-05, gnorm=0.893, clip=0, loss_scale=64, train_wall=40, gb_free=28.2, wall=29862 2023-05-01 10:51:29 - progress_bar.py[line:274] - INFO: epoch 002: 1270 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7794.8, nsentences=120, sample_size=4145.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1975.5, ups=0.25, wpb=7794.8, bsz=120, num_updates=7300, lr=2.80588e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=29901 2023-05-01 10:52:09 - progress_bar.py[line:274] - INFO: epoch 002: 1280 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7753.4, nsentences=120, sample_size=4489.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1901.4, ups=0.25, wpb=7753.4, bsz=120, num_updates=7310, lr=2.80535e-05, gnorm=0.892, clip=10, loss_scale=64, train_wall=41, gb_free=30.2, wall=29942 2023-05-01 10:52:49 - progress_bar.py[line:274] - INFO: epoch 002: 1290 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7790.8, nsentences=120, sample_size=3754.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1982.6, ups=0.25, wpb=7790.8, bsz=120, num_updates=7320, lr=2.80482e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=29981 2023-05-01 10:53:28 - progress_bar.py[line:274] - INFO: epoch 002: 1300 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7656.4, nsentences=120, sample_size=4091, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1946.4, ups=0.25, wpb=7656.4, bsz=120, num_updates=7330, lr=2.8043e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=30020 2023-05-01 10:54:08 - progress_bar.py[line:274] - INFO: epoch 002: 1310 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7658.4, nsentences=120, sample_size=4328.1, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1893.6, ups=0.25, wpb=7658.4, bsz=120, num_updates=7340, lr=2.80377e-05, gnorm=0.884, clip=0, loss_scale=64, train_wall=40, gb_free=28.7, wall=30061 2023-05-01 10:54:49 - progress_bar.py[line:274] - INFO: epoch 002: 1320 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7959.3, nsentences=120, sample_size=3872, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1977.7, ups=0.25, wpb=7959.3, bsz=120, num_updates=7350, lr=2.80324e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=30101 2023-05-01 10:55:28 - progress_bar.py[line:274] - INFO: epoch 002: 1330 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7561.9, nsentences=120, sample_size=4134.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1923.6, ups=0.25, wpb=7561.9, bsz=120, num_updates=7360, lr=2.80271e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=30140 2023-05-01 10:56:09 - progress_bar.py[line:274] - INFO: epoch 002: 1340 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7804.6, nsentences=120, sample_size=4121.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1924.6, ups=0.25, wpb=7804.6, bsz=120, num_updates=7370, lr=2.80218e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=26.1, wall=30181 2023-05-01 10:56:48 - progress_bar.py[line:274] - INFO: epoch 002: 1350 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7722.8, nsentences=120, sample_size=4236.1, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1971.6, ups=0.26, wpb=7722.8, bsz=120, num_updates=7380, lr=2.80166e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=39, gb_free=30.6, wall=30220 2023-05-01 10:57:27 - progress_bar.py[line:274] - INFO: epoch 002: 1360 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.272, ntokens=7774.8, nsentences=120, sample_size=4260.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1967.8, ups=0.25, wpb=7774.8, bsz=120, num_updates=7390, lr=2.80113e-05, gnorm=0.91, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=30260 2023-05-01 10:58:07 - progress_bar.py[line:274] - INFO: epoch 002: 1370 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7809.5, nsentences=120, sample_size=4044.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1983.7, ups=0.25, wpb=7809.5, bsz=120, num_updates=7400, lr=2.8006e-05, gnorm=0.911, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=30299 2023-05-01 10:58:47 - progress_bar.py[line:274] - INFO: epoch 002: 1380 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7577, nsentences=120, sample_size=4285.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1891.4, ups=0.25, wpb=7577, bsz=120, num_updates=7410, lr=2.80007e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=30339 2023-05-01 10:59:27 - progress_bar.py[line:274] - INFO: epoch 002: 1390 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7820.9, nsentences=120, sample_size=4025.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1961.5, ups=0.25, wpb=7820.9, bsz=120, num_updates=7420, lr=2.79954e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=27.3, wall=30379 2023-05-01 11:00:06 - progress_bar.py[line:274] - INFO: epoch 002: 1400 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7344.3, nsentences=120, sample_size=4214.6, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1844.7, ups=0.25, wpb=7344.3, bsz=120, num_updates=7430, lr=2.79901e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=28.3, wall=30419 2023-05-01 11:00:46 - progress_bar.py[line:274] - INFO: epoch 002: 1410 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7643.3, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1929.3, ups=0.25, wpb=7643.3, bsz=120, num_updates=7440, lr=2.79849e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=30458 2023-05-01 11:01:25 - progress_bar.py[line:274] - INFO: epoch 002: 1420 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=8090, nsentences=120, sample_size=3740.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2048.7, ups=0.25, wpb=8090, bsz=120, num_updates=7450, lr=2.79796e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=30498 2023-05-01 11:02:06 - progress_bar.py[line:274] - INFO: epoch 002: 1430 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.278, ntokens=7695.8, nsentences=120, sample_size=4311.7, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1905.4, ups=0.25, wpb=7695.8, bsz=120, num_updates=7460, lr=2.79743e-05, gnorm=0.886, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=30538 2023-05-01 11:02:45 - progress_bar.py[line:274] - INFO: epoch 002: 1440 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7704.7, nsentences=120, sample_size=4010.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1978, ups=0.26, wpb=7704.7, bsz=120, num_updates=7470, lr=2.7969e-05, gnorm=0.927, clip=20, loss_scale=64, train_wall=39, gb_free=29.1, wall=30577 2023-05-01 11:03:25 - progress_bar.py[line:274] - INFO: epoch 002: 1450 / 6042 loss=2.508, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7633.6, nsentences=120, sample_size=3992.2, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1907.3, ups=0.25, wpb=7633.6, bsz=120, num_updates=7480, lr=2.79637e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=30617 2023-05-01 11:04:04 - progress_bar.py[line:274] - INFO: epoch 002: 1460 / 6042 loss=2.516, loss_v1=0, loss_v2=0, nll_loss=1.276, ntokens=7712.3, nsentences=120, sample_size=3774.5, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1961.9, ups=0.25, wpb=7712.3, bsz=120, num_updates=7490, lr=2.79584e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=30657 2023-05-01 11:04:43 - progress_bar.py[line:274] - INFO: epoch 002: 1470 / 6042 loss=2.535, loss_v1=0, loss_v2=0, nll_loss=1.3, ntokens=7713.8, nsentences=120, sample_size=4277.2, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1964.9, ups=0.25, wpb=7713.8, bsz=120, num_updates=7500, lr=2.79532e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=30696 2023-05-01 11:05:23 - progress_bar.py[line:274] - INFO: epoch 002: 1480 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=8006.3, nsentences=120, sample_size=4314.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2042.5, ups=0.26, wpb=8006.3, bsz=120, num_updates=7510, lr=2.79479e-05, gnorm=0.898, clip=10, loss_scale=64, train_wall=39, gb_free=28.8, wall=30735 2023-05-01 11:06:02 - progress_bar.py[line:274] - INFO: epoch 002: 1490 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7798.1, nsentences=120, sample_size=3425.8, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1970.2, ups=0.25, wpb=7798.1, bsz=120, num_updates=7520, lr=2.79426e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=30775 2023-05-01 11:06:43 - progress_bar.py[line:274] - INFO: epoch 002: 1500 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7562.9, nsentences=120, sample_size=4323.2, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1871.2, ups=0.25, wpb=7562.9, bsz=120, num_updates=7530, lr=2.79373e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=30815 2023-05-01 11:07:22 - progress_bar.py[line:274] - INFO: epoch 002: 1510 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7906.9, nsentences=120, sample_size=4198.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1985.8, ups=0.25, wpb=7906.9, bsz=120, num_updates=7540, lr=2.7932e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=30855 2023-05-01 11:08:01 - progress_bar.py[line:274] - INFO: epoch 002: 1520 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7454.1, nsentences=120, sample_size=4320.4, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1922.9, ups=0.26, wpb=7454.1, bsz=120, num_updates=7550, lr=2.79268e-05, gnorm=0.904, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=30894 2023-05-01 11:08:41 - progress_bar.py[line:274] - INFO: epoch 002: 1530 / 6042 loss=2.54, loss_v1=0, loss_v2=0, nll_loss=1.307, ntokens=7566.4, nsentences=120, sample_size=4069.7, sample_size_v1=0, sample_size_v2=0, ppl=2.47, wps=1876.2, ups=0.25, wpb=7566.4, bsz=120, num_updates=7560, lr=2.79215e-05, gnorm=0.92, clip=10, loss_scale=128, train_wall=40, gb_free=31, wall=30934 2023-05-01 11:08:50 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 11:09:26 - progress_bar.py[line:274] - INFO: epoch 002: 1541 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7883.7, nsentences=120, sample_size=4089.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1787.9, ups=0.23, wpb=7883.7, bsz=120, num_updates=7570, lr=2.79162e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=44, gb_free=30, wall=30978 2023-05-01 11:10:05 - progress_bar.py[line:274] - INFO: epoch 002: 1551 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7618.7, nsentences=120, sample_size=4079.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1940, ups=0.25, wpb=7618.7, bsz=120, num_updates=7580, lr=2.79109e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=39, gb_free=29, wall=31017 2023-05-01 11:10:45 - progress_bar.py[line:274] - INFO: epoch 002: 1561 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7741.1, nsentences=120, sample_size=4057.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1932.8, ups=0.25, wpb=7741.1, bsz=120, num_updates=7590, lr=2.79056e-05, gnorm=0.934, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=31057 2023-05-01 11:11:25 - progress_bar.py[line:274] - INFO: epoch 002: 1571 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7771.2, nsentences=120, sample_size=3901.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1940.3, ups=0.25, wpb=7771.2, bsz=120, num_updates=7600, lr=2.79003e-05, gnorm=0.909, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=31097 2023-05-01 11:12:04 - progress_bar.py[line:274] - INFO: epoch 002: 1581 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7645.9, nsentences=120, sample_size=3995.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1945.8, ups=0.25, wpb=7645.9, bsz=120, num_updates=7610, lr=2.78951e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=39, gb_free=31.1, wall=31137 2023-05-01 11:12:44 - progress_bar.py[line:274] - INFO: epoch 002: 1591 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7723.7, nsentences=120, sample_size=4088.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1925.8, ups=0.25, wpb=7723.7, bsz=120, num_updates=7620, lr=2.78898e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=31177 2023-05-01 11:13:25 - progress_bar.py[line:274] - INFO: epoch 002: 1601 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7866.4, nsentences=120, sample_size=4032.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1954.3, ups=0.25, wpb=7866.4, bsz=120, num_updates=7630, lr=2.78845e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=31217 2023-05-01 11:14:04 - progress_bar.py[line:274] - INFO: epoch 002: 1611 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7737.3, nsentences=120, sample_size=3835.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1962, ups=0.25, wpb=7737.3, bsz=120, num_updates=7640, lr=2.78792e-05, gnorm=0.939, clip=0, loss_scale=64, train_wall=39, gb_free=30.7, wall=31257 2023-05-01 11:14:44 - progress_bar.py[line:274] - INFO: epoch 002: 1621 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7897.1, nsentences=120, sample_size=4095.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1967.9, ups=0.25, wpb=7897.1, bsz=120, num_updates=7650, lr=2.78739e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29.6, wall=31297 2023-05-01 11:15:24 - progress_bar.py[line:274] - INFO: epoch 002: 1631 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7469.5, nsentences=120, sample_size=4482.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1857.1, ups=0.25, wpb=7469.5, bsz=120, num_updates=7660, lr=2.78687e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=31.5, wall=31337 2023-05-01 11:16:04 - progress_bar.py[line:274] - INFO: epoch 002: 1641 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=8033.5, nsentences=120, sample_size=4179.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2027.6, ups=0.25, wpb=8033.5, bsz=120, num_updates=7670, lr=2.78634e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=31376 2023-05-01 11:16:45 - progress_bar.py[line:274] - INFO: epoch 002: 1651 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7992.2, nsentences=120, sample_size=4024.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1966.1, ups=0.25, wpb=7992.2, bsz=120, num_updates=7680, lr=2.78581e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=41, gb_free=23.6, wall=31417 2023-05-01 11:17:24 - progress_bar.py[line:274] - INFO: epoch 002: 1661 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7344.9, nsentences=120, sample_size=4230.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1852.1, ups=0.25, wpb=7344.9, bsz=120, num_updates=7690, lr=2.78528e-05, gnorm=0.913, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=31457 2023-05-01 11:18:04 - progress_bar.py[line:274] - INFO: epoch 002: 1671 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7760.6, nsentences=120, sample_size=3952.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1956.4, ups=0.25, wpb=7760.6, bsz=120, num_updates=7700, lr=2.78475e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=31496 2023-05-01 11:18:44 - progress_bar.py[line:274] - INFO: epoch 002: 1681 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7872.9, nsentences=120, sample_size=4027.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1980.5, ups=0.25, wpb=7872.9, bsz=120, num_updates=7710, lr=2.78422e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=28.2, wall=31536 2023-05-01 11:19:24 - progress_bar.py[line:274] - INFO: epoch 002: 1691 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=8018.3, nsentences=120, sample_size=4090.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1978.9, ups=0.25, wpb=8018.3, bsz=120, num_updates=7720, lr=2.7837e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=40, gb_free=27.9, wall=31577 2023-05-01 11:20:04 - progress_bar.py[line:274] - INFO: epoch 002: 1701 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7763.9, nsentences=120, sample_size=4085.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1948.1, ups=0.25, wpb=7763.9, bsz=120, num_updates=7730, lr=2.78317e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=28.1, wall=31617 2023-05-01 11:20:44 - progress_bar.py[line:274] - INFO: epoch 002: 1711 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7691.9, nsentences=120, sample_size=3965.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1932.9, ups=0.25, wpb=7691.9, bsz=120, num_updates=7740, lr=2.78264e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=31656 2023-05-01 11:21:23 - progress_bar.py[line:274] - INFO: epoch 002: 1721 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7525.1, nsentences=120, sample_size=4007.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1902.5, ups=0.25, wpb=7525.1, bsz=120, num_updates=7750, lr=2.78211e-05, gnorm=0.895, clip=0, loss_scale=64, train_wall=39, gb_free=31, wall=31696 2023-05-01 11:22:04 - progress_bar.py[line:274] - INFO: epoch 002: 1731 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7969.6, nsentences=120, sample_size=3559.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1981.9, ups=0.25, wpb=7969.6, bsz=120, num_updates=7760, lr=2.78158e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=31736 2023-05-01 11:22:43 - progress_bar.py[line:274] - INFO: epoch 002: 1741 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7493.9, nsentences=120, sample_size=3987.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1896, ups=0.25, wpb=7493.9, bsz=120, num_updates=7770, lr=2.78105e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=27.9, wall=31776 2023-05-01 11:23:24 - progress_bar.py[line:274] - INFO: epoch 002: 1751 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=8051.4, nsentences=120, sample_size=3954.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1995, ups=0.25, wpb=8051.4, bsz=120, num_updates=7780, lr=2.78053e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=31816 2023-05-01 11:24:03 - progress_bar.py[line:274] - INFO: epoch 002: 1761 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7392.8, nsentences=120, sample_size=4065.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1879.9, ups=0.25, wpb=7392.8, bsz=120, num_updates=7790, lr=2.78e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=31855 2023-05-01 11:24:44 - progress_bar.py[line:274] - INFO: epoch 002: 1771 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7649.4, nsentences=120, sample_size=4599.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1874.8, ups=0.25, wpb=7649.4, bsz=120, num_updates=7800, lr=2.77947e-05, gnorm=0.9, clip=0, loss_scale=64, train_wall=41, gb_free=28.4, wall=31896 2023-05-01 11:25:23 - progress_bar.py[line:274] - INFO: epoch 002: 1781 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7863.9, nsentences=120, sample_size=3874.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1977.3, ups=0.25, wpb=7863.9, bsz=120, num_updates=7810, lr=2.77894e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=31936 2023-05-01 11:26:03 - progress_bar.py[line:274] - INFO: epoch 002: 1791 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7534.4, nsentences=120, sample_size=4024.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1901, ups=0.25, wpb=7534.4, bsz=120, num_updates=7820, lr=2.77841e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=31976 2023-05-01 11:26:43 - progress_bar.py[line:274] - INFO: epoch 002: 1801 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=8002.2, nsentences=120, sample_size=3841, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1991.7, ups=0.25, wpb=8002.2, bsz=120, num_updates=7830, lr=2.77789e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=32016 2023-05-01 11:27:24 - progress_bar.py[line:274] - INFO: epoch 002: 1811 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7500.6, nsentences=120, sample_size=3925.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1859.2, ups=0.25, wpb=7500.6, bsz=120, num_updates=7840, lr=2.77736e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=32056 2023-05-01 11:28:04 - progress_bar.py[line:274] - INFO: epoch 002: 1821 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7410.6, nsentences=120, sample_size=4294.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1853.9, ups=0.25, wpb=7410.6, bsz=120, num_updates=7850, lr=2.77683e-05, gnorm=0.899, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=32096 2023-05-01 11:28:43 - progress_bar.py[line:274] - INFO: epoch 002: 1831 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7899.2, nsentences=120, sample_size=4025.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988, ups=0.25, wpb=7899.2, bsz=120, num_updates=7860, lr=2.7763e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=32136 2023-05-01 11:29:23 - progress_bar.py[line:274] - INFO: epoch 002: 1841 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7647.3, nsentences=120, sample_size=3968, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1939.4, ups=0.25, wpb=7647.3, bsz=120, num_updates=7870, lr=2.77577e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=32175 2023-05-01 11:30:02 - progress_bar.py[line:274] - INFO: epoch 002: 1851 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7455.6, nsentences=120, sample_size=4172.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1893.8, ups=0.25, wpb=7455.6, bsz=120, num_updates=7880, lr=2.77524e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=27.7, wall=32215 2023-05-01 11:30:42 - progress_bar.py[line:274] - INFO: epoch 002: 1861 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7493.7, nsentences=120, sample_size=4189.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1878.4, ups=0.25, wpb=7493.7, bsz=120, num_updates=7890, lr=2.77472e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=32255 2023-05-01 11:31:22 - progress_bar.py[line:274] - INFO: epoch 002: 1871 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7825.1, nsentences=120, sample_size=4314.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1962.9, ups=0.25, wpb=7825.1, bsz=120, num_updates=7900, lr=2.77419e-05, gnorm=0.875, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=32294 2023-05-01 11:32:02 - progress_bar.py[line:274] - INFO: epoch 002: 1881 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7995.3, nsentences=120, sample_size=3869, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1998.9, ups=0.25, wpb=7995.3, bsz=120, num_updates=7910, lr=2.77366e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=32334 2023-05-01 11:32:41 - progress_bar.py[line:274] - INFO: epoch 002: 1891 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7777.6, nsentences=120, sample_size=4163.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2001.2, ups=0.26, wpb=7777.6, bsz=120, num_updates=7920, lr=2.77313e-05, gnorm=0.901, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=32373 2023-05-01 11:33:21 - progress_bar.py[line:274] - INFO: epoch 002: 1901 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7712.1, nsentences=120, sample_size=3930.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1919.6, ups=0.25, wpb=7712.1, bsz=120, num_updates=7930, lr=2.7726e-05, gnorm=0.938, clip=0, loss_scale=64, train_wall=40, gb_free=31.5, wall=32413 2023-05-01 11:34:01 - progress_bar.py[line:274] - INFO: epoch 002: 1911 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7904.4, nsentences=120, sample_size=3949.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1959.2, ups=0.25, wpb=7904.4, bsz=120, num_updates=7940, lr=2.77208e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=32454 2023-05-01 11:34:42 - progress_bar.py[line:274] - INFO: epoch 002: 1921 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=8028.6, nsentences=120, sample_size=4329, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1953.6, ups=0.24, wpb=8028.6, bsz=120, num_updates=7950, lr=2.77155e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=41, gb_free=30, wall=32495 2023-05-01 11:35:22 - progress_bar.py[line:274] - INFO: epoch 002: 1931 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7725.9, nsentences=120, sample_size=3918.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1940.1, ups=0.25, wpb=7725.9, bsz=120, num_updates=7960, lr=2.77102e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=32535 2023-05-01 11:36:02 - progress_bar.py[line:274] - INFO: epoch 002: 1941 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7486.7, nsentences=120, sample_size=3894.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1876.8, ups=0.25, wpb=7486.7, bsz=120, num_updates=7970, lr=2.77049e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=32575 2023-05-01 11:36:43 - progress_bar.py[line:274] - INFO: epoch 002: 1951 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7833.3, nsentences=120, sample_size=3825.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1932.2, ups=0.25, wpb=7833.3, bsz=120, num_updates=7980, lr=2.76996e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=32615 2023-05-01 11:37:23 - progress_bar.py[line:274] - INFO: epoch 002: 1961 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7495.6, nsentences=120, sample_size=4045, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1857.9, ups=0.25, wpb=7495.6, bsz=120, num_updates=7990, lr=2.76943e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=32655 2023-05-01 11:38:03 - progress_bar.py[line:274] - INFO: epoch 002: 1971 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7676.8, nsentences=120, sample_size=3940.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1912.5, ups=0.25, wpb=7676.8, bsz=120, num_updates=8000, lr=2.76891e-05, gnorm=0.915, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=32696 2023-05-01 11:38:03 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 11:38:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 11:38:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 11:38:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 11:38:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:45 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 11:38:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:49 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 11:38:49 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 11:38:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 11:38:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 11:38:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 11:38:54 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.205 | loss_v1 0 | loss_v2 0 | nll_loss 2.04 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.11 | score 0.7402 | wps 3291 | wpb 3202.1 | bsz 39.4 | num_updates 8000 | best_score 0.751 2023-05-01 11:38:54 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 8000 updates 2023-05-01 11:38:54 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_8000.pt 2023-05-01 11:39:18 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_8000.pt 2023-05-01 11:39:33 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_8000.pt (epoch 2 @ 8000 updates, score 0.7402) (writing took 38.841108654858544 seconds) 2023-05-01 11:40:13 - progress_bar.py[line:274] - INFO: epoch 002: 1981 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7865.2, nsentences=120, sample_size=3917.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=604.6, ups=0.08, wpb=7865.2, bsz=120, num_updates=8010, lr=2.76838e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=32826 2023-05-01 11:40:52 - progress_bar.py[line:274] - INFO: epoch 002: 1991 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7586.2, nsentences=120, sample_size=4053.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1945.7, ups=0.26, wpb=7586.2, bsz=120, num_updates=8020, lr=2.76785e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=32865 2023-05-01 11:41:32 - progress_bar.py[line:274] - INFO: epoch 002: 2001 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7760.7, nsentences=120, sample_size=4022.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1954.3, ups=0.25, wpb=7760.7, bsz=120, num_updates=8030, lr=2.76732e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=32904 2023-05-01 11:42:12 - progress_bar.py[line:274] - INFO: epoch 002: 2011 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7559.8, nsentences=120, sample_size=3810.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1895.1, ups=0.25, wpb=7559.8, bsz=120, num_updates=8040, lr=2.76679e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=32944 2023-05-01 11:42:52 - progress_bar.py[line:274] - INFO: epoch 002: 2021 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7911.3, nsentences=120, sample_size=4028.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1980.8, ups=0.25, wpb=7911.3, bsz=120, num_updates=8050, lr=2.76626e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=32984 2023-05-01 11:43:31 - progress_bar.py[line:274] - INFO: epoch 002: 2031 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7811.9, nsentences=120, sample_size=4324.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1987, ups=0.25, wpb=7811.9, bsz=120, num_updates=8060, lr=2.76574e-05, gnorm=0.893, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=33024 2023-05-01 11:44:11 - progress_bar.py[line:274] - INFO: epoch 002: 2041 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7581.9, nsentences=120, sample_size=4181.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1923.4, ups=0.25, wpb=7581.9, bsz=120, num_updates=8070, lr=2.76521e-05, gnorm=0.907, clip=0, loss_scale=64, train_wall=39, gb_free=31.1, wall=33063 2023-05-01 11:44:50 - progress_bar.py[line:274] - INFO: epoch 002: 2051 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7835.3, nsentences=120, sample_size=4134.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2002, ups=0.26, wpb=7835.3, bsz=120, num_updates=8080, lr=2.76468e-05, gnorm=0.91, clip=10, loss_scale=128, train_wall=39, gb_free=30, wall=33102 2023-05-01 11:44:54 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 11:45:33 - progress_bar.py[line:274] - INFO: epoch 002: 2062 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7718.8, nsentences=120, sample_size=4094.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1786, ups=0.23, wpb=7718.8, bsz=120, num_updates=8090, lr=2.76415e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=43, gb_free=30, wall=33145 2023-05-01 11:46:12 - progress_bar.py[line:274] - INFO: epoch 002: 2072 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7438.7, nsentences=120, sample_size=4002, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1887.1, ups=0.25, wpb=7438.7, bsz=120, num_updates=8100, lr=2.76362e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=33185 2023-05-01 11:46:51 - progress_bar.py[line:274] - INFO: epoch 002: 2082 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7736.2, nsentences=120, sample_size=4122.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2002.1, ups=0.26, wpb=7736.2, bsz=120, num_updates=8110, lr=2.7631e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=33223 2023-05-01 11:47:31 - progress_bar.py[line:274] - INFO: epoch 002: 2092 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=8075, nsentences=120, sample_size=3911.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2019.1, ups=0.25, wpb=8075, bsz=120, num_updates=8120, lr=2.76257e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=33263 2023-05-01 11:48:11 - progress_bar.py[line:274] - INFO: epoch 002: 2102 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7831, nsentences=120, sample_size=3927, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1968, ups=0.25, wpb=7831, bsz=120, num_updates=8130, lr=2.76204e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=33303 2023-05-01 11:48:51 - progress_bar.py[line:274] - INFO: epoch 002: 2112 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7590.8, nsentences=120, sample_size=4021.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1906.5, ups=0.25, wpb=7590.8, bsz=120, num_updates=8140, lr=2.76151e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=33343 2023-05-01 11:49:30 - progress_bar.py[line:274] - INFO: epoch 002: 2122 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7734.9, nsentences=120, sample_size=4318.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1943.5, ups=0.25, wpb=7734.9, bsz=120, num_updates=8150, lr=2.76098e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=33383 2023-05-01 11:50:10 - progress_bar.py[line:274] - INFO: epoch 002: 2132 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7736.3, nsentences=120, sample_size=4080.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1965.7, ups=0.25, wpb=7736.3, bsz=120, num_updates=8160, lr=2.76045e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=39, gb_free=27.3, wall=33422 2023-05-01 11:50:49 - progress_bar.py[line:274] - INFO: epoch 002: 2142 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=8125.1, nsentences=120, sample_size=3729.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2051.7, ups=0.25, wpb=8125.1, bsz=120, num_updates=8170, lr=2.75993e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=33462 2023-05-01 11:51:29 - progress_bar.py[line:274] - INFO: epoch 002: 2152 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7852.1, nsentences=120, sample_size=4013.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1994.4, ups=0.25, wpb=7852.1, bsz=120, num_updates=8180, lr=2.7594e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=28.4, wall=33501 2023-05-01 11:52:10 - progress_bar.py[line:274] - INFO: epoch 002: 2162 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7701.5, nsentences=120, sample_size=3997.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1885.1, ups=0.24, wpb=7701.5, bsz=120, num_updates=8190, lr=2.75887e-05, gnorm=0.918, clip=20, loss_scale=64, train_wall=41, gb_free=29.9, wall=33542 2023-05-01 11:52:50 - progress_bar.py[line:274] - INFO: epoch 002: 2172 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7648, nsentences=120, sample_size=4222.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1893.8, ups=0.25, wpb=7648, bsz=120, num_updates=8200, lr=2.75834e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=33582 2023-05-01 11:53:29 - progress_bar.py[line:274] - INFO: epoch 002: 2182 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7945.7, nsentences=120, sample_size=3700.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2028, ups=0.26, wpb=7945.7, bsz=120, num_updates=8210, lr=2.75781e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=33622 2023-05-01 11:54:09 - progress_bar.py[line:274] - INFO: epoch 002: 2192 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7634, nsentences=120, sample_size=3917.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1914, ups=0.25, wpb=7634, bsz=120, num_updates=8220, lr=2.75728e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=33661 2023-05-01 11:54:49 - progress_bar.py[line:274] - INFO: epoch 002: 2202 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7656.9, nsentences=120, sample_size=3912.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1909.8, ups=0.25, wpb=7656.9, bsz=120, num_updates=8230, lr=2.75676e-05, gnorm=0.924, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=33702 2023-05-01 11:55:29 - progress_bar.py[line:274] - INFO: epoch 002: 2212 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7559.9, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1905.1, ups=0.25, wpb=7559.9, bsz=120, num_updates=8240, lr=2.75623e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=33741 2023-05-01 11:56:09 - progress_bar.py[line:274] - INFO: epoch 002: 2222 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=8073, nsentences=120, sample_size=4224.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2017.9, ups=0.25, wpb=8073, bsz=120, num_updates=8250, lr=2.7557e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=29.2, wall=33781 2023-05-01 11:56:49 - progress_bar.py[line:274] - INFO: epoch 002: 2232 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7793.8, nsentences=120, sample_size=4395, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1936.2, ups=0.25, wpb=7793.8, bsz=120, num_updates=8260, lr=2.75517e-05, gnorm=0.882, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=33821 2023-05-01 11:57:30 - progress_bar.py[line:274] - INFO: epoch 002: 2242 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7719, nsentences=120, sample_size=3796.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1876.6, ups=0.24, wpb=7719, bsz=120, num_updates=8270, lr=2.75464e-05, gnorm=0.934, clip=0, loss_scale=64, train_wall=41, gb_free=29.3, wall=33863 2023-05-01 11:58:10 - progress_bar.py[line:274] - INFO: epoch 002: 2252 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7773.2, nsentences=120, sample_size=3964.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1941.1, ups=0.25, wpb=7773.2, bsz=120, num_updates=8280, lr=2.75412e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=33903 2023-05-01 11:58:50 - progress_bar.py[line:274] - INFO: epoch 002: 2262 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7776.2, nsentences=120, sample_size=4020, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1950.4, ups=0.25, wpb=7776.2, bsz=120, num_updates=8290, lr=2.75359e-05, gnorm=0.933, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=33943 2023-05-01 11:59:29 - progress_bar.py[line:274] - INFO: epoch 002: 2272 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7436.5, nsentences=120, sample_size=4095.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1913.3, ups=0.26, wpb=7436.5, bsz=120, num_updates=8300, lr=2.75306e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=33981 2023-05-01 12:00:08 - progress_bar.py[line:274] - INFO: epoch 002: 2282 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7561.4, nsentences=120, sample_size=4095.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1926.4, ups=0.25, wpb=7561.4, bsz=120, num_updates=8310, lr=2.75253e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=34021 2023-05-01 12:00:48 - progress_bar.py[line:274] - INFO: epoch 002: 2292 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7582.6, nsentences=120, sample_size=4015.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1924.6, ups=0.25, wpb=7582.6, bsz=120, num_updates=8320, lr=2.752e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=34060 2023-05-01 12:01:28 - progress_bar.py[line:274] - INFO: epoch 002: 2302 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7937.7, nsentences=120, sample_size=4068.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1966.4, ups=0.25, wpb=7937.7, bsz=120, num_updates=8330, lr=2.75147e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=34100 2023-05-01 12:02:08 - progress_bar.py[line:274] - INFO: epoch 002: 2312 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7708.3, nsentences=120, sample_size=3913.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1936.3, ups=0.25, wpb=7708.3, bsz=120, num_updates=8340, lr=2.75095e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=34140 2023-05-01 12:02:47 - progress_bar.py[line:274] - INFO: epoch 002: 2322 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7580.7, nsentences=120, sample_size=3834.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1910.7, ups=0.25, wpb=7580.7, bsz=120, num_updates=8350, lr=2.75042e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=34180 2023-05-01 12:03:27 - progress_bar.py[line:274] - INFO: epoch 002: 2332 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7604.5, nsentences=120, sample_size=3831, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.5, ups=0.25, wpb=7604.5, bsz=120, num_updates=8360, lr=2.74989e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=28.6, wall=34220 2023-05-01 12:04:07 - progress_bar.py[line:274] - INFO: epoch 002: 2342 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7926.5, nsentences=120, sample_size=3957.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1974.5, ups=0.25, wpb=7926.5, bsz=120, num_updates=8370, lr=2.74936e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=34260 2023-05-01 12:04:48 - progress_bar.py[line:274] - INFO: epoch 002: 2352 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7722.3, nsentences=120, sample_size=4101, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1913.7, ups=0.25, wpb=7722.3, bsz=120, num_updates=8380, lr=2.74883e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=34300 2023-05-01 12:05:29 - progress_bar.py[line:274] - INFO: epoch 002: 2362 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7841.5, nsentences=120, sample_size=4060, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1927.1, ups=0.25, wpb=7841.5, bsz=120, num_updates=8390, lr=2.74831e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=41, gb_free=30.2, wall=34341 2023-05-01 12:06:08 - progress_bar.py[line:274] - INFO: epoch 002: 2372 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7471.6, nsentences=120, sample_size=4151.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1871.1, ups=0.25, wpb=7471.6, bsz=120, num_updates=8400, lr=2.74778e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=25.7, wall=34381 2023-05-01 12:06:48 - progress_bar.py[line:274] - INFO: epoch 002: 2382 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7505.4, nsentences=120, sample_size=3850.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1887.2, ups=0.25, wpb=7505.4, bsz=120, num_updates=8410, lr=2.74725e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=34421 2023-05-01 12:07:28 - progress_bar.py[line:274] - INFO: epoch 002: 2392 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7572.4, nsentences=120, sample_size=4093.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1903.2, ups=0.25, wpb=7572.4, bsz=120, num_updates=8420, lr=2.74672e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=34460 2023-05-01 12:08:07 - progress_bar.py[line:274] - INFO: epoch 002: 2402 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7808.2, nsentences=120, sample_size=4202.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1985.3, ups=0.25, wpb=7808.2, bsz=120, num_updates=8430, lr=2.74619e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=34500 2023-05-01 12:08:47 - progress_bar.py[line:274] - INFO: epoch 002: 2412 / 6042 loss=2.525, loss_v1=0, loss_v2=0, nll_loss=1.294, ntokens=7843.2, nsentences=120, sample_size=3999.6, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1983.7, ups=0.25, wpb=7843.2, bsz=120, num_updates=8440, lr=2.74566e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=34539 2023-05-01 12:09:27 - progress_bar.py[line:274] - INFO: epoch 002: 2422 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.265, ntokens=7722.1, nsentences=120, sample_size=3735.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1932.8, ups=0.25, wpb=7722.1, bsz=120, num_updates=8450, lr=2.74514e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=34579 2023-05-01 12:10:07 - progress_bar.py[line:274] - INFO: epoch 002: 2432 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7805.8, nsentences=120, sample_size=4162.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1957.3, ups=0.25, wpb=7805.8, bsz=120, num_updates=8460, lr=2.74461e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=34619 2023-05-01 12:10:46 - progress_bar.py[line:274] - INFO: epoch 002: 2442 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7538.9, nsentences=120, sample_size=4066.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1925.3, ups=0.26, wpb=7538.9, bsz=120, num_updates=8470, lr=2.74408e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=34658 2023-05-01 12:11:26 - progress_bar.py[line:274] - INFO: epoch 002: 2452 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7995.4, nsentences=120, sample_size=4123.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1982, ups=0.25, wpb=7995.4, bsz=120, num_updates=8480, lr=2.74355e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=27.9, wall=34699 2023-05-01 12:12:05 - progress_bar.py[line:274] - INFO: epoch 002: 2462 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7758.4, nsentences=120, sample_size=3890.9, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1991.4, ups=0.26, wpb=7758.4, bsz=120, num_updates=8490, lr=2.74302e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=34738 2023-05-01 12:12:46 - progress_bar.py[line:274] - INFO: epoch 002: 2472 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7852.5, nsentences=120, sample_size=3678.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1946.8, ups=0.25, wpb=7852.5, bsz=120, num_updates=8500, lr=2.74249e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=34778 2023-05-01 12:13:25 - progress_bar.py[line:274] - INFO: epoch 002: 2482 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7850.8, nsentences=120, sample_size=3904.3, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1971.1, ups=0.25, wpb=7850.8, bsz=120, num_updates=8510, lr=2.74197e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=34818 2023-05-01 12:14:05 - progress_bar.py[line:274] - INFO: epoch 002: 2492 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7609.7, nsentences=120, sample_size=4062.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1932.9, ups=0.25, wpb=7609.7, bsz=120, num_updates=8520, lr=2.74144e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=34857 2023-05-01 12:14:44 - progress_bar.py[line:274] - INFO: epoch 002: 2502 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7501.1, nsentences=120, sample_size=4330.9, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1903.1, ups=0.25, wpb=7501.1, bsz=120, num_updates=8530, lr=2.74091e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=31.3, wall=34897 2023-05-01 12:15:25 - progress_bar.py[line:274] - INFO: epoch 002: 2512 / 6042 loss=2.571, loss_v1=0, loss_v2=0, nll_loss=1.343, ntokens=8044.1, nsentences=120, sample_size=4183.1, sample_size_v1=0, sample_size_v2=0, ppl=2.54, wps=1980.2, ups=0.25, wpb=8044.1, bsz=120, num_updates=8540, lr=2.74038e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=41, gb_free=29.3, wall=34937 2023-05-01 12:16:04 - progress_bar.py[line:274] - INFO: epoch 002: 2522 / 6042 loss=2.523, loss_v1=0, loss_v2=0, nll_loss=1.286, ntokens=7712.6, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1969.7, ups=0.26, wpb=7712.6, bsz=120, num_updates=8550, lr=2.73985e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=28.2, wall=34976 2023-05-01 12:16:43 - progress_bar.py[line:274] - INFO: epoch 002: 2532 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7733.3, nsentences=120, sample_size=3846.8, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1963.7, ups=0.25, wpb=7733.3, bsz=120, num_updates=8560, lr=2.73933e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=35016 2023-05-01 12:17:23 - progress_bar.py[line:274] - INFO: epoch 002: 2542 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7713.6, nsentences=120, sample_size=4384.5, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1938.4, ups=0.25, wpb=7713.6, bsz=120, num_updates=8570, lr=2.7388e-05, gnorm=0.892, clip=0, loss_scale=64, train_wall=40, gb_free=28.6, wall=35056 2023-05-01 12:18:03 - progress_bar.py[line:274] - INFO: epoch 002: 2552 / 6042 loss=2.522, loss_v1=0, loss_v2=0, nll_loss=1.285, ntokens=7810, nsentences=120, sample_size=4097.7, sample_size_v1=0, sample_size_v2=0, ppl=2.44, wps=1942.2, ups=0.25, wpb=7810, bsz=120, num_updates=8580, lr=2.73827e-05, gnorm=0.915, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=35096 2023-05-01 12:18:44 - progress_bar.py[line:274] - INFO: epoch 002: 2562 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7747.3, nsentences=120, sample_size=3980.6, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1921.9, ups=0.25, wpb=7747.3, bsz=120, num_updates=8590, lr=2.73774e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=35136 2023-05-01 12:19:22 - progress_bar.py[line:274] - INFO: epoch 002: 2572 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7516.8, nsentences=120, sample_size=4022.3, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1938.2, ups=0.26, wpb=7516.8, bsz=120, num_updates=8600, lr=2.73721e-05, gnorm=0.952, clip=10, loss_scale=128, train_wall=39, gb_free=29.5, wall=35175 2023-05-01 12:19:30 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 12:20:06 - progress_bar.py[line:274] - INFO: epoch 002: 2583 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7656.7, nsentences=120, sample_size=4027, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1765.1, ups=0.23, wpb=7656.7, bsz=120, num_updates=8610, lr=2.73668e-05, gnorm=0.892, clip=10, loss_scale=64, train_wall=43, gb_free=29.4, wall=35218 2023-05-01 12:20:45 - progress_bar.py[line:274] - INFO: epoch 002: 2593 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7658.4, nsentences=120, sample_size=3932.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1929.5, ups=0.25, wpb=7658.4, bsz=120, num_updates=8620, lr=2.73616e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=40, gb_free=23.6, wall=35258 2023-05-01 12:21:26 - progress_bar.py[line:274] - INFO: epoch 002: 2603 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7929.7, nsentences=120, sample_size=3872.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1980.4, ups=0.25, wpb=7929.7, bsz=120, num_updates=8630, lr=2.73563e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=35298 2023-05-01 12:22:06 - progress_bar.py[line:274] - INFO: epoch 002: 2613 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7949.9, nsentences=120, sample_size=3793.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1978.8, ups=0.25, wpb=7949.9, bsz=120, num_updates=8640, lr=2.7351e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=35338 2023-05-01 12:22:47 - progress_bar.py[line:274] - INFO: epoch 002: 2623 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7474.9, nsentences=120, sample_size=4040.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1829.9, ups=0.24, wpb=7474.9, bsz=120, num_updates=8650, lr=2.73457e-05, gnorm=0.904, clip=10, loss_scale=64, train_wall=41, gb_free=30.8, wall=35379 2023-05-01 12:23:26 - progress_bar.py[line:274] - INFO: epoch 002: 2633 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7789.9, nsentences=120, sample_size=3815.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1951.2, ups=0.25, wpb=7789.9, bsz=120, num_updates=8660, lr=2.73404e-05, gnorm=0.933, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=35419 2023-05-01 12:24:07 - progress_bar.py[line:274] - INFO: epoch 002: 2643 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7543.1, nsentences=120, sample_size=4099.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1882.5, ups=0.25, wpb=7543.1, bsz=120, num_updates=8670, lr=2.73352e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=35459 2023-05-01 12:24:46 - progress_bar.py[line:274] - INFO: epoch 002: 2653 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7750.6, nsentences=120, sample_size=3971.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1947.2, ups=0.25, wpb=7750.6, bsz=120, num_updates=8680, lr=2.73299e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=35499 2023-05-01 12:25:27 - progress_bar.py[line:274] - INFO: epoch 002: 2663 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7751.9, nsentences=120, sample_size=4124, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1931, ups=0.25, wpb=7751.9, bsz=120, num_updates=8690, lr=2.73246e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=35539 2023-05-01 12:26:07 - progress_bar.py[line:274] - INFO: epoch 002: 2673 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=8001.5, nsentences=120, sample_size=3771.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1995.4, ups=0.25, wpb=8001.5, bsz=120, num_updates=8700, lr=2.73193e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=35579 2023-05-01 12:26:47 - progress_bar.py[line:274] - INFO: epoch 002: 2683 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7894.1, nsentences=120, sample_size=4203.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1975, ups=0.25, wpb=7894.1, bsz=120, num_updates=8710, lr=2.7314e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=35619 2023-05-01 12:27:27 - progress_bar.py[line:274] - INFO: epoch 002: 2693 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7793.9, nsentences=120, sample_size=3840.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1939.3, ups=0.25, wpb=7793.9, bsz=120, num_updates=8720, lr=2.73087e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=35659 2023-05-01 12:28:07 - progress_bar.py[line:274] - INFO: epoch 002: 2703 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7782.4, nsentences=120, sample_size=4189.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1928.4, ups=0.25, wpb=7782.4, bsz=120, num_updates=8730, lr=2.73035e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=35700 2023-05-01 12:28:48 - progress_bar.py[line:274] - INFO: epoch 002: 2713 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=8031.9, nsentences=120, sample_size=4034.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1976.8, ups=0.25, wpb=8031.9, bsz=120, num_updates=8740, lr=2.72982e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=41, gb_free=26.4, wall=35740 2023-05-01 12:29:28 - progress_bar.py[line:274] - INFO: epoch 002: 2723 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7723, nsentences=120, sample_size=4043.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1922.3, ups=0.25, wpb=7723, bsz=120, num_updates=8750, lr=2.72929e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=35780 2023-05-01 12:30:07 - progress_bar.py[line:274] - INFO: epoch 002: 2733 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7549.7, nsentences=120, sample_size=3982.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1909.5, ups=0.25, wpb=7549.7, bsz=120, num_updates=8760, lr=2.72876e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=39, gb_free=29.6, wall=35820 2023-05-01 12:30:47 - progress_bar.py[line:274] - INFO: epoch 002: 2743 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7502.5, nsentences=120, sample_size=4232.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1886.8, ups=0.25, wpb=7502.5, bsz=120, num_updates=8770, lr=2.72823e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=35860 2023-05-01 12:31:27 - progress_bar.py[line:274] - INFO: epoch 002: 2753 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7475.5, nsentences=120, sample_size=4103.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1889.2, ups=0.25, wpb=7475.5, bsz=120, num_updates=8780, lr=2.7277e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=35899 2023-05-01 12:32:06 - progress_bar.py[line:274] - INFO: epoch 002: 2763 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7595, nsentences=120, sample_size=3805.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1941.7, ups=0.26, wpb=7595, bsz=120, num_updates=8790, lr=2.72718e-05, gnorm=0.949, clip=0, loss_scale=64, train_wall=39, gb_free=24.9, wall=35938 2023-05-01 12:32:46 - progress_bar.py[line:274] - INFO: epoch 002: 2773 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7457.8, nsentences=120, sample_size=4054.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1865.4, ups=0.25, wpb=7457.8, bsz=120, num_updates=8800, lr=2.72665e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=35978 2023-05-01 12:33:26 - progress_bar.py[line:274] - INFO: epoch 002: 2783 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7721.8, nsentences=120, sample_size=3768.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1922.7, ups=0.25, wpb=7721.8, bsz=120, num_updates=8810, lr=2.72612e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=40, gb_free=26.9, wall=36019 2023-05-01 12:34:06 - progress_bar.py[line:274] - INFO: epoch 002: 2793 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7599.8, nsentences=120, sample_size=3761.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1893.2, ups=0.25, wpb=7599.8, bsz=120, num_updates=8820, lr=2.72559e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=27.6, wall=36059 2023-05-01 12:34:47 - progress_bar.py[line:274] - INFO: epoch 002: 2803 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7603.6, nsentences=120, sample_size=3908.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1886.6, ups=0.25, wpb=7603.6, bsz=120, num_updates=8830, lr=2.72506e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=36099 2023-05-01 12:35:26 - progress_bar.py[line:274] - INFO: epoch 002: 2813 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7752.4, nsentences=120, sample_size=3887.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1947.6, ups=0.25, wpb=7752.4, bsz=120, num_updates=8840, lr=2.72454e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=36139 2023-05-01 12:36:06 - progress_bar.py[line:274] - INFO: epoch 002: 2823 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7717.2, nsentences=120, sample_size=4107.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1940.4, ups=0.25, wpb=7717.2, bsz=120, num_updates=8850, lr=2.72401e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=36179 2023-05-01 12:36:47 - progress_bar.py[line:274] - INFO: epoch 002: 2833 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7680.4, nsentences=120, sample_size=4213.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1868.4, ups=0.24, wpb=7680.4, bsz=120, num_updates=8860, lr=2.72348e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=41, gb_free=30.5, wall=36220 2023-05-01 12:37:27 - progress_bar.py[line:274] - INFO: epoch 002: 2843 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7815, nsentences=120, sample_size=4270.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1981.6, ups=0.25, wpb=7815, bsz=120, num_updates=8870, lr=2.72295e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=39, gb_free=29.9, wall=36259 2023-05-01 12:38:07 - progress_bar.py[line:274] - INFO: epoch 002: 2853 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7647.6, nsentences=120, sample_size=4328.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1883.5, ups=0.25, wpb=7647.6, bsz=120, num_updates=8880, lr=2.72242e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=41, gb_free=30.4, wall=36300 2023-05-01 12:38:48 - progress_bar.py[line:274] - INFO: epoch 002: 2863 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7868.3, nsentences=120, sample_size=4016.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1924.9, ups=0.24, wpb=7868.3, bsz=120, num_updates=8890, lr=2.72189e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=41, gb_free=26.6, wall=36341 2023-05-01 12:39:27 - progress_bar.py[line:274] - INFO: epoch 002: 2873 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7526, nsentences=120, sample_size=4108.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1939.1, ups=0.26, wpb=7526, bsz=120, num_updates=8900, lr=2.72137e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=36379 2023-05-01 12:40:07 - progress_bar.py[line:274] - INFO: epoch 002: 2883 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7859.8, nsentences=120, sample_size=4161, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1973.5, ups=0.25, wpb=7859.8, bsz=120, num_updates=8910, lr=2.72084e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=28, wall=36419 2023-05-01 12:40:46 - progress_bar.py[line:274] - INFO: epoch 002: 2893 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7491.8, nsentences=120, sample_size=4067, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1911.6, ups=0.26, wpb=7491.8, bsz=120, num_updates=8920, lr=2.72031e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=39, gb_free=29.2, wall=36458 2023-05-01 12:41:26 - progress_bar.py[line:274] - INFO: epoch 002: 2903 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7628, nsentences=120, sample_size=4112.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1912.1, ups=0.25, wpb=7628, bsz=120, num_updates=8930, lr=2.71978e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=36498 2023-05-01 12:42:05 - progress_bar.py[line:274] - INFO: epoch 002: 2913 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7756.8, nsentences=120, sample_size=4290.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1964.5, ups=0.25, wpb=7756.8, bsz=120, num_updates=8940, lr=2.71925e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=36538 2023-05-01 12:42:45 - progress_bar.py[line:274] - INFO: epoch 002: 2923 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7496.8, nsentences=120, sample_size=3819.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1884.6, ups=0.25, wpb=7496.8, bsz=120, num_updates=8950, lr=2.71873e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=36578 2023-05-01 12:43:24 - progress_bar.py[line:274] - INFO: epoch 002: 2933 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7396.3, nsentences=120, sample_size=4561.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1890.7, ups=0.26, wpb=7396.3, bsz=120, num_updates=8960, lr=2.7182e-05, gnorm=0.89, clip=0, loss_scale=64, train_wall=39, gb_free=29.2, wall=36617 2023-05-01 12:44:04 - progress_bar.py[line:274] - INFO: epoch 002: 2943 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7544.5, nsentences=120, sample_size=4243.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1901.5, ups=0.25, wpb=7544.5, bsz=120, num_updates=8970, lr=2.71767e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=36656 2023-05-01 12:44:44 - progress_bar.py[line:274] - INFO: epoch 002: 2953 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.271, ntokens=7986.1, nsentences=120, sample_size=3970.4, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1999.1, ups=0.25, wpb=7986.1, bsz=120, num_updates=8980, lr=2.71714e-05, gnorm=1.012, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=36696 2023-05-01 12:45:24 - progress_bar.py[line:274] - INFO: epoch 002: 2963 / 6042 loss=2.519, loss_v1=0, loss_v2=0, nll_loss=1.278, ntokens=7593.5, nsentences=120, sample_size=4047.9, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1882.6, ups=0.25, wpb=7593.5, bsz=120, num_updates=8990, lr=2.71661e-05, gnorm=0.954, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=36737 2023-05-01 12:46:03 - progress_bar.py[line:274] - INFO: epoch 002: 2973 / 6042 loss=2.55, loss_v1=0, loss_v2=0, nll_loss=1.317, ntokens=7694.4, nsentences=120, sample_size=4057.1, sample_size_v1=0, sample_size_v2=0, ppl=2.49, wps=1963.5, ups=0.26, wpb=7694.4, bsz=120, num_updates=9000, lr=2.71608e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=36776 2023-05-01 12:46:03 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 12:46:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 12:46:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 12:46:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 12:46:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:46 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 12:46:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:50 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 12:46:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 12:46:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 12:46:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 12:46:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 12:46:55 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.198 | loss_v1 0 | loss_v2 0 | nll_loss 2.032 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.09 | score 0.7402 | wps 3282.8 | wpb 3202.1 | bsz 39.4 | num_updates 9000 | best_score 0.751 2023-05-01 12:46:55 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 9000 updates 2023-05-01 12:46:55 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_9000.pt 2023-05-01 12:47:19 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_9000.pt 2023-05-01 12:47:32 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_9000.pt (epoch 2 @ 9000 updates, score 0.7402) (writing took 37.295877914875746 seconds) 2023-05-01 12:48:11 - progress_bar.py[line:274] - INFO: epoch 002: 2983 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7591.9, nsentences=120, sample_size=3781.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=594.2, ups=0.08, wpb=7591.9, bsz=120, num_updates=9010, lr=2.71556e-05, gnorm=0.951, clip=40, loss_scale=64, train_wall=39, gb_free=28.3, wall=36904 2023-05-01 12:48:52 - progress_bar.py[line:274] - INFO: epoch 002: 2993 / 6042 loss=2.493, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7640.8, nsentences=120, sample_size=4163.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1886, ups=0.25, wpb=7640.8, bsz=120, num_updates=9020, lr=2.71503e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=36944 2023-05-01 12:49:32 - progress_bar.py[line:274] - INFO: epoch 002: 3003 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7687, nsentences=120, sample_size=4242.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1913.1, ups=0.25, wpb=7687, bsz=120, num_updates=9030, lr=2.7145e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=28.2, wall=36984 2023-05-01 12:50:12 - progress_bar.py[line:274] - INFO: epoch 002: 3013 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7636.6, nsentences=120, sample_size=4169, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1906, ups=0.25, wpb=7636.6, bsz=120, num_updates=9040, lr=2.71397e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=37024 2023-05-01 12:50:52 - progress_bar.py[line:274] - INFO: epoch 002: 3023 / 6042 loss=2.517, loss_v1=0, loss_v2=0, nll_loss=1.28, ntokens=7502.3, nsentences=120, sample_size=4146.9, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1893.4, ups=0.25, wpb=7502.3, bsz=120, num_updates=9050, lr=2.71344e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=37064 2023-05-01 12:51:31 - progress_bar.py[line:274] - INFO: epoch 002: 3033 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7834.8, nsentences=120, sample_size=3907.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1976.2, ups=0.25, wpb=7834.8, bsz=120, num_updates=9060, lr=2.71291e-05, gnorm=0.956, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=37104 2023-05-01 12:52:11 - progress_bar.py[line:274] - INFO: epoch 002: 3043 / 6042 loss=2.52, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7931.5, nsentences=120, sample_size=3626.4, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1973.9, ups=0.25, wpb=7931.5, bsz=120, num_updates=9070, lr=2.71239e-05, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=37144 2023-05-01 12:52:51 - progress_bar.py[line:274] - INFO: epoch 002: 3053 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7629, nsentences=120, sample_size=3956.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1943.3, ups=0.25, wpb=7629, bsz=120, num_updates=9080, lr=2.71186e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=37183 2023-05-01 12:53:30 - progress_bar.py[line:274] - INFO: epoch 002: 3063 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7709.4, nsentences=120, sample_size=3790.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1941.3, ups=0.25, wpb=7709.4, bsz=120, num_updates=9090, lr=2.71133e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=37223 2023-05-01 12:54:10 - progress_bar.py[line:274] - INFO: epoch 002: 3073 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7716.2, nsentences=120, sample_size=3869, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1958.4, ups=0.25, wpb=7716.2, bsz=120, num_updates=9100, lr=2.7108e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=37262 2023-05-01 12:54:48 - progress_bar.py[line:274] - INFO: epoch 002: 3083 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7487, nsentences=120, sample_size=4025.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1942.6, ups=0.26, wpb=7487, bsz=120, num_updates=9110, lr=2.71027e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=38, gb_free=30.1, wall=37301 2023-05-01 12:55:16 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 12:55:32 - progress_bar.py[line:274] - INFO: epoch 002: 3094 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7684.7, nsentences=120, sample_size=3994.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1750.6, ups=0.23, wpb=7684.7, bsz=120, num_updates=9120, lr=2.70975e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=44, gb_free=30.1, wall=37345 2023-05-01 12:56:12 - progress_bar.py[line:274] - INFO: epoch 002: 3104 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7506.7, nsentences=120, sample_size=4135.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1893.3, ups=0.25, wpb=7506.7, bsz=120, num_updates=9130, lr=2.70922e-05, gnorm=0.883, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=37384 2023-05-01 12:56:51 - progress_bar.py[line:274] - INFO: epoch 002: 3114 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7796.4, nsentences=120, sample_size=3954.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1972.2, ups=0.25, wpb=7796.4, bsz=120, num_updates=9140, lr=2.70869e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=39, gb_free=29, wall=37424 2023-05-01 12:57:32 - progress_bar.py[line:274] - INFO: epoch 002: 3124 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7627, nsentences=120, sample_size=3961.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1859.7, ups=0.24, wpb=7627, bsz=120, num_updates=9150, lr=2.70816e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=41, gb_free=26, wall=37465 2023-05-01 12:58:12 - progress_bar.py[line:274] - INFO: epoch 002: 3134 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7621.1, nsentences=120, sample_size=3899.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1922.5, ups=0.25, wpb=7621.1, bsz=120, num_updates=9160, lr=2.70763e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=37504 2023-05-01 12:58:52 - progress_bar.py[line:274] - INFO: epoch 002: 3144 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7800.7, nsentences=120, sample_size=3945.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1968.6, ups=0.25, wpb=7800.7, bsz=120, num_updates=9170, lr=2.7071e-05, gnorm=0.924, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=37544 2023-05-01 12:59:31 - progress_bar.py[line:274] - INFO: epoch 002: 3154 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7450.4, nsentences=120, sample_size=4111.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1873.8, ups=0.25, wpb=7450.4, bsz=120, num_updates=9180, lr=2.70658e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=37584 2023-05-01 13:00:11 - progress_bar.py[line:274] - INFO: epoch 002: 3164 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7668.2, nsentences=120, sample_size=4200.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1916.1, ups=0.25, wpb=7668.2, bsz=120, num_updates=9190, lr=2.70605e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=37624 2023-05-01 13:00:51 - progress_bar.py[line:274] - INFO: epoch 002: 3174 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7785.7, nsentences=120, sample_size=3911, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1973.3, ups=0.25, wpb=7785.7, bsz=120, num_updates=9200, lr=2.70552e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=37663 2023-05-01 13:01:30 - progress_bar.py[line:274] - INFO: epoch 002: 3184 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7641.6, nsentences=120, sample_size=3762.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1965.2, ups=0.26, wpb=7641.6, bsz=120, num_updates=9210, lr=2.70499e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=37702 2023-05-01 13:02:10 - progress_bar.py[line:274] - INFO: epoch 002: 3194 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7666.7, nsentences=120, sample_size=3875, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1930.2, ups=0.25, wpb=7666.7, bsz=120, num_updates=9220, lr=2.70446e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=37742 2023-05-01 13:02:49 - progress_bar.py[line:274] - INFO: epoch 002: 3204 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7893.2, nsentences=120, sample_size=3951.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1981.8, ups=0.25, wpb=7893.2, bsz=120, num_updates=9230, lr=2.70394e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=37782 2023-05-01 13:03:29 - progress_bar.py[line:274] - INFO: epoch 002: 3214 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7621.8, nsentences=120, sample_size=4254.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1920.8, ups=0.25, wpb=7621.8, bsz=120, num_updates=9240, lr=2.70341e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=37821 2023-05-01 13:04:09 - progress_bar.py[line:274] - INFO: epoch 002: 3224 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7798.6, nsentences=120, sample_size=4226.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1944.3, ups=0.25, wpb=7798.6, bsz=120, num_updates=9250, lr=2.70288e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=40, gb_free=30.9, wall=37862 2023-05-01 13:04:49 - progress_bar.py[line:274] - INFO: epoch 002: 3234 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7682, nsentences=120, sample_size=4110.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1930.5, ups=0.25, wpb=7682, bsz=120, num_updates=9260, lr=2.70235e-05, gnorm=0.907, clip=0, loss_scale=64, train_wall=40, gb_free=29.5, wall=37901 2023-05-01 13:05:29 - progress_bar.py[line:274] - INFO: epoch 002: 3244 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=8109.3, nsentences=120, sample_size=3841, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2039.5, ups=0.25, wpb=8109.3, bsz=120, num_updates=9270, lr=2.70182e-05, gnorm=0.897, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=37941 2023-05-01 13:06:08 - progress_bar.py[line:274] - INFO: epoch 002: 3254 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7643.3, nsentences=120, sample_size=3863, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1920.7, ups=0.25, wpb=7643.3, bsz=120, num_updates=9280, lr=2.70129e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=37981 2023-05-01 13:06:49 - progress_bar.py[line:274] - INFO: epoch 002: 3264 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7482.1, nsentences=120, sample_size=4344.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1838.1, ups=0.25, wpb=7482.1, bsz=120, num_updates=9290, lr=2.70077e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=41, gb_free=30.2, wall=38022 2023-05-01 13:07:30 - progress_bar.py[line:274] - INFO: epoch 002: 3274 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8042.3, nsentences=120, sample_size=4319.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1990.5, ups=0.25, wpb=8042.3, bsz=120, num_updates=9300, lr=2.70024e-05, gnorm=0.9, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=38062 2023-05-01 13:08:10 - progress_bar.py[line:274] - INFO: epoch 002: 3284 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7432.2, nsentences=120, sample_size=3998.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1834.4, ups=0.25, wpb=7432.2, bsz=120, num_updates=9310, lr=2.69971e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=27.7, wall=38103 2023-05-01 13:08:50 - progress_bar.py[line:274] - INFO: epoch 002: 3294 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7625.3, nsentences=120, sample_size=4160.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1906.8, ups=0.25, wpb=7625.3, bsz=120, num_updates=9320, lr=2.69918e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=38143 2023-05-01 13:09:31 - progress_bar.py[line:274] - INFO: epoch 002: 3304 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7749.8, nsentences=120, sample_size=4390.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1879.1, ups=0.24, wpb=7749.8, bsz=120, num_updates=9330, lr=2.69865e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=41, gb_free=30.2, wall=38184 2023-05-01 13:10:11 - progress_bar.py[line:274] - INFO: epoch 002: 3314 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7481.3, nsentences=120, sample_size=3771.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1881.6, ups=0.25, wpb=7481.3, bsz=120, num_updates=9340, lr=2.69812e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=38224 2023-05-01 13:10:51 - progress_bar.py[line:274] - INFO: epoch 002: 3324 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7548.2, nsentences=120, sample_size=4088.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1906.3, ups=0.25, wpb=7548.2, bsz=120, num_updates=9350, lr=2.6976e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=38263 2023-05-01 13:11:30 - progress_bar.py[line:274] - INFO: epoch 002: 3334 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7593.3, nsentences=120, sample_size=3861.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1930.8, ups=0.25, wpb=7593.3, bsz=120, num_updates=9360, lr=2.69707e-05, gnorm=0.96, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=38303 2023-05-01 13:12:10 - progress_bar.py[line:274] - INFO: epoch 002: 3344 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7900.2, nsentences=120, sample_size=3966.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1964.8, ups=0.25, wpb=7900.2, bsz=120, num_updates=9370, lr=2.69654e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=38343 2023-05-01 13:12:50 - progress_bar.py[line:274] - INFO: epoch 002: 3354 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7979.9, nsentences=120, sample_size=4063.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1988.2, ups=0.25, wpb=7979.9, bsz=120, num_updates=9380, lr=2.69601e-05, gnorm=0.927, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=38383 2023-05-01 13:13:30 - progress_bar.py[line:274] - INFO: epoch 002: 3364 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7635.8, nsentences=120, sample_size=4093.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1930.4, ups=0.25, wpb=7635.8, bsz=120, num_updates=9390, lr=2.69548e-05, gnorm=0.899, clip=0, loss_scale=64, train_wall=39, gb_free=26.9, wall=38422 2023-05-01 13:14:09 - progress_bar.py[line:274] - INFO: epoch 002: 3374 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7585.5, nsentences=120, sample_size=3797, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1941.5, ups=0.26, wpb=7585.5, bsz=120, num_updates=9400, lr=2.69496e-05, gnorm=0.952, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=38461 2023-05-01 13:14:49 - progress_bar.py[line:274] - INFO: epoch 002: 3384 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7641.4, nsentences=120, sample_size=4264.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1907.2, ups=0.25, wpb=7641.4, bsz=120, num_updates=9410, lr=2.69443e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=38502 2023-05-01 13:15:29 - progress_bar.py[line:274] - INFO: epoch 002: 3394 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7467.1, nsentences=120, sample_size=3934.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1888.2, ups=0.25, wpb=7467.1, bsz=120, num_updates=9420, lr=2.6939e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=38541 2023-05-01 13:16:08 - progress_bar.py[line:274] - INFO: epoch 002: 3404 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7745.1, nsentences=120, sample_size=3887.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1980.2, ups=0.26, wpb=7745.1, bsz=120, num_updates=9430, lr=2.69337e-05, gnorm=0.942, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=38580 2023-05-01 13:16:47 - progress_bar.py[line:274] - INFO: epoch 002: 3414 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7787, nsentences=120, sample_size=3938.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1990.4, ups=0.26, wpb=7787, bsz=120, num_updates=9440, lr=2.69284e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=29.5, wall=38619 2023-05-01 13:17:27 - progress_bar.py[line:274] - INFO: epoch 002: 3424 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7597.5, nsentences=120, sample_size=4272.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.4, ups=0.25, wpb=7597.5, bsz=120, num_updates=9450, lr=2.69231e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=38659 2023-05-01 13:18:06 - progress_bar.py[line:274] - INFO: epoch 002: 3434 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7838.3, nsentences=120, sample_size=3946.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1979.2, ups=0.25, wpb=7838.3, bsz=120, num_updates=9460, lr=2.69179e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=38699 2023-05-01 13:18:46 - progress_bar.py[line:274] - INFO: epoch 002: 3444 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7532.9, nsentences=120, sample_size=4079.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1919.7, ups=0.25, wpb=7532.9, bsz=120, num_updates=9470, lr=2.69126e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=38738 2023-05-01 13:19:25 - progress_bar.py[line:274] - INFO: epoch 002: 3454 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=8021, nsentences=120, sample_size=4102.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2056.8, ups=0.26, wpb=8021, bsz=120, num_updates=9480, lr=2.69073e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=38777 2023-05-01 13:20:04 - progress_bar.py[line:274] - INFO: epoch 002: 3464 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8019.4, nsentences=120, sample_size=3914, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2020.1, ups=0.25, wpb=8019.4, bsz=120, num_updates=9490, lr=2.6902e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=38817 2023-05-01 13:20:45 - progress_bar.py[line:274] - INFO: epoch 002: 3474 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7696.4, nsentences=120, sample_size=4049.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.9, ups=0.25, wpb=7696.4, bsz=120, num_updates=9500, lr=2.68967e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=38857 2023-05-01 13:21:24 - progress_bar.py[line:274] - INFO: epoch 002: 3484 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7863.4, nsentences=120, sample_size=3852.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1986.5, ups=0.25, wpb=7863.4, bsz=120, num_updates=9510, lr=2.68915e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=38897 2023-05-01 13:22:04 - progress_bar.py[line:274] - INFO: epoch 002: 3494 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7383.8, nsentences=120, sample_size=3994.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1880.9, ups=0.25, wpb=7383.8, bsz=120, num_updates=9520, lr=2.68862e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=39, gb_free=28.7, wall=38936 2023-05-01 13:22:43 - progress_bar.py[line:274] - INFO: epoch 002: 3504 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7631.8, nsentences=120, sample_size=4110, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.6, ups=0.25, wpb=7631.8, bsz=120, num_updates=9530, lr=2.68809e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=38976 2023-05-01 13:23:23 - progress_bar.py[line:274] - INFO: epoch 002: 3514 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7673.9, nsentences=120, sample_size=4223.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1941.2, ups=0.25, wpb=7673.9, bsz=120, num_updates=9540, lr=2.68756e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=39015 2023-05-01 13:24:03 - progress_bar.py[line:274] - INFO: epoch 002: 3524 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=8014.2, nsentences=120, sample_size=3942.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1986.6, ups=0.25, wpb=8014.2, bsz=120, num_updates=9550, lr=2.68703e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=39056 2023-05-01 13:24:42 - progress_bar.py[line:274] - INFO: epoch 002: 3534 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7768.8, nsentences=120, sample_size=3726.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2010.8, ups=0.26, wpb=7768.8, bsz=120, num_updates=9560, lr=2.6865e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=39094 2023-05-01 13:25:22 - progress_bar.py[line:274] - INFO: epoch 002: 3544 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7705, nsentences=120, sample_size=4181.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1909.5, ups=0.25, wpb=7705, bsz=120, num_updates=9570, lr=2.68598e-05, gnorm=0.899, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=39135 2023-05-01 13:26:02 - progress_bar.py[line:274] - INFO: epoch 002: 3554 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7956.6, nsentences=120, sample_size=4011.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1984.6, ups=0.25, wpb=7956.6, bsz=120, num_updates=9580, lr=2.68545e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=39175 2023-05-01 13:26:43 - progress_bar.py[line:274] - INFO: epoch 002: 3564 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7412.3, nsentences=120, sample_size=4056.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1832.4, ups=0.25, wpb=7412.3, bsz=120, num_updates=9590, lr=2.68492e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=39215 2023-05-01 13:27:23 - progress_bar.py[line:274] - INFO: epoch 002: 3574 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7661, nsentences=120, sample_size=4135, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1923.5, ups=0.25, wpb=7661, bsz=120, num_updates=9600, lr=2.68439e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=39255 2023-05-01 13:28:02 - progress_bar.py[line:274] - INFO: epoch 002: 3584 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7714.2, nsentences=120, sample_size=4005.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1975.4, ups=0.26, wpb=7714.2, bsz=120, num_updates=9610, lr=2.68386e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=39, gb_free=29.3, wall=39294 2023-05-01 13:28:43 - progress_bar.py[line:274] - INFO: epoch 002: 3594 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7701.9, nsentences=120, sample_size=4007.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1887, ups=0.25, wpb=7701.9, bsz=120, num_updates=9620, lr=2.68333e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=41, gb_free=29.2, wall=39335 2023-05-01 13:29:22 - progress_bar.py[line:274] - INFO: epoch 002: 3604 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7622.1, nsentences=120, sample_size=4047.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1916.8, ups=0.25, wpb=7622.1, bsz=120, num_updates=9630, lr=2.68281e-05, gnorm=0.895, clip=0, loss_scale=128, train_wall=40, gb_free=30.3, wall=39375 2023-05-01 13:29:51 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 13:30:07 - progress_bar.py[line:274] - INFO: epoch 002: 3615 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7847.9, nsentences=120, sample_size=4094.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1768.3, ups=0.23, wpb=7847.9, bsz=120, num_updates=9640, lr=2.68228e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=44, gb_free=31.1, wall=39419 2023-05-01 13:30:47 - progress_bar.py[line:274] - INFO: epoch 002: 3625 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7673.8, nsentences=120, sample_size=4141.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1909, ups=0.25, wpb=7673.8, bsz=120, num_updates=9650, lr=2.68175e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=39459 2023-05-01 13:31:27 - progress_bar.py[line:274] - INFO: epoch 002: 3635 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7570.8, nsentences=120, sample_size=4030.7, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1898.7, ups=0.25, wpb=7570.8, bsz=120, num_updates=9660, lr=2.68122e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=39499 2023-05-01 13:32:06 - progress_bar.py[line:274] - INFO: epoch 002: 3645 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7648.9, nsentences=120, sample_size=3754.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1928.4, ups=0.25, wpb=7648.9, bsz=120, num_updates=9670, lr=2.68069e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=39539 2023-05-01 13:32:46 - progress_bar.py[line:274] - INFO: epoch 002: 3655 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7565.7, nsentences=120, sample_size=4216.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1889.7, ups=0.25, wpb=7565.7, bsz=120, num_updates=9680, lr=2.68017e-05, gnorm=0.89, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=39579 2023-05-01 13:33:27 - progress_bar.py[line:274] - INFO: epoch 002: 3665 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=8140.5, nsentences=120, sample_size=4019.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2012.5, ups=0.25, wpb=8140.5, bsz=120, num_updates=9690, lr=2.67964e-05, gnorm=0.902, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=39619 2023-05-01 13:34:08 - progress_bar.py[line:274] - INFO: epoch 002: 3675 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7652, nsentences=120, sample_size=3703.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1878.2, ups=0.25, wpb=7652, bsz=120, num_updates=9700, lr=2.67911e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=41, gb_free=30.3, wall=39660 2023-05-01 13:34:47 - progress_bar.py[line:274] - INFO: epoch 002: 3685 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7524.4, nsentences=120, sample_size=3854.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1900.7, ups=0.25, wpb=7524.4, bsz=120, num_updates=9710, lr=2.67858e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=39700 2023-05-01 13:35:26 - progress_bar.py[line:274] - INFO: epoch 002: 3695 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7356.5, nsentences=120, sample_size=3895.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1895.4, ups=0.26, wpb=7356.5, bsz=120, num_updates=9720, lr=2.67805e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=39739 2023-05-01 13:36:06 - progress_bar.py[line:274] - INFO: epoch 002: 3705 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7652.4, nsentences=120, sample_size=4164.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1940.2, ups=0.25, wpb=7652.4, bsz=120, num_updates=9730, lr=2.67752e-05, gnorm=0.899, clip=0, loss_scale=64, train_wall=39, gb_free=31.2, wall=39778 2023-05-01 13:36:45 - progress_bar.py[line:274] - INFO: epoch 002: 3715 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7721.5, nsentences=120, sample_size=4126.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1953.3, ups=0.25, wpb=7721.5, bsz=120, num_updates=9740, lr=2.677e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=39818 2023-05-01 13:37:25 - progress_bar.py[line:274] - INFO: epoch 002: 3725 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7860, nsentences=120, sample_size=3851.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1946.6, ups=0.25, wpb=7860, bsz=120, num_updates=9750, lr=2.67647e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=39858 2023-05-01 13:38:05 - progress_bar.py[line:274] - INFO: epoch 002: 3735 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7864.2, nsentences=120, sample_size=4057.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1991.2, ups=0.25, wpb=7864.2, bsz=120, num_updates=9760, lr=2.67594e-05, gnorm=0.888, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=39897 2023-05-01 13:38:44 - progress_bar.py[line:274] - INFO: epoch 002: 3745 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7718.3, nsentences=120, sample_size=3881.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1967.2, ups=0.25, wpb=7718.3, bsz=120, num_updates=9770, lr=2.67541e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=39, gb_free=26.6, wall=39937 2023-05-01 13:39:24 - progress_bar.py[line:274] - INFO: epoch 002: 3755 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7735.8, nsentences=120, sample_size=3917.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1961.7, ups=0.25, wpb=7735.8, bsz=120, num_updates=9780, lr=2.67488e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=39, gb_free=29, wall=39976 2023-05-01 13:40:03 - progress_bar.py[line:274] - INFO: epoch 002: 3765 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7853.8, nsentences=120, sample_size=4203.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1974.7, ups=0.25, wpb=7853.8, bsz=120, num_updates=9790, lr=2.67436e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=29.5, wall=40016 2023-05-01 13:40:42 - progress_bar.py[line:274] - INFO: epoch 002: 3775 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7755.6, nsentences=120, sample_size=3994.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1994, ups=0.26, wpb=7755.6, bsz=120, num_updates=9800, lr=2.67383e-05, gnorm=0.926, clip=20, loss_scale=64, train_wall=39, gb_free=31.2, wall=40055 2023-05-01 13:41:22 - progress_bar.py[line:274] - INFO: epoch 002: 3785 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7576.7, nsentences=120, sample_size=3880.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1898.7, ups=0.25, wpb=7576.7, bsz=120, num_updates=9810, lr=2.6733e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=40095 2023-05-01 13:42:02 - progress_bar.py[line:274] - INFO: epoch 002: 3795 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7780.3, nsentences=120, sample_size=3981.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1965.8, ups=0.25, wpb=7780.3, bsz=120, num_updates=9820, lr=2.67277e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=40134 2023-05-01 13:42:43 - progress_bar.py[line:274] - INFO: epoch 002: 3805 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7950.6, nsentences=120, sample_size=3798.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1948.8, ups=0.25, wpb=7950.6, bsz=120, num_updates=9830, lr=2.67224e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=41, gb_free=28.2, wall=40175 2023-05-01 13:43:23 - progress_bar.py[line:274] - INFO: epoch 002: 3815 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7867.3, nsentences=120, sample_size=3846.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1946.8, ups=0.25, wpb=7867.3, bsz=120, num_updates=9840, lr=2.67171e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=40215 2023-05-01 13:44:04 - progress_bar.py[line:274] - INFO: epoch 002: 3825 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7655.2, nsentences=120, sample_size=4110.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1883.4, ups=0.25, wpb=7655.2, bsz=120, num_updates=9850, lr=2.67119e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=41, gb_free=27.4, wall=40256 2023-05-01 13:44:43 - progress_bar.py[line:274] - INFO: epoch 002: 3835 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7819.4, nsentences=120, sample_size=3996.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1963.9, ups=0.25, wpb=7819.4, bsz=120, num_updates=9860, lr=2.67066e-05, gnorm=0.904, clip=0, loss_scale=64, train_wall=40, gb_free=27.8, wall=40296 2023-05-01 13:45:23 - progress_bar.py[line:274] - INFO: epoch 002: 3845 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7527.8, nsentences=120, sample_size=3834.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1907.1, ups=0.25, wpb=7527.8, bsz=120, num_updates=9870, lr=2.67013e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=40335 2023-05-01 13:46:02 - progress_bar.py[line:274] - INFO: epoch 002: 3855 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7713.7, nsentences=120, sample_size=4044.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1956.4, ups=0.25, wpb=7713.7, bsz=120, num_updates=9880, lr=2.6696e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=40375 2023-05-01 13:46:42 - progress_bar.py[line:274] - INFO: epoch 002: 3865 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7608.7, nsentences=120, sample_size=4161.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1900, ups=0.25, wpb=7608.7, bsz=120, num_updates=9890, lr=2.66907e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=40415 2023-05-01 13:47:22 - progress_bar.py[line:274] - INFO: epoch 002: 3875 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7770.5, nsentences=120, sample_size=4308.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1955.1, ups=0.25, wpb=7770.5, bsz=120, num_updates=9900, lr=2.66854e-05, gnorm=0.893, clip=0, loss_scale=64, train_wall=40, gb_free=27.5, wall=40455 2023-05-01 13:48:02 - progress_bar.py[line:274] - INFO: epoch 002: 3885 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7534.8, nsentences=120, sample_size=3902.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1868.6, ups=0.25, wpb=7534.8, bsz=120, num_updates=9910, lr=2.66802e-05, gnorm=0.933, clip=0, loss_scale=64, train_wall=40, gb_free=28.1, wall=40495 2023-05-01 13:48:42 - progress_bar.py[line:274] - INFO: epoch 002: 3895 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=8002.8, nsentences=120, sample_size=3721.1, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2008.2, ups=0.25, wpb=8002.8, bsz=120, num_updates=9920, lr=2.66749e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=40535 2023-05-01 13:49:22 - progress_bar.py[line:274] - INFO: epoch 002: 3905 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7691.2, nsentences=120, sample_size=3863.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1915, ups=0.25, wpb=7691.2, bsz=120, num_updates=9930, lr=2.66696e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=40575 2023-05-01 13:50:02 - progress_bar.py[line:274] - INFO: epoch 002: 3915 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7555.8, nsentences=120, sample_size=4067.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1910.9, ups=0.25, wpb=7555.8, bsz=120, num_updates=9940, lr=2.66643e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=40614 2023-05-01 13:50:42 - progress_bar.py[line:274] - INFO: epoch 002: 3925 / 6042 loss=2.509, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7733.7, nsentences=120, sample_size=4190.9, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1940.3, ups=0.25, wpb=7733.7, bsz=120, num_updates=9950, lr=2.6659e-05, gnorm=0.891, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=40654 2023-05-01 13:51:22 - progress_bar.py[line:274] - INFO: epoch 002: 3935 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7904.5, nsentences=120, sample_size=4015.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1982, ups=0.25, wpb=7904.5, bsz=120, num_updates=9960, lr=2.66538e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=40694 2023-05-01 13:52:02 - progress_bar.py[line:274] - INFO: epoch 002: 3945 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7547.2, nsentences=120, sample_size=4151.4, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1882.7, ups=0.25, wpb=7547.2, bsz=120, num_updates=9970, lr=2.66485e-05, gnorm=0.881, clip=0, loss_scale=64, train_wall=40, gb_free=28.8, wall=40734 2023-05-01 13:52:41 - progress_bar.py[line:274] - INFO: epoch 002: 3955 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7430.1, nsentences=120, sample_size=3826, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1889.1, ups=0.25, wpb=7430.1, bsz=120, num_updates=9980, lr=2.66432e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=39, gb_free=28.4, wall=40774 2023-05-01 13:53:20 - progress_bar.py[line:274] - INFO: epoch 002: 3965 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7825.9, nsentences=120, sample_size=4020.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2006.5, ups=0.26, wpb=7825.9, bsz=120, num_updates=9990, lr=2.66379e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=39, gb_free=30.8, wall=40813 2023-05-01 13:54:00 - progress_bar.py[line:274] - INFO: epoch 002: 3975 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7506, nsentences=120, sample_size=3977.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1891.1, ups=0.25, wpb=7506, bsz=120, num_updates=10000, lr=2.66326e-05, gnorm=0.878, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=40852 2023-05-01 13:54:00 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 13:54:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 13:54:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 13:54:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 13:54:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 13:54:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:46 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 13:54:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 13:54:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 13:54:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 13:54:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 13:54:51 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.193 | loss_v1 0 | loss_v2 0 | nll_loss 2.023 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.06 | score 0.7461 | wps 3286 | wpb 3202.1 | bsz 39.4 | num_updates 10000 | best_score 0.751 2023-05-01 13:54:51 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 10000 updates 2023-05-01 13:54:51 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_10000.pt 2023-05-01 13:55:16 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_10000.pt 2023-05-01 13:55:30 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_10000.pt (epoch 2 @ 10000 updates, score 0.7461) (writing took 38.34923820500262 seconds) 2023-05-01 13:56:10 - progress_bar.py[line:274] - INFO: epoch 002: 3985 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7977, nsentences=120, sample_size=3829.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=613.8, ups=0.08, wpb=7977, bsz=120, num_updates=10010, lr=2.66273e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=28.9, wall=40982 2023-05-01 13:56:51 - progress_bar.py[line:274] - INFO: epoch 002: 3995 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7669.8, nsentences=120, sample_size=4122.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1882.8, ups=0.25, wpb=7669.8, bsz=120, num_updates=10020, lr=2.66221e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=41, gb_free=30.9, wall=41023 2023-05-01 13:57:30 - progress_bar.py[line:274] - INFO: epoch 002: 4005 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7993.8, nsentences=120, sample_size=4097, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2009.4, ups=0.25, wpb=7993.8, bsz=120, num_updates=10030, lr=2.66168e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=26.9, wall=41063 2023-05-01 13:58:10 - progress_bar.py[line:274] - INFO: epoch 002: 4015 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7599.3, nsentences=120, sample_size=3872, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1902.3, ups=0.25, wpb=7599.3, bsz=120, num_updates=10040, lr=2.66115e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=41103 2023-05-01 13:58:49 - progress_bar.py[line:274] - INFO: epoch 002: 4025 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7503.2, nsentences=120, sample_size=4264.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1921.1, ups=0.26, wpb=7503.2, bsz=120, num_updates=10050, lr=2.66062e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=41142 2023-05-01 13:59:29 - progress_bar.py[line:274] - INFO: epoch 002: 4035 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7803.2, nsentences=120, sample_size=4004.7, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1952.4, ups=0.25, wpb=7803.2, bsz=120, num_updates=10060, lr=2.66009e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=41182 2023-05-01 14:00:09 - progress_bar.py[line:274] - INFO: epoch 002: 4045 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7759.7, nsentences=120, sample_size=4345.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1938.1, ups=0.25, wpb=7759.7, bsz=120, num_updates=10070, lr=2.65957e-05, gnorm=0.871, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=41222 2023-05-01 14:00:50 - progress_bar.py[line:274] - INFO: epoch 002: 4055 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7781.9, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1920.3, ups=0.25, wpb=7781.9, bsz=120, num_updates=10080, lr=2.65904e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=41262 2023-05-01 14:01:30 - progress_bar.py[line:274] - INFO: epoch 002: 4065 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7340.5, nsentences=120, sample_size=4194.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1850.8, ups=0.25, wpb=7340.5, bsz=120, num_updates=10090, lr=2.65851e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=41302 2023-05-01 14:02:10 - progress_bar.py[line:274] - INFO: epoch 002: 4075 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7992.7, nsentences=120, sample_size=4074.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1981.6, ups=0.25, wpb=7992.7, bsz=120, num_updates=10100, lr=2.65798e-05, gnorm=0.895, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=41342 2023-05-01 14:02:50 - progress_bar.py[line:274] - INFO: epoch 002: 4085 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7783.9, nsentences=120, sample_size=3667.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1951.1, ups=0.25, wpb=7783.9, bsz=120, num_updates=10110, lr=2.65745e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=41382 2023-05-01 14:03:29 - progress_bar.py[line:274] - INFO: epoch 002: 4095 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7643.5, nsentences=120, sample_size=4342.9, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1933.3, ups=0.25, wpb=7643.5, bsz=120, num_updates=10120, lr=2.65692e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=39, gb_free=29.9, wall=41422 2023-05-01 14:04:09 - progress_bar.py[line:274] - INFO: epoch 002: 4105 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7569.1, nsentences=120, sample_size=4100.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1896.4, ups=0.25, wpb=7569.1, bsz=120, num_updates=10130, lr=2.6564e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=40, gb_free=31.6, wall=41462 2023-05-01 14:04:49 - progress_bar.py[line:274] - INFO: epoch 002: 4115 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7736.7, nsentences=120, sample_size=4134.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1970.2, ups=0.25, wpb=7736.7, bsz=120, num_updates=10140, lr=2.65587e-05, gnorm=0.896, clip=0, loss_scale=64, train_wall=39, gb_free=30.4, wall=41501 2023-05-01 14:05:29 - progress_bar.py[line:274] - INFO: epoch 002: 4125 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7902.4, nsentences=120, sample_size=4054.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1959.2, ups=0.25, wpb=7902.4, bsz=120, num_updates=10150, lr=2.65534e-05, gnorm=0.912, clip=10, loss_scale=128, train_wall=40, gb_free=29.6, wall=41541 2023-05-01 14:05:37 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 14:06:12 - progress_bar.py[line:274] - INFO: epoch 002: 4136 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7490, nsentences=120, sample_size=4237.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1734.8, ups=0.23, wpb=7490, bsz=120, num_updates=10160, lr=2.65481e-05, gnorm=0.887, clip=10, loss_scale=64, train_wall=43, gb_free=30.6, wall=41584 2023-05-01 14:06:52 - progress_bar.py[line:274] - INFO: epoch 002: 4146 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7780.7, nsentences=120, sample_size=4137.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1939.8, ups=0.25, wpb=7780.7, bsz=120, num_updates=10170, lr=2.65428e-05, gnorm=0.877, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=41625 2023-05-01 14:07:32 - progress_bar.py[line:274] - INFO: epoch 002: 4156 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7895.8, nsentences=120, sample_size=4143.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1978.3, ups=0.25, wpb=7895.8, bsz=120, num_updates=10180, lr=2.65375e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=41665 2023-05-01 14:08:12 - progress_bar.py[line:274] - INFO: epoch 002: 4166 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7445.8, nsentences=120, sample_size=4217.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1847.8, ups=0.25, wpb=7445.8, bsz=120, num_updates=10190, lr=2.65323e-05, gnorm=0.887, clip=0, loss_scale=64, train_wall=40, gb_free=31.1, wall=41705 2023-05-01 14:08:51 - progress_bar.py[line:274] - INFO: epoch 002: 4176 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7692.9, nsentences=120, sample_size=4349.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1973.5, ups=0.26, wpb=7692.9, bsz=120, num_updates=10200, lr=2.6527e-05, gnorm=0.896, clip=0, loss_scale=64, train_wall=39, gb_free=29.6, wall=41744 2023-05-01 14:09:31 - progress_bar.py[line:274] - INFO: epoch 002: 4186 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7802.8, nsentences=120, sample_size=4532.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1962.1, ups=0.25, wpb=7802.8, bsz=120, num_updates=10210, lr=2.65217e-05, gnorm=0.869, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=41784 2023-05-01 14:10:11 - progress_bar.py[line:274] - INFO: epoch 002: 4196 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7465.6, nsentences=120, sample_size=4125.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1870, ups=0.25, wpb=7465.6, bsz=120, num_updates=10220, lr=2.65164e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=41823 2023-05-01 14:10:51 - progress_bar.py[line:274] - INFO: epoch 002: 4206 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7858.9, nsentences=120, sample_size=4111, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1959.1, ups=0.25, wpb=7858.9, bsz=120, num_updates=10230, lr=2.65111e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=41864 2023-05-01 14:11:31 - progress_bar.py[line:274] - INFO: epoch 002: 4216 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7702.8, nsentences=120, sample_size=4056.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1911.8, ups=0.25, wpb=7702.8, bsz=120, num_updates=10240, lr=2.65059e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=28.1, wall=41904 2023-05-01 14:12:11 - progress_bar.py[line:274] - INFO: epoch 002: 4226 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7722.9, nsentences=120, sample_size=3916.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1960.7, ups=0.25, wpb=7722.9, bsz=120, num_updates=10250, lr=2.65006e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=41943 2023-05-01 14:12:51 - progress_bar.py[line:274] - INFO: epoch 002: 4236 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7741.1, nsentences=120, sample_size=3708.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1934.4, ups=0.25, wpb=7741.1, bsz=120, num_updates=10260, lr=2.64953e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=41983 2023-05-01 14:13:30 - progress_bar.py[line:274] - INFO: epoch 002: 4246 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7574.8, nsentences=120, sample_size=4070.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1940.4, ups=0.26, wpb=7574.8, bsz=120, num_updates=10270, lr=2.649e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=31.3, wall=42022 2023-05-01 14:14:09 - progress_bar.py[line:274] - INFO: epoch 002: 4256 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7578.3, nsentences=120, sample_size=4171.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1925.5, ups=0.25, wpb=7578.3, bsz=120, num_updates=10280, lr=2.64847e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=42062 2023-05-01 14:14:49 - progress_bar.py[line:274] - INFO: epoch 002: 4266 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7645.6, nsentences=120, sample_size=3815.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1927.1, ups=0.25, wpb=7645.6, bsz=120, num_updates=10290, lr=2.64794e-05, gnorm=0.929, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=42101 2023-05-01 14:15:29 - progress_bar.py[line:274] - INFO: epoch 002: 4276 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7932.9, nsentences=120, sample_size=4042.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1966.3, ups=0.25, wpb=7932.9, bsz=120, num_updates=10300, lr=2.64742e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=42142 2023-05-01 14:16:09 - progress_bar.py[line:274] - INFO: epoch 002: 4286 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7569.2, nsentences=120, sample_size=3882.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1890.9, ups=0.25, wpb=7569.2, bsz=120, num_updates=10310, lr=2.64689e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=42182 2023-05-01 14:16:49 - progress_bar.py[line:274] - INFO: epoch 002: 4296 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7803.2, nsentences=120, sample_size=4029, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1960.8, ups=0.25, wpb=7803.2, bsz=120, num_updates=10320, lr=2.64636e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=42222 2023-05-01 14:17:29 - progress_bar.py[line:274] - INFO: epoch 002: 4306 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7527.1, nsentences=120, sample_size=4131.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1908.6, ups=0.25, wpb=7527.1, bsz=120, num_updates=10330, lr=2.64583e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=42261 2023-05-01 14:18:09 - progress_bar.py[line:274] - INFO: epoch 002: 4316 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7598.5, nsentences=120, sample_size=4083.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1889.5, ups=0.25, wpb=7598.5, bsz=120, num_updates=10340, lr=2.6453e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=23.6, wall=42301 2023-05-01 14:18:48 - progress_bar.py[line:274] - INFO: epoch 002: 4326 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7787.3, nsentences=120, sample_size=3819.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1960.1, ups=0.25, wpb=7787.3, bsz=120, num_updates=10350, lr=2.64478e-05, gnorm=0.941, clip=0, loss_scale=64, train_wall=40, gb_free=27.2, wall=42341 2023-05-01 14:19:28 - progress_bar.py[line:274] - INFO: epoch 002: 4336 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7930.1, nsentences=120, sample_size=3781.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2000.6, ups=0.25, wpb=7930.1, bsz=120, num_updates=10360, lr=2.64425e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=42381 2023-05-01 14:20:09 - progress_bar.py[line:274] - INFO: epoch 002: 4346 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7835.1, nsentences=120, sample_size=4341.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1928.1, ups=0.25, wpb=7835.1, bsz=120, num_updates=10370, lr=2.64372e-05, gnorm=0.852, clip=0, loss_scale=64, train_wall=41, gb_free=30, wall=42421 2023-05-01 14:20:48 - progress_bar.py[line:274] - INFO: epoch 002: 4356 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7643.7, nsentences=120, sample_size=3956.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1941.2, ups=0.25, wpb=7643.7, bsz=120, num_updates=10380, lr=2.64319e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=39, gb_free=29.1, wall=42461 2023-05-01 14:21:28 - progress_bar.py[line:274] - INFO: epoch 002: 4366 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7795.9, nsentences=120, sample_size=3947.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1944.9, ups=0.25, wpb=7795.9, bsz=120, num_updates=10390, lr=2.64266e-05, gnorm=0.933, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=42501 2023-05-01 14:22:08 - progress_bar.py[line:274] - INFO: epoch 002: 4376 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7630.5, nsentences=120, sample_size=3984.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1932.8, ups=0.25, wpb=7630.5, bsz=120, num_updates=10400, lr=2.64213e-05, gnorm=0.944, clip=0, loss_scale=64, train_wall=39, gb_free=29.9, wall=42540 2023-05-01 14:22:47 - progress_bar.py[line:274] - INFO: epoch 002: 4386 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7492.1, nsentences=120, sample_size=4448.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1906.5, ups=0.25, wpb=7492.1, bsz=120, num_updates=10410, lr=2.64161e-05, gnorm=0.894, clip=0, loss_scale=64, train_wall=39, gb_free=31, wall=42579 2023-05-01 14:23:27 - progress_bar.py[line:274] - INFO: epoch 002: 4396 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7751.6, nsentences=120, sample_size=3960.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1920.3, ups=0.25, wpb=7751.6, bsz=120, num_updates=10420, lr=2.64108e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=40, gb_free=26.5, wall=42620 2023-05-01 14:24:07 - progress_bar.py[line:274] - INFO: epoch 002: 4406 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7837.4, nsentences=120, sample_size=4021.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1953.2, ups=0.25, wpb=7837.4, bsz=120, num_updates=10430, lr=2.64055e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29.1, wall=42660 2023-05-01 14:24:48 - progress_bar.py[line:274] - INFO: epoch 002: 4416 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7908.3, nsentences=120, sample_size=4264.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1967.3, ups=0.25, wpb=7908.3, bsz=120, num_updates=10440, lr=2.64002e-05, gnorm=0.93, clip=30, loss_scale=64, train_wall=40, gb_free=27.9, wall=42700 2023-05-01 14:25:27 - progress_bar.py[line:274] - INFO: epoch 002: 4426 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7811.3, nsentences=120, sample_size=3860.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1978.5, ups=0.25, wpb=7811.3, bsz=120, num_updates=10450, lr=2.63949e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=42740 2023-05-01 14:26:07 - progress_bar.py[line:274] - INFO: epoch 002: 4436 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7872, nsentences=120, sample_size=3751.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1982.5, ups=0.25, wpb=7872, bsz=120, num_updates=10460, lr=2.63896e-05, gnorm=0.948, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=42779 2023-05-01 14:26:46 - progress_bar.py[line:274] - INFO: epoch 002: 4446 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7682.2, nsentences=120, sample_size=3937.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1959.3, ups=0.26, wpb=7682.2, bsz=120, num_updates=10470, lr=2.63844e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=42819 2023-05-01 14:27:26 - progress_bar.py[line:274] - INFO: epoch 002: 4456 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7717.2, nsentences=120, sample_size=4237.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1913.7, ups=0.25, wpb=7717.2, bsz=120, num_updates=10480, lr=2.63791e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=42859 2023-05-01 14:28:05 - progress_bar.py[line:274] - INFO: epoch 002: 4466 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7606, nsentences=120, sample_size=4064.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1955.3, ups=0.26, wpb=7606, bsz=120, num_updates=10490, lr=2.63738e-05, gnorm=0.919, clip=0, loss_scale=64, train_wall=39, gb_free=30.7, wall=42898 2023-05-01 14:28:45 - progress_bar.py[line:274] - INFO: epoch 002: 4476 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7956.4, nsentences=120, sample_size=3915.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2014.3, ups=0.25, wpb=7956.4, bsz=120, num_updates=10500, lr=2.63685e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=42937 2023-05-01 14:29:24 - progress_bar.py[line:274] - INFO: epoch 002: 4486 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7922.9, nsentences=120, sample_size=3649.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2006.1, ups=0.25, wpb=7922.9, bsz=120, num_updates=10510, lr=2.63632e-05, gnorm=0.986, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=42977 2023-05-01 14:30:03 - progress_bar.py[line:274] - INFO: epoch 002: 4496 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7717.8, nsentences=120, sample_size=4082.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1974.8, ups=0.26, wpb=7717.8, bsz=120, num_updates=10520, lr=2.6358e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=39, gb_free=31, wall=43016 2023-05-01 14:30:45 - progress_bar.py[line:274] - INFO: epoch 002: 4506 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7648.2, nsentences=120, sample_size=4080.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1854.7, ups=0.24, wpb=7648.2, bsz=120, num_updates=10530, lr=2.63527e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=41, gb_free=27.3, wall=43057 2023-05-01 14:31:24 - progress_bar.py[line:274] - INFO: epoch 002: 4516 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7978.7, nsentences=120, sample_size=4225.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2007.3, ups=0.25, wpb=7978.7, bsz=120, num_updates=10540, lr=2.63474e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=43097 2023-05-01 14:32:04 - progress_bar.py[line:274] - INFO: epoch 002: 4526 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7457, nsentences=120, sample_size=4021.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1893.9, ups=0.25, wpb=7457, bsz=120, num_updates=10550, lr=2.63421e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=39, gb_free=28.3, wall=43136 2023-05-01 14:32:44 - progress_bar.py[line:274] - INFO: epoch 002: 4536 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7512.8, nsentences=120, sample_size=4082, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1884.6, ups=0.25, wpb=7512.8, bsz=120, num_updates=10560, lr=2.63368e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=43176 2023-05-01 14:33:24 - progress_bar.py[line:274] - INFO: epoch 002: 4546 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7922.5, nsentences=120, sample_size=3988.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1961.8, ups=0.25, wpb=7922.5, bsz=120, num_updates=10570, lr=2.63315e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=43216 2023-05-01 14:34:05 - progress_bar.py[line:274] - INFO: epoch 002: 4556 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7909.5, nsentences=120, sample_size=3992, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1929, ups=0.24, wpb=7909.5, bsz=120, num_updates=10580, lr=2.63263e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=41, gb_free=26.1, wall=43257 2023-05-01 14:34:45 - progress_bar.py[line:274] - INFO: epoch 002: 4566 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7772, nsentences=120, sample_size=4174.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1943.1, ups=0.25, wpb=7772, bsz=120, num_updates=10590, lr=2.6321e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=43297 2023-05-01 14:35:25 - progress_bar.py[line:274] - INFO: epoch 002: 4576 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7675.6, nsentences=120, sample_size=4131.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1920.2, ups=0.25, wpb=7675.6, bsz=120, num_updates=10600, lr=2.63157e-05, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=29.3, wall=43337 2023-05-01 14:36:05 - progress_bar.py[line:274] - INFO: epoch 002: 4586 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7752.5, nsentences=120, sample_size=4013.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1942.6, ups=0.25, wpb=7752.5, bsz=120, num_updates=10610, lr=2.63104e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=43377 2023-05-01 14:36:45 - progress_bar.py[line:274] - INFO: epoch 002: 4596 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7970.7, nsentences=120, sample_size=4188.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1974.8, ups=0.25, wpb=7970.7, bsz=120, num_updates=10620, lr=2.63051e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=43418 2023-05-01 14:37:25 - progress_bar.py[line:274] - INFO: epoch 002: 4606 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7893.3, nsentences=120, sample_size=3932.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1976.5, ups=0.25, wpb=7893.3, bsz=120, num_updates=10630, lr=2.62999e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=43458 2023-05-01 14:38:05 - progress_bar.py[line:274] - INFO: epoch 002: 4616 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7940.5, nsentences=120, sample_size=4071.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2019, ups=0.25, wpb=7940.5, bsz=120, num_updates=10640, lr=2.62946e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=43497 2023-05-01 14:38:44 - progress_bar.py[line:274] - INFO: epoch 002: 4626 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7783.9, nsentences=120, sample_size=3960.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1955.1, ups=0.25, wpb=7783.9, bsz=120, num_updates=10650, lr=2.62893e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=43537 2023-05-01 14:39:24 - progress_bar.py[line:274] - INFO: epoch 002: 4636 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7626, nsentences=120, sample_size=3805, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1914.8, ups=0.25, wpb=7626, bsz=120, num_updates=10660, lr=2.6284e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=43577 2023-05-01 14:39:48 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 14:40:09 - progress_bar.py[line:274] - INFO: epoch 002: 4647 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7990, nsentences=120, sample_size=4159.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1801.3, ups=0.23, wpb=7990, bsz=120, num_updates=10670, lr=2.62787e-05, gnorm=0.887, clip=0, loss_scale=64, train_wall=44, gb_free=28.2, wall=43621 2023-05-01 14:40:47 - progress_bar.py[line:274] - INFO: epoch 002: 4657 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7358.1, nsentences=120, sample_size=4056.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1898.9, ups=0.26, wpb=7358.1, bsz=120, num_updates=10680, lr=2.62734e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=43660 2023-05-01 14:41:27 - progress_bar.py[line:274] - INFO: epoch 002: 4667 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7725.3, nsentences=120, sample_size=3780.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1950.3, ups=0.25, wpb=7725.3, bsz=120, num_updates=10690, lr=2.62682e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=43699 2023-05-01 14:42:06 - progress_bar.py[line:274] - INFO: epoch 002: 4677 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7573.3, nsentences=120, sample_size=3986.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1924.6, ups=0.25, wpb=7573.3, bsz=120, num_updates=10700, lr=2.62629e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=43739 2023-05-01 14:42:46 - progress_bar.py[line:274] - INFO: epoch 002: 4687 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7687.4, nsentences=120, sample_size=4344.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1933.6, ups=0.25, wpb=7687.4, bsz=120, num_updates=10710, lr=2.62576e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=43778 2023-05-01 14:43:26 - progress_bar.py[line:274] - INFO: epoch 002: 4697 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7755.1, nsentences=120, sample_size=4142.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1962.2, ups=0.25, wpb=7755.1, bsz=120, num_updates=10720, lr=2.62523e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=43818 2023-05-01 14:44:05 - progress_bar.py[line:274] - INFO: epoch 002: 4707 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=8011.9, nsentences=120, sample_size=3770.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2019.7, ups=0.25, wpb=8011.9, bsz=120, num_updates=10730, lr=2.6247e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=43858 2023-05-01 14:44:45 - progress_bar.py[line:274] - INFO: epoch 002: 4717 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=8019.3, nsentences=120, sample_size=4063.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1990.4, ups=0.25, wpb=8019.3, bsz=120, num_updates=10740, lr=2.62417e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=26.7, wall=43898 2023-05-01 14:45:26 - progress_bar.py[line:274] - INFO: epoch 002: 4727 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7913.2, nsentences=120, sample_size=3963, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1962.5, ups=0.25, wpb=7913.2, bsz=120, num_updates=10750, lr=2.62365e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=43938 2023-05-01 14:46:06 - progress_bar.py[line:274] - INFO: epoch 002: 4737 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7806.4, nsentences=120, sample_size=4081.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1963.5, ups=0.25, wpb=7806.4, bsz=120, num_updates=10760, lr=2.62312e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=43978 2023-05-01 14:46:45 - progress_bar.py[line:274] - INFO: epoch 002: 4747 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7696.9, nsentences=120, sample_size=3771.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1941.8, ups=0.25, wpb=7696.9, bsz=120, num_updates=10770, lr=2.62259e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=44018 2023-05-01 14:47:26 - progress_bar.py[line:274] - INFO: epoch 002: 4757 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=8001.4, nsentences=120, sample_size=4091.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1943.2, ups=0.24, wpb=8001.4, bsz=120, num_updates=10780, lr=2.62206e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=41, gb_free=29.8, wall=44059 2023-05-01 14:48:06 - progress_bar.py[line:274] - INFO: epoch 002: 4767 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7813.1, nsentences=120, sample_size=3841.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1975.4, ups=0.25, wpb=7813.1, bsz=120, num_updates=10790, lr=2.62153e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=44098 2023-05-01 14:48:45 - progress_bar.py[line:274] - INFO: epoch 002: 4777 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7717.2, nsentences=120, sample_size=3875.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1974.9, ups=0.26, wpb=7717.2, bsz=120, num_updates=10800, lr=2.62101e-05, gnorm=0.96, clip=10, loss_scale=64, train_wall=39, gb_free=28.6, wall=44137 2023-05-01 14:49:25 - progress_bar.py[line:274] - INFO: epoch 002: 4787 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7614.2, nsentences=120, sample_size=3830.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1918.1, ups=0.25, wpb=7614.2, bsz=120, num_updates=10810, lr=2.62048e-05, gnorm=0.953, clip=40, loss_scale=64, train_wall=40, gb_free=31.4, wall=44177 2023-05-01 14:50:04 - progress_bar.py[line:274] - INFO: epoch 002: 4797 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7648.1, nsentences=120, sample_size=3900, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1967.8, ups=0.26, wpb=7648.1, bsz=120, num_updates=10820, lr=2.61995e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=39, gb_free=27.9, wall=44216 2023-05-01 14:50:44 - progress_bar.py[line:274] - INFO: epoch 002: 4807 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7675.8, nsentences=120, sample_size=4279.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1907.1, ups=0.25, wpb=7675.8, bsz=120, num_updates=10830, lr=2.61942e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=44256 2023-05-01 14:51:23 - progress_bar.py[line:274] - INFO: epoch 002: 4817 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7492.5, nsentences=120, sample_size=4087.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1895.4, ups=0.25, wpb=7492.5, bsz=120, num_updates=10840, lr=2.61889e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=31, wall=44296 2023-05-01 14:52:04 - progress_bar.py[line:274] - INFO: epoch 002: 4827 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=8087.1, nsentences=120, sample_size=4007.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2003.8, ups=0.25, wpb=8087.1, bsz=120, num_updates=10850, lr=2.61836e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=44336 2023-05-01 14:52:43 - progress_bar.py[line:274] - INFO: epoch 002: 4837 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7772.5, nsentences=120, sample_size=4179.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1960.9, ups=0.25, wpb=7772.5, bsz=120, num_updates=10860, lr=2.61784e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=44376 2023-05-01 14:53:23 - progress_bar.py[line:274] - INFO: epoch 002: 4847 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=8049.6, nsentences=120, sample_size=3976.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2017.2, ups=0.25, wpb=8049.6, bsz=120, num_updates=10870, lr=2.61731e-05, gnorm=0.933, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=44416 2023-05-01 14:54:03 - progress_bar.py[line:274] - INFO: epoch 002: 4857 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7640.9, nsentences=120, sample_size=4014.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1915, ups=0.25, wpb=7640.9, bsz=120, num_updates=10880, lr=2.61678e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=44456 2023-05-01 14:54:43 - progress_bar.py[line:274] - INFO: epoch 002: 4867 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=8222.9, nsentences=120, sample_size=3683.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2056.4, ups=0.25, wpb=8222.9, bsz=120, num_updates=10890, lr=2.61625e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=44496 2023-05-01 14:55:24 - progress_bar.py[line:274] - INFO: epoch 002: 4877 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7379.6, nsentences=120, sample_size=4256.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1816, ups=0.25, wpb=7379.6, bsz=120, num_updates=10900, lr=2.61572e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=41, gb_free=30.5, wall=44536 2023-05-01 14:56:04 - progress_bar.py[line:274] - INFO: epoch 002: 4887 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7726.3, nsentences=120, sample_size=3904.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1919.2, ups=0.25, wpb=7726.3, bsz=120, num_updates=10910, lr=2.61519e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=44577 2023-05-01 14:56:44 - progress_bar.py[line:274] - INFO: epoch 002: 4897 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7629.1, nsentences=120, sample_size=3908.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1915.9, ups=0.25, wpb=7629.1, bsz=120, num_updates=10920, lr=2.61467e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=23.7, wall=44616 2023-05-01 14:57:24 - progress_bar.py[line:274] - INFO: epoch 002: 4907 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7697.3, nsentences=120, sample_size=3823.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1935.8, ups=0.25, wpb=7697.3, bsz=120, num_updates=10930, lr=2.61414e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=44656 2023-05-01 14:58:03 - progress_bar.py[line:274] - INFO: epoch 002: 4917 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7889.2, nsentences=120, sample_size=4180.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1982.6, ups=0.25, wpb=7889.2, bsz=120, num_updates=10940, lr=2.61361e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=44696 2023-05-01 14:58:43 - progress_bar.py[line:274] - INFO: epoch 002: 4927 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7825, nsentences=120, sample_size=4092.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1958.3, ups=0.25, wpb=7825, bsz=120, num_updates=10950, lr=2.61308e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=26.7, wall=44736 2023-05-01 14:59:23 - progress_bar.py[line:274] - INFO: epoch 002: 4937 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7750.2, nsentences=120, sample_size=3921.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1944.6, ups=0.25, wpb=7750.2, bsz=120, num_updates=10960, lr=2.61255e-05, gnorm=0.924, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=44776 2023-05-01 15:00:03 - progress_bar.py[line:274] - INFO: epoch 002: 4947 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7906.2, nsentences=120, sample_size=4007.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1994.5, ups=0.25, wpb=7906.2, bsz=120, num_updates=10970, lr=2.61203e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=44815 2023-05-01 15:00:42 - progress_bar.py[line:274] - INFO: epoch 002: 4957 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7903, nsentences=120, sample_size=3707.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2017, ups=0.26, wpb=7903, bsz=120, num_updates=10980, lr=2.6115e-05, gnorm=0.989, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=44855 2023-05-01 15:01:22 - progress_bar.py[line:274] - INFO: epoch 002: 4967 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7635.9, nsentences=120, sample_size=3680.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1930.8, ups=0.25, wpb=7635.9, bsz=120, num_updates=10990, lr=2.61097e-05, gnorm=0.963, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=44894 2023-05-01 15:02:02 - progress_bar.py[line:274] - INFO: epoch 002: 4977 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7787, nsentences=120, sample_size=4109.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1935, ups=0.25, wpb=7787, bsz=120, num_updates=11000, lr=2.61044e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=44934 2023-05-01 15:02:02 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 15:02:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 15:02:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 15:02:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 15:02:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 15:02:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:49 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 15:02:49 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 15:02:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 15:02:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 15:02:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 15:02:54 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.229 | loss_v1 0 | loss_v2 0 | nll_loss 2.063 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.18 | score 0.7495 | wps 3279.1 | wpb 3202.1 | bsz 39.4 | num_updates 11000 | best_score 0.751 2023-05-01 15:02:54 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 11000 updates 2023-05-01 15:02:54 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_11000.pt 2023-05-01 15:03:18 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_11000.pt 2023-05-01 15:03:44 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_11000.pt (epoch 2 @ 11000 updates, score 0.7495) (writing took 50.66567183402367 seconds) 2023-05-01 15:04:23 - progress_bar.py[line:274] - INFO: epoch 002: 4987 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7646.9, nsentences=120, sample_size=3922.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=542.1, ups=0.07, wpb=7646.9, bsz=120, num_updates=11010, lr=2.60991e-05, gnorm=0.967, clip=40, loss_scale=64, train_wall=38, gb_free=27, wall=45075 2023-05-01 15:05:03 - progress_bar.py[line:274] - INFO: epoch 002: 4997 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7999.2, nsentences=120, sample_size=3834.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2014.9, ups=0.25, wpb=7999.2, bsz=120, num_updates=11020, lr=2.60938e-05, gnorm=0.921, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=45115 2023-05-01 15:05:43 - progress_bar.py[line:274] - INFO: epoch 002: 5007 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7457.7, nsentences=120, sample_size=4044.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1868.9, ups=0.25, wpb=7457.7, bsz=120, num_updates=11030, lr=2.60886e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=28.3, wall=45155 2023-05-01 15:06:22 - progress_bar.py[line:274] - INFO: epoch 002: 5017 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7797.2, nsentences=120, sample_size=4081.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.1, ups=0.25, wpb=7797.2, bsz=120, num_updates=11040, lr=2.60833e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=45195 2023-05-01 15:07:02 - progress_bar.py[line:274] - INFO: epoch 002: 5027 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7827.3, nsentences=120, sample_size=4290.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1980.2, ups=0.25, wpb=7827.3, bsz=120, num_updates=11050, lr=2.6078e-05, gnorm=0.899, clip=0, loss_scale=64, train_wall=39, gb_free=28.7, wall=45234 2023-05-01 15:07:42 - progress_bar.py[line:274] - INFO: epoch 002: 5037 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7936.9, nsentences=120, sample_size=3744.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1962.8, ups=0.25, wpb=7936.9, bsz=120, num_updates=11060, lr=2.60727e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=45275 2023-05-01 15:08:23 - progress_bar.py[line:274] - INFO: epoch 002: 5047 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7759.4, nsentences=120, sample_size=4158.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1888.6, ups=0.24, wpb=7759.4, bsz=120, num_updates=11070, lr=2.60674e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=41, gb_free=29.6, wall=45316 2023-05-01 15:09:04 - progress_bar.py[line:274] - INFO: epoch 002: 5057 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7498.4, nsentences=120, sample_size=3903.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1847.8, ups=0.25, wpb=7498.4, bsz=120, num_updates=11080, lr=2.60622e-05, gnorm=0.952, clip=10, loss_scale=64, train_wall=41, gb_free=29, wall=45357 2023-05-01 15:09:44 - progress_bar.py[line:274] - INFO: epoch 002: 5067 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7327.7, nsentences=120, sample_size=4173, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1846.9, ups=0.25, wpb=7327.7, bsz=120, num_updates=11090, lr=2.60569e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=45396 2023-05-01 15:10:24 - progress_bar.py[line:274] - INFO: epoch 002: 5077 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7828.9, nsentences=120, sample_size=4214.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1961.9, ups=0.25, wpb=7828.9, bsz=120, num_updates=11100, lr=2.60516e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=26.2, wall=45436 2023-05-01 15:10:47 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 15:11:07 - progress_bar.py[line:274] - INFO: epoch 002: 5088 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7911.1, nsentences=120, sample_size=4028.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1805.4, ups=0.23, wpb=7911.1, bsz=120, num_updates=11110, lr=2.60463e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=44, gb_free=29.8, wall=45480 2023-05-01 15:11:47 - progress_bar.py[line:274] - INFO: epoch 002: 5098 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7847.6, nsentences=120, sample_size=3945.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1964.4, ups=0.25, wpb=7847.6, bsz=120, num_updates=11120, lr=2.6041e-05, gnorm=0.934, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=45520 2023-05-01 15:12:26 - progress_bar.py[line:274] - INFO: epoch 002: 5108 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7518, nsentences=120, sample_size=4444.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1924.7, ups=0.26, wpb=7518, bsz=120, num_updates=11130, lr=2.60357e-05, gnorm=0.905, clip=0, loss_scale=32, train_wall=39, gb_free=27.5, wall=45559 2023-05-01 15:13:06 - progress_bar.py[line:274] - INFO: epoch 002: 5118 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7838, nsentences=120, sample_size=3618.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2002.3, ups=0.26, wpb=7838, bsz=120, num_updates=11140, lr=2.60305e-05, gnorm=0.965, clip=20, loss_scale=32, train_wall=39, gb_free=29.1, wall=45598 2023-05-01 15:13:45 - progress_bar.py[line:274] - INFO: epoch 002: 5128 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7726.6, nsentences=120, sample_size=3952.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1949.7, ups=0.25, wpb=7726.6, bsz=120, num_updates=11150, lr=2.60252e-05, gnorm=0.955, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=45638 2023-05-01 15:14:25 - progress_bar.py[line:274] - INFO: epoch 002: 5138 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7791.4, nsentences=120, sample_size=4049.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1964.4, ups=0.25, wpb=7791.4, bsz=120, num_updates=11160, lr=2.60199e-05, gnorm=0.958, clip=40, loss_scale=32, train_wall=40, gb_free=31.7, wall=45677 2023-05-01 15:15:05 - progress_bar.py[line:274] - INFO: epoch 002: 5148 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7709.9, nsentences=120, sample_size=3967.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1936.2, ups=0.25, wpb=7709.9, bsz=120, num_updates=11170, lr=2.60146e-05, gnorm=0.934, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=45717 2023-05-01 15:15:44 - progress_bar.py[line:274] - INFO: epoch 002: 5158 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7862.4, nsentences=120, sample_size=3957.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1998.7, ups=0.25, wpb=7862.4, bsz=120, num_updates=11180, lr=2.60093e-05, gnorm=0.963, clip=40, loss_scale=32, train_wall=39, gb_free=30.6, wall=45757 2023-05-01 15:16:24 - progress_bar.py[line:274] - INFO: epoch 002: 5168 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7422.6, nsentences=120, sample_size=3847.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1871.7, ups=0.25, wpb=7422.6, bsz=120, num_updates=11190, lr=2.6004e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=45796 2023-05-01 15:17:04 - progress_bar.py[line:274] - INFO: epoch 002: 5178 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=8000.3, nsentences=120, sample_size=4262.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1966, ups=0.25, wpb=8000.3, bsz=120, num_updates=11200, lr=2.59988e-05, gnorm=0.91, clip=0, loss_scale=32, train_wall=41, gb_free=29, wall=45837 2023-05-01 15:17:44 - progress_bar.py[line:274] - INFO: epoch 002: 5188 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7864.1, nsentences=120, sample_size=4202.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1988, ups=0.25, wpb=7864.1, bsz=120, num_updates=11210, lr=2.59935e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=39, gb_free=30.6, wall=45876 2023-05-01 15:18:24 - progress_bar.py[line:274] - INFO: epoch 002: 5198 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7847.6, nsentences=120, sample_size=4239.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1962.2, ups=0.25, wpb=7847.6, bsz=120, num_updates=11220, lr=2.59882e-05, gnorm=0.919, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=45916 2023-05-01 15:19:04 - progress_bar.py[line:274] - INFO: epoch 002: 5208 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7601.6, nsentences=120, sample_size=4067.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1923, ups=0.25, wpb=7601.6, bsz=120, num_updates=11230, lr=2.59829e-05, gnorm=0.931, clip=10, loss_scale=32, train_wall=39, gb_free=29.5, wall=45956 2023-05-01 15:19:44 - progress_bar.py[line:274] - INFO: epoch 002: 5218 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7687.4, nsentences=120, sample_size=4110.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1888.7, ups=0.25, wpb=7687.4, bsz=120, num_updates=11240, lr=2.59776e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=41, gb_free=30.1, wall=45997 2023-05-01 15:20:24 - progress_bar.py[line:274] - INFO: epoch 002: 5228 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7714.7, nsentences=120, sample_size=3922.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1958.2, ups=0.25, wpb=7714.7, bsz=120, num_updates=11250, lr=2.59724e-05, gnorm=0.931, clip=20, loss_scale=32, train_wall=39, gb_free=29.6, wall=46036 2023-05-01 15:21:04 - progress_bar.py[line:274] - INFO: epoch 002: 5238 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7859.9, nsentences=120, sample_size=3966.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1969.9, ups=0.25, wpb=7859.9, bsz=120, num_updates=11260, lr=2.59671e-05, gnorm=0.932, clip=20, loss_scale=32, train_wall=40, gb_free=30.2, wall=46076 2023-05-01 15:21:43 - progress_bar.py[line:274] - INFO: epoch 002: 5248 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=8069.4, nsentences=120, sample_size=4263.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2033.3, ups=0.25, wpb=8069.4, bsz=120, num_updates=11270, lr=2.59618e-05, gnorm=0.915, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=46116 2023-05-01 15:22:24 - progress_bar.py[line:274] - INFO: epoch 002: 5258 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7879.5, nsentences=120, sample_size=4139.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1953, ups=0.25, wpb=7879.5, bsz=120, num_updates=11280, lr=2.59565e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=46156 2023-05-01 15:23:03 - progress_bar.py[line:274] - INFO: epoch 002: 5268 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7418.6, nsentences=120, sample_size=4034.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1859.6, ups=0.25, wpb=7418.6, bsz=120, num_updates=11290, lr=2.59512e-05, gnorm=0.918, clip=10, loss_scale=32, train_wall=40, gb_free=30.3, wall=46196 2023-05-01 15:23:43 - progress_bar.py[line:274] - INFO: epoch 002: 5278 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7381.3, nsentences=120, sample_size=4264.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1867.4, ups=0.25, wpb=7381.3, bsz=120, num_updates=11300, lr=2.59459e-05, gnorm=0.91, clip=10, loss_scale=32, train_wall=39, gb_free=28, wall=46235 2023-05-01 15:24:23 - progress_bar.py[line:274] - INFO: epoch 002: 5288 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7691.3, nsentences=120, sample_size=3939.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1905.7, ups=0.25, wpb=7691.3, bsz=120, num_updates=11310, lr=2.59407e-05, gnorm=0.956, clip=40, loss_scale=32, train_wall=40, gb_free=28.9, wall=46276 2023-05-01 15:25:03 - progress_bar.py[line:274] - INFO: epoch 002: 5298 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7430.3, nsentences=120, sample_size=3753.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1895.2, ups=0.26, wpb=7430.3, bsz=120, num_updates=11320, lr=2.59354e-05, gnorm=0.98, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=46315 2023-05-01 15:25:43 - progress_bar.py[line:274] - INFO: epoch 002: 5308 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7747.1, nsentences=120, sample_size=3978.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1925.4, ups=0.25, wpb=7747.1, bsz=120, num_updates=11330, lr=2.59301e-05, gnorm=0.964, clip=40, loss_scale=32, train_wall=40, gb_free=25.9, wall=46355 2023-05-01 15:26:23 - progress_bar.py[line:274] - INFO: epoch 002: 5318 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7907.8, nsentences=120, sample_size=3986.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988, ups=0.25, wpb=7907.8, bsz=120, num_updates=11340, lr=2.59248e-05, gnorm=0.911, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=46395 2023-05-01 15:27:02 - progress_bar.py[line:274] - INFO: epoch 002: 5328 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7945.6, nsentences=120, sample_size=3887.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2004.3, ups=0.25, wpb=7945.6, bsz=120, num_updates=11350, lr=2.59195e-05, gnorm=0.899, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=46435 2023-05-01 15:27:43 - progress_bar.py[line:274] - INFO: epoch 002: 5338 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=8015.1, nsentences=120, sample_size=4227.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1970, ups=0.25, wpb=8015.1, bsz=120, num_updates=11360, lr=2.59143e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=41, gb_free=29.3, wall=46475 2023-05-01 15:28:24 - progress_bar.py[line:274] - INFO: epoch 002: 5348 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7481.9, nsentences=120, sample_size=3858.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1825, ups=0.24, wpb=7481.9, bsz=120, num_updates=11370, lr=2.5909e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=41, gb_free=28.3, wall=46516 2023-05-01 15:29:04 - progress_bar.py[line:274] - INFO: epoch 002: 5358 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7797.9, nsentences=120, sample_size=3947.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1950.6, ups=0.25, wpb=7797.9, bsz=120, num_updates=11380, lr=2.59037e-05, gnorm=0.954, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=46556 2023-05-01 15:29:43 - progress_bar.py[line:274] - INFO: epoch 002: 5368 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7319.1, nsentences=120, sample_size=4299.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1857, ups=0.25, wpb=7319.1, bsz=120, num_updates=11390, lr=2.58984e-05, gnorm=0.913, clip=0, loss_scale=32, train_wall=39, gb_free=29.8, wall=46596 2023-05-01 15:30:23 - progress_bar.py[line:274] - INFO: epoch 002: 5378 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7781.5, nsentences=120, sample_size=3701.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1952.7, ups=0.25, wpb=7781.5, bsz=120, num_updates=11400, lr=2.58931e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=40, gb_free=28.6, wall=46636 2023-05-01 15:31:03 - progress_bar.py[line:274] - INFO: epoch 002: 5388 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7640.1, nsentences=120, sample_size=4167.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1903.2, ups=0.25, wpb=7640.1, bsz=120, num_updates=11410, lr=2.58878e-05, gnorm=0.912, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=46676 2023-05-01 15:31:43 - progress_bar.py[line:274] - INFO: epoch 002: 5398 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7602.7, nsentences=120, sample_size=4202.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1928.6, ups=0.25, wpb=7602.7, bsz=120, num_updates=11420, lr=2.58826e-05, gnorm=0.917, clip=0, loss_scale=32, train_wall=39, gb_free=28.5, wall=46715 2023-05-01 15:32:22 - progress_bar.py[line:274] - INFO: epoch 002: 5408 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7675.6, nsentences=120, sample_size=3920.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1940.2, ups=0.25, wpb=7675.6, bsz=120, num_updates=11430, lr=2.58773e-05, gnorm=0.966, clip=20, loss_scale=32, train_wall=39, gb_free=29.4, wall=46755 2023-05-01 15:33:02 - progress_bar.py[line:274] - INFO: epoch 002: 5418 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7686.7, nsentences=120, sample_size=4155.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1924.4, ups=0.25, wpb=7686.7, bsz=120, num_updates=11440, lr=2.5872e-05, gnorm=0.928, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=46795 2023-05-01 15:33:42 - progress_bar.py[line:274] - INFO: epoch 002: 5428 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7778.8, nsentences=120, sample_size=3931.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1930.9, ups=0.25, wpb=7778.8, bsz=120, num_updates=11450, lr=2.58667e-05, gnorm=0.919, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=46835 2023-05-01 15:34:23 - progress_bar.py[line:274] - INFO: epoch 002: 5438 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7593.8, nsentences=120, sample_size=4132.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1878.8, ups=0.25, wpb=7593.8, bsz=120, num_updates=11460, lr=2.58614e-05, gnorm=0.914, clip=0, loss_scale=32, train_wall=40, gb_free=28.5, wall=46875 2023-05-01 15:35:03 - progress_bar.py[line:274] - INFO: epoch 002: 5448 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7775.3, nsentences=120, sample_size=4260.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1939.6, ups=0.25, wpb=7775.3, bsz=120, num_updates=11470, lr=2.58561e-05, gnorm=0.899, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=46915 2023-05-01 15:35:43 - progress_bar.py[line:274] - INFO: epoch 002: 5458 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7877.6, nsentences=120, sample_size=4016.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1957.8, ups=0.25, wpb=7877.6, bsz=120, num_updates=11480, lr=2.58509e-05, gnorm=0.903, clip=0, loss_scale=32, train_wall=40, gb_free=30.8, wall=46956 2023-05-01 15:36:23 - progress_bar.py[line:274] - INFO: epoch 002: 5468 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7927.2, nsentences=120, sample_size=4345, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1990, ups=0.25, wpb=7927.2, bsz=120, num_updates=11490, lr=2.58456e-05, gnorm=0.91, clip=0, loss_scale=32, train_wall=40, gb_free=28.4, wall=46996 2023-05-01 15:37:03 - progress_bar.py[line:274] - INFO: epoch 002: 5478 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7589.5, nsentences=120, sample_size=4022.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1903.6, ups=0.25, wpb=7589.5, bsz=120, num_updates=11500, lr=2.58403e-05, gnorm=0.935, clip=0, loss_scale=32, train_wall=40, gb_free=30.3, wall=47035 2023-05-01 15:37:42 - progress_bar.py[line:274] - INFO: epoch 002: 5488 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7991, nsentences=120, sample_size=3855.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2039.5, ups=0.26, wpb=7991, bsz=120, num_updates=11510, lr=2.5835e-05, gnorm=0.962, clip=20, loss_scale=32, train_wall=39, gb_free=27.8, wall=47075 2023-05-01 15:38:22 - progress_bar.py[line:274] - INFO: epoch 002: 5498 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7650.6, nsentences=120, sample_size=3798.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1928.8, ups=0.25, wpb=7650.6, bsz=120, num_updates=11520, lr=2.58297e-05, gnorm=0.965, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=47114 2023-05-01 15:39:01 - progress_bar.py[line:274] - INFO: epoch 002: 5508 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7701.7, nsentences=120, sample_size=3703.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1944.2, ups=0.25, wpb=7701.7, bsz=120, num_updates=11530, lr=2.58245e-05, gnorm=0.992, clip=40, loss_scale=32, train_wall=40, gb_free=31.1, wall=47154 2023-05-01 15:39:42 - progress_bar.py[line:274] - INFO: epoch 002: 5518 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7626.6, nsentences=120, sample_size=3925.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1889.2, ups=0.25, wpb=7626.6, bsz=120, num_updates=11540, lr=2.58192e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=30.7, wall=47194 2023-05-01 15:40:21 - progress_bar.py[line:274] - INFO: epoch 002: 5528 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7601.8, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1919, ups=0.25, wpb=7601.8, bsz=120, num_updates=11550, lr=2.58139e-05, gnorm=0.922, clip=10, loss_scale=32, train_wall=40, gb_free=27.6, wall=47234 2023-05-01 15:41:01 - progress_bar.py[line:274] - INFO: epoch 002: 5538 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7729.5, nsentences=120, sample_size=4066, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1932.3, ups=0.25, wpb=7729.5, bsz=120, num_updates=11560, lr=2.58086e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=40, gb_free=28.8, wall=47274 2023-05-01 15:41:41 - progress_bar.py[line:274] - INFO: epoch 002: 5548 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7574.8, nsentences=120, sample_size=4258.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1901.4, ups=0.25, wpb=7574.8, bsz=120, num_updates=11570, lr=2.58033e-05, gnorm=0.91, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=47314 2023-05-01 15:42:21 - progress_bar.py[line:274] - INFO: epoch 002: 5558 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7978.4, nsentences=120, sample_size=3971, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2010.1, ups=0.25, wpb=7978.4, bsz=120, num_updates=11580, lr=2.5798e-05, gnorm=0.926, clip=20, loss_scale=32, train_wall=40, gb_free=29.2, wall=47353 2023-05-01 15:43:02 - progress_bar.py[line:274] - INFO: epoch 002: 5568 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7897.8, nsentences=120, sample_size=4386.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1922.1, ups=0.24, wpb=7897.8, bsz=120, num_updates=11590, lr=2.57928e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=41, gb_free=29.7, wall=47394 2023-05-01 15:43:42 - progress_bar.py[line:274] - INFO: epoch 002: 5578 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7549.6, nsentences=120, sample_size=4201, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1873.8, ups=0.25, wpb=7549.6, bsz=120, num_updates=11600, lr=2.57875e-05, gnorm=0.94, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=47435 2023-05-01 15:44:22 - progress_bar.py[line:274] - INFO: epoch 002: 5588 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7984, nsentences=120, sample_size=3966.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1995.3, ups=0.25, wpb=7984, bsz=120, num_updates=11610, lr=2.57822e-05, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=28.6, wall=47475 2023-05-01 15:45:01 - progress_bar.py[line:274] - INFO: epoch 002: 5598 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7466.8, nsentences=119.2, sample_size=3859.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1909.3, ups=0.26, wpb=7466.8, bsz=119.2, num_updates=11620, lr=2.57769e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=39, gb_free=29.3, wall=47514 2023-05-01 15:45:13 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 15:45:46 - progress_bar.py[line:274] - INFO: epoch 002: 5609 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7417.3, nsentences=120, sample_size=4151.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1674.4, ups=0.23, wpb=7417.3, bsz=120, num_updates=11630, lr=2.57716e-05, gnorm=0.957, clip=10, loss_scale=32, train_wall=44, gb_free=30.9, wall=47558 2023-05-01 15:46:25 - progress_bar.py[line:274] - INFO: epoch 002: 5619 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7539.9, nsentences=120, sample_size=3766.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1927.1, ups=0.26, wpb=7539.9, bsz=120, num_updates=11640, lr=2.57664e-05, gnorm=0.976, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=47597 2023-05-01 15:47:05 - progress_bar.py[line:274] - INFO: epoch 002: 5629 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7835.8, nsentences=120, sample_size=4021, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1953.1, ups=0.25, wpb=7835.8, bsz=120, num_updates=11650, lr=2.57611e-05, gnorm=0.965, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=47637 2023-05-01 15:47:45 - progress_bar.py[line:274] - INFO: epoch 002: 5639 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7631.4, nsentences=120, sample_size=4292.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1900.4, ups=0.25, wpb=7631.4, bsz=120, num_updates=11660, lr=2.57558e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=47678 2023-05-01 15:48:25 - progress_bar.py[line:274] - INFO: epoch 002: 5649 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7490.5, nsentences=120, sample_size=3785, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1891.7, ups=0.25, wpb=7490.5, bsz=120, num_updates=11670, lr=2.57505e-05, gnorm=0.953, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=47717 2023-05-01 15:49:04 - progress_bar.py[line:274] - INFO: epoch 002: 5659 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7766.6, nsentences=120, sample_size=4158.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1968.5, ups=0.25, wpb=7766.6, bsz=120, num_updates=11680, lr=2.57452e-05, gnorm=0.949, clip=20, loss_scale=32, train_wall=39, gb_free=30.8, wall=47757 2023-05-01 15:49:45 - progress_bar.py[line:274] - INFO: epoch 002: 5669 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7788.3, nsentences=120, sample_size=3967.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1909.8, ups=0.25, wpb=7788.3, bsz=120, num_updates=11690, lr=2.57399e-05, gnorm=0.932, clip=20, loss_scale=32, train_wall=41, gb_free=30.4, wall=47797 2023-05-01 15:50:24 - progress_bar.py[line:274] - INFO: epoch 002: 5679 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7584, nsentences=120, sample_size=4122.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1927.6, ups=0.25, wpb=7584, bsz=120, num_updates=11700, lr=2.57347e-05, gnorm=0.918, clip=10, loss_scale=32, train_wall=39, gb_free=30.3, wall=47837 2023-05-01 15:51:04 - progress_bar.py[line:274] - INFO: epoch 002: 5689 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7545.1, nsentences=120, sample_size=4340.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1879.3, ups=0.25, wpb=7545.1, bsz=120, num_updates=11710, lr=2.57294e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=47877 2023-05-01 15:51:45 - progress_bar.py[line:274] - INFO: epoch 002: 5699 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7743.3, nsentences=120, sample_size=4101.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1916.3, ups=0.25, wpb=7743.3, bsz=120, num_updates=11720, lr=2.57241e-05, gnorm=0.937, clip=20, loss_scale=32, train_wall=40, gb_free=30.8, wall=47917 2023-05-01 15:52:25 - progress_bar.py[line:274] - INFO: epoch 002: 5709 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7609.1, nsentences=120, sample_size=3856, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1892.8, ups=0.25, wpb=7609.1, bsz=120, num_updates=11730, lr=2.57188e-05, gnorm=0.938, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=47958 2023-05-01 15:53:05 - progress_bar.py[line:274] - INFO: epoch 002: 5719 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7915, nsentences=120, sample_size=3921.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1987.8, ups=0.25, wpb=7915, bsz=120, num_updates=11740, lr=2.57135e-05, gnorm=0.963, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=47997 2023-05-01 15:53:45 - progress_bar.py[line:274] - INFO: epoch 002: 5729 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7689.7, nsentences=120, sample_size=4098.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1936.3, ups=0.25, wpb=7689.7, bsz=120, num_updates=11750, lr=2.57082e-05, gnorm=0.934, clip=20, loss_scale=32, train_wall=40, gb_free=30.1, wall=48037 2023-05-01 15:54:25 - progress_bar.py[line:274] - INFO: epoch 002: 5739 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7603.3, nsentences=120, sample_size=4335.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1881.1, ups=0.25, wpb=7603.3, bsz=120, num_updates=11760, lr=2.5703e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=48077 2023-05-01 15:55:05 - progress_bar.py[line:274] - INFO: epoch 002: 5749 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7705.3, nsentences=120, sample_size=4205.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1912, ups=0.25, wpb=7705.3, bsz=120, num_updates=11770, lr=2.56977e-05, gnorm=0.9, clip=10, loss_scale=32, train_wall=40, gb_free=29.2, wall=48118 2023-05-01 15:55:45 - progress_bar.py[line:274] - INFO: epoch 002: 5759 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7694.5, nsentences=120, sample_size=4205.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1941.9, ups=0.25, wpb=7694.5, bsz=120, num_updates=11780, lr=2.56924e-05, gnorm=0.934, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=48157 2023-05-01 15:56:24 - progress_bar.py[line:274] - INFO: epoch 002: 5769 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7784.3, nsentences=120, sample_size=3810.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1983.3, ups=0.25, wpb=7784.3, bsz=120, num_updates=11790, lr=2.56871e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=39, gb_free=29.4, wall=48197 2023-05-01 15:57:03 - progress_bar.py[line:274] - INFO: epoch 002: 5779 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7863.7, nsentences=120, sample_size=3945, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2006.1, ups=0.26, wpb=7863.7, bsz=120, num_updates=11800, lr=2.56818e-05, gnorm=0.959, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=48236 2023-05-01 15:57:44 - progress_bar.py[line:274] - INFO: epoch 002: 5789 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7775.7, nsentences=120, sample_size=4230.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1929.3, ups=0.25, wpb=7775.7, bsz=120, num_updates=11810, lr=2.56766e-05, gnorm=0.921, clip=10, loss_scale=32, train_wall=40, gb_free=27.2, wall=48276 2023-05-01 15:58:24 - progress_bar.py[line:274] - INFO: epoch 002: 5799 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7662.2, nsentences=120, sample_size=4089, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1902.1, ups=0.25, wpb=7662.2, bsz=120, num_updates=11820, lr=2.56713e-05, gnorm=0.956, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=48316 2023-05-01 15:59:03 - progress_bar.py[line:274] - INFO: epoch 002: 5809 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7676.3, nsentences=120, sample_size=4129.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1969.3, ups=0.26, wpb=7676.3, bsz=120, num_updates=11830, lr=2.5666e-05, gnorm=0.948, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=48355 2023-05-01 15:59:44 - progress_bar.py[line:274] - INFO: epoch 002: 5819 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7821.8, nsentences=120, sample_size=4247.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927, ups=0.25, wpb=7821.8, bsz=120, num_updates=11840, lr=2.56607e-05, gnorm=0.918, clip=0, loss_scale=32, train_wall=41, gb_free=28.5, wall=48396 2023-05-01 16:00:24 - progress_bar.py[line:274] - INFO: epoch 002: 5829 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7716.6, nsentences=120, sample_size=4051.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1917.8, ups=0.25, wpb=7716.6, bsz=120, num_updates=11850, lr=2.56554e-05, gnorm=0.928, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=48436 2023-05-01 16:01:04 - progress_bar.py[line:274] - INFO: epoch 002: 5839 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7853.2, nsentences=120, sample_size=4122.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1959.8, ups=0.25, wpb=7853.2, bsz=120, num_updates=11860, lr=2.56501e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=48476 2023-05-01 16:01:44 - progress_bar.py[line:274] - INFO: epoch 002: 5849 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7548.3, nsentences=120, sample_size=4065.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1902.6, ups=0.25, wpb=7548.3, bsz=120, num_updates=11870, lr=2.56449e-05, gnorm=0.944, clip=10, loss_scale=32, train_wall=40, gb_free=28.6, wall=48516 2023-05-01 16:02:24 - progress_bar.py[line:274] - INFO: epoch 002: 5859 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7782.3, nsentences=120, sample_size=3892.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1946.6, ups=0.25, wpb=7782.3, bsz=120, num_updates=11880, lr=2.56396e-05, gnorm=0.964, clip=20, loss_scale=32, train_wall=40, gb_free=26.7, wall=48556 2023-05-01 16:03:03 - progress_bar.py[line:274] - INFO: epoch 002: 5869 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7414, nsentences=120, sample_size=4143.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1867.5, ups=0.25, wpb=7414, bsz=120, num_updates=11890, lr=2.56343e-05, gnorm=0.961, clip=30, loss_scale=32, train_wall=40, gb_free=29.4, wall=48596 2023-05-01 16:03:43 - progress_bar.py[line:274] - INFO: epoch 002: 5879 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7780.8, nsentences=120, sample_size=4070.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1969.6, ups=0.25, wpb=7780.8, bsz=120, num_updates=11900, lr=2.5629e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=39, gb_free=29.3, wall=48635 2023-05-01 16:04:22 - progress_bar.py[line:274] - INFO: epoch 002: 5889 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7793, nsentences=120, sample_size=4068, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1987.7, ups=0.26, wpb=7793, bsz=120, num_updates=11910, lr=2.56237e-05, gnorm=0.942, clip=30, loss_scale=32, train_wall=39, gb_free=25.6, wall=48674 2023-05-01 16:05:02 - progress_bar.py[line:274] - INFO: epoch 002: 5899 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7712.1, nsentences=120, sample_size=3961.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1916.5, ups=0.25, wpb=7712.1, bsz=120, num_updates=11920, lr=2.56185e-05, gnorm=0.958, clip=30, loss_scale=32, train_wall=40, gb_free=29.2, wall=48715 2023-05-01 16:05:42 - progress_bar.py[line:274] - INFO: epoch 002: 5909 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7242.2, nsentences=120, sample_size=4200.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1836.5, ups=0.25, wpb=7242.2, bsz=120, num_updates=11930, lr=2.56132e-05, gnorm=0.959, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=48754 2023-05-01 16:06:22 - progress_bar.py[line:274] - INFO: epoch 002: 5919 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7724.2, nsentences=120, sample_size=4059.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1934, ups=0.25, wpb=7724.2, bsz=120, num_updates=11940, lr=2.56079e-05, gnorm=0.928, clip=0, loss_scale=32, train_wall=40, gb_free=30.6, wall=48794 2023-05-01 16:07:02 - progress_bar.py[line:274] - INFO: epoch 002: 5929 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7819.4, nsentences=120, sample_size=4288.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1934, ups=0.25, wpb=7819.4, bsz=120, num_updates=11950, lr=2.56026e-05, gnorm=0.897, clip=0, loss_scale=32, train_wall=40, gb_free=27.9, wall=48834 2023-05-01 16:07:42 - progress_bar.py[line:274] - INFO: epoch 002: 5939 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7650.5, nsentences=120, sample_size=4178.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1929, ups=0.25, wpb=7650.5, bsz=120, num_updates=11960, lr=2.55973e-05, gnorm=0.916, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=48874 2023-05-01 16:08:22 - progress_bar.py[line:274] - INFO: epoch 002: 5949 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7979.2, nsentences=120, sample_size=3699.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988.5, ups=0.25, wpb=7979.2, bsz=120, num_updates=11970, lr=2.5592e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=48914 2023-05-01 16:09:02 - progress_bar.py[line:274] - INFO: epoch 002: 5959 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7708.5, nsentences=120, sample_size=3989.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1923.8, ups=0.25, wpb=7708.5, bsz=120, num_updates=11980, lr=2.55868e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=30.5, wall=48954 2023-05-01 16:09:42 - progress_bar.py[line:274] - INFO: epoch 002: 5969 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7391, nsentences=120, sample_size=4199.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1860, ups=0.25, wpb=7391, bsz=120, num_updates=11990, lr=2.55815e-05, gnorm=0.914, clip=0, loss_scale=32, train_wall=40, gb_free=30.8, wall=48994 2023-05-01 16:10:22 - progress_bar.py[line:274] - INFO: epoch 002: 5979 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7801.7, nsentences=120, sample_size=4276.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1935.7, ups=0.25, wpb=7801.7, bsz=120, num_updates=12000, lr=2.55762e-05, gnorm=0.913, clip=0, loss_scale=32, train_wall=40, gb_free=23.6, wall=49034 2023-05-01 16:10:22 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 16:10:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:10:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:10:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:10:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:10:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:10:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:10:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:10:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:10:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:11:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:11:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:08 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 16:11:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:11:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:11:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:11:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:11:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:11:13 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.206 | loss_v1 0 | loss_v2 0 | nll_loss 2.039 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.11 | score 0.7451 | wps 3292.7 | wpb 3202.1 | bsz 39.4 | num_updates 12000 | best_score 0.751 2023-05-01 16:11:13 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 12000 updates 2023-05-01 16:11:13 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_12000.pt 2023-05-01 16:11:37 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_12000.pt 2023-05-01 16:11:51 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_2_12000.pt (epoch 2 @ 12000 updates, score 0.7451) (writing took 37.79766811709851 seconds) 2023-05-01 16:12:31 - progress_bar.py[line:274] - INFO: epoch 002: 5989 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7818, nsentences=120, sample_size=3746.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=607.7, ups=0.08, wpb=7818, bsz=120, num_updates=12010, lr=2.55709e-05, gnorm=0.983, clip=40, loss_scale=32, train_wall=39, gb_free=30.7, wall=49163 2023-05-01 16:13:10 - progress_bar.py[line:274] - INFO: epoch 002: 5999 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7409.6, nsentences=120, sample_size=4025, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1897.4, ups=0.26, wpb=7409.6, bsz=120, num_updates=12020, lr=2.55656e-05, gnorm=0.959, clip=10, loss_scale=32, train_wall=39, gb_free=31.3, wall=49202 2023-05-01 16:13:50 - progress_bar.py[line:274] - INFO: epoch 002: 6009 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7988.9, nsentences=120, sample_size=4142.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1977.7, ups=0.25, wpb=7988.9, bsz=120, num_updates=12030, lr=2.55603e-05, gnorm=0.906, clip=10, loss_scale=32, train_wall=40, gb_free=30.5, wall=49242 2023-05-01 16:14:30 - progress_bar.py[line:274] - INFO: epoch 002: 6019 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7920.7, nsentences=120, sample_size=4272.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1997.3, ups=0.25, wpb=7920.7, bsz=120, num_updates=12040, lr=2.55551e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=40, gb_free=29.6, wall=49282 2023-05-01 16:15:11 - progress_bar.py[line:274] - INFO: epoch 002: 6029 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7832.8, nsentences=120, sample_size=3814.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1915.7, ups=0.24, wpb=7832.8, bsz=120, num_updates=12050, lr=2.55498e-05, gnorm=0.928, clip=10, loss_scale=32, train_wall=41, gb_free=30.7, wall=49323 2023-05-01 16:15:50 - progress_bar.py[line:274] - INFO: epoch 002: 6039 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7633.4, nsentences=120, sample_size=4099.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1920.2, ups=0.25, wpb=7633.4, bsz=120, num_updates=12060, lr=2.55445e-05, gnorm=0.938, clip=40, loss_scale=32, train_wall=40, gb_free=30, wall=49363 2023-05-01 16:16:01 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 16:16:03 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:16:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:16:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:16:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:16:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:48 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 16:16:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 16:16:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 16:16:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 16:16:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 16:16:53 - progress_bar.py[line:282] - INFO: epoch 002 | valid on 'valid' subset | loss 3.202 | loss_v1 0 | loss_v2 0 | nll_loss 2.035 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.1 | score 0.7407 | wps 3290.9 | wpb 3202.1 | bsz 39.4 | num_updates 12063 | best_score 0.751 2023-05-01 16:16:53 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 2 @ 12063 updates 2023-05-01 16:16:53 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 16:17:21 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 16:17:21 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 2 @ 12063 updates, score 0.7407) (writing took 28.077118862885982 seconds) 2023-05-01 16:17:21 - train.py[line:332] - INFO: end of epoch 2 (average epoch stats below) 2023-05-01 16:17:21 - progress_bar.py[line:282] - INFO: epoch 002 | loss 2.432 | loss_v1 0 | loss_v2 0 | nll_loss 1.179 | ntokens 7724.21 | nsentences 119.992 | sample_size 4035.43 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.26 | wps 1885.7 | ups 0.24 | wpb 7724.2 | bsz 120 | num_updates 12063 | lr 2.55429e-05 | gnorm 0.927 | clip 11.8 | loss_scale 32 | train_wall 24026 | gb_free 30.4 | wall 49453 2023-05-01 16:17:21 - trainer.py[line:639] - INFO: loading train data for epoch 3 2023-05-01 16:17:21 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-01 16:17:21 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-01 16:17:23 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-01 16:17:24 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-01 16:17:25 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-01 16:17:25 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-01 16:17:25 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-01 16:17:25 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-01 16:17:26 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-01 16:17:26 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-01 16:17:27 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-01 16:17:27 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-01 16:17:27 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-01 16:17:27 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-01 16:17:28 - trainer.py[line:703] - INFO: begin training epoch 3 2023-05-01 16:17:28 - train.py[line:305] - INFO: Start iterating over samples 2023-05-01 16:17:55 - progress_bar.py[line:274] - INFO: epoch 003: 7 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7451.1, nsentences=116, sample_size=3853.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=596.3, ups=0.08, wpb=7451.1, bsz=116, num_updates=12070, lr=2.55392e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=38, gb_free=30.2, wall=49488 2023-05-01 16:18:35 - progress_bar.py[line:274] - INFO: epoch 003: 17 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7720.8, nsentences=120, sample_size=4270.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1957, ups=0.25, wpb=7720.8, bsz=120, num_updates=12080, lr=2.55339e-05, gnorm=0.91, clip=0, loss_scale=32, train_wall=39, gb_free=29.4, wall=49527 2023-05-01 16:19:14 - progress_bar.py[line:274] - INFO: epoch 003: 27 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7643, nsentences=120, sample_size=4280.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1922.5, ups=0.25, wpb=7643, bsz=120, num_updates=12090, lr=2.55287e-05, gnorm=0.919, clip=0, loss_scale=32, train_wall=40, gb_free=28.8, wall=49567 2023-05-01 16:19:54 - progress_bar.py[line:274] - INFO: epoch 003: 37 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7685.9, nsentences=120, sample_size=4156.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1955.1, ups=0.25, wpb=7685.9, bsz=120, num_updates=12100, lr=2.55234e-05, gnorm=0.926, clip=0, loss_scale=32, train_wall=39, gb_free=31.3, wall=49606 2023-05-01 16:20:34 - progress_bar.py[line:274] - INFO: epoch 003: 47 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7704.9, nsentences=120, sample_size=3971.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.4, ups=0.25, wpb=7704.9, bsz=120, num_updates=12110, lr=2.55181e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=40, gb_free=28.4, wall=49646 2023-05-01 16:21:13 - progress_bar.py[line:274] - INFO: epoch 003: 57 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7462.4, nsentences=120, sample_size=3753.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.7, ups=0.26, wpb=7462.4, bsz=120, num_updates=12120, lr=2.55128e-05, gnorm=0.97, clip=20, loss_scale=32, train_wall=39, gb_free=29.4, wall=49685 2023-05-01 16:21:53 - progress_bar.py[line:274] - INFO: epoch 003: 67 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=8009.3, nsentences=120, sample_size=4112, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1997.4, ups=0.25, wpb=8009.3, bsz=120, num_updates=12130, lr=2.55075e-05, gnorm=0.925, clip=0, loss_scale=32, train_wall=40, gb_free=28.8, wall=49725 2023-05-01 16:22:33 - progress_bar.py[line:274] - INFO: epoch 003: 77 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7698.2, nsentences=120, sample_size=3764.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1933.3, ups=0.25, wpb=7698.2, bsz=120, num_updates=12140, lr=2.55022e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=23.6, wall=49765 2023-05-01 16:23:12 - progress_bar.py[line:274] - INFO: epoch 003: 87 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7673.9, nsentences=120, sample_size=3977.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1952.6, ups=0.25, wpb=7673.9, bsz=120, num_updates=12150, lr=2.5497e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=49805 2023-05-01 16:23:51 - progress_bar.py[line:274] - INFO: epoch 003: 97 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7754, nsentences=120, sample_size=3831.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1971, ups=0.25, wpb=7754, bsz=120, num_updates=12160, lr=2.54917e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=39, gb_free=28.8, wall=49844 2023-05-01 16:24:31 - progress_bar.py[line:274] - INFO: epoch 003: 107 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7597.6, nsentences=120, sample_size=3841.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1938.6, ups=0.26, wpb=7597.6, bsz=120, num_updates=12170, lr=2.54864e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=49883 2023-05-01 16:25:11 - progress_bar.py[line:274] - INFO: epoch 003: 117 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7865.3, nsentences=120, sample_size=3820.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1952.1, ups=0.25, wpb=7865.3, bsz=120, num_updates=12180, lr=2.54811e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=49923 2023-05-01 16:25:50 - progress_bar.py[line:274] - INFO: epoch 003: 127 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7953.4, nsentences=120, sample_size=4178.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2027.9, ups=0.25, wpb=7953.4, bsz=120, num_updates=12190, lr=2.54758e-05, gnorm=0.918, clip=20, loss_scale=64, train_wall=39, gb_free=31, wall=49963 2023-05-01 16:26:30 - progress_bar.py[line:274] - INFO: epoch 003: 137 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7609.6, nsentences=120, sample_size=4115.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1924, ups=0.25, wpb=7609.6, bsz=120, num_updates=12200, lr=2.54706e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=50002 2023-05-01 16:27:09 - progress_bar.py[line:274] - INFO: epoch 003: 147 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7967.2, nsentences=120, sample_size=3996.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2005.2, ups=0.25, wpb=7967.2, bsz=120, num_updates=12210, lr=2.54653e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=50042 2023-05-01 16:27:50 - progress_bar.py[line:274] - INFO: epoch 003: 157 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7386.9, nsentences=120, sample_size=4064.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1835.8, ups=0.25, wpb=7386.9, bsz=120, num_updates=12220, lr=2.546e-05, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=50082 2023-05-01 16:28:29 - progress_bar.py[line:274] - INFO: epoch 003: 167 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7817.9, nsentences=120, sample_size=3964.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1991.9, ups=0.25, wpb=7817.9, bsz=120, num_updates=12230, lr=2.54547e-05, gnorm=0.999, clip=70, loss_scale=64, train_wall=39, gb_free=29.7, wall=50121 2023-05-01 16:29:09 - progress_bar.py[line:274] - INFO: epoch 003: 177 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7605.7, nsentences=120, sample_size=4389.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1911.4, ups=0.25, wpb=7605.7, bsz=120, num_updates=12240, lr=2.54494e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=50161 2023-05-01 16:29:49 - progress_bar.py[line:274] - INFO: epoch 003: 187 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7892, nsentences=120, sample_size=4133, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1958.8, ups=0.25, wpb=7892, bsz=120, num_updates=12250, lr=2.54441e-05, gnorm=0.906, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=50201 2023-05-01 16:30:29 - progress_bar.py[line:274] - INFO: epoch 003: 197 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7601.8, nsentences=120, sample_size=4013, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1913.2, ups=0.25, wpb=7601.8, bsz=120, num_updates=12260, lr=2.54389e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=50241 2023-05-01 16:31:08 - progress_bar.py[line:274] - INFO: epoch 003: 207 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=8074.6, nsentences=120, sample_size=3918, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2051.1, ups=0.25, wpb=8074.6, bsz=120, num_updates=12270, lr=2.54336e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=50281 2023-05-01 16:31:48 - progress_bar.py[line:274] - INFO: epoch 003: 217 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7758.7, nsentences=120, sample_size=4012.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1961.4, ups=0.25, wpb=7758.7, bsz=120, num_updates=12280, lr=2.54283e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=27.8, wall=50320 2023-05-01 16:32:28 - progress_bar.py[line:274] - INFO: epoch 003: 227 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7689.3, nsentences=120, sample_size=4163.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1920.2, ups=0.25, wpb=7689.3, bsz=120, num_updates=12290, lr=2.5423e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=27.1, wall=50360 2023-05-01 16:33:08 - progress_bar.py[line:274] - INFO: epoch 003: 237 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7605.3, nsentences=120, sample_size=4140.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1900.2, ups=0.25, wpb=7605.3, bsz=120, num_updates=12300, lr=2.54177e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=27.7, wall=50400 2023-05-01 16:33:48 - progress_bar.py[line:274] - INFO: epoch 003: 247 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=8034.4, nsentences=120, sample_size=3887, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2017.8, ups=0.25, wpb=8034.4, bsz=120, num_updates=12310, lr=2.54124e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=50440 2023-05-01 16:34:27 - progress_bar.py[line:274] - INFO: epoch 003: 257 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7583.1, nsentences=120, sample_size=4069.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1918, ups=0.25, wpb=7583.1, bsz=120, num_updates=12320, lr=2.54072e-05, gnorm=0.927, clip=0, loss_scale=64, train_wall=39, gb_free=30.8, wall=50480 2023-05-01 16:35:07 - progress_bar.py[line:274] - INFO: epoch 003: 267 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7661.9, nsentences=120, sample_size=3847, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1925.1, ups=0.25, wpb=7661.9, bsz=120, num_updates=12330, lr=2.54019e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=50519 2023-05-01 16:35:48 - progress_bar.py[line:274] - INFO: epoch 003: 277 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7986.1, nsentences=120, sample_size=4150.8, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1961, ups=0.25, wpb=7986.1, bsz=120, num_updates=12340, lr=2.53966e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=41, gb_free=30.9, wall=50560 2023-05-01 16:36:29 - progress_bar.py[line:274] - INFO: epoch 003: 287 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7692.2, nsentences=120, sample_size=4117.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1873.6, ups=0.24, wpb=7692.2, bsz=120, num_updates=12350, lr=2.53913e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=41, gb_free=30.5, wall=50601 2023-05-01 16:37:08 - progress_bar.py[line:274] - INFO: epoch 003: 297 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7951.5, nsentences=120, sample_size=4150, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2022.5, ups=0.25, wpb=7951.5, bsz=120, num_updates=12360, lr=2.5386e-05, gnorm=0.983, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=50640 2023-05-01 16:37:48 - progress_bar.py[line:274] - INFO: epoch 003: 307 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7435.5, nsentences=120, sample_size=4006.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1867.9, ups=0.25, wpb=7435.5, bsz=120, num_updates=12370, lr=2.53808e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=50680 2023-05-01 16:38:28 - progress_bar.py[line:274] - INFO: epoch 003: 317 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7519.4, nsentences=120, sample_size=4313.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1891, ups=0.25, wpb=7519.4, bsz=120, num_updates=12380, lr=2.53755e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=50720 2023-05-01 16:39:07 - progress_bar.py[line:274] - INFO: epoch 003: 327 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7762.5, nsentences=120, sample_size=3783.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1958.2, ups=0.25, wpb=7762.5, bsz=120, num_updates=12390, lr=2.53702e-05, gnorm=0.97, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=50760 2023-05-01 16:39:47 - progress_bar.py[line:274] - INFO: epoch 003: 337 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7795.8, nsentences=120, sample_size=4199.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1956.2, ups=0.25, wpb=7795.8, bsz=120, num_updates=12400, lr=2.53649e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=27.6, wall=50799 2023-05-01 16:40:27 - progress_bar.py[line:274] - INFO: epoch 003: 347 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7951.7, nsentences=120, sample_size=3956.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2003.2, ups=0.25, wpb=7951.7, bsz=120, num_updates=12410, lr=2.53596e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=50839 2023-05-01 16:41:07 - progress_bar.py[line:274] - INFO: epoch 003: 357 / 6042 loss=2.513, loss_v1=0, loss_v2=0, nll_loss=1.273, ntokens=7903.6, nsentences=120, sample_size=4189.8, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1984, ups=0.25, wpb=7903.6, bsz=120, num_updates=12420, lr=2.53543e-05, gnorm=0.902, clip=0, loss_scale=64, train_wall=40, gb_free=27.2, wall=50879 2023-05-01 16:41:46 - progress_bar.py[line:274] - INFO: epoch 003: 367 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7596.3, nsentences=120, sample_size=3996.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1926.8, ups=0.25, wpb=7596.3, bsz=120, num_updates=12430, lr=2.53491e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=50918 2023-05-01 16:42:26 - progress_bar.py[line:274] - INFO: epoch 003: 377 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7718.3, nsentences=120, sample_size=3984.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1932.8, ups=0.25, wpb=7718.3, bsz=120, num_updates=12440, lr=2.53438e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=50958 2023-05-01 16:43:06 - progress_bar.py[line:274] - INFO: epoch 003: 387 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7715.3, nsentences=120, sample_size=4454.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1915.3, ups=0.25, wpb=7715.3, bsz=120, num_updates=12450, lr=2.53385e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=50999 2023-05-01 16:43:46 - progress_bar.py[line:274] - INFO: epoch 003: 397 / 6042 loss=2.528, loss_v1=0, loss_v2=0, nll_loss=1.296, ntokens=7649.7, nsentences=120, sample_size=4125.7, sample_size_v1=0, sample_size_v2=0, ppl=2.46, wps=1934.1, ups=0.25, wpb=7649.7, bsz=120, num_updates=12460, lr=2.53332e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=39, gb_free=29.3, wall=51038 2023-05-01 16:44:26 - progress_bar.py[line:274] - INFO: epoch 003: 407 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7877.8, nsentences=120, sample_size=4305.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1935.7, ups=0.25, wpb=7877.8, bsz=120, num_updates=12470, lr=2.53279e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=41, gb_free=30.3, wall=51079 2023-05-01 16:45:06 - progress_bar.py[line:274] - INFO: epoch 003: 417 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7650.8, nsentences=120, sample_size=4096.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1946.6, ups=0.25, wpb=7650.8, bsz=120, num_updates=12480, lr=2.53227e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=51118 2023-05-01 16:45:46 - progress_bar.py[line:274] - INFO: epoch 003: 427 / 6042 loss=2.515, loss_v1=0, loss_v2=0, nll_loss=1.275, ntokens=7748, nsentences=120, sample_size=4478.3, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1923.1, ups=0.25, wpb=7748, bsz=120, num_updates=12490, lr=2.53174e-05, gnorm=0.89, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=51159 2023-05-01 16:46:26 - progress_bar.py[line:274] - INFO: epoch 003: 437 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7747.4, nsentences=120, sample_size=4002.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1944.6, ups=0.25, wpb=7747.4, bsz=120, num_updates=12500, lr=2.53121e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=51198 2023-05-01 16:47:06 - progress_bar.py[line:274] - INFO: epoch 003: 447 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=8301.8, nsentences=120, sample_size=4325.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=2063.1, ups=0.25, wpb=8301.8, bsz=120, num_updates=12510, lr=2.53068e-05, gnorm=0.878, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=51239 2023-05-01 16:47:46 - progress_bar.py[line:274] - INFO: epoch 003: 457 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7666.4, nsentences=120, sample_size=3934.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1945.6, ups=0.25, wpb=7666.4, bsz=120, num_updates=12520, lr=2.53015e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=51278 2023-05-01 16:48:25 - progress_bar.py[line:274] - INFO: epoch 003: 467 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.267, ntokens=7630.1, nsentences=120, sample_size=3469.3, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1928.8, ups=0.25, wpb=7630.1, bsz=120, num_updates=12530, lr=2.52962e-05, gnorm=1, clip=40, loss_scale=64, train_wall=39, gb_free=30.8, wall=51318 2023-05-01 16:49:05 - progress_bar.py[line:274] - INFO: epoch 003: 477 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7621.2, nsentences=120, sample_size=4000.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1913.5, ups=0.25, wpb=7621.2, bsz=120, num_updates=12540, lr=2.5291e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=51357 2023-05-01 16:49:44 - progress_bar.py[line:274] - INFO: epoch 003: 487 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7628.6, nsentences=120, sample_size=3939.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1932.8, ups=0.25, wpb=7628.6, bsz=120, num_updates=12550, lr=2.52857e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=51397 2023-05-01 16:50:24 - progress_bar.py[line:274] - INFO: epoch 003: 497 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7436, nsentences=120, sample_size=3827.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1855.7, ups=0.25, wpb=7436, bsz=120, num_updates=12560, lr=2.52804e-05, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=51437 2023-05-01 16:51:04 - progress_bar.py[line:274] - INFO: epoch 003: 507 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.253, ntokens=7959.5, nsentences=120, sample_size=4210.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2000.5, ups=0.25, wpb=7959.5, bsz=120, num_updates=12570, lr=2.52751e-05, gnorm=0.929, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=51477 2023-05-01 16:51:43 - progress_bar.py[line:274] - INFO: epoch 003: 517 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7607.6, nsentences=120, sample_size=3927.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1938.8, ups=0.25, wpb=7607.6, bsz=120, num_updates=12580, lr=2.52698e-05, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=51516 2023-05-01 16:52:24 - progress_bar.py[line:274] - INFO: epoch 003: 527 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7665.9, nsentences=120, sample_size=4493.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1915.5, ups=0.25, wpb=7665.9, bsz=120, num_updates=12590, lr=2.52645e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=29, wall=51556 2023-05-01 16:53:04 - progress_bar.py[line:274] - INFO: epoch 003: 537 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7740.8, nsentences=120, sample_size=4231.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1919, ups=0.25, wpb=7740.8, bsz=120, num_updates=12600, lr=2.52593e-05, gnorm=0.896, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=51596 2023-05-01 16:53:43 - progress_bar.py[line:274] - INFO: epoch 003: 547 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7675.8, nsentences=120, sample_size=3764.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1943.1, ups=0.25, wpb=7675.8, bsz=120, num_updates=12610, lr=2.5254e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=39, gb_free=30.9, wall=51636 2023-05-01 16:54:23 - progress_bar.py[line:274] - INFO: epoch 003: 557 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7687.5, nsentences=120, sample_size=4145.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1928.4, ups=0.25, wpb=7687.5, bsz=120, num_updates=12620, lr=2.52487e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=51676 2023-05-01 16:55:03 - progress_bar.py[line:274] - INFO: epoch 003: 567 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7706.8, nsentences=120, sample_size=3941.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1952.4, ups=0.25, wpb=7706.8, bsz=120, num_updates=12630, lr=2.52434e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=51715 2023-05-01 16:55:42 - progress_bar.py[line:274] - INFO: epoch 003: 577 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7753, nsentences=120, sample_size=4098.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1966.1, ups=0.25, wpb=7753, bsz=120, num_updates=12640, lr=2.52381e-05, gnorm=0.933, clip=30, loss_scale=64, train_wall=39, gb_free=28.6, wall=51755 2023-05-01 16:56:22 - progress_bar.py[line:274] - INFO: epoch 003: 587 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7898.7, nsentences=120, sample_size=3996.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1983.9, ups=0.25, wpb=7898.7, bsz=120, num_updates=12650, lr=2.52329e-05, gnorm=0.944, clip=20, loss_scale=128, train_wall=40, gb_free=30.7, wall=51794 2023-05-01 16:56:50 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 16:57:05 - progress_bar.py[line:274] - INFO: epoch 003: 598 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7865.7, nsentences=120, sample_size=3900.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1808.4, ups=0.23, wpb=7865.7, bsz=120, num_updates=12660, lr=2.52276e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=43, gb_free=29.5, wall=51838 2023-05-01 16:57:45 - progress_bar.py[line:274] - INFO: epoch 003: 608 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7760.7, nsentences=120, sample_size=3932, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1939.7, ups=0.25, wpb=7760.7, bsz=120, num_updates=12670, lr=2.52223e-05, gnorm=0.904, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=51878 2023-05-01 16:58:25 - progress_bar.py[line:274] - INFO: epoch 003: 618 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7395.3, nsentences=120, sample_size=4301.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1873.1, ups=0.25, wpb=7395.3, bsz=120, num_updates=12680, lr=2.5217e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=29.4, wall=51917 2023-05-01 16:59:05 - progress_bar.py[line:274] - INFO: epoch 003: 628 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7951.5, nsentences=120, sample_size=4009.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1999, ups=0.25, wpb=7951.5, bsz=120, num_updates=12690, lr=2.52117e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=51957 2023-05-01 16:59:44 - progress_bar.py[line:274] - INFO: epoch 003: 638 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7499.5, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1918.8, ups=0.26, wpb=7499.5, bsz=120, num_updates=12700, lr=2.52064e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=51996 2023-05-01 17:00:25 - progress_bar.py[line:274] - INFO: epoch 003: 648 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7653.9, nsentences=120, sample_size=3861.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1881.1, ups=0.25, wpb=7653.9, bsz=120, num_updates=12710, lr=2.52012e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=41, gb_free=29.9, wall=52037 2023-05-01 17:01:05 - progress_bar.py[line:274] - INFO: epoch 003: 658 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7681.2, nsentences=120, sample_size=4020.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1915.4, ups=0.25, wpb=7681.2, bsz=120, num_updates=12720, lr=2.51959e-05, gnorm=0.927, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=52077 2023-05-01 17:01:45 - progress_bar.py[line:274] - INFO: epoch 003: 668 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7658.8, nsentences=120, sample_size=4409.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1904.8, ups=0.25, wpb=7658.8, bsz=120, num_updates=12730, lr=2.51906e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=27.5, wall=52117 2023-05-01 17:02:25 - progress_bar.py[line:274] - INFO: epoch 003: 678 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7552.4, nsentences=120, sample_size=3814.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1890.8, ups=0.25, wpb=7552.4, bsz=120, num_updates=12740, lr=2.51853e-05, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=52157 2023-05-01 17:03:04 - progress_bar.py[line:274] - INFO: epoch 003: 688 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7455.5, nsentences=120, sample_size=4145.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1916.4, ups=0.26, wpb=7455.5, bsz=120, num_updates=12750, lr=2.518e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=52196 2023-05-01 17:03:42 - progress_bar.py[line:274] - INFO: epoch 003: 698 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7744, nsentences=120, sample_size=4165.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1997, ups=0.26, wpb=7744, bsz=120, num_updates=12760, lr=2.51748e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=52235 2023-05-01 17:04:22 - progress_bar.py[line:274] - INFO: epoch 003: 708 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=8007.6, nsentences=120, sample_size=3766.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2015.2, ups=0.25, wpb=8007.6, bsz=120, num_updates=12770, lr=2.51695e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=28, wall=52275 2023-05-01 17:05:03 - progress_bar.py[line:274] - INFO: epoch 003: 718 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7672.2, nsentences=120, sample_size=4364, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1890, ups=0.25, wpb=7672.2, bsz=120, num_updates=12780, lr=2.51642e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=41, gb_free=28.4, wall=52315 2023-05-01 17:05:42 - progress_bar.py[line:274] - INFO: epoch 003: 728 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7557.3, nsentences=120, sample_size=3913.1, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1912.8, ups=0.25, wpb=7557.3, bsz=120, num_updates=12790, lr=2.51589e-05, gnorm=0.947, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=52355 2023-05-01 17:06:22 - progress_bar.py[line:274] - INFO: epoch 003: 738 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7667.2, nsentences=120, sample_size=3944, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1913.9, ups=0.25, wpb=7667.2, bsz=120, num_updates=12800, lr=2.51536e-05, gnorm=0.96, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=52395 2023-05-01 17:07:02 - progress_bar.py[line:274] - INFO: epoch 003: 748 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7551.8, nsentences=120, sample_size=4377.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1925.8, ups=0.26, wpb=7551.8, bsz=120, num_updates=12810, lr=2.51483e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=52434 2023-05-01 17:07:41 - progress_bar.py[line:274] - INFO: epoch 003: 758 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7466.4, nsentences=120, sample_size=4015.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1879.7, ups=0.25, wpb=7466.4, bsz=120, num_updates=12820, lr=2.51431e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=52474 2023-05-01 17:08:21 - progress_bar.py[line:274] - INFO: epoch 003: 768 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7737.9, nsentences=120, sample_size=3912.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1952.5, ups=0.25, wpb=7737.9, bsz=120, num_updates=12830, lr=2.51378e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=52513 2023-05-01 17:09:01 - progress_bar.py[line:274] - INFO: epoch 003: 778 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7689.1, nsentences=120, sample_size=4058.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1921.6, ups=0.25, wpb=7689.1, bsz=120, num_updates=12840, lr=2.51325e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=52553 2023-05-01 17:09:41 - progress_bar.py[line:274] - INFO: epoch 003: 788 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7625.2, nsentences=120, sample_size=4275.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1922.2, ups=0.25, wpb=7625.2, bsz=120, num_updates=12850, lr=2.51272e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=29, wall=52593 2023-05-01 17:10:20 - progress_bar.py[line:274] - INFO: epoch 003: 798 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7553.6, nsentences=120, sample_size=4330.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1904.9, ups=0.25, wpb=7553.6, bsz=120, num_updates=12860, lr=2.51219e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=52633 2023-05-01 17:10:59 - progress_bar.py[line:274] - INFO: epoch 003: 808 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7874.9, nsentences=120, sample_size=3960.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=2014, ups=0.26, wpb=7874.9, bsz=120, num_updates=12870, lr=2.51166e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=39, gb_free=29, wall=52672 2023-05-01 17:11:39 - progress_bar.py[line:274] - INFO: epoch 003: 818 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7655.8, nsentences=120, sample_size=3960.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1938.8, ups=0.25, wpb=7655.8, bsz=120, num_updates=12880, lr=2.51114e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=52711 2023-05-01 17:12:19 - progress_bar.py[line:274] - INFO: epoch 003: 828 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.258, ntokens=7641.9, nsentences=120, sample_size=3919.2, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1903, ups=0.25, wpb=7641.9, bsz=120, num_updates=12890, lr=2.51061e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=52751 2023-05-01 17:12:58 - progress_bar.py[line:274] - INFO: epoch 003: 838 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=7745.2, nsentences=120, sample_size=3865.4, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1963.7, ups=0.25, wpb=7745.2, bsz=120, num_updates=12900, lr=2.51008e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=52791 2023-05-01 17:13:38 - progress_bar.py[line:274] - INFO: epoch 003: 848 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7532, nsentences=120, sample_size=4338.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1883.2, ups=0.25, wpb=7532, bsz=120, num_updates=12910, lr=2.50955e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=52831 2023-05-01 17:14:18 - progress_bar.py[line:274] - INFO: epoch 003: 858 / 6042 loss=2.512, loss_v1=0, loss_v2=0, nll_loss=1.276, ntokens=7734.1, nsentences=120, sample_size=3951.5, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1958.9, ups=0.25, wpb=7734.1, bsz=120, num_updates=12920, lr=2.50902e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=39, gb_free=28.1, wall=52870 2023-05-01 17:14:57 - progress_bar.py[line:274] - INFO: epoch 003: 868 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7709.8, nsentences=120, sample_size=4083.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1983.2, ups=0.26, wpb=7709.8, bsz=120, num_updates=12930, lr=2.5085e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=39, gb_free=30.8, wall=52909 2023-05-01 17:15:37 - progress_bar.py[line:274] - INFO: epoch 003: 878 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7633.6, nsentences=120, sample_size=3825.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1922.5, ups=0.25, wpb=7633.6, bsz=120, num_updates=12940, lr=2.50797e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=52949 2023-05-01 17:16:17 - progress_bar.py[line:274] - INFO: epoch 003: 888 / 6042 loss=2.522, loss_v1=0, loss_v2=0, nll_loss=1.284, ntokens=7792.8, nsentences=120, sample_size=4103.3, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=1925.1, ups=0.25, wpb=7792.8, bsz=120, num_updates=12950, lr=2.50744e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=52989 2023-05-01 17:16:56 - progress_bar.py[line:274] - INFO: epoch 003: 898 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7626.6, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1937.8, ups=0.25, wpb=7626.6, bsz=120, num_updates=12960, lr=2.50691e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=39, gb_free=30.5, wall=53029 2023-05-01 17:17:36 - progress_bar.py[line:274] - INFO: epoch 003: 908 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7594.3, nsentences=120, sample_size=4007.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1908.1, ups=0.25, wpb=7594.3, bsz=120, num_updates=12970, lr=2.50638e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=53069 2023-05-01 17:18:15 - progress_bar.py[line:274] - INFO: epoch 003: 918 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7400.2, nsentences=120, sample_size=4183.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1886.5, ups=0.25, wpb=7400.2, bsz=120, num_updates=12980, lr=2.50585e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=53108 2023-05-01 17:18:54 - progress_bar.py[line:274] - INFO: epoch 003: 928 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7595.1, nsentences=120, sample_size=4154.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1948.9, ups=0.26, wpb=7595.1, bsz=120, num_updates=12990, lr=2.50533e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=53147 2023-05-01 17:19:35 - progress_bar.py[line:274] - INFO: epoch 003: 938 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7608.8, nsentences=120, sample_size=4181.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1891, ups=0.25, wpb=7608.8, bsz=120, num_updates=13000, lr=2.5048e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=53187 2023-05-01 17:19:35 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 17:19:37 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 17:19:37 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:19:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 17:19:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:19:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:19:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:19:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 17:20:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:20:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:17 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 17:20:17 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:20:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:21 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 17:20:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:20:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:26 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 17:20:26 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 17:20:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 17:20:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 17:20:26 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.208 | loss_v1 0 | loss_v2 0 | nll_loss 2.04 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.11 | score 0.7461 | wps 3295.3 | wpb 3202.1 | bsz 39.4 | num_updates 13000 | best_score 0.751 2023-05-01 17:20:26 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 13000 updates 2023-05-01 17:20:26 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_13000.pt 2023-05-01 17:20:51 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_13000.pt 2023-05-01 17:21:05 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_13000.pt (epoch 3 @ 13000 updates, score 0.7461) (writing took 38.66314581502229 seconds) 2023-05-01 17:21:44 - progress_bar.py[line:274] - INFO: epoch 003: 948 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7471.6, nsentences=120, sample_size=4283.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=575.9, ups=0.08, wpb=7471.6, bsz=120, num_updates=13010, lr=2.50427e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=53317 2023-05-01 17:22:25 - progress_bar.py[line:274] - INFO: epoch 003: 958 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7697.7, nsentences=120, sample_size=3726.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1912.6, ups=0.25, wpb=7697.7, bsz=120, num_updates=13020, lr=2.50374e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=53357 2023-05-01 17:23:05 - progress_bar.py[line:274] - INFO: epoch 003: 968 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7866.2, nsentences=120, sample_size=3952, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1966.9, ups=0.25, wpb=7866.2, bsz=120, num_updates=13030, lr=2.50321e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=53397 2023-05-01 17:23:44 - progress_bar.py[line:274] - INFO: epoch 003: 978 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7926.4, nsentences=120, sample_size=3914.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1995.9, ups=0.25, wpb=7926.4, bsz=120, num_updates=13040, lr=2.50269e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=53437 2023-05-01 17:24:23 - progress_bar.py[line:274] - INFO: epoch 003: 988 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7399.9, nsentences=120, sample_size=4016.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1890.3, ups=0.26, wpb=7399.9, bsz=120, num_updates=13050, lr=2.50216e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=53476 2023-05-01 17:25:03 - progress_bar.py[line:274] - INFO: epoch 003: 998 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7698.5, nsentences=120, sample_size=3969.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1965.7, ups=0.26, wpb=7698.5, bsz=120, num_updates=13060, lr=2.50163e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=39, gb_free=31.1, wall=53515 2023-05-01 17:25:42 - progress_bar.py[line:274] - INFO: epoch 003: 1008 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.266, ntokens=7974.8, nsentences=120, sample_size=4125.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2011, ups=0.25, wpb=7974.8, bsz=120, num_updates=13070, lr=2.5011e-05, gnorm=0.961, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=53555 2023-05-01 17:26:22 - progress_bar.py[line:274] - INFO: epoch 003: 1018 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7815, nsentences=120, sample_size=4086, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1967.6, ups=0.25, wpb=7815, bsz=120, num_updates=13080, lr=2.50057e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=53594 2023-05-01 17:27:04 - progress_bar.py[line:274] - INFO: epoch 003: 1028 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7953.5, nsentences=120, sample_size=4362.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1905.8, ups=0.24, wpb=7953.5, bsz=120, num_updates=13090, lr=2.50004e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=42, gb_free=30, wall=53636 2023-05-01 17:27:44 - progress_bar.py[line:274] - INFO: epoch 003: 1038 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7740.7, nsentences=120, sample_size=4106.2, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1916.8, ups=0.25, wpb=7740.7, bsz=120, num_updates=13100, lr=2.49952e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=53677 2023-05-01 17:28:24 - progress_bar.py[line:274] - INFO: epoch 003: 1048 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7501, nsentences=120, sample_size=4097.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1897, ups=0.25, wpb=7501, bsz=120, num_updates=13110, lr=2.49899e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=53716 2023-05-01 17:29:04 - progress_bar.py[line:274] - INFO: epoch 003: 1058 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7600, nsentences=120, sample_size=4181.6, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1889.5, ups=0.25, wpb=7600, bsz=120, num_updates=13120, lr=2.49846e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=53756 2023-05-01 17:29:44 - progress_bar.py[line:274] - INFO: epoch 003: 1068 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7865.3, nsentences=120, sample_size=4282.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1946.8, ups=0.25, wpb=7865.3, bsz=120, num_updates=13130, lr=2.49793e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=26.3, wall=53797 2023-05-01 17:30:25 - progress_bar.py[line:274] - INFO: epoch 003: 1078 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7860.6, nsentences=120, sample_size=4166.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1945.4, ups=0.25, wpb=7860.6, bsz=120, num_updates=13140, lr=2.4974e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=53837 2023-05-01 17:31:04 - progress_bar.py[line:274] - INFO: epoch 003: 1088 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7871.5, nsentences=120, sample_size=3989, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2012.1, ups=0.26, wpb=7871.5, bsz=120, num_updates=13150, lr=2.49687e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=28.9, wall=53876 2023-05-01 17:31:43 - progress_bar.py[line:274] - INFO: epoch 003: 1098 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7881, nsentences=120, sample_size=3987, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1986.4, ups=0.25, wpb=7881, bsz=120, num_updates=13160, lr=2.49635e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=53916 2023-05-01 17:32:23 - progress_bar.py[line:274] - INFO: epoch 003: 1108 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=8101.9, nsentences=120, sample_size=3946.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2031.9, ups=0.25, wpb=8101.9, bsz=120, num_updates=13170, lr=2.49582e-05, gnorm=0.921, clip=10, loss_scale=128, train_wall=40, gb_free=29, wall=53956 2023-05-01 17:33:03 - progress_bar.py[line:274] - INFO: epoch 003: 1118 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7939.5, nsentences=120, sample_size=3915, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1983.1, ups=0.25, wpb=7939.5, bsz=120, num_updates=13180, lr=2.49529e-05, gnorm=0.943, clip=20, loss_scale=128, train_wall=40, gb_free=29.4, wall=53996 2023-05-01 17:33:19 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 17:33:47 - progress_bar.py[line:274] - INFO: epoch 003: 1129 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7728.2, nsentences=120, sample_size=4120.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1768.1, ups=0.23, wpb=7728.2, bsz=120, num_updates=13190, lr=2.49476e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=44, gb_free=30.5, wall=54040 2023-05-01 17:34:27 - progress_bar.py[line:274] - INFO: epoch 003: 1139 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7790.8, nsentences=120, sample_size=4098.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1971, ups=0.25, wpb=7790.8, bsz=120, num_updates=13200, lr=2.49423e-05, gnorm=0.874, clip=0, loss_scale=64, train_wall=39, gb_free=28.6, wall=54079 2023-05-01 17:35:06 - progress_bar.py[line:274] - INFO: epoch 003: 1149 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7807, nsentences=120, sample_size=3785.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1985, ups=0.25, wpb=7807, bsz=120, num_updates=13210, lr=2.49371e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=54118 2023-05-01 17:35:46 - progress_bar.py[line:274] - INFO: epoch 003: 1159 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7576.1, nsentences=120, sample_size=4042.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1914.2, ups=0.25, wpb=7576.1, bsz=120, num_updates=13220, lr=2.49318e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=54158 2023-05-01 17:36:26 - progress_bar.py[line:274] - INFO: epoch 003: 1169 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7578.6, nsentences=120, sample_size=4132.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1860.1, ups=0.25, wpb=7578.6, bsz=120, num_updates=13230, lr=2.49265e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=41, gb_free=30.7, wall=54199 2023-05-01 17:37:07 - progress_bar.py[line:274] - INFO: epoch 003: 1179 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7816.1, nsentences=120, sample_size=4229.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1935.7, ups=0.25, wpb=7816.1, bsz=120, num_updates=13240, lr=2.49212e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=54239 2023-05-01 17:37:47 - progress_bar.py[line:274] - INFO: epoch 003: 1189 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7829.4, nsentences=120, sample_size=4103, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1922.4, ups=0.25, wpb=7829.4, bsz=120, num_updates=13250, lr=2.49159e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=41, gb_free=23.6, wall=54280 2023-05-01 17:38:28 - progress_bar.py[line:274] - INFO: epoch 003: 1199 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7595.3, nsentences=120, sample_size=3979.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1883.1, ups=0.25, wpb=7595.3, bsz=120, num_updates=13260, lr=2.49106e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=54320 2023-05-01 17:39:08 - progress_bar.py[line:274] - INFO: epoch 003: 1209 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7738, nsentences=120, sample_size=4201, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1933.7, ups=0.25, wpb=7738, bsz=120, num_updates=13270, lr=2.49054e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=54360 2023-05-01 17:39:48 - progress_bar.py[line:274] - INFO: epoch 003: 1219 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=8167.3, nsentences=120, sample_size=3673.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2039.1, ups=0.25, wpb=8167.3, bsz=120, num_updates=13280, lr=2.49001e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=54400 2023-05-01 17:40:28 - progress_bar.py[line:274] - INFO: epoch 003: 1229 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7619.5, nsentences=120, sample_size=3592.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1908.4, ups=0.25, wpb=7619.5, bsz=120, num_updates=13290, lr=2.48948e-05, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=54440 2023-05-01 17:41:08 - progress_bar.py[line:274] - INFO: epoch 003: 1239 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7749.4, nsentences=120, sample_size=4026.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1945.6, ups=0.25, wpb=7749.4, bsz=120, num_updates=13300, lr=2.48895e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=54480 2023-05-01 17:41:48 - progress_bar.py[line:274] - INFO: epoch 003: 1249 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7886.9, nsentences=120, sample_size=3974.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1966.4, ups=0.25, wpb=7886.9, bsz=120, num_updates=13310, lr=2.48842e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=28.3, wall=54520 2023-05-01 17:42:27 - progress_bar.py[line:274] - INFO: epoch 003: 1259 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7723.2, nsentences=120, sample_size=3950.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1950.7, ups=0.25, wpb=7723.2, bsz=120, num_updates=13320, lr=2.4879e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=54560 2023-05-01 17:43:08 - progress_bar.py[line:274] - INFO: epoch 003: 1269 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7822.5, nsentences=120, sample_size=3938.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1944.3, ups=0.25, wpb=7822.5, bsz=120, num_updates=13330, lr=2.48737e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=54600 2023-05-01 17:43:48 - progress_bar.py[line:274] - INFO: epoch 003: 1279 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7756.2, nsentences=120, sample_size=4077.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1911.9, ups=0.25, wpb=7756.2, bsz=120, num_updates=13340, lr=2.48684e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=54641 2023-05-01 17:44:27 - progress_bar.py[line:274] - INFO: epoch 003: 1289 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7505.2, nsentences=120, sample_size=4085.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1908.8, ups=0.25, wpb=7505.2, bsz=120, num_updates=13350, lr=2.48631e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=54680 2023-05-01 17:45:07 - progress_bar.py[line:274] - INFO: epoch 003: 1299 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7769.6, nsentences=120, sample_size=3762.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1975.8, ups=0.25, wpb=7769.6, bsz=120, num_updates=13360, lr=2.48578e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=31, wall=54719 2023-05-01 17:45:47 - progress_bar.py[line:274] - INFO: epoch 003: 1309 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7759.6, nsentences=120, sample_size=3987.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1947, ups=0.25, wpb=7759.6, bsz=120, num_updates=13370, lr=2.48525e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=54759 2023-05-01 17:46:27 - progress_bar.py[line:274] - INFO: epoch 003: 1319 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7472.9, nsentences=120, sample_size=4189, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1843.9, ups=0.25, wpb=7472.9, bsz=120, num_updates=13380, lr=2.48473e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=27.5, wall=54800 2023-05-01 17:47:07 - progress_bar.py[line:274] - INFO: epoch 003: 1329 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7596.2, nsentences=120, sample_size=3797.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1919.7, ups=0.25, wpb=7596.2, bsz=120, num_updates=13390, lr=2.4842e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=54839 2023-05-01 17:47:46 - progress_bar.py[line:274] - INFO: epoch 003: 1339 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=8031.6, nsentences=120, sample_size=4310.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2040.7, ups=0.25, wpb=8031.6, bsz=120, num_updates=13400, lr=2.48367e-05, gnorm=0.898, clip=10, loss_scale=64, train_wall=39, gb_free=30.8, wall=54879 2023-05-01 17:48:26 - progress_bar.py[line:274] - INFO: epoch 003: 1349 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7724.1, nsentences=120, sample_size=4066.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1937.6, ups=0.25, wpb=7724.1, bsz=120, num_updates=13410, lr=2.48314e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=54918 2023-05-01 17:49:05 - progress_bar.py[line:274] - INFO: epoch 003: 1359 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7515.7, nsentences=120, sample_size=4015.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1918.3, ups=0.26, wpb=7515.7, bsz=120, num_updates=13420, lr=2.48261e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=39, gb_free=28.3, wall=54958 2023-05-01 17:49:45 - progress_bar.py[line:274] - INFO: epoch 003: 1369 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7725.5, nsentences=120, sample_size=4017.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1937.2, ups=0.25, wpb=7725.5, bsz=120, num_updates=13430, lr=2.48208e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=54997 2023-05-01 17:50:25 - progress_bar.py[line:274] - INFO: epoch 003: 1379 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7831.7, nsentences=120, sample_size=3972.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1936.4, ups=0.25, wpb=7831.7, bsz=120, num_updates=13440, lr=2.48156e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=55038 2023-05-01 17:51:05 - progress_bar.py[line:274] - INFO: epoch 003: 1389 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7765.4, nsentences=120, sample_size=4164.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1948.3, ups=0.25, wpb=7765.4, bsz=120, num_updates=13450, lr=2.48103e-05, gnorm=0.926, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=55078 2023-05-01 17:51:45 - progress_bar.py[line:274] - INFO: epoch 003: 1399 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7691.1, nsentences=120, sample_size=3976.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916.6, ups=0.25, wpb=7691.1, bsz=120, num_updates=13460, lr=2.4805e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=55118 2023-05-01 17:52:25 - progress_bar.py[line:274] - INFO: epoch 003: 1409 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7518.2, nsentences=120, sample_size=4162.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1876.4, ups=0.25, wpb=7518.2, bsz=120, num_updates=13470, lr=2.47997e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=55158 2023-05-01 17:53:05 - progress_bar.py[line:274] - INFO: epoch 003: 1419 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7977.6, nsentences=120, sample_size=4043, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2021.3, ups=0.25, wpb=7977.6, bsz=120, num_updates=13480, lr=2.47944e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=55197 2023-05-01 17:53:45 - progress_bar.py[line:274] - INFO: epoch 003: 1429 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7816.9, nsentences=120, sample_size=3954.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1956.6, ups=0.25, wpb=7816.9, bsz=120, num_updates=13490, lr=2.47892e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=55237 2023-05-01 17:54:24 - progress_bar.py[line:274] - INFO: epoch 003: 1439 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7818.7, nsentences=120, sample_size=4020.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1987.5, ups=0.25, wpb=7818.7, bsz=120, num_updates=13500, lr=2.47839e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=55277 2023-05-01 17:55:03 - progress_bar.py[line:274] - INFO: epoch 003: 1449 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7532.6, nsentences=120, sample_size=3999.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1921.4, ups=0.26, wpb=7532.6, bsz=120, num_updates=13510, lr=2.47786e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=55316 2023-05-01 17:55:43 - progress_bar.py[line:274] - INFO: epoch 003: 1459 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7725.7, nsentences=120, sample_size=3823, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1960.4, ups=0.25, wpb=7725.7, bsz=120, num_updates=13520, lr=2.47733e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=55355 2023-05-01 17:56:22 - progress_bar.py[line:274] - INFO: epoch 003: 1469 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=8115.9, nsentences=120, sample_size=3819.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2055.3, ups=0.25, wpb=8115.9, bsz=120, num_updates=13530, lr=2.4768e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=55395 2023-05-01 17:57:02 - progress_bar.py[line:274] - INFO: epoch 003: 1479 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=8221.8, nsentences=120, sample_size=3812.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2052.9, ups=0.25, wpb=8221.8, bsz=120, num_updates=13540, lr=2.47627e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=40, gb_free=27.2, wall=55435 2023-05-01 17:57:43 - progress_bar.py[line:274] - INFO: epoch 003: 1489 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7524.9, nsentences=120, sample_size=4149.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1875.7, ups=0.25, wpb=7524.9, bsz=120, num_updates=13550, lr=2.47575e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=55475 2023-05-01 17:58:22 - progress_bar.py[line:274] - INFO: epoch 003: 1499 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7681.5, nsentences=120, sample_size=3643.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1942.8, ups=0.25, wpb=7681.5, bsz=120, num_updates=13560, lr=2.47522e-05, gnorm=0.98, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=55515 2023-05-01 17:59:02 - progress_bar.py[line:274] - INFO: epoch 003: 1509 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=8060.9, nsentences=120, sample_size=4275.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2016.6, ups=0.25, wpb=8060.9, bsz=120, num_updates=13570, lr=2.47469e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=55554 2023-05-01 17:59:42 - progress_bar.py[line:274] - INFO: epoch 003: 1519 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7639.5, nsentences=120, sample_size=3800.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1925.7, ups=0.25, wpb=7639.5, bsz=120, num_updates=13580, lr=2.47416e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=55594 2023-05-01 18:00:22 - progress_bar.py[line:274] - INFO: epoch 003: 1529 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7662.3, nsentences=120, sample_size=4370.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1919.4, ups=0.25, wpb=7662.3, bsz=120, num_updates=13590, lr=2.47363e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=55634 2023-05-01 18:01:00 - progress_bar.py[line:274] - INFO: epoch 003: 1539 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7498.2, nsentences=120, sample_size=4140.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1951.5, ups=0.26, wpb=7498.2, bsz=120, num_updates=13600, lr=2.47311e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=38, gb_free=30.9, wall=55673 2023-05-01 18:01:41 - progress_bar.py[line:274] - INFO: epoch 003: 1549 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7874.8, nsentences=120, sample_size=3709.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1920.7, ups=0.24, wpb=7874.8, bsz=120, num_updates=13610, lr=2.47258e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=41, gb_free=30.1, wall=55714 2023-05-01 18:02:21 - progress_bar.py[line:274] - INFO: epoch 003: 1559 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7803.3, nsentences=120, sample_size=4117.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1957.7, ups=0.25, wpb=7803.3, bsz=120, num_updates=13620, lr=2.47205e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=55753 2023-05-01 18:03:01 - progress_bar.py[line:274] - INFO: epoch 003: 1569 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7638, nsentences=120, sample_size=4016.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1887.1, ups=0.25, wpb=7638, bsz=120, num_updates=13630, lr=2.47152e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=55794 2023-05-01 18:03:42 - progress_bar.py[line:274] - INFO: epoch 003: 1579 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7739.9, nsentences=120, sample_size=3651.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1910, ups=0.25, wpb=7739.9, bsz=120, num_updates=13640, lr=2.47099e-05, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=30.9, wall=55834 2023-05-01 18:04:22 - progress_bar.py[line:274] - INFO: epoch 003: 1589 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7849.7, nsentences=120, sample_size=4020.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1941.4, ups=0.25, wpb=7849.7, bsz=120, num_updates=13650, lr=2.47046e-05, gnorm=0.919, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=55875 2023-05-01 18:05:03 - progress_bar.py[line:274] - INFO: epoch 003: 1599 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7767.8, nsentences=120, sample_size=3958.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926.1, ups=0.25, wpb=7767.8, bsz=120, num_updates=13660, lr=2.46994e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=55915 2023-05-01 18:05:42 - progress_bar.py[line:274] - INFO: epoch 003: 1609 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7547.5, nsentences=120, sample_size=4112, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1914.2, ups=0.25, wpb=7547.5, bsz=120, num_updates=13670, lr=2.46941e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=55955 2023-05-01 18:06:22 - progress_bar.py[line:274] - INFO: epoch 003: 1619 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7923, nsentences=120, sample_size=4129, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1987.9, ups=0.25, wpb=7923, bsz=120, num_updates=13680, lr=2.46888e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=55994 2023-05-01 18:07:01 - progress_bar.py[line:274] - INFO: epoch 003: 1629 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7416.1, nsentences=120, sample_size=4132.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1881.5, ups=0.25, wpb=7416.1, bsz=120, num_updates=13690, lr=2.46835e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=39, gb_free=30.8, wall=56034 2023-05-01 18:07:25 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 18:07:45 - progress_bar.py[line:274] - INFO: epoch 003: 1640 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7750.2, nsentences=120, sample_size=3930.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1762.9, ups=0.23, wpb=7750.2, bsz=120, num_updates=13700, lr=2.46782e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=44, gb_free=29.7, wall=56078 2023-05-01 18:08:25 - progress_bar.py[line:274] - INFO: epoch 003: 1650 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7839.5, nsentences=120, sample_size=3722.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1975.7, ups=0.25, wpb=7839.5, bsz=120, num_updates=13710, lr=2.46729e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=56117 2023-05-01 18:09:05 - progress_bar.py[line:274] - INFO: epoch 003: 1660 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7463.3, nsentences=120, sample_size=4045.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1852.9, ups=0.25, wpb=7463.3, bsz=120, num_updates=13720, lr=2.46677e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=56158 2023-05-01 18:09:44 - progress_bar.py[line:274] - INFO: epoch 003: 1670 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7458.5, nsentences=120, sample_size=3932, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1921.6, ups=0.26, wpb=7458.5, bsz=120, num_updates=13730, lr=2.46624e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=29.1, wall=56197 2023-05-01 18:10:23 - progress_bar.py[line:274] - INFO: epoch 003: 1680 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7716.9, nsentences=120, sample_size=3927, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1965.9, ups=0.25, wpb=7716.9, bsz=120, num_updates=13740, lr=2.46571e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=56236 2023-05-01 18:11:04 - progress_bar.py[line:274] - INFO: epoch 003: 1690 / 6042 loss=2.503, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=8040.9, nsentences=120, sample_size=3743.3, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1994.9, ups=0.25, wpb=8040.9, bsz=120, num_updates=13750, lr=2.46518e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=56276 2023-05-01 18:11:44 - progress_bar.py[line:274] - INFO: epoch 003: 1700 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7842.1, nsentences=120, sample_size=4093, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1948.7, ups=0.25, wpb=7842.1, bsz=120, num_updates=13760, lr=2.46465e-05, gnorm=0.947, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=56316 2023-05-01 18:12:24 - progress_bar.py[line:274] - INFO: epoch 003: 1710 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7654.8, nsentences=120, sample_size=3921.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1923.1, ups=0.25, wpb=7654.8, bsz=120, num_updates=13770, lr=2.46413e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=56356 2023-05-01 18:13:03 - progress_bar.py[line:274] - INFO: epoch 003: 1720 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7423.2, nsentences=120, sample_size=4119.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1883.9, ups=0.25, wpb=7423.2, bsz=120, num_updates=13780, lr=2.4636e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=56396 2023-05-01 18:13:43 - progress_bar.py[line:274] - INFO: epoch 003: 1730 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7797.3, nsentences=120, sample_size=3990.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1950.2, ups=0.25, wpb=7797.3, bsz=120, num_updates=13790, lr=2.46307e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=56436 2023-05-01 18:14:23 - progress_bar.py[line:274] - INFO: epoch 003: 1740 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7458.6, nsentences=120, sample_size=4066.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1882, ups=0.25, wpb=7458.6, bsz=120, num_updates=13800, lr=2.46254e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=56475 2023-05-01 18:15:02 - progress_bar.py[line:274] - INFO: epoch 003: 1750 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7483.5, nsentences=120, sample_size=4198.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1885.3, ups=0.25, wpb=7483.5, bsz=120, num_updates=13810, lr=2.46201e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=56515 2023-05-01 18:15:42 - progress_bar.py[line:274] - INFO: epoch 003: 1760 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7673.9, nsentences=120, sample_size=3781.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1930.2, ups=0.25, wpb=7673.9, bsz=120, num_updates=13820, lr=2.46148e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=56555 2023-05-01 18:16:23 - progress_bar.py[line:274] - INFO: epoch 003: 1770 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7967.1, nsentences=120, sample_size=3878.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1963.1, ups=0.25, wpb=7967.1, bsz=120, num_updates=13830, lr=2.46096e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=41, gb_free=26.8, wall=56595 2023-05-01 18:17:03 - progress_bar.py[line:274] - INFO: epoch 003: 1780 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7581.6, nsentences=120, sample_size=3859.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1904.7, ups=0.25, wpb=7581.6, bsz=120, num_updates=13840, lr=2.46043e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=56635 2023-05-01 18:17:43 - progress_bar.py[line:274] - INFO: epoch 003: 1790 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7521.3, nsentences=120, sample_size=4185.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1868.2, ups=0.25, wpb=7521.3, bsz=120, num_updates=13850, lr=2.4599e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=56675 2023-05-01 18:18:23 - progress_bar.py[line:274] - INFO: epoch 003: 1800 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7599, nsentences=120, sample_size=4010, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1905.5, ups=0.25, wpb=7599, bsz=120, num_updates=13860, lr=2.45937e-05, gnorm=0.926, clip=0, loss_scale=64, train_wall=40, gb_free=31.4, wall=56715 2023-05-01 18:18:43 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 18:19:07 - progress_bar.py[line:274] - INFO: epoch 003: 1811 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7810.7, nsentences=120, sample_size=3973.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1760.9, ups=0.23, wpb=7810.7, bsz=120, num_updates=13870, lr=2.45884e-05, gnorm=0.988, clip=50, loss_scale=32, train_wall=44, gb_free=29.8, wall=56760 2023-05-01 18:19:46 - progress_bar.py[line:274] - INFO: epoch 003: 1821 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.264, ntokens=7631.9, nsentences=120, sample_size=4070.9, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1941.3, ups=0.25, wpb=7631.9, bsz=120, num_updates=13880, lr=2.45831e-05, gnorm=0.974, clip=40, loss_scale=32, train_wall=39, gb_free=29.6, wall=56799 2023-05-01 18:20:26 - progress_bar.py[line:274] - INFO: epoch 003: 1831 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7891.5, nsentences=120, sample_size=4114.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1994.1, ups=0.25, wpb=7891.5, bsz=120, num_updates=13890, lr=2.45779e-05, gnorm=0.916, clip=20, loss_scale=32, train_wall=39, gb_free=30.7, wall=56838 2023-05-01 18:21:06 - progress_bar.py[line:274] - INFO: epoch 003: 1841 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7517.1, nsentences=120, sample_size=3891.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1879.8, ups=0.25, wpb=7517.1, bsz=120, num_updates=13900, lr=2.45726e-05, gnorm=0.995, clip=40, loss_scale=32, train_wall=40, gb_free=29.6, wall=56878 2023-05-01 18:21:46 - progress_bar.py[line:274] - INFO: epoch 003: 1851 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7619.9, nsentences=120, sample_size=4177.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1894.5, ups=0.25, wpb=7619.9, bsz=120, num_updates=13910, lr=2.45673e-05, gnorm=0.944, clip=20, loss_scale=32, train_wall=40, gb_free=30.1, wall=56919 2023-05-01 18:22:26 - progress_bar.py[line:274] - INFO: epoch 003: 1861 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7607.2, nsentences=120, sample_size=3690.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1906.2, ups=0.25, wpb=7607.2, bsz=120, num_updates=13920, lr=2.4562e-05, gnorm=0.951, clip=30, loss_scale=32, train_wall=40, gb_free=31.3, wall=56959 2023-05-01 18:23:06 - progress_bar.py[line:274] - INFO: epoch 003: 1871 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7633.6, nsentences=120, sample_size=4120.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1936.5, ups=0.25, wpb=7633.6, bsz=120, num_updates=13930, lr=2.45567e-05, gnorm=0.93, clip=20, loss_scale=32, train_wall=39, gb_free=31.1, wall=56998 2023-05-01 18:23:45 - progress_bar.py[line:274] - INFO: epoch 003: 1881 / 6042 loss=2.51, loss_v1=0, loss_v2=0, nll_loss=1.271, ntokens=7826.1, nsentences=120, sample_size=3899.6, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1980.9, ups=0.25, wpb=7826.1, bsz=120, num_updates=13940, lr=2.45515e-05, gnorm=0.96, clip=30, loss_scale=32, train_wall=39, gb_free=30, wall=57037 2023-05-01 18:24:25 - progress_bar.py[line:274] - INFO: epoch 003: 1891 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7680.4, nsentences=120, sample_size=4275.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1906, ups=0.25, wpb=7680.4, bsz=120, num_updates=13950, lr=2.45462e-05, gnorm=0.891, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=57078 2023-05-01 18:25:05 - progress_bar.py[line:274] - INFO: epoch 003: 1901 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7784.8, nsentences=120, sample_size=4320.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1975, ups=0.25, wpb=7784.8, bsz=120, num_updates=13960, lr=2.45409e-05, gnorm=0.895, clip=0, loss_scale=32, train_wall=39, gb_free=30.6, wall=57117 2023-05-01 18:25:45 - progress_bar.py[line:274] - INFO: epoch 003: 1911 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7681, nsentences=120, sample_size=3788.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1928.8, ups=0.25, wpb=7681, bsz=120, num_updates=13970, lr=2.45356e-05, gnorm=0.981, clip=40, loss_scale=32, train_wall=40, gb_free=29.5, wall=57157 2023-05-01 18:26:25 - progress_bar.py[line:274] - INFO: epoch 003: 1921 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7872, nsentences=120, sample_size=4268.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1960.2, ups=0.25, wpb=7872, bsz=120, num_updates=13980, lr=2.45303e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=57197 2023-05-01 18:27:05 - progress_bar.py[line:274] - INFO: epoch 003: 1931 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7976.8, nsentences=120, sample_size=3849.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1970.2, ups=0.25, wpb=7976.8, bsz=120, num_updates=13990, lr=2.4525e-05, gnorm=0.977, clip=50, loss_scale=32, train_wall=40, gb_free=31, wall=57238 2023-05-01 18:27:45 - progress_bar.py[line:274] - INFO: epoch 003: 1941 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7690, nsentences=120, sample_size=4046.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1925.3, ups=0.25, wpb=7690, bsz=120, num_updates=14000, lr=2.45198e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=40, gb_free=29.3, wall=57278 2023-05-01 18:27:45 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 18:27:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 18:27:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:27:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:27:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:27:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 18:28:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:28:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:16 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 18:28:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:28:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 18:28:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:28:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:32 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 18:28:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:28:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 18:28:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 18:28:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 18:28:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 18:28:37 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.203 | loss_v1 0 | loss_v2 0 | nll_loss 2.036 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.1 | score 0.7441 | wps 3298.5 | wpb 3202.1 | bsz 39.4 | num_updates 14000 | best_score 0.751 2023-05-01 18:28:37 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 14000 updates 2023-05-01 18:28:37 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_14000.pt 2023-05-01 18:29:01 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_14000.pt 2023-05-01 18:29:15 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_14000.pt (epoch 3 @ 14000 updates, score 0.7441) (writing took 37.9841252991464 seconds) 2023-05-01 18:29:54 - progress_bar.py[line:274] - INFO: epoch 003: 1951 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.269, ntokens=7712.9, nsentences=120, sample_size=3835.8, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=599.7, ups=0.08, wpb=7712.9, bsz=120, num_updates=14010, lr=2.45145e-05, gnorm=0.939, clip=30, loss_scale=32, train_wall=39, gb_free=30.1, wall=57406 2023-05-01 18:30:34 - progress_bar.py[line:274] - INFO: epoch 003: 1961 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7850.3, nsentences=120, sample_size=3994.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1949.3, ups=0.25, wpb=7850.3, bsz=120, num_updates=14020, lr=2.45092e-05, gnorm=1.066, clip=40, loss_scale=32, train_wall=40, gb_free=30.6, wall=57447 2023-05-01 18:31:14 - progress_bar.py[line:274] - INFO: epoch 003: 1971 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7800.7, nsentences=120, sample_size=3784, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1956.6, ups=0.25, wpb=7800.7, bsz=120, num_updates=14030, lr=2.45039e-05, gnorm=0.961, clip=40, loss_scale=32, train_wall=40, gb_free=30, wall=57486 2023-05-01 18:31:54 - progress_bar.py[line:274] - INFO: epoch 003: 1981 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7704.7, nsentences=120, sample_size=4168.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1933.9, ups=0.25, wpb=7704.7, bsz=120, num_updates=14040, lr=2.44986e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=57526 2023-05-01 18:32:34 - progress_bar.py[line:274] - INFO: epoch 003: 1991 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7651.6, nsentences=120, sample_size=4201.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1916.6, ups=0.25, wpb=7651.6, bsz=120, num_updates=14050, lr=2.44934e-05, gnorm=0.944, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=57566 2023-05-01 18:33:14 - progress_bar.py[line:274] - INFO: epoch 003: 2001 / 6042 loss=2.504, loss_v1=0, loss_v2=0, nll_loss=1.274, ntokens=7709.9, nsentences=120, sample_size=3972.9, sample_size_v1=0, sample_size_v2=0, ppl=2.42, wps=1901.3, ups=0.25, wpb=7709.9, bsz=120, num_updates=14060, lr=2.44881e-05, gnorm=0.967, clip=30, loss_scale=32, train_wall=40, gb_free=28.9, wall=57607 2023-05-01 18:33:54 - progress_bar.py[line:274] - INFO: epoch 003: 2011 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7690.7, nsentences=120, sample_size=4109.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1925.3, ups=0.25, wpb=7690.7, bsz=120, num_updates=14070, lr=2.44828e-05, gnorm=0.949, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=57647 2023-05-01 18:34:35 - progress_bar.py[line:274] - INFO: epoch 003: 2021 / 6042 loss=2.494, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7842.8, nsentences=120, sample_size=4022.7, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1906, ups=0.24, wpb=7842.8, bsz=120, num_updates=14080, lr=2.44775e-05, gnorm=0.955, clip=30, loss_scale=32, train_wall=41, gb_free=30.3, wall=57688 2023-05-01 18:35:16 - progress_bar.py[line:274] - INFO: epoch 003: 2031 / 6042 loss=2.515, loss_v1=0, loss_v2=0, nll_loss=1.281, ntokens=8100.5, nsentences=120, sample_size=4163.9, sample_size_v1=0, sample_size_v2=0, ppl=2.43, wps=2011.1, ups=0.25, wpb=8100.5, bsz=120, num_updates=14090, lr=2.44722e-05, gnorm=0.937, clip=20, loss_scale=32, train_wall=40, gb_free=25.2, wall=57728 2023-05-01 18:35:55 - progress_bar.py[line:274] - INFO: epoch 003: 2041 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7688.7, nsentences=120, sample_size=3913.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1957.6, ups=0.25, wpb=7688.7, bsz=120, num_updates=14100, lr=2.44669e-05, gnorm=0.974, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=57767 2023-05-01 18:36:34 - progress_bar.py[line:274] - INFO: epoch 003: 2051 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7912.6, nsentences=120, sample_size=4015.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2018.6, ups=0.26, wpb=7912.6, bsz=120, num_updates=14110, lr=2.44617e-05, gnorm=0.947, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=57807 2023-05-01 18:37:14 - progress_bar.py[line:274] - INFO: epoch 003: 2061 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7647, nsentences=120, sample_size=4369.9, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1904.8, ups=0.25, wpb=7647, bsz=120, num_updates=14120, lr=2.44564e-05, gnorm=0.928, clip=30, loss_scale=32, train_wall=40, gb_free=30.3, wall=57847 2023-05-01 18:37:54 - progress_bar.py[line:274] - INFO: epoch 003: 2071 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7828.1, nsentences=120, sample_size=4089.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1952.4, ups=0.25, wpb=7828.1, bsz=120, num_updates=14130, lr=2.44511e-05, gnorm=0.943, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=57887 2023-05-01 18:38:34 - progress_bar.py[line:274] - INFO: epoch 003: 2081 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7815.7, nsentences=120, sample_size=3967.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1955.7, ups=0.25, wpb=7815.7, bsz=120, num_updates=14140, lr=2.44458e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=57927 2023-05-01 18:39:13 - progress_bar.py[line:274] - INFO: epoch 003: 2091 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7566.9, nsentences=120, sample_size=3919, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1954, ups=0.26, wpb=7566.9, bsz=120, num_updates=14150, lr=2.44405e-05, gnorm=0.945, clip=30, loss_scale=32, train_wall=39, gb_free=30.7, wall=57966 2023-05-01 18:39:53 - progress_bar.py[line:274] - INFO: epoch 003: 2101 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7655.9, nsentences=120, sample_size=4098.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1907.8, ups=0.25, wpb=7655.9, bsz=120, num_updates=14160, lr=2.44352e-05, gnorm=0.942, clip=20, loss_scale=32, train_wall=40, gb_free=30.4, wall=58006 2023-05-01 18:40:34 - progress_bar.py[line:274] - INFO: epoch 003: 2111 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.262, ntokens=7689, nsentences=120, sample_size=3833.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1894.7, ups=0.25, wpb=7689, bsz=120, num_updates=14170, lr=2.443e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=41, gb_free=30.4, wall=58046 2023-05-01 18:41:13 - progress_bar.py[line:274] - INFO: epoch 003: 2121 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7642.5, nsentences=120, sample_size=3869.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1948.4, ups=0.25, wpb=7642.5, bsz=120, num_updates=14180, lr=2.44247e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=39, gb_free=30.8, wall=58085 2023-05-01 18:41:52 - progress_bar.py[line:274] - INFO: epoch 003: 2131 / 6042 loss=2.501, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7596.2, nsentences=120, sample_size=4136.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1930.6, ups=0.25, wpb=7596.2, bsz=120, num_updates=14190, lr=2.44194e-05, gnorm=0.958, clip=30, loss_scale=32, train_wall=39, gb_free=29.5, wall=58125 2023-05-01 18:42:32 - progress_bar.py[line:274] - INFO: epoch 003: 2141 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7671.4, nsentences=120, sample_size=4129.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1914.8, ups=0.25, wpb=7671.4, bsz=120, num_updates=14200, lr=2.44141e-05, gnorm=0.927, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=58165 2023-05-01 18:43:12 - progress_bar.py[line:274] - INFO: epoch 003: 2151 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.256, ntokens=7755, nsentences=120, sample_size=4048.8, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1965.8, ups=0.25, wpb=7755, bsz=120, num_updates=14210, lr=2.44088e-05, gnorm=0.933, clip=10, loss_scale=32, train_wall=39, gb_free=30.9, wall=58204 2023-05-01 18:43:52 - progress_bar.py[line:274] - INFO: epoch 003: 2161 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7489.5, nsentences=120, sample_size=4052.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1859.6, ups=0.25, wpb=7489.5, bsz=120, num_updates=14220, lr=2.44036e-05, gnorm=0.934, clip=10, loss_scale=32, train_wall=40, gb_free=30.5, wall=58245 2023-05-01 18:44:31 - progress_bar.py[line:274] - INFO: epoch 003: 2171 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7718.7, nsentences=120, sample_size=3912.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1970.2, ups=0.26, wpb=7718.7, bsz=120, num_updates=14230, lr=2.43983e-05, gnorm=0.972, clip=20, loss_scale=32, train_wall=39, gb_free=29.1, wall=58284 2023-05-01 18:45:11 - progress_bar.py[line:274] - INFO: epoch 003: 2181 / 6042 loss=2.526, loss_v1=0, loss_v2=0, nll_loss=1.29, ntokens=7881.5, nsentences=120, sample_size=4376.9, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=1998.7, ups=0.25, wpb=7881.5, bsz=120, num_updates=14240, lr=2.4393e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=39, gb_free=30.1, wall=58323 2023-05-01 18:45:51 - progress_bar.py[line:274] - INFO: epoch 003: 2191 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7938.7, nsentences=120, sample_size=3690.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1979, ups=0.25, wpb=7938.7, bsz=120, num_updates=14250, lr=2.43877e-05, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=28.5, wall=58363 2023-05-01 18:46:31 - progress_bar.py[line:274] - INFO: epoch 003: 2201 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7462.9, nsentences=120, sample_size=3939.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1860.8, ups=0.25, wpb=7462.9, bsz=120, num_updates=14260, lr=2.43824e-05, gnorm=0.943, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=58403 2023-05-01 18:47:11 - progress_bar.py[line:274] - INFO: epoch 003: 2211 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7883.6, nsentences=120, sample_size=3930.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1968.5, ups=0.25, wpb=7883.6, bsz=120, num_updates=14270, lr=2.43771e-05, gnorm=0.922, clip=20, loss_scale=32, train_wall=40, gb_free=28.1, wall=58443 2023-05-01 18:47:51 - progress_bar.py[line:274] - INFO: epoch 003: 2221 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7902.7, nsentences=120, sample_size=3922.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1999.3, ups=0.25, wpb=7902.7, bsz=120, num_updates=14280, lr=2.43719e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=58483 2023-05-01 18:48:31 - progress_bar.py[line:274] - INFO: epoch 003: 2231 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7626.6, nsentences=120, sample_size=4031.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1904.2, ups=0.25, wpb=7626.6, bsz=120, num_updates=14290, lr=2.43666e-05, gnorm=0.95, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=58523 2023-05-01 18:49:12 - progress_bar.py[line:274] - INFO: epoch 003: 2241 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7906, nsentences=120, sample_size=3735.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1930.9, ups=0.24, wpb=7906, bsz=120, num_updates=14300, lr=2.43613e-05, gnorm=0.92, clip=20, loss_scale=32, train_wall=41, gb_free=29.3, wall=58564 2023-05-01 18:49:51 - progress_bar.py[line:274] - INFO: epoch 003: 2251 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7581.3, nsentences=120, sample_size=4251.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1904.8, ups=0.25, wpb=7581.3, bsz=120, num_updates=14310, lr=2.4356e-05, gnorm=0.942, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=58604 2023-05-01 18:50:31 - progress_bar.py[line:274] - INFO: epoch 003: 2261 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7814.5, nsentences=120, sample_size=3940.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1976.9, ups=0.25, wpb=7814.5, bsz=120, num_updates=14320, lr=2.43507e-05, gnorm=0.928, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=58643 2023-05-01 18:51:11 - progress_bar.py[line:274] - INFO: epoch 003: 2271 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7697.6, nsentences=120, sample_size=3989.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1935.6, ups=0.25, wpb=7697.6, bsz=120, num_updates=14330, lr=2.43455e-05, gnorm=0.936, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=58683 2023-05-01 18:51:51 - progress_bar.py[line:274] - INFO: epoch 003: 2281 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7908.9, nsentences=120, sample_size=4205.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1955.9, ups=0.25, wpb=7908.9, bsz=120, num_updates=14340, lr=2.43402e-05, gnorm=0.889, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=58724 2023-05-01 18:52:32 - progress_bar.py[line:274] - INFO: epoch 003: 2291 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7645.6, nsentences=120, sample_size=4081.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1886.4, ups=0.25, wpb=7645.6, bsz=120, num_updates=14350, lr=2.43349e-05, gnorm=0.94, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=58764 2023-05-01 18:53:11 - progress_bar.py[line:274] - INFO: epoch 003: 2301 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7723.5, nsentences=120, sample_size=4191.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1970.7, ups=0.26, wpb=7723.5, bsz=120, num_updates=14360, lr=2.43296e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=39, gb_free=29.9, wall=58803 2023-05-01 18:53:50 - progress_bar.py[line:274] - INFO: epoch 003: 2311 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7652.7, nsentences=120, sample_size=3947.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1934.1, ups=0.25, wpb=7652.7, bsz=120, num_updates=14370, lr=2.43243e-05, gnorm=0.905, clip=10, loss_scale=32, train_wall=39, gb_free=31, wall=58843 2023-05-01 18:54:30 - progress_bar.py[line:274] - INFO: epoch 003: 2321 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7765.9, nsentences=120, sample_size=3798.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1970.8, ups=0.25, wpb=7765.9, bsz=120, num_updates=14380, lr=2.4319e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=58882 2023-05-01 18:55:09 - progress_bar.py[line:274] - INFO: epoch 003: 2331 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7567.3, nsentences=120, sample_size=3941.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1918.6, ups=0.25, wpb=7567.3, bsz=120, num_updates=14390, lr=2.43138e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=58922 2023-05-01 18:55:49 - progress_bar.py[line:274] - INFO: epoch 003: 2341 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7693.6, nsentences=120, sample_size=3759.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1922, ups=0.25, wpb=7693.6, bsz=120, num_updates=14400, lr=2.43085e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=58962 2023-05-01 18:56:29 - progress_bar.py[line:274] - INFO: epoch 003: 2351 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7497.3, nsentences=120, sample_size=4435.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1873.5, ups=0.25, wpb=7497.3, bsz=120, num_updates=14410, lr=2.43032e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=59002 2023-05-01 18:57:08 - progress_bar.py[line:274] - INFO: epoch 003: 2361 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7719.5, nsentences=120, sample_size=4069.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1969.7, ups=0.26, wpb=7719.5, bsz=120, num_updates=14420, lr=2.42979e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=59041 2023-05-01 18:57:48 - progress_bar.py[line:274] - INFO: epoch 003: 2371 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=8077.1, nsentences=120, sample_size=3887.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2023.3, ups=0.25, wpb=8077.1, bsz=120, num_updates=14430, lr=2.42926e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=59081 2023-05-01 18:58:28 - progress_bar.py[line:274] - INFO: epoch 003: 2381 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7912.1, nsentences=120, sample_size=3790.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2010, ups=0.25, wpb=7912.1, bsz=120, num_updates=14440, lr=2.42873e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=59120 2023-05-01 18:59:07 - progress_bar.py[line:274] - INFO: epoch 003: 2391 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7963.9, nsentences=120, sample_size=3794, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2017.5, ups=0.25, wpb=7963.9, bsz=120, num_updates=14450, lr=2.42821e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=31, wall=59160 2023-05-01 18:59:47 - progress_bar.py[line:274] - INFO: epoch 003: 2401 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7652.5, nsentences=120, sample_size=4132.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1923, ups=0.25, wpb=7652.5, bsz=120, num_updates=14460, lr=2.42768e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=59199 2023-05-01 19:00:27 - progress_bar.py[line:274] - INFO: epoch 003: 2411 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7947.9, nsentences=120, sample_size=4224.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1991.2, ups=0.25, wpb=7947.9, bsz=120, num_updates=14470, lr=2.42715e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=27.1, wall=59239 2023-05-01 19:01:06 - progress_bar.py[line:274] - INFO: epoch 003: 2421 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7637.5, nsentences=120, sample_size=3796.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1944, ups=0.25, wpb=7637.5, bsz=120, num_updates=14480, lr=2.42662e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=59279 2023-05-01 19:01:45 - progress_bar.py[line:274] - INFO: epoch 003: 2431 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7684, nsentences=120, sample_size=4138.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1974.8, ups=0.26, wpb=7684, bsz=120, num_updates=14490, lr=2.42609e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=26.3, wall=59318 2023-05-01 19:02:25 - progress_bar.py[line:274] - INFO: epoch 003: 2441 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7779.3, nsentences=120, sample_size=4026.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1954.9, ups=0.25, wpb=7779.3, bsz=120, num_updates=14500, lr=2.42557e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=59357 2023-05-01 19:03:05 - progress_bar.py[line:274] - INFO: epoch 003: 2451 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=8104.1, nsentences=120, sample_size=4033.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2006.3, ups=0.25, wpb=8104.1, bsz=120, num_updates=14510, lr=2.42504e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=59398 2023-05-01 19:03:46 - progress_bar.py[line:274] - INFO: epoch 003: 2461 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7680.1, nsentences=120, sample_size=4158.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1893.1, ups=0.25, wpb=7680.1, bsz=120, num_updates=14520, lr=2.42451e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=59438 2023-05-01 19:04:26 - progress_bar.py[line:274] - INFO: epoch 003: 2471 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7830.1, nsentences=120, sample_size=4023.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1942.1, ups=0.25, wpb=7830.1, bsz=120, num_updates=14530, lr=2.42398e-05, gnorm=0.951, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=59479 2023-05-01 19:05:05 - progress_bar.py[line:274] - INFO: epoch 003: 2481 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7638.7, nsentences=120, sample_size=4236.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1944.5, ups=0.25, wpb=7638.7, bsz=120, num_updates=14540, lr=2.42345e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=39, gb_free=30.6, wall=59518 2023-05-01 19:05:45 - progress_bar.py[line:274] - INFO: epoch 003: 2491 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7571.1, nsentences=120, sample_size=4124.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1927.3, ups=0.25, wpb=7571.1, bsz=120, num_updates=14550, lr=2.42292e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=59557 2023-05-01 19:06:25 - progress_bar.py[line:274] - INFO: epoch 003: 2501 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7788.2, nsentences=120, sample_size=3887.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1952.4, ups=0.25, wpb=7788.2, bsz=120, num_updates=14560, lr=2.4224e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=59597 2023-05-01 19:07:05 - progress_bar.py[line:274] - INFO: epoch 003: 2511 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7836.1, nsentences=120, sample_size=4236.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1943.6, ups=0.25, wpb=7836.1, bsz=120, num_updates=14570, lr=2.42187e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=28.6, wall=59637 2023-05-01 19:07:45 - progress_bar.py[line:274] - INFO: epoch 003: 2521 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7952.1, nsentences=120, sample_size=4081.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1969.3, ups=0.25, wpb=7952.1, bsz=120, num_updates=14580, lr=2.42134e-05, gnorm=0.92, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=59678 2023-05-01 19:08:26 - progress_bar.py[line:274] - INFO: epoch 003: 2531 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7816.1, nsentences=120, sample_size=4244, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1914.3, ups=0.24, wpb=7816.1, bsz=120, num_updates=14590, lr=2.42081e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=41, gb_free=29.2, wall=59719 2023-05-01 19:09:06 - progress_bar.py[line:274] - INFO: epoch 003: 2541 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7542.2, nsentences=120, sample_size=4150.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1891.4, ups=0.25, wpb=7542.2, bsz=120, num_updates=14600, lr=2.42028e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=59759 2023-05-01 19:09:46 - progress_bar.py[line:274] - INFO: epoch 003: 2551 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7513.1, nsentences=120, sample_size=3671.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1873, ups=0.25, wpb=7513.1, bsz=120, num_updates=14610, lr=2.41976e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=59799 2023-05-01 19:10:26 - progress_bar.py[line:274] - INFO: epoch 003: 2561 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7922.4, nsentences=120, sample_size=3754.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2008.2, ups=0.25, wpb=7922.4, bsz=120, num_updates=14620, lr=2.41923e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=27, wall=59838 2023-05-01 19:11:05 - progress_bar.py[line:274] - INFO: epoch 003: 2571 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7570.3, nsentences=120, sample_size=4224.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1920.8, ups=0.25, wpb=7570.3, bsz=120, num_updates=14630, lr=2.4187e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=39, gb_free=30.6, wall=59878 2023-05-01 19:11:45 - progress_bar.py[line:274] - INFO: epoch 003: 2581 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7661.4, nsentences=120, sample_size=3822.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1936.3, ups=0.25, wpb=7661.4, bsz=120, num_updates=14640, lr=2.41817e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=59917 2023-05-01 19:12:25 - progress_bar.py[line:274] - INFO: epoch 003: 2591 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7576.7, nsentences=120, sample_size=4336.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1872.9, ups=0.25, wpb=7576.7, bsz=120, num_updates=14650, lr=2.41764e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=59958 2023-05-01 19:13:05 - progress_bar.py[line:274] - INFO: epoch 003: 2601 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7799.9, nsentences=120, sample_size=3807.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1976.5, ups=0.25, wpb=7799.9, bsz=120, num_updates=14660, lr=2.41711e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=31.5, wall=59997 2023-05-01 19:13:44 - progress_bar.py[line:274] - INFO: epoch 003: 2611 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7794.2, nsentences=120, sample_size=3686.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1986.4, ups=0.25, wpb=7794.2, bsz=120, num_updates=14670, lr=2.41659e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=60036 2023-05-01 19:14:23 - progress_bar.py[line:274] - INFO: epoch 003: 2621 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7545.2, nsentences=120, sample_size=3934.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1943.3, ups=0.26, wpb=7545.2, bsz=120, num_updates=14680, lr=2.41606e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=60075 2023-05-01 19:15:03 - progress_bar.py[line:274] - INFO: epoch 003: 2631 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7827.8, nsentences=120, sample_size=3929.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1956.8, ups=0.25, wpb=7827.8, bsz=120, num_updates=14690, lr=2.41553e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=60115 2023-05-01 19:15:43 - progress_bar.py[line:274] - INFO: epoch 003: 2641 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7735.8, nsentences=120, sample_size=4064.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1935.6, ups=0.25, wpb=7735.8, bsz=120, num_updates=14700, lr=2.415e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=60155 2023-05-01 19:16:23 - progress_bar.py[line:274] - INFO: epoch 003: 2651 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7554.7, nsentences=120, sample_size=4034.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1877, ups=0.25, wpb=7554.7, bsz=120, num_updates=14710, lr=2.41447e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=60195 2023-05-01 19:17:03 - progress_bar.py[line:274] - INFO: epoch 003: 2661 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7872.4, nsentences=120, sample_size=4167.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1942.2, ups=0.25, wpb=7872.4, bsz=120, num_updates=14720, lr=2.41394e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=60236 2023-05-01 19:17:44 - progress_bar.py[line:274] - INFO: epoch 003: 2671 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7792.6, nsentences=120, sample_size=3641.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1904.1, ups=0.24, wpb=7792.6, bsz=120, num_updates=14730, lr=2.41342e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=41, gb_free=30.7, wall=60277 2023-05-01 19:18:24 - progress_bar.py[line:274] - INFO: epoch 003: 2681 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7913.5, nsentences=120, sample_size=4002.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2006.9, ups=0.25, wpb=7913.5, bsz=120, num_updates=14740, lr=2.41289e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=25.1, wall=60316 2023-05-01 19:19:04 - progress_bar.py[line:274] - INFO: epoch 003: 2691 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7782.4, nsentences=120, sample_size=4350.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1949, ups=0.25, wpb=7782.4, bsz=120, num_updates=14750, lr=2.41236e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=60356 2023-05-01 19:19:43 - progress_bar.py[line:274] - INFO: epoch 003: 2701 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7712.2, nsentences=120, sample_size=4401.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1962.8, ups=0.25, wpb=7712.2, bsz=120, num_updates=14760, lr=2.41183e-05, gnorm=0.885, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=60395 2023-05-01 19:20:23 - progress_bar.py[line:274] - INFO: epoch 003: 2711 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7669.2, nsentences=120, sample_size=4301.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1929.3, ups=0.25, wpb=7669.2, bsz=120, num_updates=14770, lr=2.4113e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=60435 2023-05-01 19:21:03 - progress_bar.py[line:274] - INFO: epoch 003: 2721 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7980.7, nsentences=120, sample_size=3906.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1968.1, ups=0.25, wpb=7980.7, bsz=120, num_updates=14780, lr=2.41078e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=60476 2023-05-01 19:21:42 - progress_bar.py[line:274] - INFO: epoch 003: 2731 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7666.5, nsentences=120, sample_size=4374, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1968.2, ups=0.26, wpb=7666.5, bsz=120, num_updates=14790, lr=2.41025e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=60515 2023-05-01 19:22:21 - progress_bar.py[line:274] - INFO: epoch 003: 2741 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7446.5, nsentences=120, sample_size=4006, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1921.6, ups=0.26, wpb=7446.5, bsz=120, num_updates=14800, lr=2.40972e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=60553 2023-05-01 19:23:01 - progress_bar.py[line:274] - INFO: epoch 003: 2751 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7529.7, nsentences=120, sample_size=3835.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1899.2, ups=0.25, wpb=7529.7, bsz=120, num_updates=14810, lr=2.40919e-05, gnorm=1.033, clip=70, loss_scale=64, train_wall=40, gb_free=30.5, wall=60593 2023-05-01 19:23:40 - progress_bar.py[line:274] - INFO: epoch 003: 2761 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7524.1, nsentences=120, sample_size=3984.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1896.5, ups=0.25, wpb=7524.1, bsz=120, num_updates=14820, lr=2.40866e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=60633 2023-05-01 19:24:20 - progress_bar.py[line:274] - INFO: epoch 003: 2771 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7594, nsentences=120, sample_size=3899.3, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1922.8, ups=0.25, wpb=7594, bsz=120, num_updates=14830, lr=2.40813e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=30.8, wall=60672 2023-05-01 19:24:59 - progress_bar.py[line:274] - INFO: epoch 003: 2781 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7517.4, nsentences=120, sample_size=3775.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1892.6, ups=0.25, wpb=7517.4, bsz=120, num_updates=14840, lr=2.40761e-05, gnorm=0.955, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=60712 2023-05-01 19:25:39 - progress_bar.py[line:274] - INFO: epoch 003: 2791 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7828.8, nsentences=120, sample_size=3918.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1982.3, ups=0.25, wpb=7828.8, bsz=120, num_updates=14850, lr=2.40708e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=60751 2023-05-01 19:26:18 - progress_bar.py[line:274] - INFO: epoch 003: 2801 / 6042 loss=2.505, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=7880.3, nsentences=120, sample_size=4083.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1997.7, ups=0.25, wpb=7880.3, bsz=120, num_updates=14860, lr=2.40655e-05, gnorm=0.939, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=60791 2023-05-01 19:26:59 - progress_bar.py[line:274] - INFO: epoch 003: 2811 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7779.2, nsentences=120, sample_size=4075.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1926, ups=0.25, wpb=7779.2, bsz=120, num_updates=14870, lr=2.40602e-05, gnorm=0.942, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=60831 2023-05-01 19:27:38 - progress_bar.py[line:274] - INFO: epoch 003: 2821 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7468.6, nsentences=120, sample_size=3930.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1890.3, ups=0.25, wpb=7468.6, bsz=120, num_updates=14880, lr=2.40549e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=60871 2023-05-01 19:28:18 - progress_bar.py[line:274] - INFO: epoch 003: 2831 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7746.6, nsentences=120, sample_size=4028.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1967.9, ups=0.25, wpb=7746.6, bsz=120, num_updates=14890, lr=2.40497e-05, gnorm=0.939, clip=20, loss_scale=128, train_wall=39, gb_free=28.8, wall=60910 2023-05-01 19:28:29 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 19:29:01 - progress_bar.py[line:274] - INFO: epoch 003: 2842 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7483.2, nsentences=120, sample_size=4099.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1734.5, ups=0.23, wpb=7483.2, bsz=120, num_updates=14900, lr=2.40444e-05, gnorm=0.883, clip=10, loss_scale=64, train_wall=43, gb_free=30.1, wall=60953 2023-05-01 19:29:41 - progress_bar.py[line:274] - INFO: epoch 003: 2852 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7561.4, nsentences=120, sample_size=4298.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1884.7, ups=0.25, wpb=7561.4, bsz=120, num_updates=14910, lr=2.40391e-05, gnorm=0.885, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=60993 2023-05-01 19:30:20 - progress_bar.py[line:274] - INFO: epoch 003: 2862 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7357, nsentences=120, sample_size=4254.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1866.8, ups=0.25, wpb=7357, bsz=120, num_updates=14920, lr=2.40338e-05, gnorm=0.895, clip=0, loss_scale=64, train_wall=39, gb_free=27.1, wall=61033 2023-05-01 19:31:00 - progress_bar.py[line:274] - INFO: epoch 003: 2872 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7801.3, nsentences=120, sample_size=3988.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1971.4, ups=0.25, wpb=7801.3, bsz=120, num_updates=14930, lr=2.40285e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=61072 2023-05-01 19:31:40 - progress_bar.py[line:274] - INFO: epoch 003: 2882 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7866.4, nsentences=120, sample_size=4173.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1945.5, ups=0.25, wpb=7866.4, bsz=120, num_updates=14940, lr=2.40232e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=61113 2023-05-01 19:32:20 - progress_bar.py[line:274] - INFO: epoch 003: 2892 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7660, nsentences=120, sample_size=3886.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1919.9, ups=0.25, wpb=7660, bsz=120, num_updates=14950, lr=2.4018e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=61153 2023-05-01 19:33:01 - progress_bar.py[line:274] - INFO: epoch 003: 2902 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7828, nsentences=120, sample_size=3941.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1938.2, ups=0.25, wpb=7828, bsz=120, num_updates=14960, lr=2.40127e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=61193 2023-05-01 19:33:41 - progress_bar.py[line:274] - INFO: epoch 003: 2912 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7642.6, nsentences=120, sample_size=4338.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1885.6, ups=0.25, wpb=7642.6, bsz=120, num_updates=14970, lr=2.40074e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=61234 2023-05-01 19:34:21 - progress_bar.py[line:274] - INFO: epoch 003: 2922 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7335.1, nsentences=120, sample_size=4203.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1827.7, ups=0.25, wpb=7335.1, bsz=120, num_updates=14980, lr=2.40021e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=61274 2023-05-01 19:35:00 - progress_bar.py[line:274] - INFO: epoch 003: 2932 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7825.9, nsentences=120, sample_size=3963.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2000.6, ups=0.26, wpb=7825.9, bsz=120, num_updates=14990, lr=2.39968e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=61313 2023-05-01 19:35:41 - progress_bar.py[line:274] - INFO: epoch 003: 2942 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7958.2, nsentences=120, sample_size=4175.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1954.5, ups=0.25, wpb=7958.2, bsz=120, num_updates=15000, lr=2.39915e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=41, gb_free=26.9, wall=61354 2023-05-01 19:35:41 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 19:35:43 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 19:35:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:35:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:35:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:35:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:00 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 19:36:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:36:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 19:36:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:36:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 19:36:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:36:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:28 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 19:36:28 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:36:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 19:36:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 19:36:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 19:36:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 19:36:33 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.226 | loss_v1 0 | loss_v2 0 | nll_loss 2.062 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.18 | score 0.7505 | wps 3270.2 | wpb 3202.1 | bsz 39.4 | num_updates 15000 | best_score 0.751 2023-05-01 19:36:33 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 15000 updates 2023-05-01 19:36:33 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_15000.pt 2023-05-01 19:36:57 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_15000.pt 2023-05-01 19:37:23 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_15000.pt (epoch 3 @ 15000 updates, score 0.7505) (writing took 50.45187128917314 seconds) 2023-05-01 19:38:04 - progress_bar.py[line:274] - INFO: epoch 003: 2952 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7808.5, nsentences=120, sample_size=4216.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=547.3, ups=0.07, wpb=7808.5, bsz=120, num_updates=15010, lr=2.39863e-05, gnorm=0.896, clip=0, loss_scale=64, train_wall=40, gb_free=28.7, wall=61496 2023-05-01 19:38:44 - progress_bar.py[line:274] - INFO: epoch 003: 2962 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7540.9, nsentences=120, sample_size=4252, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1895.8, ups=0.25, wpb=7540.9, bsz=120, num_updates=15020, lr=2.3981e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=61536 2023-05-01 19:39:24 - progress_bar.py[line:274] - INFO: epoch 003: 2972 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7658.4, nsentences=120, sample_size=4129.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1902.5, ups=0.25, wpb=7658.4, bsz=120, num_updates=15030, lr=2.39757e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=61576 2023-05-01 19:40:04 - progress_bar.py[line:274] - INFO: epoch 003: 2982 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7598.2, nsentences=120, sample_size=4259.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1898.9, ups=0.25, wpb=7598.2, bsz=120, num_updates=15040, lr=2.39704e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=61616 2023-05-01 19:40:44 - progress_bar.py[line:274] - INFO: epoch 003: 2992 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7936.2, nsentences=120, sample_size=4126.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1986.7, ups=0.25, wpb=7936.2, bsz=120, num_updates=15050, lr=2.39651e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=24.6, wall=61656 2023-05-01 19:41:24 - progress_bar.py[line:274] - INFO: epoch 003: 3002 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7698.9, nsentences=120, sample_size=4290.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1907.5, ups=0.25, wpb=7698.9, bsz=120, num_updates=15060, lr=2.39599e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=31.2, wall=61697 2023-05-01 19:42:04 - progress_bar.py[line:274] - INFO: epoch 003: 3012 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7767.7, nsentences=120, sample_size=3960.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1959.2, ups=0.25, wpb=7767.7, bsz=120, num_updates=15070, lr=2.39546e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=61736 2023-05-01 19:42:44 - progress_bar.py[line:274] - INFO: epoch 003: 3022 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7625.1, nsentences=120, sample_size=3837.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1922, ups=0.25, wpb=7625.1, bsz=120, num_updates=15080, lr=2.39493e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=61776 2023-05-01 19:43:23 - progress_bar.py[line:274] - INFO: epoch 003: 3032 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7743.8, nsentences=120, sample_size=3931.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1947, ups=0.25, wpb=7743.8, bsz=120, num_updates=15090, lr=2.3944e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=61816 2023-05-01 19:44:03 - progress_bar.py[line:274] - INFO: epoch 003: 3042 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7944.5, nsentences=120, sample_size=3861.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2013.9, ups=0.25, wpb=7944.5, bsz=120, num_updates=15100, lr=2.39387e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=39, gb_free=30.8, wall=61855 2023-05-01 19:44:43 - progress_bar.py[line:274] - INFO: epoch 003: 3052 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7816.3, nsentences=120, sample_size=3982.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1920.5, ups=0.25, wpb=7816.3, bsz=120, num_updates=15110, lr=2.39334e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=41, gb_free=29.2, wall=61896 2023-05-01 19:45:23 - progress_bar.py[line:274] - INFO: epoch 003: 3062 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7647.3, nsentences=120, sample_size=4414.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1912.3, ups=0.25, wpb=7647.3, bsz=120, num_updates=15120, lr=2.39282e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=61936 2023-05-01 19:46:03 - progress_bar.py[line:274] - INFO: epoch 003: 3072 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7566, nsentences=120, sample_size=4063, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1933.3, ups=0.26, wpb=7566, bsz=120, num_updates=15130, lr=2.39229e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=39, gb_free=29.2, wall=61975 2023-05-01 19:46:42 - progress_bar.py[line:274] - INFO: epoch 003: 3082 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7834.7, nsentences=120, sample_size=3778.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1991.1, ups=0.25, wpb=7834.7, bsz=120, num_updates=15140, lr=2.39176e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=62014 2023-05-01 19:47:22 - progress_bar.py[line:274] - INFO: epoch 003: 3092 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7802.8, nsentences=120, sample_size=3938.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1957.6, ups=0.25, wpb=7802.8, bsz=120, num_updates=15150, lr=2.39123e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=62054 2023-05-01 19:48:01 - progress_bar.py[line:274] - INFO: epoch 003: 3102 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7870.7, nsentences=120, sample_size=3960.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1995.9, ups=0.25, wpb=7870.7, bsz=120, num_updates=15160, lr=2.3907e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=62094 2023-05-01 19:48:41 - progress_bar.py[line:274] - INFO: epoch 003: 3112 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7588.8, nsentences=120, sample_size=4137.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1887.5, ups=0.25, wpb=7588.8, bsz=120, num_updates=15170, lr=2.39018e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=62134 2023-05-01 19:49:22 - progress_bar.py[line:274] - INFO: epoch 003: 3122 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7484.5, nsentences=120, sample_size=3977, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1864.4, ups=0.25, wpb=7484.5, bsz=120, num_updates=15180, lr=2.38965e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=62174 2023-05-01 19:50:02 - progress_bar.py[line:274] - INFO: epoch 003: 3132 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7535.7, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1859.3, ups=0.25, wpb=7535.7, bsz=120, num_updates=15190, lr=2.38912e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=62215 2023-05-01 19:50:42 - progress_bar.py[line:274] - INFO: epoch 003: 3142 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7548, nsentences=120, sample_size=4027.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1904.4, ups=0.25, wpb=7548, bsz=120, num_updates=15200, lr=2.38859e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=26.8, wall=62254 2023-05-01 19:51:22 - progress_bar.py[line:274] - INFO: epoch 003: 3152 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7812.5, nsentences=120, sample_size=4242.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1949.1, ups=0.25, wpb=7812.5, bsz=120, num_updates=15210, lr=2.38806e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=62294 2023-05-01 19:52:01 - progress_bar.py[line:274] - INFO: epoch 003: 3162 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7761.8, nsentences=120, sample_size=3840.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1992.6, ups=0.26, wpb=7761.8, bsz=120, num_updates=15220, lr=2.38753e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=62333 2023-05-01 19:52:40 - progress_bar.py[line:274] - INFO: epoch 003: 3172 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7453.2, nsentences=120, sample_size=4387.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1882, ups=0.25, wpb=7453.2, bsz=120, num_updates=15230, lr=2.38701e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=28, wall=62373 2023-05-01 19:53:21 - progress_bar.py[line:274] - INFO: epoch 003: 3182 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7871.9, nsentences=120, sample_size=3701.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1960, ups=0.25, wpb=7871.9, bsz=120, num_updates=15240, lr=2.38648e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=62413 2023-05-01 19:54:00 - progress_bar.py[line:274] - INFO: epoch 003: 3192 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=8056.5, nsentences=120, sample_size=3891.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2025.5, ups=0.25, wpb=8056.5, bsz=120, num_updates=15250, lr=2.38595e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=24.9, wall=62453 2023-05-01 19:54:40 - progress_bar.py[line:274] - INFO: epoch 003: 3202 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7629.3, nsentences=120, sample_size=4051.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1919, ups=0.25, wpb=7629.3, bsz=120, num_updates=15260, lr=2.38542e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=62493 2023-05-01 19:55:21 - progress_bar.py[line:274] - INFO: epoch 003: 3212 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7878.6, nsentences=120, sample_size=3959.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1940.5, ups=0.25, wpb=7878.6, bsz=120, num_updates=15270, lr=2.38489e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=41, gb_free=30.1, wall=62533 2023-05-01 19:56:01 - progress_bar.py[line:274] - INFO: epoch 003: 3222 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=8076.8, nsentences=120, sample_size=4133.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1999.4, ups=0.25, wpb=8076.8, bsz=120, num_updates=15280, lr=2.38436e-05, gnorm=0.927, clip=0, loss_scale=64, train_wall=40, gb_free=29.2, wall=62574 2023-05-01 19:56:41 - progress_bar.py[line:274] - INFO: epoch 003: 3232 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7786.5, nsentences=120, sample_size=4047.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1946.6, ups=0.25, wpb=7786.5, bsz=120, num_updates=15290, lr=2.38384e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=62614 2023-05-01 19:57:22 - progress_bar.py[line:274] - INFO: epoch 003: 3242 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=8029.8, nsentences=120, sample_size=4148.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1980.1, ups=0.25, wpb=8029.8, bsz=120, num_updates=15300, lr=2.38331e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=62654 2023-05-01 19:58:01 - progress_bar.py[line:274] - INFO: epoch 003: 3252 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7773.1, nsentences=120, sample_size=3740.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1960.1, ups=0.25, wpb=7773.1, bsz=120, num_updates=15310, lr=2.38278e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=62694 2023-05-01 19:58:42 - progress_bar.py[line:274] - INFO: epoch 003: 3262 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7813.4, nsentences=120, sample_size=4190.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1936.3, ups=0.25, wpb=7813.4, bsz=120, num_updates=15320, lr=2.38225e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=62734 2023-05-01 19:59:21 - progress_bar.py[line:274] - INFO: epoch 003: 3272 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7923.9, nsentences=120, sample_size=4113, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2000.8, ups=0.25, wpb=7923.9, bsz=120, num_updates=15330, lr=2.38172e-05, gnorm=1.037, clip=70, loss_scale=64, train_wall=40, gb_free=30.3, wall=62774 2023-05-01 20:00:02 - progress_bar.py[line:274] - INFO: epoch 003: 3282 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7704, nsentences=120, sample_size=3867.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1911.5, ups=0.25, wpb=7704, bsz=120, num_updates=15340, lr=2.3812e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=62814 2023-05-01 20:00:42 - progress_bar.py[line:274] - INFO: epoch 003: 3292 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7698.2, nsentences=120, sample_size=3882.7, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1918.4, ups=0.25, wpb=7698.2, bsz=120, num_updates=15350, lr=2.38067e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=62854 2023-05-01 20:01:21 - progress_bar.py[line:274] - INFO: epoch 003: 3302 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7551.2, nsentences=120, sample_size=4228, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1910.1, ups=0.25, wpb=7551.2, bsz=120, num_updates=15360, lr=2.38014e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=62894 2023-05-01 20:02:01 - progress_bar.py[line:274] - INFO: epoch 003: 3312 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7366.8, nsentences=120, sample_size=4087.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1860.3, ups=0.25, wpb=7366.8, bsz=120, num_updates=15370, lr=2.37961e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=62933 2023-05-01 20:02:40 - progress_bar.py[line:274] - INFO: epoch 003: 3322 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7771.7, nsentences=120, sample_size=4180.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1982.6, ups=0.26, wpb=7771.7, bsz=120, num_updates=15380, lr=2.37908e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=39, gb_free=29, wall=62973 2023-05-01 20:03:19 - progress_bar.py[line:274] - INFO: epoch 003: 3332 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7224.8, nsentences=120, sample_size=4483.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1839, ups=0.25, wpb=7224.8, bsz=120, num_updates=15390, lr=2.37855e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=39, gb_free=24.9, wall=63012 2023-05-01 20:04:00 - progress_bar.py[line:274] - INFO: epoch 003: 3342 / 6042 loss=2.511, loss_v1=0, loss_v2=0, nll_loss=1.271, ntokens=7868.5, nsentences=120, sample_size=4117.2, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1958.2, ups=0.25, wpb=7868.5, bsz=120, num_updates=15400, lr=2.37803e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=27.2, wall=63052 2023-05-01 20:04:23 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 20:04:43 - progress_bar.py[line:274] - INFO: epoch 003: 3353 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7587.7, nsentences=120, sample_size=4250.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1761.5, ups=0.23, wpb=7587.7, bsz=120, num_updates=15410, lr=2.3775e-05, gnorm=0.91, clip=20, loss_scale=64, train_wall=43, gb_free=30.1, wall=63095 2023-05-01 20:05:23 - progress_bar.py[line:274] - INFO: epoch 003: 3363 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7997, nsentences=120, sample_size=3990.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1991.8, ups=0.25, wpb=7997, bsz=120, num_updates=15420, lr=2.37697e-05, gnorm=0.916, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=63135 2023-05-01 20:06:02 - progress_bar.py[line:274] - INFO: epoch 003: 3373 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7703.9, nsentences=120, sample_size=3741.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1951.1, ups=0.25, wpb=7703.9, bsz=120, num_updates=15430, lr=2.37644e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=63175 2023-05-01 20:06:42 - progress_bar.py[line:274] - INFO: epoch 003: 3383 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7888.6, nsentences=120, sample_size=3713.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1985.7, ups=0.25, wpb=7888.6, bsz=120, num_updates=15440, lr=2.37591e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=63214 2023-05-01 20:07:21 - progress_bar.py[line:274] - INFO: epoch 003: 3393 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7430.3, nsentences=120, sample_size=4071.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1900.4, ups=0.26, wpb=7430.3, bsz=120, num_updates=15450, lr=2.37539e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=39, gb_free=28.6, wall=63254 2023-05-01 20:08:01 - progress_bar.py[line:274] - INFO: epoch 003: 3403 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7623.5, nsentences=120, sample_size=4174.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1922.2, ups=0.25, wpb=7623.5, bsz=120, num_updates=15460, lr=2.37486e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=63293 2023-05-01 20:08:40 - progress_bar.py[line:274] - INFO: epoch 003: 3413 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7670.1, nsentences=120, sample_size=4108.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1958.2, ups=0.26, wpb=7670.1, bsz=120, num_updates=15470, lr=2.37433e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=63332 2023-05-01 20:09:20 - progress_bar.py[line:274] - INFO: epoch 003: 3423 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7700.7, nsentences=120, sample_size=3942.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1935.8, ups=0.25, wpb=7700.7, bsz=120, num_updates=15480, lr=2.3738e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=63372 2023-05-01 20:10:00 - progress_bar.py[line:274] - INFO: epoch 003: 3433 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7626.6, nsentences=120, sample_size=3777.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1899.8, ups=0.25, wpb=7626.6, bsz=120, num_updates=15490, lr=2.37327e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=63412 2023-05-01 20:10:40 - progress_bar.py[line:274] - INFO: epoch 003: 3443 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7706.9, nsentences=120, sample_size=4069.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1942.2, ups=0.25, wpb=7706.9, bsz=120, num_updates=15500, lr=2.37274e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=63452 2023-05-01 20:11:20 - progress_bar.py[line:274] - INFO: epoch 003: 3453 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7961.9, nsentences=120, sample_size=4252.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1967.3, ups=0.25, wpb=7961.9, bsz=120, num_updates=15510, lr=2.37222e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=63492 2023-05-01 20:12:01 - progress_bar.py[line:274] - INFO: epoch 003: 3463 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7987.1, nsentences=120, sample_size=4239.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1967.4, ups=0.25, wpb=7987.1, bsz=120, num_updates=15520, lr=2.37169e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=41, gb_free=29.5, wall=63533 2023-05-01 20:12:40 - progress_bar.py[line:274] - INFO: epoch 003: 3473 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7540.9, nsentences=120, sample_size=3968.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1907, ups=0.25, wpb=7540.9, bsz=120, num_updates=15530, lr=2.37116e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=63573 2023-05-01 20:13:21 - progress_bar.py[line:274] - INFO: epoch 003: 3483 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7787.4, nsentences=120, sample_size=4096.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1917.4, ups=0.25, wpb=7787.4, bsz=120, num_updates=15540, lr=2.37063e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=41, gb_free=29.4, wall=63613 2023-05-01 20:14:00 - progress_bar.py[line:274] - INFO: epoch 003: 3493 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7445.6, nsentences=120, sample_size=4197.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1875.9, ups=0.25, wpb=7445.6, bsz=120, num_updates=15550, lr=2.3701e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=63653 2023-05-01 20:14:41 - progress_bar.py[line:274] - INFO: epoch 003: 3503 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7666.9, nsentences=120, sample_size=4136.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1892.8, ups=0.25, wpb=7666.9, bsz=120, num_updates=15560, lr=2.36957e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=63693 2023-05-01 20:15:20 - progress_bar.py[line:274] - INFO: epoch 003: 3513 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7818.9, nsentences=120, sample_size=4011.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1986.2, ups=0.25, wpb=7818.9, bsz=120, num_updates=15570, lr=2.36905e-05, gnorm=0.99, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=63733 2023-05-01 20:16:00 - progress_bar.py[line:274] - INFO: epoch 003: 3523 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7520, nsentences=120, sample_size=4176.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1894.5, ups=0.25, wpb=7520, bsz=120, num_updates=15580, lr=2.36852e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=63772 2023-05-01 20:16:39 - progress_bar.py[line:274] - INFO: epoch 003: 3533 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7550, nsentences=120, sample_size=3935, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1933.4, ups=0.26, wpb=7550, bsz=120, num_updates=15590, lr=2.36799e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=63812 2023-05-01 20:17:19 - progress_bar.py[line:274] - INFO: epoch 003: 3543 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7575.2, nsentences=120, sample_size=4319.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1888.3, ups=0.25, wpb=7575.2, bsz=120, num_updates=15600, lr=2.36746e-05, gnorm=0.916, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=63852 2023-05-01 20:17:59 - progress_bar.py[line:274] - INFO: epoch 003: 3553 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7663.8, nsentences=120, sample_size=3958.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1919.2, ups=0.25, wpb=7663.8, bsz=120, num_updates=15610, lr=2.36693e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=63892 2023-05-01 20:18:38 - progress_bar.py[line:274] - INFO: epoch 003: 3563 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7543.9, nsentences=120, sample_size=3811.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1924.5, ups=0.26, wpb=7543.9, bsz=120, num_updates=15620, lr=2.36641e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=63931 2023-05-01 20:19:18 - progress_bar.py[line:274] - INFO: epoch 003: 3573 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7743.9, nsentences=120, sample_size=3942.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949.3, ups=0.25, wpb=7743.9, bsz=120, num_updates=15630, lr=2.36588e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=63970 2023-05-01 20:19:58 - progress_bar.py[line:274] - INFO: epoch 003: 3583 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7640.3, nsentences=120, sample_size=4132.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1915.5, ups=0.25, wpb=7640.3, bsz=120, num_updates=15640, lr=2.36535e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=64010 2023-05-01 20:20:38 - progress_bar.py[line:274] - INFO: epoch 003: 3593 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7718, nsentences=120, sample_size=4146.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1924.7, ups=0.25, wpb=7718, bsz=120, num_updates=15650, lr=2.36482e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=64050 2023-05-01 20:21:18 - progress_bar.py[line:274] - INFO: epoch 003: 3603 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7995.4, nsentences=120, sample_size=4002.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1985.2, ups=0.25, wpb=7995.4, bsz=120, num_updates=15660, lr=2.36429e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=64091 2023-05-01 20:21:58 - progress_bar.py[line:274] - INFO: epoch 003: 3613 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7594, nsentences=120, sample_size=3884.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1926.9, ups=0.25, wpb=7594, bsz=120, num_updates=15670, lr=2.36376e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=64130 2023-05-01 20:22:38 - progress_bar.py[line:274] - INFO: epoch 003: 3623 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7689, nsentences=120, sample_size=4160.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1929.4, ups=0.25, wpb=7689, bsz=120, num_updates=15680, lr=2.36324e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=64170 2023-05-01 20:23:17 - progress_bar.py[line:274] - INFO: epoch 003: 3633 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7474.6, nsentences=120, sample_size=4084.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1880.9, ups=0.25, wpb=7474.6, bsz=120, num_updates=15690, lr=2.36271e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=64210 2023-05-01 20:23:58 - progress_bar.py[line:274] - INFO: epoch 003: 3643 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7866.7, nsentences=120, sample_size=3758.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1918.2, ups=0.24, wpb=7866.7, bsz=120, num_updates=15700, lr=2.36218e-05, gnorm=0.984, clip=50, loss_scale=64, train_wall=41, gb_free=30.5, wall=64251 2023-05-01 20:24:39 - progress_bar.py[line:274] - INFO: epoch 003: 3653 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7665.1, nsentences=120, sample_size=4154.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1885, ups=0.25, wpb=7665.1, bsz=120, num_updates=15710, lr=2.36165e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=41, gb_free=28.2, wall=64291 2023-05-01 20:25:18 - progress_bar.py[line:274] - INFO: epoch 003: 3663 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7829.6, nsentences=120, sample_size=4068.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2009.6, ups=0.26, wpb=7829.6, bsz=120, num_updates=15720, lr=2.36112e-05, gnorm=1.01, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=64330 2023-05-01 20:25:59 - progress_bar.py[line:274] - INFO: epoch 003: 3673 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7766.8, nsentences=120, sample_size=3805.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.9, ups=0.25, wpb=7766.8, bsz=120, num_updates=15730, lr=2.3606e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=41, gb_free=30.9, wall=64371 2023-05-01 20:26:39 - progress_bar.py[line:274] - INFO: epoch 003: 3683 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7737.5, nsentences=120, sample_size=3697.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1914.5, ups=0.25, wpb=7737.5, bsz=120, num_updates=15740, lr=2.36007e-05, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=64412 2023-05-01 20:27:18 - progress_bar.py[line:274] - INFO: epoch 003: 3693 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7615.4, nsentences=120, sample_size=4028.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1937.4, ups=0.25, wpb=7615.4, bsz=120, num_updates=15750, lr=2.35954e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=64451 2023-05-01 20:27:59 - progress_bar.py[line:274] - INFO: epoch 003: 3703 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7611.9, nsentences=120, sample_size=3804.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1889.7, ups=0.25, wpb=7611.9, bsz=120, num_updates=15760, lr=2.35901e-05, gnorm=0.952, clip=10, loss_scale=64, train_wall=40, gb_free=28.9, wall=64491 2023-05-01 20:28:38 - progress_bar.py[line:274] - INFO: epoch 003: 3713 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7731.5, nsentences=120, sample_size=3952.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1969.4, ups=0.25, wpb=7731.5, bsz=120, num_updates=15770, lr=2.35848e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=64530 2023-05-01 20:29:18 - progress_bar.py[line:274] - INFO: epoch 003: 3723 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7759.7, nsentences=120, sample_size=4249.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1937.5, ups=0.25, wpb=7759.7, bsz=120, num_updates=15780, lr=2.35795e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=64571 2023-05-01 20:29:30 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 20:30:02 - progress_bar.py[line:274] - INFO: epoch 003: 3734 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7810.1, nsentences=120, sample_size=3708, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1776.6, ups=0.23, wpb=7810.1, bsz=120, num_updates=15790, lr=2.35743e-05, gnorm=0.992, clip=40, loss_scale=32, train_wall=44, gb_free=30.3, wall=64614 2023-05-01 20:30:41 - progress_bar.py[line:274] - INFO: epoch 003: 3744 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7681, nsentences=120, sample_size=3884.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1950.5, ups=0.25, wpb=7681, bsz=120, num_updates=15800, lr=2.3569e-05, gnorm=0.961, clip=10, loss_scale=32, train_wall=39, gb_free=30.7, wall=64654 2023-05-01 20:31:21 - progress_bar.py[line:274] - INFO: epoch 003: 3754 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7747.3, nsentences=120, sample_size=4128.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1941.1, ups=0.25, wpb=7747.3, bsz=120, num_updates=15810, lr=2.35637e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=64694 2023-05-01 20:32:01 - progress_bar.py[line:274] - INFO: epoch 003: 3764 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7974.2, nsentences=120, sample_size=3924.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2018.8, ups=0.25, wpb=7974.2, bsz=120, num_updates=15820, lr=2.35584e-05, gnorm=0.97, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=64733 2023-05-01 20:32:40 - progress_bar.py[line:274] - INFO: epoch 003: 3774 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7793.5, nsentences=120, sample_size=4138.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1972.8, ups=0.25, wpb=7793.5, bsz=120, num_updates=15830, lr=2.35531e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=39, gb_free=30, wall=64773 2023-05-01 20:33:20 - progress_bar.py[line:274] - INFO: epoch 003: 3784 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7694.8, nsentences=120, sample_size=3993.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1930.1, ups=0.25, wpb=7694.8, bsz=120, num_updates=15840, lr=2.35478e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=64813 2023-05-01 20:34:00 - progress_bar.py[line:274] - INFO: epoch 003: 3794 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7916.1, nsentences=120, sample_size=3714.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2001.2, ups=0.25, wpb=7916.1, bsz=120, num_updates=15850, lr=2.35426e-05, gnorm=0.976, clip=40, loss_scale=32, train_wall=39, gb_free=29.9, wall=64852 2023-05-01 20:34:40 - progress_bar.py[line:274] - INFO: epoch 003: 3804 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7583.3, nsentences=120, sample_size=3736.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1892.4, ups=0.25, wpb=7583.3, bsz=120, num_updates=15860, lr=2.35373e-05, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=30, wall=64892 2023-05-01 20:35:20 - progress_bar.py[line:274] - INFO: epoch 003: 3814 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7610.1, nsentences=120, sample_size=3761.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1899.2, ups=0.25, wpb=7610.1, bsz=120, num_updates=15870, lr=2.3532e-05, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=27.8, wall=64932 2023-05-01 20:35:59 - progress_bar.py[line:274] - INFO: epoch 003: 3824 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7607.4, nsentences=120, sample_size=3971.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1945.2, ups=0.26, wpb=7607.4, bsz=120, num_updates=15880, lr=2.35267e-05, gnorm=0.961, clip=30, loss_scale=32, train_wall=39, gb_free=29.7, wall=64971 2023-05-01 20:36:40 - progress_bar.py[line:274] - INFO: epoch 003: 3834 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7544.9, nsentences=120, sample_size=3971.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1854.1, ups=0.25, wpb=7544.9, bsz=120, num_updates=15890, lr=2.35214e-05, gnorm=0.948, clip=20, loss_scale=32, train_wall=41, gb_free=30.8, wall=65012 2023-05-01 20:37:19 - progress_bar.py[line:274] - INFO: epoch 003: 3844 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7692.4, nsentences=120, sample_size=3960.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1947.2, ups=0.25, wpb=7692.4, bsz=120, num_updates=15900, lr=2.35162e-05, gnorm=0.953, clip=20, loss_scale=32, train_wall=39, gb_free=29.6, wall=65052 2023-05-01 20:38:00 - progress_bar.py[line:274] - INFO: epoch 003: 3854 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7763.2, nsentences=120, sample_size=3976.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1918, ups=0.25, wpb=7763.2, bsz=120, num_updates=15910, lr=2.35109e-05, gnorm=0.951, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=65092 2023-05-01 20:38:39 - progress_bar.py[line:274] - INFO: epoch 003: 3864 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7646.5, nsentences=120, sample_size=4109.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1921.8, ups=0.25, wpb=7646.5, bsz=120, num_updates=15920, lr=2.35056e-05, gnorm=0.98, clip=20, loss_scale=32, train_wall=40, gb_free=27.3, wall=65132 2023-05-01 20:39:20 - progress_bar.py[line:274] - INFO: epoch 003: 3874 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7681, nsentences=120, sample_size=4086.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1900.7, ups=0.25, wpb=7681, bsz=120, num_updates=15930, lr=2.35003e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=65172 2023-05-01 20:39:59 - progress_bar.py[line:274] - INFO: epoch 003: 3884 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7570.8, nsentences=120, sample_size=3716.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1933.8, ups=0.26, wpb=7570.8, bsz=120, num_updates=15940, lr=2.3495e-05, gnorm=0.962, clip=10, loss_scale=32, train_wall=39, gb_free=29.8, wall=65211 2023-05-01 20:40:40 - progress_bar.py[line:274] - INFO: epoch 003: 3894 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7790.6, nsentences=120, sample_size=4137.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1918.6, ups=0.25, wpb=7790.6, bsz=120, num_updates=15950, lr=2.34897e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=41, gb_free=29.7, wall=65252 2023-05-01 20:41:20 - progress_bar.py[line:274] - INFO: epoch 003: 3904 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7651.2, nsentences=120, sample_size=3686.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1918.3, ups=0.25, wpb=7651.2, bsz=120, num_updates=15960, lr=2.34845e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=65292 2023-05-01 20:41:59 - progress_bar.py[line:274] - INFO: epoch 003: 3914 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7688.3, nsentences=120, sample_size=4171.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1940.7, ups=0.25, wpb=7688.3, bsz=120, num_updates=15970, lr=2.34792e-05, gnorm=0.923, clip=0, loss_scale=32, train_wall=40, gb_free=29.3, wall=65332 2023-05-01 20:42:40 - progress_bar.py[line:274] - INFO: epoch 003: 3924 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7668.6, nsentences=120, sample_size=3899, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1883.7, ups=0.25, wpb=7668.6, bsz=120, num_updates=15980, lr=2.34739e-05, gnorm=0.955, clip=10, loss_scale=32, train_wall=41, gb_free=29.8, wall=65372 2023-05-01 20:43:19 - progress_bar.py[line:274] - INFO: epoch 003: 3934 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7707.6, nsentences=120, sample_size=4443.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1955.1, ups=0.25, wpb=7707.6, bsz=120, num_updates=15990, lr=2.34686e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=39, gb_free=28.8, wall=65412 2023-05-01 20:43:59 - progress_bar.py[line:274] - INFO: epoch 003: 3944 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7598.8, nsentences=120, sample_size=3868.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1920, ups=0.25, wpb=7598.8, bsz=120, num_updates=16000, lr=2.34633e-05, gnorm=0.95, clip=10, loss_scale=32, train_wall=40, gb_free=28.1, wall=65451 2023-05-01 20:43:59 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 20:44:01 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 20:44:01 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:18 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 20:44:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 20:44:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 20:44:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:45 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 20:44:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 20:44:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 20:44:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 20:44:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 20:44:50 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.216 | loss_v1 0 | loss_v2 0 | nll_loss 2.049 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.14 | score 0.7461 | wps 3289.1 | wpb 3202.1 | bsz 39.4 | num_updates 16000 | best_score 0.751 2023-05-01 20:44:50 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 16000 updates 2023-05-01 20:44:51 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_16000.pt 2023-05-01 20:45:15 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_16000.pt 2023-05-01 20:45:28 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_16000.pt (epoch 3 @ 16000 updates, score 0.7461) (writing took 37.86358413286507 seconds) 2023-05-01 20:46:08 - progress_bar.py[line:274] - INFO: epoch 003: 3954 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7627.6, nsentences=120, sample_size=3886.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=592.3, ups=0.08, wpb=7627.6, bsz=120, num_updates=16010, lr=2.34581e-05, gnorm=0.951, clip=30, loss_scale=32, train_wall=39, gb_free=31.2, wall=65580 2023-05-01 20:46:48 - progress_bar.py[line:274] - INFO: epoch 003: 3964 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7710.9, nsentences=120, sample_size=4131.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1898.8, ups=0.25, wpb=7710.9, bsz=120, num_updates=16020, lr=2.34528e-05, gnorm=0.938, clip=10, loss_scale=32, train_wall=41, gb_free=29.5, wall=65621 2023-05-01 20:47:28 - progress_bar.py[line:274] - INFO: epoch 003: 3974 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7503.8, nsentences=120, sample_size=4039.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1873, ups=0.25, wpb=7503.8, bsz=120, num_updates=16030, lr=2.34475e-05, gnorm=0.96, clip=30, loss_scale=32, train_wall=40, gb_free=30.3, wall=65661 2023-05-01 20:48:08 - progress_bar.py[line:274] - INFO: epoch 003: 3984 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7789.9, nsentences=120, sample_size=3984.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1964.6, ups=0.25, wpb=7789.9, bsz=120, num_updates=16040, lr=2.34422e-05, gnorm=0.917, clip=0, loss_scale=32, train_wall=40, gb_free=29.9, wall=65700 2023-05-01 20:48:48 - progress_bar.py[line:274] - INFO: epoch 003: 3994 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7812.2, nsentences=120, sample_size=3919.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1960.6, ups=0.25, wpb=7812.2, bsz=120, num_updates=16050, lr=2.34369e-05, gnorm=0.931, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=65740 2023-05-01 20:49:28 - progress_bar.py[line:274] - INFO: epoch 003: 4004 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7666, nsentences=120, sample_size=3942.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1903.7, ups=0.25, wpb=7666, bsz=120, num_updates=16060, lr=2.34316e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=65781 2023-05-01 20:50:08 - progress_bar.py[line:274] - INFO: epoch 003: 4014 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7731.6, nsentences=120, sample_size=4167.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1924.4, ups=0.25, wpb=7731.6, bsz=120, num_updates=16070, lr=2.34264e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=40, gb_free=30.4, wall=65821 2023-05-01 20:50:48 - progress_bar.py[line:274] - INFO: epoch 003: 4024 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7625.8, nsentences=119.2, sample_size=4019.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.8, ups=0.25, wpb=7625.8, bsz=119.2, num_updates=16080, lr=2.34211e-05, gnorm=0.933, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=65861 2023-05-01 20:51:27 - progress_bar.py[line:274] - INFO: epoch 003: 4034 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7920.5, nsentences=120, sample_size=3737.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2035.9, ups=0.26, wpb=7920.5, bsz=120, num_updates=16090, lr=2.34158e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=39, gb_free=29.5, wall=65900 2023-05-01 20:52:08 - progress_bar.py[line:274] - INFO: epoch 003: 4044 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7983.5, nsentences=120, sample_size=4085.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1984.3, ups=0.25, wpb=7983.5, bsz=120, num_updates=16100, lr=2.34105e-05, gnorm=0.908, clip=0, loss_scale=32, train_wall=40, gb_free=30.4, wall=65940 2023-05-01 20:52:48 - progress_bar.py[line:274] - INFO: epoch 003: 4054 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7477.6, nsentences=120, sample_size=4095.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1843.2, ups=0.25, wpb=7477.6, bsz=120, num_updates=16110, lr=2.34052e-05, gnorm=0.947, clip=30, loss_scale=32, train_wall=40, gb_free=29.2, wall=65981 2023-05-01 20:53:28 - progress_bar.py[line:274] - INFO: epoch 003: 4064 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7561.8, nsentences=120, sample_size=4208.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1911.5, ups=0.25, wpb=7561.8, bsz=120, num_updates=16120, lr=2.33999e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=39, gb_free=29.2, wall=66020 2023-05-01 20:54:07 - progress_bar.py[line:274] - INFO: epoch 003: 4074 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7762.1, nsentences=120, sample_size=4167.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1963, ups=0.25, wpb=7762.1, bsz=120, num_updates=16130, lr=2.33947e-05, gnorm=0.935, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=66060 2023-05-01 20:54:47 - progress_bar.py[line:274] - INFO: epoch 003: 4084 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7899.8, nsentences=120, sample_size=3733.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1995.2, ups=0.25, wpb=7899.8, bsz=120, num_updates=16140, lr=2.33894e-05, gnorm=0.991, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=66099 2023-05-01 20:55:27 - progress_bar.py[line:274] - INFO: epoch 003: 4094 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7709.3, nsentences=120, sample_size=4027.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1912.7, ups=0.25, wpb=7709.3, bsz=120, num_updates=16150, lr=2.33841e-05, gnorm=0.933, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=66140 2023-05-01 20:56:07 - progress_bar.py[line:274] - INFO: epoch 003: 4104 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7625.9, nsentences=120, sample_size=4571, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1903, ups=0.25, wpb=7625.9, bsz=120, num_updates=16160, lr=2.33788e-05, gnorm=0.912, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=66180 2023-05-01 20:56:47 - progress_bar.py[line:274] - INFO: epoch 003: 4114 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7839.3, nsentences=120, sample_size=4059.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1964, ups=0.25, wpb=7839.3, bsz=120, num_updates=16170, lr=2.33735e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=66220 2023-05-01 20:57:27 - progress_bar.py[line:274] - INFO: epoch 003: 4124 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7550.3, nsentences=120, sample_size=4270.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1905.5, ups=0.25, wpb=7550.3, bsz=120, num_updates=16180, lr=2.33683e-05, gnorm=0.931, clip=0, loss_scale=32, train_wall=40, gb_free=30.7, wall=66259 2023-05-01 20:58:07 - progress_bar.py[line:274] - INFO: epoch 003: 4134 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7801.6, nsentences=120, sample_size=4463.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1960.6, ups=0.25, wpb=7801.6, bsz=120, num_updates=16190, lr=2.3363e-05, gnorm=0.907, clip=10, loss_scale=32, train_wall=40, gb_free=31, wall=66299 2023-05-01 20:58:46 - progress_bar.py[line:274] - INFO: epoch 003: 4144 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7931.2, nsentences=120, sample_size=4081.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1995.2, ups=0.25, wpb=7931.2, bsz=120, num_updates=16200, lr=2.33577e-05, gnorm=0.938, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=66339 2023-05-01 20:59:26 - progress_bar.py[line:274] - INFO: epoch 003: 4154 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7812.6, nsentences=120, sample_size=4390, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1978.3, ups=0.25, wpb=7812.6, bsz=120, num_updates=16210, lr=2.33524e-05, gnorm=0.91, clip=0, loss_scale=32, train_wall=39, gb_free=30.4, wall=66378 2023-05-01 21:00:05 - progress_bar.py[line:274] - INFO: epoch 003: 4164 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7563, nsentences=120, sample_size=4034.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1930.1, ups=0.26, wpb=7563, bsz=120, num_updates=16220, lr=2.33471e-05, gnorm=0.921, clip=0, loss_scale=32, train_wall=39, gb_free=30.7, wall=66417 2023-05-01 21:00:44 - progress_bar.py[line:274] - INFO: epoch 003: 4174 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7521.4, nsentences=120, sample_size=4251.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1908.1, ups=0.25, wpb=7521.4, bsz=120, num_updates=16230, lr=2.33418e-05, gnorm=0.934, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=66457 2023-05-01 21:01:24 - progress_bar.py[line:274] - INFO: epoch 003: 4184 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7833, nsentences=120, sample_size=4151.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1975.2, ups=0.25, wpb=7833, bsz=120, num_updates=16240, lr=2.33366e-05, gnorm=0.953, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=66497 2023-05-01 21:02:04 - progress_bar.py[line:274] - INFO: epoch 003: 4194 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7549.9, nsentences=120, sample_size=4182.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1889, ups=0.25, wpb=7549.9, bsz=120, num_updates=16250, lr=2.33313e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=40, gb_free=26.9, wall=66536 2023-05-01 21:02:44 - progress_bar.py[line:274] - INFO: epoch 003: 4204 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7725.8, nsentences=120, sample_size=4108.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1954.2, ups=0.25, wpb=7725.8, bsz=120, num_updates=16260, lr=2.3326e-05, gnorm=0.962, clip=30, loss_scale=32, train_wall=39, gb_free=30.1, wall=66576 2023-05-01 21:03:23 - progress_bar.py[line:274] - INFO: epoch 003: 4214 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7753.9, nsentences=120, sample_size=4049.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1953.2, ups=0.25, wpb=7753.9, bsz=120, num_updates=16270, lr=2.33207e-05, gnorm=0.936, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=66616 2023-05-01 21:04:03 - progress_bar.py[line:274] - INFO: epoch 003: 4224 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7753.5, nsentences=120, sample_size=3884.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1964.6, ups=0.25, wpb=7753.5, bsz=120, num_updates=16280, lr=2.33154e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=39, gb_free=31, wall=66655 2023-05-01 21:04:42 - progress_bar.py[line:274] - INFO: epoch 003: 4234 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7637.6, nsentences=120, sample_size=4205.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929.6, ups=0.25, wpb=7637.6, bsz=120, num_updates=16290, lr=2.33102e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=40, gb_free=28.9, wall=66695 2023-05-01 21:05:22 - progress_bar.py[line:274] - INFO: epoch 003: 4244 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7483.1, nsentences=120, sample_size=3905.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1900.5, ups=0.25, wpb=7483.1, bsz=120, num_updates=16300, lr=2.33049e-05, gnorm=0.987, clip=40, loss_scale=64, train_wall=39, gb_free=30.7, wall=66734 2023-05-01 21:06:01 - progress_bar.py[line:274] - INFO: epoch 003: 4254 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7873, nsentences=120, sample_size=4410.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1976.9, ups=0.25, wpb=7873, bsz=120, num_updates=16310, lr=2.32996e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=66774 2023-05-01 21:06:41 - progress_bar.py[line:274] - INFO: epoch 003: 4264 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7305.6, nsentences=120, sample_size=4155.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1846, ups=0.25, wpb=7305.6, bsz=120, num_updates=16320, lr=2.32943e-05, gnorm=0.929, clip=0, loss_scale=64, train_wall=39, gb_free=29.4, wall=66814 2023-05-01 21:07:22 - progress_bar.py[line:274] - INFO: epoch 003: 4274 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=8165.7, nsentences=120, sample_size=3844, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2004.2, ups=0.25, wpb=8165.7, bsz=120, num_updates=16330, lr=2.3289e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=41, gb_free=29.7, wall=66854 2023-05-01 21:08:02 - progress_bar.py[line:274] - INFO: epoch 003: 4284 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7650.1, nsentences=120, sample_size=4291.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1884.4, ups=0.25, wpb=7650.1, bsz=120, num_updates=16340, lr=2.32837e-05, gnorm=0.93, clip=30, loss_scale=64, train_wall=41, gb_free=29.8, wall=66895 2023-05-01 21:08:42 - progress_bar.py[line:274] - INFO: epoch 003: 4294 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7705.6, nsentences=120, sample_size=3792.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1926.2, ups=0.25, wpb=7705.6, bsz=120, num_updates=16350, lr=2.32785e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=66935 2023-05-01 21:09:23 - progress_bar.py[line:274] - INFO: epoch 003: 4304 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7777.7, nsentences=120, sample_size=3947.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1920.5, ups=0.25, wpb=7777.7, bsz=120, num_updates=16360, lr=2.32732e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=66975 2023-05-01 21:10:03 - progress_bar.py[line:274] - INFO: epoch 003: 4314 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7393.8, nsentences=120, sample_size=3837.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1864.7, ups=0.25, wpb=7393.8, bsz=120, num_updates=16370, lr=2.32679e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=67015 2023-05-01 21:10:42 - progress_bar.py[line:274] - INFO: epoch 003: 4324 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7940.7, nsentences=120, sample_size=4161.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2004.6, ups=0.25, wpb=7940.7, bsz=120, num_updates=16380, lr=2.32626e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=27.2, wall=67055 2023-05-01 21:11:22 - progress_bar.py[line:274] - INFO: epoch 003: 4334 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7725.4, nsentences=120, sample_size=3869.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1962.9, ups=0.25, wpb=7725.4, bsz=120, num_updates=16390, lr=2.32573e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=39, gb_free=31.5, wall=67094 2023-05-01 21:12:01 - progress_bar.py[line:274] - INFO: epoch 003: 4344 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7609, nsentences=120, sample_size=4017, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.7, ups=0.25, wpb=7609, bsz=120, num_updates=16400, lr=2.3252e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=67134 2023-05-01 21:12:41 - progress_bar.py[line:274] - INFO: epoch 003: 4354 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7547.6, nsentences=120, sample_size=4199.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1915.9, ups=0.25, wpb=7547.6, bsz=120, num_updates=16410, lr=2.32468e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=67173 2023-05-01 21:13:20 - progress_bar.py[line:274] - INFO: epoch 003: 4364 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7406.4, nsentences=120, sample_size=4024, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1873.7, ups=0.25, wpb=7406.4, bsz=120, num_updates=16420, lr=2.32415e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=67213 2023-05-01 21:14:01 - progress_bar.py[line:274] - INFO: epoch 003: 4374 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7639.3, nsentences=120, sample_size=4061.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.7, ups=0.25, wpb=7639.3, bsz=120, num_updates=16430, lr=2.32362e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=67253 2023-05-01 21:14:41 - progress_bar.py[line:274] - INFO: epoch 003: 4384 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7544.3, nsentences=120, sample_size=3773.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1881.5, ups=0.25, wpb=7544.3, bsz=120, num_updates=16440, lr=2.32309e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=67293 2023-05-01 21:15:21 - progress_bar.py[line:274] - INFO: epoch 003: 4394 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7948.1, nsentences=120, sample_size=4036.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1975.7, ups=0.25, wpb=7948.1, bsz=120, num_updates=16450, lr=2.32256e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=67333 2023-05-01 21:16:01 - progress_bar.py[line:274] - INFO: epoch 003: 4404 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7945.9, nsentences=120, sample_size=3991.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1984.2, ups=0.25, wpb=7945.9, bsz=120, num_updates=16460, lr=2.32204e-05, gnorm=0.991, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=67373 2023-05-01 21:16:41 - progress_bar.py[line:274] - INFO: epoch 003: 4414 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7648.6, nsentences=120, sample_size=3873.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1933.9, ups=0.25, wpb=7648.6, bsz=120, num_updates=16470, lr=2.32151e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=67413 2023-05-01 21:17:20 - progress_bar.py[line:274] - INFO: epoch 003: 4424 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7753.4, nsentences=120, sample_size=4055.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1951.5, ups=0.25, wpb=7753.4, bsz=120, num_updates=16480, lr=2.32098e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=67453 2023-05-01 21:18:00 - progress_bar.py[line:274] - INFO: epoch 003: 4434 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7699.5, nsentences=120, sample_size=4289, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1933, ups=0.25, wpb=7699.5, bsz=120, num_updates=16490, lr=2.32045e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=67493 2023-05-01 21:18:40 - progress_bar.py[line:274] - INFO: epoch 003: 4444 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=8071.5, nsentences=120, sample_size=4313.5, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2048.1, ups=0.25, wpb=8071.5, bsz=120, num_updates=16500, lr=2.31992e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=67532 2023-05-01 21:19:19 - progress_bar.py[line:274] - INFO: epoch 003: 4454 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7596.3, nsentences=120, sample_size=4009.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1927.9, ups=0.25, wpb=7596.3, bsz=120, num_updates=16510, lr=2.31939e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=67571 2023-05-01 21:20:00 - progress_bar.py[line:274] - INFO: epoch 003: 4464 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7778, nsentences=120, sample_size=4309.5, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1905.5, ups=0.24, wpb=7778, bsz=120, num_updates=16520, lr=2.31887e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=41, gb_free=30.3, wall=67612 2023-05-01 21:20:39 - progress_bar.py[line:274] - INFO: epoch 003: 4474 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7490, nsentences=120, sample_size=4009.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1918.4, ups=0.26, wpb=7490, bsz=120, num_updates=16530, lr=2.31834e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=67651 2023-05-01 21:21:18 - progress_bar.py[line:274] - INFO: epoch 003: 4484 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7749.2, nsentences=120, sample_size=4336.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1961.6, ups=0.25, wpb=7749.2, bsz=120, num_updates=16540, lr=2.31781e-05, gnorm=0.896, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=67691 2023-05-01 21:21:58 - progress_bar.py[line:274] - INFO: epoch 003: 4494 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7982.1, nsentences=120, sample_size=4064.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2003.7, ups=0.25, wpb=7982.1, bsz=120, num_updates=16550, lr=2.31728e-05, gnorm=0.921, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=67731 2023-05-01 21:22:38 - progress_bar.py[line:274] - INFO: epoch 003: 4504 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7296.8, nsentences=120, sample_size=3859.3, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1837.1, ups=0.25, wpb=7296.8, bsz=120, num_updates=16560, lr=2.31675e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=67770 2023-05-01 21:23:18 - progress_bar.py[line:274] - INFO: epoch 003: 4514 / 6042 loss=2.5, loss_v1=0, loss_v2=0, nll_loss=1.263, ntokens=7901.5, nsentences=120, sample_size=4138.6, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1983.3, ups=0.25, wpb=7901.5, bsz=120, num_updates=16570, lr=2.31623e-05, gnorm=0.927, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=67810 2023-05-01 21:23:58 - progress_bar.py[line:274] - INFO: epoch 003: 4524 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7687.9, nsentences=120, sample_size=4146.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1901.2, ups=0.25, wpb=7687.9, bsz=120, num_updates=16580, lr=2.3157e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=67851 2023-05-01 21:24:37 - progress_bar.py[line:274] - INFO: epoch 003: 4534 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7402.6, nsentences=120, sample_size=4092.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1896.5, ups=0.26, wpb=7402.6, bsz=120, num_updates=16590, lr=2.31517e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=67890 2023-05-01 21:25:16 - progress_bar.py[line:274] - INFO: epoch 003: 4544 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7723.7, nsentences=120, sample_size=3861.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1999.8, ups=0.26, wpb=7723.7, bsz=120, num_updates=16600, lr=2.31464e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=67928 2023-05-01 21:25:20 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-01 21:26:00 - progress_bar.py[line:274] - INFO: epoch 003: 4555 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7774.8, nsentences=120, sample_size=4211.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1774.4, ups=0.23, wpb=7774.8, bsz=120, num_updates=16610, lr=2.31411e-05, gnorm=0.872, clip=0, loss_scale=32, train_wall=44, gb_free=29, wall=67972 2023-05-01 21:26:40 - progress_bar.py[line:274] - INFO: epoch 003: 4565 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=8041.7, nsentences=120, sample_size=3503.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1978.8, ups=0.25, wpb=8041.7, bsz=120, num_updates=16620, lr=2.31358e-05, gnorm=0.983, clip=40, loss_scale=32, train_wall=41, gb_free=30.8, wall=68013 2023-05-01 21:27:20 - progress_bar.py[line:274] - INFO: epoch 003: 4575 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7370.3, nsentences=120, sample_size=4352.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1858.4, ups=0.25, wpb=7370.3, bsz=120, num_updates=16630, lr=2.31306e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=68052 2023-05-01 21:27:59 - progress_bar.py[line:274] - INFO: epoch 003: 4585 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=8087.2, nsentences=120, sample_size=3920.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2054.5, ups=0.25, wpb=8087.2, bsz=120, num_updates=16640, lr=2.31253e-05, gnorm=0.936, clip=30, loss_scale=32, train_wall=39, gb_free=28.5, wall=68092 2023-05-01 21:28:40 - progress_bar.py[line:274] - INFO: epoch 003: 4595 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7745.7, nsentences=120, sample_size=4122.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1909.2, ups=0.25, wpb=7745.7, bsz=120, num_updates=16650, lr=2.312e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=28.9, wall=68132 2023-05-01 21:29:20 - progress_bar.py[line:274] - INFO: epoch 003: 4605 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7704.3, nsentences=120, sample_size=4193.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1938.5, ups=0.25, wpb=7704.3, bsz=120, num_updates=16660, lr=2.31147e-05, gnorm=0.936, clip=0, loss_scale=32, train_wall=40, gb_free=30.5, wall=68172 2023-05-01 21:29:59 - progress_bar.py[line:274] - INFO: epoch 003: 4615 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7856.4, nsentences=120, sample_size=3936.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1980, ups=0.25, wpb=7856.4, bsz=120, num_updates=16670, lr=2.31094e-05, gnorm=0.932, clip=0, loss_scale=32, train_wall=40, gb_free=30.3, wall=68212 2023-05-01 21:30:39 - progress_bar.py[line:274] - INFO: epoch 003: 4625 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7664.5, nsentences=120, sample_size=4157.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1910.2, ups=0.25, wpb=7664.5, bsz=120, num_updates=16680, lr=2.31041e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=68252 2023-05-01 21:31:19 - progress_bar.py[line:274] - INFO: epoch 003: 4635 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7455.9, nsentences=120, sample_size=4442.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1877.8, ups=0.25, wpb=7455.9, bsz=120, num_updates=16690, lr=2.30989e-05, gnorm=0.894, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=68292 2023-05-01 21:31:58 - progress_bar.py[line:274] - INFO: epoch 003: 4645 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7748.4, nsentences=120, sample_size=4188.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1973.3, ups=0.25, wpb=7748.4, bsz=120, num_updates=16700, lr=2.30936e-05, gnorm=0.934, clip=20, loss_scale=32, train_wall=39, gb_free=29.4, wall=68331 2023-05-01 21:32:38 - progress_bar.py[line:274] - INFO: epoch 003: 4655 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7548.6, nsentences=120, sample_size=4194.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1886.1, ups=0.25, wpb=7548.6, bsz=120, num_updates=16710, lr=2.30883e-05, gnorm=0.951, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=68371 2023-05-01 21:33:19 - progress_bar.py[line:274] - INFO: epoch 003: 4665 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7745.3, nsentences=120, sample_size=4192.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1906.2, ups=0.25, wpb=7745.3, bsz=120, num_updates=16720, lr=2.3083e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=41, gb_free=29.7, wall=68411 2023-05-01 21:33:59 - progress_bar.py[line:274] - INFO: epoch 003: 4675 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7776.9, nsentences=120, sample_size=4113.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1927.9, ups=0.25, wpb=7776.9, bsz=120, num_updates=16730, lr=2.30777e-05, gnorm=0.961, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=68452 2023-05-01 21:34:40 - progress_bar.py[line:274] - INFO: epoch 003: 4685 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7770.2, nsentences=120, sample_size=4092.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.7, ups=0.25, wpb=7770.2, bsz=120, num_updates=16740, lr=2.30725e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=68492 2023-05-01 21:35:20 - progress_bar.py[line:274] - INFO: epoch 003: 4695 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7832.4, nsentences=120, sample_size=4059.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.1, ups=0.25, wpb=7832.4, bsz=120, num_updates=16750, lr=2.30672e-05, gnorm=0.924, clip=0, loss_scale=32, train_wall=40, gb_free=24.8, wall=68532 2023-05-01 21:35:59 - progress_bar.py[line:274] - INFO: epoch 003: 4705 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7707.1, nsentences=120, sample_size=3833.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.7, ups=0.26, wpb=7707.1, bsz=120, num_updates=16760, lr=2.30619e-05, gnorm=0.951, clip=10, loss_scale=32, train_wall=39, gb_free=30.5, wall=68571 2023-05-01 21:36:38 - progress_bar.py[line:274] - INFO: epoch 003: 4715 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7717.4, nsentences=120, sample_size=3911.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958, ups=0.25, wpb=7717.4, bsz=120, num_updates=16770, lr=2.30566e-05, gnorm=0.982, clip=30, loss_scale=32, train_wall=39, gb_free=28.7, wall=68611 2023-05-01 21:37:17 - progress_bar.py[line:274] - INFO: epoch 003: 4725 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7804.5, nsentences=120, sample_size=4159.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2009.6, ups=0.26, wpb=7804.5, bsz=120, num_updates=16780, lr=2.30513e-05, gnorm=0.921, clip=0, loss_scale=32, train_wall=39, gb_free=30.2, wall=68649 2023-05-01 21:37:57 - progress_bar.py[line:274] - INFO: epoch 003: 4735 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7466.9, nsentences=120, sample_size=4091.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1868.2, ups=0.25, wpb=7466.9, bsz=120, num_updates=16790, lr=2.3046e-05, gnorm=0.923, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=68689 2023-05-01 21:38:37 - progress_bar.py[line:274] - INFO: epoch 003: 4745 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7743.5, nsentences=120, sample_size=4156.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1929.1, ups=0.25, wpb=7743.5, bsz=120, num_updates=16800, lr=2.30408e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=40, gb_free=30.7, wall=68730 2023-05-01 21:39:17 - progress_bar.py[line:274] - INFO: epoch 003: 4755 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7690.2, nsentences=120, sample_size=3848.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1932.2, ups=0.25, wpb=7690.2, bsz=120, num_updates=16810, lr=2.30355e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=68769 2023-05-01 21:39:57 - progress_bar.py[line:274] - INFO: epoch 003: 4765 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7865.8, nsentences=120, sample_size=4220.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1975, ups=0.25, wpb=7865.8, bsz=120, num_updates=16820, lr=2.30302e-05, gnorm=0.912, clip=0, loss_scale=32, train_wall=40, gb_free=28.3, wall=68809 2023-05-01 21:40:38 - progress_bar.py[line:274] - INFO: epoch 003: 4775 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7959.2, nsentences=120, sample_size=3884.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1948.5, ups=0.24, wpb=7959.2, bsz=120, num_updates=16830, lr=2.30249e-05, gnorm=0.932, clip=20, loss_scale=32, train_wall=41, gb_free=29.9, wall=68850 2023-05-01 21:41:17 - progress_bar.py[line:274] - INFO: epoch 003: 4785 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7559.9, nsentences=120, sample_size=4017.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1900.8, ups=0.25, wpb=7559.9, bsz=120, num_updates=16840, lr=2.30196e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=28.4, wall=68890 2023-05-01 21:41:58 - progress_bar.py[line:274] - INFO: epoch 003: 4795 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7880.1, nsentences=120, sample_size=3955.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1955.3, ups=0.25, wpb=7880.1, bsz=120, num_updates=16850, lr=2.30143e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=68930 2023-05-01 21:42:37 - progress_bar.py[line:274] - INFO: epoch 003: 4805 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7871.6, nsentences=120, sample_size=3849, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1997.1, ups=0.25, wpb=7871.6, bsz=120, num_updates=16860, lr=2.30091e-05, gnorm=0.952, clip=40, loss_scale=32, train_wall=39, gb_free=29.8, wall=68970 2023-05-01 21:43:17 - progress_bar.py[line:274] - INFO: epoch 003: 4815 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7898.3, nsentences=120, sample_size=4010.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1982.2, ups=0.25, wpb=7898.3, bsz=120, num_updates=16870, lr=2.30038e-05, gnorm=0.945, clip=30, loss_scale=32, train_wall=40, gb_free=28, wall=69009 2023-05-01 21:43:57 - progress_bar.py[line:274] - INFO: epoch 003: 4825 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7885, nsentences=120, sample_size=4217, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1974.6, ups=0.25, wpb=7885, bsz=120, num_updates=16880, lr=2.29985e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=69049 2023-05-01 21:44:36 - progress_bar.py[line:274] - INFO: epoch 003: 4835 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7654.9, nsentences=120, sample_size=3988.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1944.7, ups=0.25, wpb=7654.9, bsz=120, num_updates=16890, lr=2.29932e-05, gnorm=0.952, clip=30, loss_scale=32, train_wall=39, gb_free=30, wall=69089 2023-05-01 21:45:16 - progress_bar.py[line:274] - INFO: epoch 003: 4845 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=8137.2, nsentences=120, sample_size=3965.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2044.9, ups=0.25, wpb=8137.2, bsz=120, num_updates=16900, lr=2.29879e-05, gnorm=0.94, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=69129 2023-05-01 21:45:56 - progress_bar.py[line:274] - INFO: epoch 003: 4855 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7963.3, nsentences=120, sample_size=4274.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1998, ups=0.25, wpb=7963.3, bsz=120, num_updates=16910, lr=2.29827e-05, gnorm=0.923, clip=0, loss_scale=32, train_wall=40, gb_free=29.4, wall=69168 2023-05-01 21:46:36 - progress_bar.py[line:274] - INFO: epoch 003: 4865 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7665, nsentences=120, sample_size=4190, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1930.1, ups=0.25, wpb=7665, bsz=120, num_updates=16920, lr=2.29774e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=69208 2023-05-01 21:47:15 - progress_bar.py[line:274] - INFO: epoch 003: 4875 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7612.8, nsentences=120, sample_size=3880.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1933.7, ups=0.25, wpb=7612.8, bsz=120, num_updates=16930, lr=2.29721e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=39, gb_free=30.8, wall=69247 2023-05-01 21:47:54 - progress_bar.py[line:274] - INFO: epoch 003: 4885 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7785.9, nsentences=120, sample_size=4081.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1979.5, ups=0.25, wpb=7785.9, bsz=120, num_updates=16940, lr=2.29668e-05, gnorm=0.956, clip=10, loss_scale=32, train_wall=39, gb_free=29.3, wall=69287 2023-05-01 21:48:34 - progress_bar.py[line:274] - INFO: epoch 003: 4895 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7767.9, nsentences=120, sample_size=4009.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1938.4, ups=0.25, wpb=7767.9, bsz=120, num_updates=16950, lr=2.29615e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=40, gb_free=30.9, wall=69327 2023-05-01 21:49:14 - progress_bar.py[line:274] - INFO: epoch 003: 4905 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7694.8, nsentences=120, sample_size=3906.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1927.5, ups=0.25, wpb=7694.8, bsz=120, num_updates=16960, lr=2.29562e-05, gnorm=0.976, clip=40, loss_scale=32, train_wall=40, gb_free=30.6, wall=69367 2023-05-01 21:49:54 - progress_bar.py[line:274] - INFO: epoch 003: 4915 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7542.9, nsentences=120, sample_size=4181, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1895.1, ups=0.25, wpb=7542.9, bsz=120, num_updates=16970, lr=2.2951e-05, gnorm=0.897, clip=0, loss_scale=32, train_wall=40, gb_free=27.7, wall=69407 2023-05-01 21:50:34 - progress_bar.py[line:274] - INFO: epoch 003: 4925 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7891.7, nsentences=120, sample_size=4177.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1966.9, ups=0.25, wpb=7891.7, bsz=120, num_updates=16980, lr=2.29457e-05, gnorm=0.911, clip=0, loss_scale=32, train_wall=40, gb_free=30, wall=69447 2023-05-01 21:51:15 - progress_bar.py[line:274] - INFO: epoch 003: 4935 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7913.4, nsentences=120, sample_size=3814.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1966.2, ups=0.25, wpb=7913.4, bsz=120, num_updates=16990, lr=2.29404e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=30.1, wall=69487 2023-05-01 21:51:54 - progress_bar.py[line:274] - INFO: epoch 003: 4945 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7942, nsentences=120, sample_size=4088.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1989.4, ups=0.25, wpb=7942, bsz=120, num_updates=17000, lr=2.29351e-05, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=69527 2023-05-01 21:51:54 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 21:51:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 21:51:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:51:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:51:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:51:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:51:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:51:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:51:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 21:52:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:52:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:25 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 21:52:25 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:52:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 21:52:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:52:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:40 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 21:52:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:52:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:45 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 21:52:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 21:52:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 21:52:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 21:52:46 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.221 | loss_v1 0 | loss_v2 0 | nll_loss 2.058 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.16 | score 0.7393 | wps 3296.9 | wpb 3202.1 | bsz 39.4 | num_updates 17000 | best_score 0.751 2023-05-01 21:52:46 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 17000 updates 2023-05-01 21:52:46 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_17000.pt 2023-05-01 21:53:10 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_17000.pt 2023-05-01 21:53:23 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_17000.pt (epoch 3 @ 17000 updates, score 0.7393) (writing took 37.729026313871145 seconds) 2023-05-01 21:54:02 - progress_bar.py[line:274] - INFO: epoch 003: 4955 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7923.6, nsentences=120, sample_size=3795.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=620.1, ups=0.08, wpb=7923.6, bsz=120, num_updates=17010, lr=2.29298e-05, gnorm=0.966, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=69655 2023-05-01 21:54:42 - progress_bar.py[line:274] - INFO: epoch 003: 4965 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7757.7, nsentences=120, sample_size=3930.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1944.8, ups=0.25, wpb=7757.7, bsz=120, num_updates=17020, lr=2.29246e-05, gnorm=0.968, clip=40, loss_scale=32, train_wall=40, gb_free=29.6, wall=69695 2023-05-01 21:55:21 - progress_bar.py[line:274] - INFO: epoch 003: 4975 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7881, nsentences=120, sample_size=4094, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2001.8, ups=0.25, wpb=7881, bsz=120, num_updates=17030, lr=2.29193e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=69734 2023-05-01 21:56:01 - progress_bar.py[line:274] - INFO: epoch 003: 4985 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7530.7, nsentences=120, sample_size=4115.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1914.3, ups=0.25, wpb=7530.7, bsz=120, num_updates=17040, lr=2.2914e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=39, gb_free=29.7, wall=69773 2023-05-01 21:56:41 - progress_bar.py[line:274] - INFO: epoch 003: 4995 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=8132.1, nsentences=120, sample_size=4091.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2035.2, ups=0.25, wpb=8132.1, bsz=120, num_updates=17050, lr=2.29087e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=40, gb_free=28.5, wall=69813 2023-05-01 21:57:20 - progress_bar.py[line:274] - INFO: epoch 003: 5005 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7582.3, nsentences=120, sample_size=3850.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1933.3, ups=0.25, wpb=7582.3, bsz=120, num_updates=17060, lr=2.29034e-05, gnorm=1.019, clip=20, loss_scale=32, train_wall=39, gb_free=29.3, wall=69852 2023-05-01 21:58:00 - progress_bar.py[line:274] - INFO: epoch 003: 5015 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7825.5, nsentences=120, sample_size=3842.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1966.6, ups=0.25, wpb=7825.5, bsz=120, num_updates=17070, lr=2.28981e-05, gnorm=0.978, clip=40, loss_scale=32, train_wall=40, gb_free=30.4, wall=69892 2023-05-01 21:58:41 - progress_bar.py[line:274] - INFO: epoch 003: 5025 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7671, nsentences=120, sample_size=3975, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1882.6, ups=0.25, wpb=7671, bsz=120, num_updates=17080, lr=2.28929e-05, gnorm=0.938, clip=10, loss_scale=32, train_wall=41, gb_free=28.8, wall=69933 2023-05-01 21:59:20 - progress_bar.py[line:274] - INFO: epoch 003: 5035 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7770.9, nsentences=120, sample_size=4146.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1959.9, ups=0.25, wpb=7770.9, bsz=120, num_updates=17090, lr=2.28876e-05, gnorm=0.906, clip=0, loss_scale=32, train_wall=40, gb_free=30.2, wall=69973 2023-05-01 22:00:00 - progress_bar.py[line:274] - INFO: epoch 003: 5045 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=8062.5, nsentences=120, sample_size=3951, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2032.9, ups=0.25, wpb=8062.5, bsz=120, num_updates=17100, lr=2.28823e-05, gnorm=0.922, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=70012 2023-05-01 22:00:40 - progress_bar.py[line:274] - INFO: epoch 003: 5055 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7615.8, nsentences=120, sample_size=3885.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1885.2, ups=0.25, wpb=7615.8, bsz=120, num_updates=17110, lr=2.2877e-05, gnorm=0.952, clip=10, loss_scale=32, train_wall=40, gb_free=29.6, wall=70053 2023-05-01 22:01:20 - progress_bar.py[line:274] - INFO: epoch 003: 5065 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7587.4, nsentences=120, sample_size=4128.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1927.4, ups=0.25, wpb=7587.4, bsz=120, num_updates=17120, lr=2.28717e-05, gnorm=0.933, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=70092 2023-05-01 22:01:58 - progress_bar.py[line:274] - INFO: epoch 003: 5075 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7703.1, nsentences=120, sample_size=3890.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1981.8, ups=0.26, wpb=7703.1, bsz=120, num_updates=17130, lr=2.28664e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=70131 2023-05-01 22:02:39 - progress_bar.py[line:274] - INFO: epoch 003: 5085 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=8014, nsentences=120, sample_size=4293.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1983, ups=0.25, wpb=8014, bsz=120, num_updates=17140, lr=2.28612e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=28.7, wall=70171 2023-05-01 22:03:19 - progress_bar.py[line:274] - INFO: epoch 003: 5095 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7445, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1848.3, ups=0.25, wpb=7445, bsz=120, num_updates=17150, lr=2.28559e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=70212 2023-05-01 22:04:00 - progress_bar.py[line:274] - INFO: epoch 003: 5105 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7844.1, nsentences=120, sample_size=3920.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1929.6, ups=0.25, wpb=7844.1, bsz=120, num_updates=17160, lr=2.28506e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=41, gb_free=29.7, wall=70252 2023-05-01 22:04:39 - progress_bar.py[line:274] - INFO: epoch 003: 5115 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7905.5, nsentences=120, sample_size=4175.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1998.8, ups=0.25, wpb=7905.5, bsz=120, num_updates=17170, lr=2.28453e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=70292 2023-05-01 22:05:20 - progress_bar.py[line:274] - INFO: epoch 003: 5125 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7643.4, nsentences=120, sample_size=3984, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1893.7, ups=0.25, wpb=7643.4, bsz=120, num_updates=17180, lr=2.284e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=70332 2023-05-01 22:06:01 - progress_bar.py[line:274] - INFO: epoch 003: 5135 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7525.7, nsentences=120, sample_size=4537.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1832.4, ups=0.24, wpb=7525.7, bsz=120, num_updates=17190, lr=2.28348e-05, gnorm=0.889, clip=0, loss_scale=64, train_wall=41, gb_free=30.2, wall=70373 2023-05-01 22:06:41 - progress_bar.py[line:274] - INFO: epoch 003: 5145 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7607.9, nsentences=120, sample_size=4092.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1876.2, ups=0.25, wpb=7607.9, bsz=120, num_updates=17200, lr=2.28295e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=70414 2023-05-01 22:07:21 - progress_bar.py[line:274] - INFO: epoch 003: 5155 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7642.6, nsentences=120, sample_size=3856.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1906.3, ups=0.25, wpb=7642.6, bsz=120, num_updates=17210, lr=2.28242e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=70454 2023-05-01 22:08:01 - progress_bar.py[line:274] - INFO: epoch 003: 5165 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7801.3, nsentences=120, sample_size=4400.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1961.4, ups=0.25, wpb=7801.3, bsz=120, num_updates=17220, lr=2.28189e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=70494 2023-05-01 22:08:41 - progress_bar.py[line:274] - INFO: epoch 003: 5175 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7483.3, nsentences=120, sample_size=4148.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1892.8, ups=0.25, wpb=7483.3, bsz=120, num_updates=17230, lr=2.28136e-05, gnorm=0.938, clip=0, loss_scale=64, train_wall=39, gb_free=28.4, wall=70533 2023-05-01 22:09:20 - progress_bar.py[line:274] - INFO: epoch 003: 5185 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=8105.6, nsentences=120, sample_size=4025.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2055.4, ups=0.25, wpb=8105.6, bsz=120, num_updates=17240, lr=2.28083e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=70573 2023-05-01 22:10:00 - progress_bar.py[line:274] - INFO: epoch 003: 5195 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7780.1, nsentences=120, sample_size=4099.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.4, ups=0.25, wpb=7780.1, bsz=120, num_updates=17250, lr=2.28031e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=70612 2023-05-01 22:10:40 - progress_bar.py[line:274] - INFO: epoch 003: 5205 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7754.6, nsentences=120, sample_size=4073.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1933.7, ups=0.25, wpb=7754.6, bsz=120, num_updates=17260, lr=2.27978e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=70653 2023-05-01 22:11:19 - progress_bar.py[line:274] - INFO: epoch 003: 5215 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7637.1, nsentences=120, sample_size=3688.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1948.8, ups=0.26, wpb=7637.1, bsz=120, num_updates=17270, lr=2.27925e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=70692 2023-05-01 22:11:59 - progress_bar.py[line:274] - INFO: epoch 003: 5225 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=8043.6, nsentences=120, sample_size=4058.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2020, ups=0.25, wpb=8043.6, bsz=120, num_updates=17280, lr=2.27872e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=70732 2023-05-01 22:12:39 - progress_bar.py[line:274] - INFO: epoch 003: 5235 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7508.4, nsentences=120, sample_size=4408.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1868.4, ups=0.25, wpb=7508.4, bsz=120, num_updates=17290, lr=2.27819e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=70772 2023-05-01 22:13:19 - progress_bar.py[line:274] - INFO: epoch 003: 5245 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7849.2, nsentences=120, sample_size=4040, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1978.4, ups=0.25, wpb=7849.2, bsz=120, num_updates=17300, lr=2.27767e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=70811 2023-05-01 22:13:58 - progress_bar.py[line:274] - INFO: epoch 003: 5255 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7720.8, nsentences=120, sample_size=3875.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1962, ups=0.25, wpb=7720.8, bsz=120, num_updates=17310, lr=2.27714e-05, gnorm=0.974, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=70851 2023-05-01 22:14:39 - progress_bar.py[line:274] - INFO: epoch 003: 5265 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7847.5, nsentences=120, sample_size=3700.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1921.1, ups=0.24, wpb=7847.5, bsz=120, num_updates=17320, lr=2.27661e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=41, gb_free=27.3, wall=70892 2023-05-01 22:15:19 - progress_bar.py[line:274] - INFO: epoch 003: 5275 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7652.4, nsentences=120, sample_size=3975.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1942.5, ups=0.25, wpb=7652.4, bsz=120, num_updates=17330, lr=2.27608e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=70931 2023-05-01 22:15:58 - progress_bar.py[line:274] - INFO: epoch 003: 5285 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7715.7, nsentences=120, sample_size=4200.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1934.2, ups=0.25, wpb=7715.7, bsz=120, num_updates=17340, lr=2.27555e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=70971 2023-05-01 22:16:38 - progress_bar.py[line:274] - INFO: epoch 003: 5295 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7649, nsentences=120, sample_size=3566.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1925.4, ups=0.25, wpb=7649, bsz=120, num_updates=17350, lr=2.27502e-05, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=28, wall=71011 2023-05-01 22:17:18 - progress_bar.py[line:274] - INFO: epoch 003: 5305 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7663.3, nsentences=120, sample_size=4044.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1926.2, ups=0.25, wpb=7663.3, bsz=120, num_updates=17360, lr=2.2745e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=71050 2023-05-01 22:17:57 - progress_bar.py[line:274] - INFO: epoch 003: 5315 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7625, nsentences=120, sample_size=4235.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1945.5, ups=0.26, wpb=7625, bsz=120, num_updates=17370, lr=2.27397e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=71090 2023-05-01 22:18:37 - progress_bar.py[line:274] - INFO: epoch 003: 5325 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7875.6, nsentences=120, sample_size=4023.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.8, ups=0.25, wpb=7875.6, bsz=120, num_updates=17380, lr=2.27344e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=26.9, wall=71130 2023-05-01 22:19:17 - progress_bar.py[line:274] - INFO: epoch 003: 5335 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=8187.1, nsentences=120, sample_size=3844.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2032.7, ups=0.25, wpb=8187.1, bsz=120, num_updates=17390, lr=2.27291e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=71170 2023-05-01 22:19:57 - progress_bar.py[line:274] - INFO: epoch 003: 5345 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7651.3, nsentences=120, sample_size=4282.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1910.3, ups=0.25, wpb=7651.3, bsz=120, num_updates=17400, lr=2.27238e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=71210 2023-05-01 22:20:37 - progress_bar.py[line:274] - INFO: epoch 003: 5355 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7784.2, nsentences=120, sample_size=4014.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1975.3, ups=0.25, wpb=7784.2, bsz=120, num_updates=17410, lr=2.27185e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=71249 2023-05-01 22:21:17 - progress_bar.py[line:274] - INFO: epoch 003: 5365 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7569.9, nsentences=120, sample_size=3987.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1881.9, ups=0.25, wpb=7569.9, bsz=120, num_updates=17420, lr=2.27133e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=71290 2023-05-01 22:21:57 - progress_bar.py[line:274] - INFO: epoch 003: 5375 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7795.4, nsentences=120, sample_size=3760.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1975.4, ups=0.25, wpb=7795.4, bsz=120, num_updates=17430, lr=2.2708e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=71329 2023-05-01 22:22:37 - progress_bar.py[line:274] - INFO: epoch 003: 5385 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7765, nsentences=120, sample_size=3891.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1924.7, ups=0.25, wpb=7765, bsz=120, num_updates=17440, lr=2.27027e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=71369 2023-05-01 22:23:17 - progress_bar.py[line:274] - INFO: epoch 003: 5395 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7696.4, nsentences=120, sample_size=3914.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1942.3, ups=0.25, wpb=7696.4, bsz=120, num_updates=17450, lr=2.26974e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=71409 2023-05-01 22:23:56 - progress_bar.py[line:274] - INFO: epoch 003: 5405 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7419.4, nsentences=120, sample_size=4409.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1893.5, ups=0.26, wpb=7419.4, bsz=120, num_updates=17460, lr=2.26921e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=39, gb_free=28.9, wall=71448 2023-05-01 22:24:35 - progress_bar.py[line:274] - INFO: epoch 003: 5415 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7693.2, nsentences=120, sample_size=3840.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1941.6, ups=0.25, wpb=7693.2, bsz=120, num_updates=17470, lr=2.26869e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=71488 2023-05-01 22:25:15 - progress_bar.py[line:274] - INFO: epoch 003: 5425 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7521.1, nsentences=120, sample_size=4029.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1898.2, ups=0.25, wpb=7521.1, bsz=120, num_updates=17480, lr=2.26816e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=71527 2023-05-01 22:25:55 - progress_bar.py[line:274] - INFO: epoch 003: 5435 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7798.5, nsentences=120, sample_size=4295.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1940.8, ups=0.25, wpb=7798.5, bsz=120, num_updates=17490, lr=2.26763e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=71568 2023-05-01 22:26:35 - progress_bar.py[line:274] - INFO: epoch 003: 5445 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7677.6, nsentences=120, sample_size=4040.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1921.8, ups=0.25, wpb=7677.6, bsz=120, num_updates=17500, lr=2.2671e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=71608 2023-05-01 22:27:15 - progress_bar.py[line:274] - INFO: epoch 003: 5455 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7901.5, nsentences=120, sample_size=4236.7, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1976.3, ups=0.25, wpb=7901.5, bsz=120, num_updates=17510, lr=2.26657e-05, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=27.3, wall=71648 2023-05-01 22:27:55 - progress_bar.py[line:274] - INFO: epoch 003: 5465 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7530.8, nsentences=120, sample_size=3959.1, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1901.6, ups=0.25, wpb=7530.8, bsz=120, num_updates=17520, lr=2.26604e-05, gnorm=0.984, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=71687 2023-05-01 22:28:34 - progress_bar.py[line:274] - INFO: epoch 003: 5475 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7643, nsentences=120, sample_size=4041, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1923, ups=0.25, wpb=7643, bsz=120, num_updates=17530, lr=2.26552e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=28.9, wall=71727 2023-05-01 22:29:14 - progress_bar.py[line:274] - INFO: epoch 003: 5485 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7688.3, nsentences=120, sample_size=3761.7, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1955.8, ups=0.25, wpb=7688.3, bsz=120, num_updates=17540, lr=2.26499e-05, gnorm=0.987, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=71766 2023-05-01 22:29:53 - progress_bar.py[line:274] - INFO: epoch 003: 5495 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7649.1, nsentences=120, sample_size=4068.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1926.3, ups=0.25, wpb=7649.1, bsz=120, num_updates=17550, lr=2.26446e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=71806 2023-05-01 22:30:33 - progress_bar.py[line:274] - INFO: epoch 003: 5505 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7665.4, nsentences=120, sample_size=3855.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1927.1, ups=0.25, wpb=7665.4, bsz=120, num_updates=17560, lr=2.26393e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=71846 2023-05-01 22:31:13 - progress_bar.py[line:274] - INFO: epoch 003: 5515 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7824.9, nsentences=120, sample_size=3799, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1968.7, ups=0.25, wpb=7824.9, bsz=120, num_updates=17570, lr=2.2634e-05, gnorm=0.973, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=71885 2023-05-01 22:31:53 - progress_bar.py[line:274] - INFO: epoch 003: 5525 / 6042 loss=2.51, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7504.6, nsentences=120, sample_size=4048.2, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1853.4, ups=0.25, wpb=7504.6, bsz=120, num_updates=17580, lr=2.26288e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=71926 2023-05-01 22:32:33 - progress_bar.py[line:274] - INFO: epoch 003: 5535 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7596.8, nsentences=120, sample_size=4382.8, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1898.7, ups=0.25, wpb=7596.8, bsz=120, num_updates=17590, lr=2.26235e-05, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=71966 2023-05-01 22:33:13 - progress_bar.py[line:274] - INFO: epoch 003: 5545 / 6042 loss=2.502, loss_v1=0, loss_v2=0, nll_loss=1.257, ntokens=7864.8, nsentences=120, sample_size=3840.6, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2008.7, ups=0.26, wpb=7864.8, bsz=120, num_updates=17600, lr=2.26182e-05, gnorm=0.989, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=72005 2023-05-01 22:33:53 - progress_bar.py[line:274] - INFO: epoch 003: 5555 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7769.9, nsentences=120, sample_size=3857.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1928.8, ups=0.25, wpb=7769.9, bsz=120, num_updates=17610, lr=2.26129e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=72045 2023-05-01 22:34:34 - progress_bar.py[line:274] - INFO: epoch 003: 5565 / 6042 loss=2.492, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=8021.9, nsentences=120, sample_size=3873.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1972.2, ups=0.25, wpb=8021.9, bsz=120, num_updates=17620, lr=2.26076e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=41, gb_free=29.7, wall=72086 2023-05-01 22:35:13 - progress_bar.py[line:274] - INFO: epoch 003: 5575 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7655.3, nsentences=120, sample_size=4103.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1965.9, ups=0.26, wpb=7655.3, bsz=120, num_updates=17630, lr=2.26023e-05, gnorm=0.933, clip=20, loss_scale=128, train_wall=39, gb_free=30.9, wall=72125 2023-05-01 22:35:52 - progress_bar.py[line:274] - INFO: epoch 003: 5585 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7787.5, nsentences=120, sample_size=4335.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1954.3, ups=0.25, wpb=7787.5, bsz=120, num_updates=17640, lr=2.25971e-05, gnorm=0.91, clip=10, loss_scale=128, train_wall=40, gb_free=31.2, wall=72165 2023-05-01 22:36:32 - progress_bar.py[line:274] - INFO: epoch 003: 5595 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7579.9, nsentences=120, sample_size=4029.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1897.7, ups=0.25, wpb=7579.9, bsz=120, num_updates=17650, lr=2.25918e-05, gnorm=0.965, clip=40, loss_scale=128, train_wall=40, gb_free=31, wall=72205 2023-05-01 22:36:48 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 22:37:16 - progress_bar.py[line:274] - INFO: epoch 003: 5606 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=8086.6, nsentences=120, sample_size=3817.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1838.7, ups=0.23, wpb=8086.6, bsz=120, num_updates=17660, lr=2.25865e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=44, gb_free=30.4, wall=72249 2023-05-01 22:37:55 - progress_bar.py[line:274] - INFO: epoch 003: 5616 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7533.1, nsentences=120, sample_size=3975.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1937.6, ups=0.26, wpb=7533.1, bsz=120, num_updates=17670, lr=2.25812e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=72288 2023-05-01 22:38:35 - progress_bar.py[line:274] - INFO: epoch 003: 5626 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=8017, nsentences=120, sample_size=3858.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2030.9, ups=0.25, wpb=8017, bsz=120, num_updates=17680, lr=2.25759e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=72327 2023-05-01 22:39:14 - progress_bar.py[line:274] - INFO: epoch 003: 5636 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7810.3, nsentences=120, sample_size=3928, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1977.7, ups=0.25, wpb=7810.3, bsz=120, num_updates=17690, lr=2.25706e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=29.1, wall=72367 2023-05-01 22:39:53 - progress_bar.py[line:274] - INFO: epoch 003: 5646 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7753.5, nsentences=120, sample_size=4022.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1992.3, ups=0.26, wpb=7753.5, bsz=120, num_updates=17700, lr=2.25654e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=72406 2023-05-01 22:40:33 - progress_bar.py[line:274] - INFO: epoch 003: 5656 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7749.2, nsentences=120, sample_size=3952.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1931.4, ups=0.25, wpb=7749.2, bsz=120, num_updates=17710, lr=2.25601e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=72446 2023-05-01 22:41:13 - progress_bar.py[line:274] - INFO: epoch 003: 5666 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7654.4, nsentences=120, sample_size=4096.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1901.6, ups=0.25, wpb=7654.4, bsz=120, num_updates=17720, lr=2.25548e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=72486 2023-05-01 22:41:54 - progress_bar.py[line:274] - INFO: epoch 003: 5676 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7756.5, nsentences=120, sample_size=4082.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1928.2, ups=0.25, wpb=7756.5, bsz=120, num_updates=17730, lr=2.25495e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=72526 2023-05-01 22:42:33 - progress_bar.py[line:274] - INFO: epoch 003: 5686 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7403.7, nsentences=120, sample_size=4153.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1867.1, ups=0.25, wpb=7403.7, bsz=120, num_updates=17740, lr=2.25442e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=72566 2023-05-01 22:43:13 - progress_bar.py[line:274] - INFO: epoch 003: 5696 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7774.2, nsentences=120, sample_size=3962, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1939.8, ups=0.25, wpb=7774.2, bsz=120, num_updates=17750, lr=2.2539e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=72606 2023-05-01 22:43:54 - progress_bar.py[line:274] - INFO: epoch 003: 5706 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7469.3, nsentences=120, sample_size=4248, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1836.1, ups=0.25, wpb=7469.3, bsz=120, num_updates=17760, lr=2.25337e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=41, gb_free=30, wall=72647 2023-05-01 22:44:34 - progress_bar.py[line:274] - INFO: epoch 003: 5716 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7664.4, nsentences=120, sample_size=4029.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1916.5, ups=0.25, wpb=7664.4, bsz=120, num_updates=17770, lr=2.25284e-05, gnorm=0.957, clip=10, loss_scale=64, train_wall=40, gb_free=28.9, wall=72687 2023-05-01 22:45:13 - progress_bar.py[line:274] - INFO: epoch 003: 5726 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7700, nsentences=120, sample_size=4273.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1955.2, ups=0.25, wpb=7700, bsz=120, num_updates=17780, lr=2.25231e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=72726 2023-05-01 22:45:53 - progress_bar.py[line:274] - INFO: epoch 003: 5736 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7688.2, nsentences=120, sample_size=3901.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1943.8, ups=0.25, wpb=7688.2, bsz=120, num_updates=17790, lr=2.25178e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=72766 2023-05-01 22:46:33 - progress_bar.py[line:274] - INFO: epoch 003: 5746 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7839.2, nsentences=120, sample_size=3871.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1938.5, ups=0.25, wpb=7839.2, bsz=120, num_updates=17800, lr=2.25125e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=31.4, wall=72806 2023-05-01 22:47:13 - progress_bar.py[line:274] - INFO: epoch 003: 5756 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7749.5, nsentences=120, sample_size=4006.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1982.7, ups=0.26, wpb=7749.5, bsz=120, num_updates=17810, lr=2.25073e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=72845 2023-05-01 22:47:52 - progress_bar.py[line:274] - INFO: epoch 003: 5766 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7381.1, nsentences=120, sample_size=3979.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1888.4, ups=0.26, wpb=7381.1, bsz=120, num_updates=17820, lr=2.2502e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=72884 2023-05-01 22:48:31 - progress_bar.py[line:274] - INFO: epoch 003: 5776 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7683.6, nsentences=120, sample_size=4095.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1938.2, ups=0.25, wpb=7683.6, bsz=120, num_updates=17830, lr=2.24967e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=72924 2023-05-01 22:49:11 - progress_bar.py[line:274] - INFO: epoch 003: 5786 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7682.9, nsentences=120, sample_size=3782.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1947.9, ups=0.25, wpb=7682.9, bsz=120, num_updates=17840, lr=2.24914e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=39, gb_free=28.5, wall=72963 2023-05-01 22:49:51 - progress_bar.py[line:274] - INFO: epoch 003: 5796 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7850.4, nsentences=120, sample_size=3919.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1936, ups=0.25, wpb=7850.4, bsz=120, num_updates=17850, lr=2.24861e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=73004 2023-05-01 22:50:32 - progress_bar.py[line:274] - INFO: epoch 003: 5806 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7501.7, nsentences=120, sample_size=4186.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1850.4, ups=0.25, wpb=7501.7, bsz=120, num_updates=17860, lr=2.24809e-05, gnorm=0.929, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=73044 2023-05-01 22:51:13 - progress_bar.py[line:274] - INFO: epoch 003: 5816 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7623.6, nsentences=120, sample_size=4095, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1858.5, ups=0.24, wpb=7623.6, bsz=120, num_updates=17870, lr=2.24756e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=41, gb_free=25.5, wall=73085 2023-05-01 22:51:53 - progress_bar.py[line:274] - INFO: epoch 003: 5826 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7878, nsentences=120, sample_size=3884, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1942.9, ups=0.25, wpb=7878, bsz=120, num_updates=17880, lr=2.24703e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=73126 2023-05-01 22:52:32 - progress_bar.py[line:274] - INFO: epoch 003: 5836 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7916.9, nsentences=120, sample_size=4151.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2027.5, ups=0.26, wpb=7916.9, bsz=120, num_updates=17890, lr=2.2465e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=39, gb_free=27.9, wall=73165 2023-05-01 22:53:13 - progress_bar.py[line:274] - INFO: epoch 003: 5846 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7809, nsentences=120, sample_size=4059.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1931.2, ups=0.25, wpb=7809, bsz=120, num_updates=17900, lr=2.24597e-05, gnorm=0.938, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=73205 2023-05-01 22:53:52 - progress_bar.py[line:274] - INFO: epoch 003: 5856 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7493.3, nsentences=120, sample_size=4187.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1904.7, ups=0.25, wpb=7493.3, bsz=120, num_updates=17910, lr=2.24544e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=73245 2023-05-01 22:54:32 - progress_bar.py[line:274] - INFO: epoch 003: 5866 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7887.3, nsentences=120, sample_size=3802.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1967.6, ups=0.25, wpb=7887.3, bsz=120, num_updates=17920, lr=2.24492e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=73285 2023-05-01 22:55:12 - progress_bar.py[line:274] - INFO: epoch 003: 5876 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7853.5, nsentences=120, sample_size=3979.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1955.8, ups=0.25, wpb=7853.5, bsz=120, num_updates=17930, lr=2.24439e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=73325 2023-05-01 22:55:52 - progress_bar.py[line:274] - INFO: epoch 003: 5886 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7741.3, nsentences=120, sample_size=3952.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1940.9, ups=0.25, wpb=7741.3, bsz=120, num_updates=17940, lr=2.24386e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=73365 2023-05-01 22:56:31 - progress_bar.py[line:274] - INFO: epoch 003: 5896 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7311, nsentences=120, sample_size=4118.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1879.9, ups=0.26, wpb=7311, bsz=120, num_updates=17950, lr=2.24333e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=28.3, wall=73404 2023-05-01 22:57:11 - progress_bar.py[line:274] - INFO: epoch 003: 5906 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7183.2, nsentences=120, sample_size=3893.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1796.3, ups=0.25, wpb=7183.2, bsz=120, num_updates=17960, lr=2.2428e-05, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=73444 2023-05-01 22:57:51 - progress_bar.py[line:274] - INFO: epoch 003: 5916 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7439.9, nsentences=120, sample_size=4360.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1877.5, ups=0.25, wpb=7439.9, bsz=120, num_updates=17970, lr=2.24227e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=73483 2023-05-01 22:58:31 - progress_bar.py[line:274] - INFO: epoch 003: 5926 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7856.3, nsentences=120, sample_size=4202.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1950.6, ups=0.25, wpb=7856.3, bsz=120, num_updates=17980, lr=2.24175e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=73524 2023-05-01 22:59:10 - progress_bar.py[line:274] - INFO: epoch 003: 5936 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7683, nsentences=120, sample_size=3912.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1974, ups=0.26, wpb=7683, bsz=120, num_updates=17990, lr=2.24122e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=29.5, wall=73563 2023-05-01 22:59:49 - progress_bar.py[line:274] - INFO: epoch 003: 5946 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7492.4, nsentences=120, sample_size=3894.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.6, ups=0.25, wpb=7492.4, bsz=120, num_updates=18000, lr=2.24069e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=73602 2023-05-01 22:59:49 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 22:59:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 22:59:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 22:59:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 22:59:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 22:59:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:00:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:00:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:00:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:00:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:00:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:00:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:36 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 23:00:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:00:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:00:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:00:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:00:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:00:42 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.211 | loss_v1 0 | loss_v2 0 | nll_loss 2.047 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.13 | score 0.7432 | wps 3273.7 | wpb 3202.1 | bsz 39.4 | num_updates 18000 | best_score 0.751 2023-05-01 23:00:42 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 18000 updates 2023-05-01 23:00:42 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_18000.pt 2023-05-01 23:01:05 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_18000.pt 2023-05-01 23:01:19 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_3_18000.pt (epoch 3 @ 18000 updates, score 0.7432) (writing took 37.61120864190161 seconds) 2023-05-01 23:01:58 - progress_bar.py[line:274] - INFO: epoch 003: 5956 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7839.6, nsentences=120, sample_size=4170.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=607.9, ups=0.08, wpb=7839.6, bsz=120, num_updates=18010, lr=2.24016e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=73731 2023-05-01 23:02:38 - progress_bar.py[line:274] - INFO: epoch 003: 5966 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7485.2, nsentences=120, sample_size=3875.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1903.3, ups=0.25, wpb=7485.2, bsz=120, num_updates=18020, lr=2.23963e-05, gnorm=1.011, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=73770 2023-05-01 23:03:17 - progress_bar.py[line:274] - INFO: epoch 003: 5976 / 6042 loss=2.506, loss_v1=0, loss_v2=0, nll_loss=1.27, ntokens=7571, nsentences=120, sample_size=4079, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1917.1, ups=0.25, wpb=7571, bsz=120, num_updates=18030, lr=2.23911e-05, gnorm=1.009, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=73810 2023-05-01 23:03:57 - progress_bar.py[line:274] - INFO: epoch 003: 5986 / 6042 loss=2.491, loss_v1=0, loss_v2=0, nll_loss=1.249, ntokens=7887.3, nsentences=120, sample_size=4124.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1999.4, ups=0.25, wpb=7887.3, bsz=120, num_updates=18040, lr=2.23858e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=39, gb_free=30.5, wall=73849 2023-05-01 23:04:35 - progress_bar.py[line:274] - INFO: epoch 003: 5996 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7629.9, nsentences=120, sample_size=4134.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1964.9, ups=0.26, wpb=7629.9, bsz=120, num_updates=18050, lr=2.23805e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=31, wall=73888 2023-05-01 23:05:16 - progress_bar.py[line:274] - INFO: epoch 003: 6006 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7749.6, nsentences=120, sample_size=4193.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1918.6, ups=0.25, wpb=7749.6, bsz=120, num_updates=18060, lr=2.23752e-05, gnorm=0.9, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=73928 2023-05-01 23:05:56 - progress_bar.py[line:274] - INFO: epoch 003: 6016 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7658, nsentences=120, sample_size=4144.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1925.9, ups=0.25, wpb=7658, bsz=120, num_updates=18070, lr=2.23699e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=73968 2023-05-01 23:06:36 - progress_bar.py[line:274] - INFO: epoch 003: 6026 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=8093.5, nsentences=120, sample_size=3786.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2029.9, ups=0.25, wpb=8093.5, bsz=120, num_updates=18080, lr=2.23646e-05, gnorm=0.997, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=74008 2023-05-01 23:07:15 - progress_bar.py[line:274] - INFO: epoch 003: 6036 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.254, ntokens=7685.4, nsentences=120, sample_size=4123.3, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1941.3, ups=0.25, wpb=7685.4, bsz=120, num_updates=18090, lr=2.23594e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=74048 2023-05-01 23:07:37 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-01 23:07:39 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:07:39 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:07:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:07:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:07:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:07:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:07:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:08 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:08:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:08:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:08:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:08:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:24 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-01 23:08:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:08:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:28 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-01 23:08:28 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-01 23:08:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-01 23:08:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-01 23:08:29 - progress_bar.py[line:282] - INFO: epoch 003 | valid on 'valid' subset | loss 3.202 | loss_v1 0 | loss_v2 0 | nll_loss 2.038 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.11 | score 0.7485 | wps 3302.2 | wpb 3202.1 | bsz 39.4 | num_updates 18096 | best_score 0.751 2023-05-01 23:08:29 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 3 @ 18096 updates 2023-05-01 23:08:29 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 23:08:55 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-01 23:08:55 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 3 @ 18096 updates, score 0.7485) (writing took 26.300863459939137 seconds) 2023-05-01 23:08:55 - train.py[line:332] - INFO: end of epoch 3 (average epoch stats below) 2023-05-01 23:08:55 - progress_bar.py[line:282] - INFO: epoch 003 | loss 2.436 | loss_v1 0 | loss_v2 0 | nll_loss 1.185 | ntokens 7719.81 | nsentences 119.992 | sample_size 4036.03 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.27 | wps 1886 | ups 0.24 | wpb 7719.8 | bsz 120 | num_updates 18096 | lr 2.23562e-05 | gnorm 0.944 | clip 19.1 | loss_scale 64 | train_wall 24013 | gb_free 30.3 | wall 74147 2023-05-01 23:08:55 - trainer.py[line:639] - INFO: loading train data for epoch 4 2023-05-01 23:08:55 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-01 23:08:55 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-01 23:08:57 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-01 23:08:58 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-01 23:08:59 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-01 23:08:59 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-01 23:08:59 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-01 23:08:59 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-01 23:09:00 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-01 23:09:00 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-01 23:09:01 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-01 23:09:01 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-01 23:09:01 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-01 23:09:01 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-01 23:09:02 - trainer.py[line:703] - INFO: begin training epoch 4 2023-05-01 23:09:02 - train.py[line:305] - INFO: Start iterating over samples 2023-05-01 23:09:17 - progress_bar.py[line:274] - INFO: epoch 004: 4 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7138.1, nsentences=116, sample_size=3743.7, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=583.4, ups=0.08, wpb=7138.1, bsz=116, num_updates=18100, lr=2.23541e-05, gnorm=0.993, clip=50, loss_scale=64, train_wall=37, gb_free=31.2, wall=74170 2023-05-01 23:09:57 - progress_bar.py[line:274] - INFO: epoch 004: 14 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7771.6, nsentences=120, sample_size=3892, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1955, ups=0.25, wpb=7771.6, bsz=120, num_updates=18110, lr=2.23488e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=74210 2023-05-01 23:10:37 - progress_bar.py[line:274] - INFO: epoch 004: 24 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7639.8, nsentences=120, sample_size=3738.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1897.3, ups=0.25, wpb=7639.8, bsz=120, num_updates=18120, lr=2.23435e-05, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=74250 2023-05-01 23:11:18 - progress_bar.py[line:274] - INFO: epoch 004: 34 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7733.4, nsentences=120, sample_size=4150.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1931.8, ups=0.25, wpb=7733.4, bsz=120, num_updates=18130, lr=2.23382e-05, gnorm=0.924, clip=30, loss_scale=64, train_wall=40, gb_free=31.6, wall=74290 2023-05-01 23:11:57 - progress_bar.py[line:274] - INFO: epoch 004: 44 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7627, nsentences=120, sample_size=4285.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1945.2, ups=0.26, wpb=7627, bsz=120, num_updates=18140, lr=2.2333e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=74329 2023-05-01 23:12:36 - progress_bar.py[line:274] - INFO: epoch 004: 54 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7527.6, nsentences=120, sample_size=4104.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1909.7, ups=0.25, wpb=7527.6, bsz=120, num_updates=18150, lr=2.23277e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=39, gb_free=28.4, wall=74369 2023-05-01 23:13:16 - progress_bar.py[line:274] - INFO: epoch 004: 64 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7884.3, nsentences=120, sample_size=4175, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1986, ups=0.25, wpb=7884.3, bsz=120, num_updates=18160, lr=2.23224e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=74408 2023-05-01 23:13:56 - progress_bar.py[line:274] - INFO: epoch 004: 74 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7799.5, nsentences=120, sample_size=4100.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1931.5, ups=0.25, wpb=7799.5, bsz=120, num_updates=18170, lr=2.23171e-05, gnorm=0.945, clip=30, loss_scale=128, train_wall=40, gb_free=30.5, wall=74449 2023-05-01 23:14:04 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 23:14:39 - progress_bar.py[line:274] - INFO: epoch 004: 85 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7511.9, nsentences=120, sample_size=4024, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1745.9, ups=0.23, wpb=7511.9, bsz=120, num_updates=18180, lr=2.23118e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=43, gb_free=30.4, wall=74492 2023-05-01 23:15:20 - progress_bar.py[line:274] - INFO: epoch 004: 95 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7894, nsentences=120, sample_size=4261.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1940.6, ups=0.25, wpb=7894, bsz=120, num_updates=18190, lr=2.23065e-05, gnorm=0.907, clip=10, loss_scale=64, train_wall=41, gb_free=30.1, wall=74532 2023-05-01 23:16:00 - progress_bar.py[line:274] - INFO: epoch 004: 105 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7919.3, nsentences=120, sample_size=3967.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1992.2, ups=0.25, wpb=7919.3, bsz=120, num_updates=18200, lr=2.23013e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=74572 2023-05-01 23:16:40 - progress_bar.py[line:274] - INFO: epoch 004: 115 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7533.2, nsentences=120, sample_size=4310.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1857.7, ups=0.25, wpb=7533.2, bsz=120, num_updates=18210, lr=2.2296e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=74613 2023-05-01 23:17:20 - progress_bar.py[line:274] - INFO: epoch 004: 125 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=8137.4, nsentences=120, sample_size=3921.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2072.5, ups=0.25, wpb=8137.4, bsz=120, num_updates=18220, lr=2.22907e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=74652 2023-05-01 23:18:01 - progress_bar.py[line:274] - INFO: epoch 004: 135 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7608.8, nsentences=120, sample_size=4142.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1855, ups=0.24, wpb=7608.8, bsz=120, num_updates=18230, lr=2.22854e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=41, gb_free=30.4, wall=74693 2023-05-01 23:18:40 - progress_bar.py[line:274] - INFO: epoch 004: 145 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7696.2, nsentences=120, sample_size=3983.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1952.9, ups=0.25, wpb=7696.2, bsz=120, num_updates=18240, lr=2.22801e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=74732 2023-05-01 23:19:19 - progress_bar.py[line:274] - INFO: epoch 004: 155 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7315.8, nsentences=120, sample_size=4170.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1872.5, ups=0.26, wpb=7315.8, bsz=120, num_updates=18250, lr=2.22748e-05, gnorm=0.939, clip=0, loss_scale=64, train_wall=39, gb_free=28, wall=74771 2023-05-01 23:19:59 - progress_bar.py[line:274] - INFO: epoch 004: 165 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7922.5, nsentences=120, sample_size=3904.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1999.8, ups=0.25, wpb=7922.5, bsz=120, num_updates=18260, lr=2.22696e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=27.1, wall=74811 2023-05-01 23:20:39 - progress_bar.py[line:274] - INFO: epoch 004: 175 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7937.2, nsentences=120, sample_size=4072.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1975.6, ups=0.25, wpb=7937.2, bsz=120, num_updates=18270, lr=2.22643e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=74851 2023-05-01 23:21:19 - progress_bar.py[line:274] - INFO: epoch 004: 185 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=8161.7, nsentences=120, sample_size=3779.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2020.4, ups=0.25, wpb=8161.7, bsz=120, num_updates=18280, lr=2.2259e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=74892 2023-05-01 23:21:59 - progress_bar.py[line:274] - INFO: epoch 004: 195 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7657.4, nsentences=120, sample_size=4062.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1921.3, ups=0.25, wpb=7657.4, bsz=120, num_updates=18290, lr=2.22537e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=74932 2023-05-01 23:22:40 - progress_bar.py[line:274] - INFO: epoch 004: 205 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7904, nsentences=120, sample_size=4071.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1952.1, ups=0.25, wpb=7904, bsz=120, num_updates=18300, lr=2.22484e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=74972 2023-05-01 23:23:20 - progress_bar.py[line:274] - INFO: epoch 004: 215 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7630.4, nsentences=120, sample_size=4090.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1891.2, ups=0.25, wpb=7630.4, bsz=120, num_updates=18310, lr=2.22432e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=75012 2023-05-01 23:24:01 - progress_bar.py[line:274] - INFO: epoch 004: 225 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7853.1, nsentences=120, sample_size=4081.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1915, ups=0.24, wpb=7853.1, bsz=120, num_updates=18320, lr=2.22379e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=41, gb_free=31, wall=75053 2023-05-01 23:24:41 - progress_bar.py[line:274] - INFO: epoch 004: 235 / 6042 loss=2.527, loss_v1=0, loss_v2=0, nll_loss=1.295, ntokens=7996.3, nsentences=120, sample_size=3918.4, sample_size_v1=0, sample_size_v2=0, ppl=2.45, wps=2008.6, ups=0.25, wpb=7996.3, bsz=120, num_updates=18330, lr=2.22326e-05, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=75093 2023-05-01 23:25:21 - progress_bar.py[line:274] - INFO: epoch 004: 245 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7923.2, nsentences=120, sample_size=4321, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1948.4, ups=0.25, wpb=7923.2, bsz=120, num_updates=18340, lr=2.22273e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=41, gb_free=29.3, wall=75134 2023-05-01 23:26:02 - progress_bar.py[line:274] - INFO: epoch 004: 255 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7869.8, nsentences=120, sample_size=4090.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1950.6, ups=0.25, wpb=7869.8, bsz=120, num_updates=18350, lr=2.2222e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=75174 2023-05-01 23:26:41 - progress_bar.py[line:274] - INFO: epoch 004: 265 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7686.6, nsentences=120, sample_size=3938.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1944.8, ups=0.25, wpb=7686.6, bsz=120, num_updates=18360, lr=2.22167e-05, gnorm=0.955, clip=10, loss_scale=64, train_wall=39, gb_free=31.5, wall=75214 2023-05-01 23:27:21 - progress_bar.py[line:274] - INFO: epoch 004: 275 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7948.1, nsentences=120, sample_size=4319.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1996.3, ups=0.25, wpb=7948.1, bsz=120, num_updates=18370, lr=2.22115e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=75254 2023-05-01 23:28:01 - progress_bar.py[line:274] - INFO: epoch 004: 285 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7547, nsentences=120, sample_size=4344.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1887.8, ups=0.25, wpb=7547, bsz=120, num_updates=18380, lr=2.22062e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=31.4, wall=75294 2023-05-01 23:28:41 - progress_bar.py[line:274] - INFO: epoch 004: 295 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7844.6, nsentences=120, sample_size=4162.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1980.3, ups=0.25, wpb=7844.6, bsz=120, num_updates=18390, lr=2.22009e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=75333 2023-05-01 23:29:20 - progress_bar.py[line:274] - INFO: epoch 004: 305 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7442.5, nsentences=120, sample_size=3968.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1870.6, ups=0.25, wpb=7442.5, bsz=120, num_updates=18400, lr=2.21956e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=28.7, wall=75373 2023-05-01 23:30:00 - progress_bar.py[line:274] - INFO: epoch 004: 315 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7424.1, nsentences=120, sample_size=3908.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1890.9, ups=0.25, wpb=7424.1, bsz=120, num_updates=18410, lr=2.21903e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=75412 2023-05-01 23:30:39 - progress_bar.py[line:274] - INFO: epoch 004: 325 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7490.4, nsentences=120, sample_size=4243.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1883.2, ups=0.25, wpb=7490.4, bsz=120, num_updates=18420, lr=2.21851e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=75452 2023-05-01 23:31:19 - progress_bar.py[line:274] - INFO: epoch 004: 335 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7749.8, nsentences=120, sample_size=4348.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1959.3, ups=0.25, wpb=7749.8, bsz=120, num_updates=18430, lr=2.21798e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=75492 2023-05-01 23:32:00 - progress_bar.py[line:274] - INFO: epoch 004: 345 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7712.4, nsentences=120, sample_size=4018.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1903.3, ups=0.25, wpb=7712.4, bsz=120, num_updates=18440, lr=2.21745e-05, gnorm=1.01, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=75532 2023-05-01 23:32:41 - progress_bar.py[line:274] - INFO: epoch 004: 355 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7865.9, nsentences=120, sample_size=4004.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1911, ups=0.24, wpb=7865.9, bsz=120, num_updates=18450, lr=2.21692e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=41, gb_free=28.8, wall=75573 2023-05-01 23:33:20 - progress_bar.py[line:274] - INFO: epoch 004: 365 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7656.5, nsentences=120, sample_size=4129.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1933.4, ups=0.25, wpb=7656.5, bsz=120, num_updates=18460, lr=2.21639e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=75613 2023-05-01 23:34:00 - progress_bar.py[line:274] - INFO: epoch 004: 375 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7445.8, nsentences=120, sample_size=4098.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1883, ups=0.25, wpb=7445.8, bsz=120, num_updates=18470, lr=2.21586e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=75652 2023-05-01 23:34:40 - progress_bar.py[line:274] - INFO: epoch 004: 385 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7672.8, nsentences=120, sample_size=3743.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1932.7, ups=0.25, wpb=7672.8, bsz=120, num_updates=18480, lr=2.21534e-05, gnorm=0.994, clip=60, loss_scale=64, train_wall=40, gb_free=30.7, wall=75692 2023-05-01 23:35:19 - progress_bar.py[line:274] - INFO: epoch 004: 395 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7702.3, nsentences=120, sample_size=4010.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1972.1, ups=0.26, wpb=7702.3, bsz=120, num_updates=18490, lr=2.21481e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=39, gb_free=27.8, wall=75731 2023-05-01 23:35:59 - progress_bar.py[line:274] - INFO: epoch 004: 405 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7981.1, nsentences=120, sample_size=3857.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1966.3, ups=0.25, wpb=7981.1, bsz=120, num_updates=18500, lr=2.21428e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=41, gb_free=29, wall=75772 2023-05-01 23:36:40 - progress_bar.py[line:274] - INFO: epoch 004: 415 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7532.1, nsentences=120, sample_size=4180.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1867.5, ups=0.25, wpb=7532.1, bsz=120, num_updates=18510, lr=2.21375e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=27.7, wall=75812 2023-05-01 23:37:19 - progress_bar.py[line:274] - INFO: epoch 004: 425 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7870.7, nsentences=120, sample_size=3796.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1974.4, ups=0.25, wpb=7870.7, bsz=120, num_updates=18520, lr=2.21322e-05, gnorm=0.973, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=75852 2023-05-01 23:37:59 - progress_bar.py[line:274] - INFO: epoch 004: 435 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7631.1, nsentences=120, sample_size=4245.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1911.1, ups=0.25, wpb=7631.1, bsz=120, num_updates=18530, lr=2.21269e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=75892 2023-05-01 23:38:40 - progress_bar.py[line:274] - INFO: epoch 004: 445 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7850.8, nsentences=120, sample_size=3977.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1950.7, ups=0.25, wpb=7850.8, bsz=120, num_updates=18540, lr=2.21217e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=75932 2023-05-01 23:39:19 - progress_bar.py[line:274] - INFO: epoch 004: 455 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7865.2, nsentences=120, sample_size=4142.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1980.5, ups=0.25, wpb=7865.2, bsz=120, num_updates=18550, lr=2.21164e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=75972 2023-05-01 23:39:59 - progress_bar.py[line:274] - INFO: epoch 004: 465 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7763.1, nsentences=120, sample_size=4096.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1966, ups=0.25, wpb=7763.1, bsz=120, num_updates=18560, lr=2.21111e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=76011 2023-05-01 23:40:38 - progress_bar.py[line:274] - INFO: epoch 004: 475 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7808.2, nsentences=120, sample_size=4182, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1980.3, ups=0.25, wpb=7808.2, bsz=120, num_updates=18570, lr=2.21058e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=31.4, wall=76051 2023-05-01 23:41:19 - progress_bar.py[line:274] - INFO: epoch 004: 485 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7893.2, nsentences=120, sample_size=4271.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1934.7, ups=0.25, wpb=7893.2, bsz=120, num_updates=18580, lr=2.21005e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=41, gb_free=29.7, wall=76092 2023-05-01 23:41:59 - progress_bar.py[line:274] - INFO: epoch 004: 495 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7700.2, nsentences=120, sample_size=3875.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1946.1, ups=0.25, wpb=7700.2, bsz=120, num_updates=18590, lr=2.20953e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=76131 2023-05-01 23:42:39 - progress_bar.py[line:274] - INFO: epoch 004: 505 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7824.8, nsentences=120, sample_size=4240.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1941, ups=0.25, wpb=7824.8, bsz=120, num_updates=18600, lr=2.209e-05, gnorm=0.918, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=76171 2023-05-01 23:43:19 - progress_bar.py[line:274] - INFO: epoch 004: 515 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7964.4, nsentences=120, sample_size=3807.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1967.2, ups=0.25, wpb=7964.4, bsz=120, num_updates=18610, lr=2.20847e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=76212 2023-05-01 23:43:59 - progress_bar.py[line:274] - INFO: epoch 004: 525 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7645.8, nsentences=120, sample_size=3948.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1941.8, ups=0.25, wpb=7645.8, bsz=120, num_updates=18620, lr=2.20794e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=76251 2023-05-01 23:44:38 - progress_bar.py[line:274] - INFO: epoch 004: 535 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7580, nsentences=120, sample_size=3784.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1911.7, ups=0.25, wpb=7580, bsz=120, num_updates=18630, lr=2.20741e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=76291 2023-05-01 23:45:18 - progress_bar.py[line:274] - INFO: epoch 004: 545 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7863.4, nsentences=120, sample_size=4114.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1986.4, ups=0.25, wpb=7863.4, bsz=120, num_updates=18640, lr=2.20688e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=76330 2023-05-01 23:45:58 - progress_bar.py[line:274] - INFO: epoch 004: 555 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7959.1, nsentences=120, sample_size=3935, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1983.2, ups=0.25, wpb=7959.1, bsz=120, num_updates=18650, lr=2.20636e-05, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=76371 2023-05-01 23:46:38 - progress_bar.py[line:274] - INFO: epoch 004: 565 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7963.5, nsentences=120, sample_size=3776.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1997.1, ups=0.25, wpb=7963.5, bsz=120, num_updates=18660, lr=2.20583e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=26.3, wall=76410 2023-05-01 23:47:18 - progress_bar.py[line:274] - INFO: epoch 004: 575 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7708.6, nsentences=120, sample_size=4005.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1940.8, ups=0.25, wpb=7708.6, bsz=120, num_updates=18670, lr=2.2053e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=76450 2023-05-01 23:47:58 - progress_bar.py[line:274] - INFO: epoch 004: 585 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7578.2, nsentences=120, sample_size=4255.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1893.5, ups=0.25, wpb=7578.2, bsz=120, num_updates=18680, lr=2.20477e-05, gnorm=0.922, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=76490 2023-05-01 23:48:21 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-01 23:48:41 - progress_bar.py[line:274] - INFO: epoch 004: 596 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7798.3, nsentences=120, sample_size=3807.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1787.3, ups=0.23, wpb=7798.3, bsz=120, num_updates=18690, lr=2.20424e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=44, gb_free=28.1, wall=76534 2023-05-01 23:49:22 - progress_bar.py[line:274] - INFO: epoch 004: 606 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7803.3, nsentences=120, sample_size=3878.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1944.6, ups=0.25, wpb=7803.3, bsz=120, num_updates=18700, lr=2.20372e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=76574 2023-05-01 23:50:02 - progress_bar.py[line:274] - INFO: epoch 004: 616 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7906.7, nsentences=120, sample_size=4305, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1950.2, ups=0.25, wpb=7906.7, bsz=120, num_updates=18710, lr=2.20319e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=76615 2023-05-01 23:50:41 - progress_bar.py[line:274] - INFO: epoch 004: 626 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7518.5, nsentences=120, sample_size=3923.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1939, ups=0.26, wpb=7518.5, bsz=120, num_updates=18720, lr=2.20266e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=39, gb_free=30.6, wall=76653 2023-05-01 23:51:21 - progress_bar.py[line:274] - INFO: epoch 004: 636 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7474.5, nsentences=120, sample_size=3854.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1862.4, ups=0.25, wpb=7474.5, bsz=120, num_updates=18730, lr=2.20213e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=76693 2023-05-01 23:52:01 - progress_bar.py[line:274] - INFO: epoch 004: 646 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7568.1, nsentences=120, sample_size=4295.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1886.2, ups=0.25, wpb=7568.1, bsz=120, num_updates=18740, lr=2.2016e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=76734 2023-05-01 23:52:41 - progress_bar.py[line:274] - INFO: epoch 004: 656 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7971.8, nsentences=120, sample_size=4052.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2014.4, ups=0.25, wpb=7971.8, bsz=120, num_updates=18750, lr=2.20107e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=76773 2023-05-01 23:53:20 - progress_bar.py[line:274] - INFO: epoch 004: 666 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7556.8, nsentences=120, sample_size=4007.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1944.8, ups=0.26, wpb=7556.8, bsz=120, num_updates=18760, lr=2.20055e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=76812 2023-05-01 23:53:59 - progress_bar.py[line:274] - INFO: epoch 004: 676 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7528.5, nsentences=120, sample_size=4207.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1903.9, ups=0.25, wpb=7528.5, bsz=120, num_updates=18770, lr=2.20002e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=76852 2023-05-01 23:54:38 - progress_bar.py[line:274] - INFO: epoch 004: 686 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7408.1, nsentences=120, sample_size=3943.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1900.8, ups=0.26, wpb=7408.1, bsz=120, num_updates=18780, lr=2.19949e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=76891 2023-05-01 23:55:18 - progress_bar.py[line:274] - INFO: epoch 004: 696 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7548.2, nsentences=120, sample_size=3880, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1871.9, ups=0.25, wpb=7548.2, bsz=120, num_updates=18790, lr=2.19896e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=76931 2023-05-01 23:55:58 - progress_bar.py[line:274] - INFO: epoch 004: 706 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7790.2, nsentences=120, sample_size=3975, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1972.9, ups=0.25, wpb=7790.2, bsz=120, num_updates=18800, lr=2.19843e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=39, gb_free=29.2, wall=76970 2023-05-01 23:56:38 - progress_bar.py[line:274] - INFO: epoch 004: 716 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7756.4, nsentences=120, sample_size=4320.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1940.4, ups=0.25, wpb=7756.4, bsz=120, num_updates=18810, lr=2.1979e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=77010 2023-05-01 23:57:17 - progress_bar.py[line:274] - INFO: epoch 004: 726 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7474.2, nsentences=120, sample_size=3566.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1900.5, ups=0.25, wpb=7474.2, bsz=120, num_updates=18820, lr=2.19738e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=77050 2023-05-01 23:57:56 - progress_bar.py[line:274] - INFO: epoch 004: 736 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7812.9, nsentences=120, sample_size=3889, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1990.9, ups=0.25, wpb=7812.9, bsz=120, num_updates=18830, lr=2.19685e-05, gnorm=0.99, clip=40, loss_scale=64, train_wall=39, gb_free=31.1, wall=77089 2023-05-01 23:58:36 - progress_bar.py[line:274] - INFO: epoch 004: 746 / 6042 loss=2.488, loss_v1=0, loss_v2=0, nll_loss=1.25, ntokens=7534.8, nsentences=120, sample_size=4356.6, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1923.5, ups=0.26, wpb=7534.8, bsz=120, num_updates=18840, lr=2.19632e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=77128 2023-05-01 23:59:15 - progress_bar.py[line:274] - INFO: epoch 004: 756 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7553, nsentences=120, sample_size=4039.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1910.6, ups=0.25, wpb=7553, bsz=120, num_updates=18850, lr=2.19579e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=77168 2023-05-01 23:59:55 - progress_bar.py[line:274] - INFO: epoch 004: 766 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=8109.7, nsentences=120, sample_size=3996.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2027.2, ups=0.25, wpb=8109.7, bsz=120, num_updates=18860, lr=2.19526e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=77208 2023-05-02 00:00:35 - progress_bar.py[line:274] - INFO: epoch 004: 776 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7613.9, nsentences=120, sample_size=3907, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1894.5, ups=0.25, wpb=7613.9, bsz=120, num_updates=18870, lr=2.19474e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=23.6, wall=77248 2023-05-02 00:01:15 - progress_bar.py[line:274] - INFO: epoch 004: 786 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7651.3, nsentences=120, sample_size=4304.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1921.3, ups=0.25, wpb=7651.3, bsz=120, num_updates=18880, lr=2.19421e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=77288 2023-05-02 00:01:55 - progress_bar.py[line:274] - INFO: epoch 004: 796 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7389.8, nsentences=120, sample_size=3999.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1861.1, ups=0.25, wpb=7389.8, bsz=120, num_updates=18890, lr=2.19368e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=77327 2023-05-02 00:02:34 - progress_bar.py[line:274] - INFO: epoch 004: 806 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7581.4, nsentences=120, sample_size=4161.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1942.5, ups=0.26, wpb=7581.4, bsz=120, num_updates=18900, lr=2.19315e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=77366 2023-05-02 00:03:13 - progress_bar.py[line:274] - INFO: epoch 004: 816 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7518.2, nsentences=120, sample_size=4330.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1901.2, ups=0.25, wpb=7518.2, bsz=120, num_updates=18910, lr=2.19262e-05, gnorm=0.911, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=77406 2023-05-02 00:03:53 - progress_bar.py[line:274] - INFO: epoch 004: 826 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7664.7, nsentences=120, sample_size=3975.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1951, ups=0.25, wpb=7664.7, bsz=120, num_updates=18920, lr=2.19209e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=77445 2023-05-02 00:04:33 - progress_bar.py[line:274] - INFO: epoch 004: 836 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7891.2, nsentences=120, sample_size=3956, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1977.9, ups=0.25, wpb=7891.2, bsz=120, num_updates=18930, lr=2.19157e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=77485 2023-05-02 00:05:12 - progress_bar.py[line:274] - INFO: epoch 004: 846 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7599.7, nsentences=120, sample_size=3753.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1934.7, ups=0.25, wpb=7599.7, bsz=120, num_updates=18940, lr=2.19104e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=77524 2023-05-02 00:05:51 - progress_bar.py[line:274] - INFO: epoch 004: 856 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7740.9, nsentences=120, sample_size=3911.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1958, ups=0.25, wpb=7740.9, bsz=120, num_updates=18950, lr=2.19051e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=27.8, wall=77564 2023-05-02 00:06:31 - progress_bar.py[line:274] - INFO: epoch 004: 866 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7668, nsentences=120, sample_size=3960.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1936, ups=0.25, wpb=7668, bsz=120, num_updates=18960, lr=2.18998e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=77604 2023-05-02 00:07:12 - progress_bar.py[line:274] - INFO: epoch 004: 876 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7656.9, nsentences=120, sample_size=4508.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1860.9, ups=0.24, wpb=7656.9, bsz=120, num_updates=18970, lr=2.18945e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=41, gb_free=29.6, wall=77645 2023-05-02 00:07:52 - progress_bar.py[line:274] - INFO: epoch 004: 886 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7880.4, nsentences=120, sample_size=4273.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1990.4, ups=0.25, wpb=7880.4, bsz=120, num_updates=18980, lr=2.18893e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=25.5, wall=77684 2023-05-02 00:08:32 - progress_bar.py[line:274] - INFO: epoch 004: 896 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7690.3, nsentences=120, sample_size=4108, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1919.3, ups=0.25, wpb=7690.3, bsz=120, num_updates=18990, lr=2.1884e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=77724 2023-05-02 00:09:11 - progress_bar.py[line:274] - INFO: epoch 004: 906 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7467.3, nsentences=120, sample_size=4308.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1914.6, ups=0.26, wpb=7467.3, bsz=120, num_updates=19000, lr=2.18787e-05, gnorm=0.926, clip=20, loss_scale=64, train_wall=39, gb_free=26.7, wall=77763 2023-05-02 00:09:11 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 00:09:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 00:09:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:09:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 00:09:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:09:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 00:09:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:09:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 00:09:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:09:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:57 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 00:09:57 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:09:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:09:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:09:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:10:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:10:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:10:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:10:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:10:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 00:10:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 00:10:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 00:10:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 00:10:02 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.222 | loss_v1 0 | loss_v2 0 | nll_loss 2.055 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.15 | score 0.7476 | wps 3299.2 | wpb 3202.1 | bsz 39.4 | num_updates 19000 | best_score 0.751 2023-05-02 00:10:02 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 19000 updates 2023-05-02 00:10:02 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_19000.pt 2023-05-02 00:10:28 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_19000.pt 2023-05-02 00:10:42 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_19000.pt (epoch 4 @ 19000 updates, score 0.7476) (writing took 39.814127509947866 seconds) 2023-05-02 00:11:22 - progress_bar.py[line:274] - INFO: epoch 004: 916 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7796.5, nsentences=120, sample_size=3947.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=594.5, ups=0.08, wpb=7796.5, bsz=120, num_updates=19010, lr=2.18734e-05, gnorm=0.921, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=77894 2023-05-02 00:12:02 - progress_bar.py[line:274] - INFO: epoch 004: 926 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7537.9, nsentences=120, sample_size=4038.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1864.3, ups=0.25, wpb=7537.9, bsz=120, num_updates=19020, lr=2.18681e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=77935 2023-05-02 00:12:42 - progress_bar.py[line:274] - INFO: epoch 004: 936 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7694.1, nsentences=120, sample_size=3940.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1949.9, ups=0.25, wpb=7694.1, bsz=120, num_updates=19030, lr=2.18628e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=31.1, wall=77974 2023-05-02 00:13:21 - progress_bar.py[line:274] - INFO: epoch 004: 946 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7554.1, nsentences=120, sample_size=4095.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1913.5, ups=0.25, wpb=7554.1, bsz=120, num_updates=19040, lr=2.18576e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=78014 2023-05-02 00:14:01 - progress_bar.py[line:274] - INFO: epoch 004: 956 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7719.6, nsentences=120, sample_size=3854.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1928.1, ups=0.25, wpb=7719.6, bsz=120, num_updates=19050, lr=2.18523e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=78054 2023-05-02 00:14:42 - progress_bar.py[line:274] - INFO: epoch 004: 966 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7801, nsentences=120, sample_size=4058.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1927.4, ups=0.25, wpb=7801, bsz=120, num_updates=19060, lr=2.1847e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=78094 2023-05-02 00:15:22 - progress_bar.py[line:274] - INFO: epoch 004: 976 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7934.5, nsentences=120, sample_size=4141.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1983.6, ups=0.25, wpb=7934.5, bsz=120, num_updates=19070, lr=2.18417e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=78134 2023-05-02 00:16:01 - progress_bar.py[line:274] - INFO: epoch 004: 986 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7672.1, nsentences=120, sample_size=4240.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1950.5, ups=0.25, wpb=7672.1, bsz=120, num_updates=19080, lr=2.18364e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=78174 2023-05-02 00:16:41 - progress_bar.py[line:274] - INFO: epoch 004: 996 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7920, nsentences=120, sample_size=3986.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1971, ups=0.25, wpb=7920, bsz=120, num_updates=19090, lr=2.18311e-05, gnorm=0.953, clip=40, loss_scale=64, train_wall=40, gb_free=28, wall=78214 2023-05-02 00:17:20 - progress_bar.py[line:274] - INFO: epoch 004: 1006 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7919.6, nsentences=120, sample_size=3895.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2032.7, ups=0.26, wpb=7919.6, bsz=120, num_updates=19100, lr=2.18259e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=78253 2023-05-02 00:18:00 - progress_bar.py[line:274] - INFO: epoch 004: 1016 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7693.9, nsentences=120, sample_size=3918.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1939.7, ups=0.25, wpb=7693.9, bsz=120, num_updates=19110, lr=2.18206e-05, gnorm=0.985, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=78293 2023-05-02 00:18:40 - progress_bar.py[line:274] - INFO: epoch 004: 1026 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7759.4, nsentences=120, sample_size=4147.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1964, ups=0.25, wpb=7759.4, bsz=120, num_updates=19120, lr=2.18153e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=24.3, wall=78332 2023-05-02 00:19:19 - progress_bar.py[line:274] - INFO: epoch 004: 1036 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7664.3, nsentences=120, sample_size=4028.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1928.5, ups=0.25, wpb=7664.3, bsz=120, num_updates=19130, lr=2.181e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=78372 2023-05-02 00:19:59 - progress_bar.py[line:274] - INFO: epoch 004: 1046 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7948.5, nsentences=120, sample_size=3922.7, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1980.2, ups=0.25, wpb=7948.5, bsz=120, num_updates=19140, lr=2.18047e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=78412 2023-05-02 00:20:40 - progress_bar.py[line:274] - INFO: epoch 004: 1056 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7832.2, nsentences=120, sample_size=3858.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1925.5, ups=0.25, wpb=7832.2, bsz=120, num_updates=19150, lr=2.17995e-05, gnorm=0.983, clip=50, loss_scale=64, train_wall=41, gb_free=23.6, wall=78453 2023-05-02 00:21:20 - progress_bar.py[line:274] - INFO: epoch 004: 1066 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=8072.4, nsentences=120, sample_size=4178.1, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2014.5, ups=0.25, wpb=8072.4, bsz=120, num_updates=19160, lr=2.17942e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=78493 2023-05-02 00:22:00 - progress_bar.py[line:274] - INFO: epoch 004: 1076 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7839.3, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1945.5, ups=0.25, wpb=7839.3, bsz=120, num_updates=19170, lr=2.17889e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=78533 2023-05-02 00:22:40 - progress_bar.py[line:274] - INFO: epoch 004: 1086 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7894.8, nsentences=120, sample_size=3786.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1984, ups=0.25, wpb=7894.8, bsz=120, num_updates=19180, lr=2.17836e-05, gnorm=0.986, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=78573 2023-05-02 00:23:20 - progress_bar.py[line:274] - INFO: epoch 004: 1096 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=8021.2, nsentences=120, sample_size=4250.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2016.6, ups=0.25, wpb=8021.2, bsz=120, num_updates=19190, lr=2.17783e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=78613 2023-05-02 00:23:59 - progress_bar.py[line:274] - INFO: epoch 004: 1106 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7898.9, nsentences=120, sample_size=3809.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2013.8, ups=0.25, wpb=7898.9, bsz=120, num_updates=19200, lr=2.1773e-05, gnorm=0.951, clip=20, loss_scale=128, train_wall=39, gb_free=29.8, wall=78652 2023-05-02 00:24:20 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 00:24:43 - progress_bar.py[line:274] - INFO: epoch 004: 1117 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=8143.1, nsentences=120, sample_size=3654.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1855.4, ups=0.23, wpb=8143.1, bsz=120, num_updates=19210, lr=2.17678e-05, gnorm=0.926, clip=30, loss_scale=64, train_wall=44, gb_free=30.3, wall=78696 2023-05-02 00:25:23 - progress_bar.py[line:274] - INFO: epoch 004: 1127 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7566.5, nsentences=120, sample_size=3781.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1893, ups=0.25, wpb=7566.5, bsz=120, num_updates=19220, lr=2.17625e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=78736 2023-05-02 00:26:04 - progress_bar.py[line:274] - INFO: epoch 004: 1137 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=8063.1, nsentences=120, sample_size=4035.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1981.8, ups=0.25, wpb=8063.1, bsz=120, num_updates=19230, lr=2.17572e-05, gnorm=0.939, clip=0, loss_scale=64, train_wall=41, gb_free=29.7, wall=78776 2023-05-02 00:26:44 - progress_bar.py[line:274] - INFO: epoch 004: 1147 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8007.7, nsentences=120, sample_size=4219.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2017.1, ups=0.25, wpb=8007.7, bsz=120, num_updates=19240, lr=2.17519e-05, gnorm=0.907, clip=0, loss_scale=64, train_wall=40, gb_free=27.5, wall=78816 2023-05-02 00:27:24 - progress_bar.py[line:274] - INFO: epoch 004: 1157 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7806.5, nsentences=120, sample_size=4196.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1927.8, ups=0.25, wpb=7806.5, bsz=120, num_updates=19250, lr=2.17466e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=28.6, wall=78857 2023-05-02 00:28:04 - progress_bar.py[line:274] - INFO: epoch 004: 1167 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7692.8, nsentences=120, sample_size=4039.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1931.6, ups=0.25, wpb=7692.8, bsz=120, num_updates=19260, lr=2.17414e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=78896 2023-05-02 00:28:43 - progress_bar.py[line:274] - INFO: epoch 004: 1177 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7992.3, nsentences=120, sample_size=3801.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2027, ups=0.25, wpb=7992.3, bsz=120, num_updates=19270, lr=2.17361e-05, gnorm=0.965, clip=10, loss_scale=64, train_wall=39, gb_free=29.1, wall=78936 2023-05-02 00:29:23 - progress_bar.py[line:274] - INFO: epoch 004: 1187 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7867.8, nsentences=120, sample_size=3943.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1991.8, ups=0.25, wpb=7867.8, bsz=120, num_updates=19280, lr=2.17308e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=39, gb_free=30.8, wall=78975 2023-05-02 00:30:02 - progress_bar.py[line:274] - INFO: epoch 004: 1197 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7651.9, nsentences=120, sample_size=3852.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1937.9, ups=0.25, wpb=7651.9, bsz=120, num_updates=19290, lr=2.17255e-05, gnorm=0.93, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=79015 2023-05-02 00:30:42 - progress_bar.py[line:274] - INFO: epoch 004: 1207 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7866, nsentences=120, sample_size=4031.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1979.1, ups=0.25, wpb=7866, bsz=120, num_updates=19300, lr=2.17202e-05, gnorm=0.938, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=79054 2023-05-02 00:31:22 - progress_bar.py[line:274] - INFO: epoch 004: 1217 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7888.2, nsentences=120, sample_size=4332.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1957, ups=0.25, wpb=7888.2, bsz=120, num_updates=19310, lr=2.17149e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=79095 2023-05-02 00:32:03 - progress_bar.py[line:274] - INFO: epoch 004: 1227 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7832.3, nsentences=120, sample_size=4090.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1944.8, ups=0.25, wpb=7832.3, bsz=120, num_updates=19320, lr=2.17097e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=28, wall=79135 2023-05-02 00:32:43 - progress_bar.py[line:274] - INFO: epoch 004: 1237 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7854.3, nsentences=120, sample_size=4133.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1938.1, ups=0.25, wpb=7854.3, bsz=120, num_updates=19330, lr=2.17044e-05, gnorm=0.983, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=79176 2023-05-02 00:33:22 - progress_bar.py[line:274] - INFO: epoch 004: 1247 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7804.7, nsentences=120, sample_size=3834.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1997.8, ups=0.26, wpb=7804.7, bsz=120, num_updates=19340, lr=2.16991e-05, gnorm=1.005, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=79215 2023-05-02 00:34:02 - progress_bar.py[line:274] - INFO: epoch 004: 1257 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7458.2, nsentences=120, sample_size=4121.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1870.3, ups=0.25, wpb=7458.2, bsz=120, num_updates=19350, lr=2.16938e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=79255 2023-05-02 00:34:43 - progress_bar.py[line:274] - INFO: epoch 004: 1267 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7823.2, nsentences=120, sample_size=3901.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1929, ups=0.25, wpb=7823.2, bsz=120, num_updates=19360, lr=2.16885e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=79295 2023-05-02 00:35:23 - progress_bar.py[line:274] - INFO: epoch 004: 1277 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7593, nsentences=120, sample_size=4208.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1895.5, ups=0.25, wpb=7593, bsz=120, num_updates=19370, lr=2.16832e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=79335 2023-05-02 00:36:03 - progress_bar.py[line:274] - INFO: epoch 004: 1287 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7716.7, nsentences=120, sample_size=4270.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1898.7, ups=0.25, wpb=7716.7, bsz=120, num_updates=19380, lr=2.1678e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=41, gb_free=29.8, wall=79376 2023-05-02 00:36:43 - progress_bar.py[line:274] - INFO: epoch 004: 1297 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7907.6, nsentences=120, sample_size=4115.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1990.6, ups=0.25, wpb=7907.6, bsz=120, num_updates=19390, lr=2.16727e-05, gnorm=0.959, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=79416 2023-05-02 00:37:23 - progress_bar.py[line:274] - INFO: epoch 004: 1307 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7832.4, nsentences=120, sample_size=4279.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1980.2, ups=0.25, wpb=7832.4, bsz=120, num_updates=19400, lr=2.16674e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=79455 2023-05-02 00:38:02 - progress_bar.py[line:274] - INFO: epoch 004: 1317 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7439.4, nsentences=120, sample_size=3967.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1896.8, ups=0.25, wpb=7439.4, bsz=120, num_updates=19410, lr=2.16621e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=79494 2023-05-02 00:38:42 - progress_bar.py[line:274] - INFO: epoch 004: 1327 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7441.2, nsentences=120, sample_size=3912.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1874.7, ups=0.25, wpb=7441.2, bsz=120, num_updates=19420, lr=2.16568e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=26.9, wall=79534 2023-05-02 00:39:22 - progress_bar.py[line:274] - INFO: epoch 004: 1337 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7960.9, nsentences=120, sample_size=3881, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1972.4, ups=0.25, wpb=7960.9, bsz=120, num_updates=19430, lr=2.16516e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=79574 2023-05-02 00:40:02 - progress_bar.py[line:274] - INFO: epoch 004: 1347 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7926.5, nsentences=120, sample_size=3942.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2000.9, ups=0.25, wpb=7926.5, bsz=120, num_updates=19440, lr=2.16463e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=79614 2023-05-02 00:40:41 - progress_bar.py[line:274] - INFO: epoch 004: 1357 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7490, nsentences=120, sample_size=4291.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1876.1, ups=0.25, wpb=7490, bsz=120, num_updates=19450, lr=2.1641e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=40, gb_free=30.9, wall=79654 2023-05-02 00:41:21 - progress_bar.py[line:274] - INFO: epoch 004: 1367 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7747.8, nsentences=120, sample_size=4021.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1970.9, ups=0.25, wpb=7747.8, bsz=120, num_updates=19460, lr=2.16357e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=39, gb_free=31.3, wall=79693 2023-05-02 00:42:01 - progress_bar.py[line:274] - INFO: epoch 004: 1377 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7868.8, nsentences=120, sample_size=3696.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1976.2, ups=0.25, wpb=7868.8, bsz=120, num_updates=19470, lr=2.16304e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=79733 2023-05-02 00:42:40 - progress_bar.py[line:274] - INFO: epoch 004: 1387 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7872.5, nsentences=120, sample_size=4126.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1974.8, ups=0.25, wpb=7872.5, bsz=120, num_updates=19480, lr=2.16251e-05, gnorm=0.936, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=79773 2023-05-02 00:43:21 - progress_bar.py[line:274] - INFO: epoch 004: 1397 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7484.2, nsentences=120, sample_size=4277.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1845.1, ups=0.25, wpb=7484.2, bsz=120, num_updates=19490, lr=2.16199e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=79813 2023-05-02 00:44:01 - progress_bar.py[line:274] - INFO: epoch 004: 1407 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7859.6, nsentences=120, sample_size=3957.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1978.6, ups=0.25, wpb=7859.6, bsz=120, num_updates=19500, lr=2.16146e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=31.2, wall=79853 2023-05-02 00:44:41 - progress_bar.py[line:274] - INFO: epoch 004: 1417 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7939.4, nsentences=120, sample_size=3871.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1981.5, ups=0.25, wpb=7939.4, bsz=120, num_updates=19510, lr=2.16093e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=79893 2023-05-02 00:45:20 - progress_bar.py[line:274] - INFO: epoch 004: 1427 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7583.7, nsentences=120, sample_size=4115.7, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1915.7, ups=0.25, wpb=7583.7, bsz=120, num_updates=19520, lr=2.1604e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=79933 2023-05-02 00:46:00 - progress_bar.py[line:274] - INFO: epoch 004: 1437 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=8101.8, nsentences=120, sample_size=3807.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2035.8, ups=0.25, wpb=8101.8, bsz=120, num_updates=19530, lr=2.15987e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=79973 2023-05-02 00:46:40 - progress_bar.py[line:274] - INFO: epoch 004: 1447 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7361.8, nsentences=120, sample_size=3944.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1845.1, ups=0.25, wpb=7361.8, bsz=120, num_updates=19540, lr=2.15935e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=80013 2023-05-02 00:47:19 - progress_bar.py[line:274] - INFO: epoch 004: 1457 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7541.2, nsentences=120, sample_size=4034.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1935.7, ups=0.26, wpb=7541.2, bsz=120, num_updates=19550, lr=2.15882e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=28.8, wall=80052 2023-05-02 00:47:59 - progress_bar.py[line:274] - INFO: epoch 004: 1467 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7983.2, nsentences=120, sample_size=4286.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1997.9, ups=0.25, wpb=7983.2, bsz=120, num_updates=19560, lr=2.15829e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=80091 2023-05-02 00:48:40 - progress_bar.py[line:274] - INFO: epoch 004: 1477 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7850.9, nsentences=120, sample_size=4041.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1935.9, ups=0.25, wpb=7850.9, bsz=120, num_updates=19570, lr=2.15776e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=80132 2023-05-02 00:49:20 - progress_bar.py[line:274] - INFO: epoch 004: 1487 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7718.2, nsentences=120, sample_size=4058.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1931.5, ups=0.25, wpb=7718.2, bsz=120, num_updates=19580, lr=2.15723e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=80172 2023-05-02 00:49:59 - progress_bar.py[line:274] - INFO: epoch 004: 1497 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7796.5, nsentences=120, sample_size=3847.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1952.7, ups=0.25, wpb=7796.5, bsz=120, num_updates=19590, lr=2.1567e-05, gnorm=0.967, clip=50, loss_scale=64, train_wall=40, gb_free=29.1, wall=80212 2023-05-02 00:50:38 - progress_bar.py[line:274] - INFO: epoch 004: 1507 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7711.5, nsentences=120, sample_size=4317.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1975.8, ups=0.26, wpb=7711.5, bsz=120, num_updates=19600, lr=2.15618e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=25, wall=80251 2023-05-02 00:51:17 - progress_bar.py[line:274] - INFO: epoch 004: 1517 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7651, nsentences=120, sample_size=4002.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1962.2, ups=0.26, wpb=7651, bsz=120, num_updates=19610, lr=2.15565e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=39, gb_free=31.1, wall=80290 2023-05-02 00:51:58 - progress_bar.py[line:274] - INFO: epoch 004: 1527 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7669.7, nsentences=120, sample_size=3867.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1895.2, ups=0.25, wpb=7669.7, bsz=120, num_updates=19620, lr=2.15512e-05, gnorm=0.956, clip=40, loss_scale=64, train_wall=40, gb_free=27.9, wall=80330 2023-05-02 00:52:38 - progress_bar.py[line:274] - INFO: epoch 004: 1537 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7928.6, nsentences=120, sample_size=4028.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1985.3, ups=0.25, wpb=7928.6, bsz=120, num_updates=19630, lr=2.15459e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=80370 2023-05-02 00:53:18 - progress_bar.py[line:274] - INFO: epoch 004: 1547 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7947.7, nsentences=120, sample_size=3623.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1964.2, ups=0.25, wpb=7947.7, bsz=120, num_updates=19640, lr=2.15406e-05, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=27.4, wall=80411 2023-05-02 00:53:59 - progress_bar.py[line:274] - INFO: epoch 004: 1557 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7478.8, nsentences=120, sample_size=4211.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1856, ups=0.25, wpb=7478.8, bsz=120, num_updates=19650, lr=2.15353e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=80451 2023-05-02 00:54:38 - progress_bar.py[line:274] - INFO: epoch 004: 1567 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7389.4, nsentences=120, sample_size=3912, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1886.6, ups=0.26, wpb=7389.4, bsz=120, num_updates=19660, lr=2.15301e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=80490 2023-05-02 00:55:18 - progress_bar.py[line:274] - INFO: epoch 004: 1577 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=7979.3, nsentences=120, sample_size=3930, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1975.2, ups=0.25, wpb=7979.3, bsz=120, num_updates=19670, lr=2.15248e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=80531 2023-05-02 00:55:58 - progress_bar.py[line:274] - INFO: epoch 004: 1587 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7676.1, nsentences=120, sample_size=4374.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1918, ups=0.25, wpb=7676.1, bsz=120, num_updates=19680, lr=2.15195e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=80571 2023-05-02 00:56:38 - progress_bar.py[line:274] - INFO: epoch 004: 1597 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7776.3, nsentences=120, sample_size=3933.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1939.9, ups=0.25, wpb=7776.3, bsz=120, num_updates=19690, lr=2.15142e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=80611 2023-05-02 00:57:19 - progress_bar.py[line:274] - INFO: epoch 004: 1607 / 6042 loss=2.498, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7820.4, nsentences=120, sample_size=4198.9, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1935.1, ups=0.25, wpb=7820.4, bsz=120, num_updates=19700, lr=2.15089e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=80651 2023-05-02 00:57:58 - progress_bar.py[line:274] - INFO: epoch 004: 1617 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7942.5, nsentences=120, sample_size=4310.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2001.6, ups=0.25, wpb=7942.5, bsz=120, num_updates=19710, lr=2.15037e-05, gnorm=0.922, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=80691 2023-05-02 00:58:39 - progress_bar.py[line:274] - INFO: epoch 004: 1627 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7621.5, nsentences=120, sample_size=4090.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1893.3, ups=0.25, wpb=7621.5, bsz=120, num_updates=19720, lr=2.14984e-05, gnorm=0.93, clip=10, loss_scale=128, train_wall=40, gb_free=30.3, wall=80731 2023-05-02 00:59:19 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 00:59:22 - progress_bar.py[line:274] - INFO: epoch 004: 1638 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7854.4, nsentences=120, sample_size=4044.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1795.2, ups=0.23, wpb=7854.4, bsz=120, num_updates=19730, lr=2.14931e-05, gnorm=0.944, clip=30, loss_scale=64, train_wall=44, gb_free=29.6, wall=80775 2023-05-02 01:00:02 - progress_bar.py[line:274] - INFO: epoch 004: 1648 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7734.8, nsentences=120, sample_size=4003.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1947.2, ups=0.25, wpb=7734.8, bsz=120, num_updates=19740, lr=2.14878e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=80815 2023-05-02 01:00:42 - progress_bar.py[line:274] - INFO: epoch 004: 1658 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7577.3, nsentences=120, sample_size=3892.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1916.9, ups=0.25, wpb=7577.3, bsz=120, num_updates=19750, lr=2.14825e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=80854 2023-05-02 01:01:21 - progress_bar.py[line:274] - INFO: epoch 004: 1668 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7658.5, nsentences=120, sample_size=3977.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1947, ups=0.25, wpb=7658.5, bsz=120, num_updates=19760, lr=2.14772e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=39, gb_free=30.9, wall=80893 2023-05-02 01:02:00 - progress_bar.py[line:274] - INFO: epoch 004: 1678 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7511.8, nsentences=120, sample_size=4007.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1937.4, ups=0.26, wpb=7511.8, bsz=120, num_updates=19770, lr=2.1472e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=80932 2023-05-02 01:02:40 - progress_bar.py[line:274] - INFO: epoch 004: 1688 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7739.9, nsentences=120, sample_size=3877.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1943.7, ups=0.25, wpb=7739.9, bsz=120, num_updates=19780, lr=2.14667e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=80972 2023-05-02 01:03:19 - progress_bar.py[line:274] - INFO: epoch 004: 1698 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7582.9, nsentences=120, sample_size=3869.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1903.1, ups=0.25, wpb=7582.9, bsz=120, num_updates=19790, lr=2.14614e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=81012 2023-05-02 01:03:59 - progress_bar.py[line:274] - INFO: epoch 004: 1708 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7578.4, nsentences=120, sample_size=4066, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1918.4, ups=0.25, wpb=7578.4, bsz=120, num_updates=19800, lr=2.14561e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=39, gb_free=29.6, wall=81051 2023-05-02 01:04:39 - progress_bar.py[line:274] - INFO: epoch 004: 1718 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7772.7, nsentences=120, sample_size=4018.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1925.9, ups=0.25, wpb=7772.7, bsz=120, num_updates=19810, lr=2.14508e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=81092 2023-05-02 01:05:19 - progress_bar.py[line:274] - INFO: epoch 004: 1728 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7823.2, nsentences=120, sample_size=4193.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1958.1, ups=0.25, wpb=7823.2, bsz=120, num_updates=19820, lr=2.14455e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=81132 2023-05-02 01:05:58 - progress_bar.py[line:274] - INFO: epoch 004: 1738 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7772.5, nsentences=120, sample_size=4085, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1989.4, ups=0.26, wpb=7772.5, bsz=120, num_updates=19830, lr=2.14403e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=81171 2023-05-02 01:06:39 - progress_bar.py[line:274] - INFO: epoch 004: 1748 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7895.9, nsentences=120, sample_size=4076.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1953.3, ups=0.25, wpb=7895.9, bsz=120, num_updates=19840, lr=2.1435e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=81211 2023-05-02 01:07:18 - progress_bar.py[line:274] - INFO: epoch 004: 1758 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7962.9, nsentences=120, sample_size=4054, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2019.3, ups=0.25, wpb=7962.9, bsz=120, num_updates=19850, lr=2.14297e-05, gnorm=0.941, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=81251 2023-05-02 01:07:58 - progress_bar.py[line:274] - INFO: epoch 004: 1768 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7714.1, nsentences=120, sample_size=3825.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1937.3, ups=0.25, wpb=7714.1, bsz=120, num_updates=19860, lr=2.14244e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=81290 2023-05-02 01:08:39 - progress_bar.py[line:274] - INFO: epoch 004: 1778 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7609.8, nsentences=120, sample_size=4201.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1877.1, ups=0.25, wpb=7609.8, bsz=120, num_updates=19870, lr=2.14191e-05, gnorm=0.928, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=81331 2023-05-02 01:09:18 - progress_bar.py[line:274] - INFO: epoch 004: 1788 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7412.1, nsentences=120, sample_size=4119.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1872.5, ups=0.25, wpb=7412.1, bsz=120, num_updates=19880, lr=2.14139e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=40, gb_free=28.1, wall=81371 2023-05-02 01:09:57 - progress_bar.py[line:274] - INFO: epoch 004: 1798 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7577.4, nsentences=120, sample_size=4086.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1930.3, ups=0.25, wpb=7577.4, bsz=120, num_updates=19890, lr=2.14086e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=81410 2023-05-02 01:10:37 - progress_bar.py[line:274] - INFO: epoch 004: 1808 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7757.4, nsentences=120, sample_size=3759.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1957.7, ups=0.25, wpb=7757.4, bsz=120, num_updates=19900, lr=2.14033e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=81450 2023-05-02 01:11:16 - progress_bar.py[line:274] - INFO: epoch 004: 1818 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7675.4, nsentences=120, sample_size=3675.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1949.2, ups=0.25, wpb=7675.4, bsz=120, num_updates=19910, lr=2.1398e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=28.8, wall=81489 2023-05-02 01:11:56 - progress_bar.py[line:274] - INFO: epoch 004: 1828 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7601.2, nsentences=120, sample_size=3772.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926.7, ups=0.25, wpb=7601.2, bsz=120, num_updates=19920, lr=2.13927e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=28.1, wall=81528 2023-05-02 01:12:35 - progress_bar.py[line:274] - INFO: epoch 004: 1838 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7633.4, nsentences=120, sample_size=4108.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1953.9, ups=0.26, wpb=7633.4, bsz=120, num_updates=19930, lr=2.13874e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=28.7, wall=81567 2023-05-02 01:13:14 - progress_bar.py[line:274] - INFO: epoch 004: 1848 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7683.6, nsentences=120, sample_size=3982.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1965.3, ups=0.26, wpb=7683.6, bsz=120, num_updates=19940, lr=2.13822e-05, gnorm=0.924, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=81606 2023-05-02 01:13:54 - progress_bar.py[line:274] - INFO: epoch 004: 1858 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7633.8, nsentences=120, sample_size=4059.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.6, ups=0.25, wpb=7633.8, bsz=120, num_updates=19950, lr=2.13769e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=81647 2023-05-02 01:14:33 - progress_bar.py[line:274] - INFO: epoch 004: 1868 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7639.1, nsentences=120, sample_size=4097.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953.1, ups=0.26, wpb=7639.1, bsz=120, num_updates=19960, lr=2.13716e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=81686 2023-05-02 01:15:13 - progress_bar.py[line:274] - INFO: epoch 004: 1878 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7735.4, nsentences=120, sample_size=3958.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1941.8, ups=0.25, wpb=7735.4, bsz=120, num_updates=19970, lr=2.13663e-05, gnorm=0.917, clip=10, loss_scale=64, train_wall=40, gb_free=31.4, wall=81726 2023-05-02 01:15:53 - progress_bar.py[line:274] - INFO: epoch 004: 1888 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7783.6, nsentences=120, sample_size=4191.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1953.4, ups=0.25, wpb=7783.6, bsz=120, num_updates=19980, lr=2.1361e-05, gnorm=0.906, clip=0, loss_scale=64, train_wall=40, gb_free=29.7, wall=81765 2023-05-02 01:16:33 - progress_bar.py[line:274] - INFO: epoch 004: 1898 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8144.3, nsentences=120, sample_size=4235.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2018.3, ups=0.25, wpb=8144.3, bsz=120, num_updates=19990, lr=2.13558e-05, gnorm=0.903, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=81806 2023-05-02 01:17:14 - progress_bar.py[line:274] - INFO: epoch 004: 1908 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7874.5, nsentences=120, sample_size=4231.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1947.8, ups=0.25, wpb=7874.5, bsz=120, num_updates=20000, lr=2.13505e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=28.1, wall=81846 2023-05-02 01:17:14 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 01:17:15 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 01:17:15 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:17:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 01:17:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:17:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 01:17:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:17:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 01:17:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:17:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:17:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:17:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:00 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 01:18:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:18:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:18:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:18:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:18:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:18:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 01:18:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 01:18:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 01:18:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 01:18:05 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.241 | loss_v1 0 | loss_v2 0 | nll_loss 2.077 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.22 | score 0.7432 | wps 3283.6 | wpb 3202.1 | bsz 39.4 | num_updates 20000 | best_score 0.751 2023-05-02 01:18:05 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 20000 updates 2023-05-02 01:18:05 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_20000.pt 2023-05-02 01:18:30 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_20000.pt 2023-05-02 01:18:43 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_20000.pt (epoch 4 @ 20000 updates, score 0.7432) (writing took 38.37041026703082 seconds) 2023-05-02 01:19:23 - progress_bar.py[line:274] - INFO: epoch 004: 1918 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7721.2, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=598.9, ups=0.08, wpb=7721.2, bsz=120, num_updates=20010, lr=2.13452e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=81975 2023-05-02 01:20:02 - progress_bar.py[line:274] - INFO: epoch 004: 1928 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7624.8, nsentences=120, sample_size=3966.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1938.2, ups=0.25, wpb=7624.8, bsz=120, num_updates=20020, lr=2.13399e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=82014 2023-05-02 01:20:42 - progress_bar.py[line:274] - INFO: epoch 004: 1938 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7541.9, nsentences=120, sample_size=4261.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1890.5, ups=0.25, wpb=7541.9, bsz=120, num_updates=20030, lr=2.13346e-05, gnorm=1.043, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=82054 2023-05-02 01:21:22 - progress_bar.py[line:274] - INFO: epoch 004: 1948 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7917.2, nsentences=120, sample_size=3937.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1955.6, ups=0.25, wpb=7917.2, bsz=120, num_updates=20040, lr=2.13293e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=82095 2023-05-02 01:22:02 - progress_bar.py[line:274] - INFO: epoch 004: 1958 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7695, nsentences=120, sample_size=3939.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1954, ups=0.25, wpb=7695, bsz=120, num_updates=20050, lr=2.13241e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=82134 2023-05-02 01:22:42 - progress_bar.py[line:274] - INFO: epoch 004: 1968 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7765.4, nsentences=120, sample_size=3873.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1938.2, ups=0.25, wpb=7765.4, bsz=120, num_updates=20060, lr=2.13188e-05, gnorm=0.937, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=82174 2023-05-02 01:23:22 - progress_bar.py[line:274] - INFO: epoch 004: 1978 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7755.6, nsentences=120, sample_size=4362.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1931.1, ups=0.25, wpb=7755.6, bsz=120, num_updates=20070, lr=2.13135e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=82214 2023-05-02 01:24:01 - progress_bar.py[line:274] - INFO: epoch 004: 1988 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7704.1, nsentences=120, sample_size=3822.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1948.5, ups=0.25, wpb=7704.1, bsz=120, num_updates=20080, lr=2.13082e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=82254 2023-05-02 01:24:41 - progress_bar.py[line:274] - INFO: epoch 004: 1998 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7639.8, nsentences=120, sample_size=4114.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1914.6, ups=0.25, wpb=7639.8, bsz=120, num_updates=20090, lr=2.13029e-05, gnorm=0.924, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=82294 2023-05-02 01:25:20 - progress_bar.py[line:274] - INFO: epoch 004: 2008 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7678.7, nsentences=120, sample_size=4276.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1971.1, ups=0.26, wpb=7678.7, bsz=120, num_updates=20100, lr=2.12976e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=82333 2023-05-02 01:26:00 - progress_bar.py[line:274] - INFO: epoch 004: 2018 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7686.3, nsentences=120, sample_size=4027.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1949.3, ups=0.25, wpb=7686.3, bsz=120, num_updates=20110, lr=2.12924e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=82372 2023-05-02 01:26:39 - progress_bar.py[line:274] - INFO: epoch 004: 2028 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7908.6, nsentences=120, sample_size=4145.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1992.5, ups=0.25, wpb=7908.6, bsz=120, num_updates=20120, lr=2.12871e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=82412 2023-05-02 01:27:19 - progress_bar.py[line:274] - INFO: epoch 004: 2038 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7923.8, nsentences=120, sample_size=4082.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2012.1, ups=0.25, wpb=7923.8, bsz=120, num_updates=20130, lr=2.12818e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=82451 2023-05-02 01:27:58 - progress_bar.py[line:274] - INFO: epoch 004: 2048 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7754.9, nsentences=120, sample_size=3803.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1983.1, ups=0.26, wpb=7754.9, bsz=120, num_updates=20140, lr=2.12765e-05, gnorm=0.967, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=82490 2023-05-02 01:28:37 - progress_bar.py[line:274] - INFO: epoch 004: 2058 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7440.3, nsentences=120, sample_size=4377.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1885.2, ups=0.25, wpb=7440.3, bsz=120, num_updates=20150, lr=2.12712e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=82530 2023-05-02 01:29:17 - progress_bar.py[line:274] - INFO: epoch 004: 2068 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7800.2, nsentences=120, sample_size=3898.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1965.8, ups=0.25, wpb=7800.2, bsz=120, num_updates=20160, lr=2.1266e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=82570 2023-05-02 01:29:57 - progress_bar.py[line:274] - INFO: epoch 004: 2078 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7691.4, nsentences=120, sample_size=3964.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1920.7, ups=0.25, wpb=7691.4, bsz=120, num_updates=20170, lr=2.12607e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=82610 2023-05-02 01:30:37 - progress_bar.py[line:274] - INFO: epoch 004: 2088 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7788.6, nsentences=120, sample_size=4176.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1934.1, ups=0.25, wpb=7788.6, bsz=120, num_updates=20180, lr=2.12554e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=82650 2023-05-02 01:31:17 - progress_bar.py[line:274] - INFO: epoch 004: 2098 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7751.5, nsentences=120, sample_size=4183.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1947.1, ups=0.25, wpb=7751.5, bsz=120, num_updates=20190, lr=2.12501e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=82690 2023-05-02 01:31:57 - progress_bar.py[line:274] - INFO: epoch 004: 2108 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7584.3, nsentences=120, sample_size=3962.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1900.1, ups=0.25, wpb=7584.3, bsz=120, num_updates=20200, lr=2.12448e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=82730 2023-05-02 01:32:36 - progress_bar.py[line:274] - INFO: epoch 004: 2118 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7793, nsentences=120, sample_size=4083.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1982.7, ups=0.25, wpb=7793, bsz=120, num_updates=20210, lr=2.12395e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=82769 2023-05-02 01:33:16 - progress_bar.py[line:274] - INFO: epoch 004: 2128 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7770.9, nsentences=120, sample_size=4105, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1963.8, ups=0.25, wpb=7770.9, bsz=120, num_updates=20220, lr=2.12343e-05, gnorm=0.934, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=82808 2023-05-02 01:33:56 - progress_bar.py[line:274] - INFO: epoch 004: 2138 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7864.5, nsentences=120, sample_size=3643.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1981.4, ups=0.25, wpb=7864.5, bsz=120, num_updates=20230, lr=2.1229e-05, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=82848 2023-05-02 01:34:35 - progress_bar.py[line:274] - INFO: epoch 004: 2148 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7813.8, nsentences=120, sample_size=3817.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1970.3, ups=0.25, wpb=7813.8, bsz=120, num_updates=20240, lr=2.12237e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=82888 2023-05-02 01:35:14 - progress_bar.py[line:274] - INFO: epoch 004: 2158 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7561.5, nsentences=120, sample_size=4041.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1946.1, ups=0.26, wpb=7561.5, bsz=120, num_updates=20250, lr=2.12184e-05, gnorm=0.939, clip=10, loss_scale=128, train_wall=39, gb_free=29.6, wall=82927 2023-05-02 01:35:42 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 01:35:58 - progress_bar.py[line:274] - INFO: epoch 004: 2169 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7826.8, nsentences=120, sample_size=3994.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1789.9, ups=0.23, wpb=7826.8, bsz=120, num_updates=20260, lr=2.12131e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=44, gb_free=30.3, wall=82970 2023-05-02 01:36:37 - progress_bar.py[line:274] - INFO: epoch 004: 2179 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7786.7, nsentences=120, sample_size=4165.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1973.5, ups=0.25, wpb=7786.7, bsz=120, num_updates=20270, lr=2.12079e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=83010 2023-05-02 01:37:18 - progress_bar.py[line:274] - INFO: epoch 004: 2189 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7702, nsentences=120, sample_size=4078.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1908.5, ups=0.25, wpb=7702, bsz=120, num_updates=20280, lr=2.12026e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=83050 2023-05-02 01:37:58 - progress_bar.py[line:274] - INFO: epoch 004: 2199 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7464.5, nsentences=120, sample_size=4160.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1867.7, ups=0.25, wpb=7464.5, bsz=120, num_updates=20290, lr=2.11973e-05, gnorm=0.952, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=83090 2023-05-02 01:38:38 - progress_bar.py[line:274] - INFO: epoch 004: 2209 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7704.3, nsentences=120, sample_size=4223.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1895, ups=0.25, wpb=7704.3, bsz=120, num_updates=20300, lr=2.1192e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=41, gb_free=29.2, wall=83131 2023-05-02 01:39:18 - progress_bar.py[line:274] - INFO: epoch 004: 2219 / 6042 loss=2.507, loss_v1=0, loss_v2=0, nll_loss=1.268, ntokens=7936.8, nsentences=120, sample_size=4071.3, sample_size_v1=0, sample_size_v2=0, ppl=2.41, wps=1982.6, ups=0.25, wpb=7936.8, bsz=120, num_updates=20310, lr=2.11867e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=83171 2023-05-02 01:39:58 - progress_bar.py[line:274] - INFO: epoch 004: 2229 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7840.3, nsentences=120, sample_size=3844.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1957.5, ups=0.25, wpb=7840.3, bsz=120, num_updates=20320, lr=2.11814e-05, gnorm=0.987, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=83211 2023-05-02 01:40:38 - progress_bar.py[line:274] - INFO: epoch 004: 2239 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7744.1, nsentences=120, sample_size=4065.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1944.7, ups=0.25, wpb=7744.1, bsz=120, num_updates=20330, lr=2.11762e-05, gnorm=0.972, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=83251 2023-05-02 01:41:18 - progress_bar.py[line:274] - INFO: epoch 004: 2249 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7467.6, nsentences=120, sample_size=4148.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1902.3, ups=0.25, wpb=7467.6, bsz=120, num_updates=20340, lr=2.11709e-05, gnorm=0.927, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=83290 2023-05-02 01:41:57 - progress_bar.py[line:274] - INFO: epoch 004: 2259 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7793.1, nsentences=120, sample_size=4072.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1960.9, ups=0.25, wpb=7793.1, bsz=120, num_updates=20350, lr=2.11656e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=83330 2023-05-02 01:42:37 - progress_bar.py[line:274] - INFO: epoch 004: 2269 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=7810.4, nsentences=120, sample_size=3924.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1958.8, ups=0.25, wpb=7810.4, bsz=120, num_updates=20360, lr=2.11603e-05, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=83370 2023-05-02 01:43:18 - progress_bar.py[line:274] - INFO: epoch 004: 2279 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7627.6, nsentences=120, sample_size=4294.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1887.6, ups=0.25, wpb=7627.6, bsz=120, num_updates=20370, lr=2.1155e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=83410 2023-05-02 01:43:57 - progress_bar.py[line:274] - INFO: epoch 004: 2289 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7772, nsentences=120, sample_size=4140, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1976.7, ups=0.25, wpb=7772, bsz=120, num_updates=20380, lr=2.11497e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=83449 2023-05-02 01:44:37 - progress_bar.py[line:274] - INFO: epoch 004: 2299 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7627.6, nsentences=120, sample_size=4041.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1914.8, ups=0.25, wpb=7627.6, bsz=120, num_updates=20390, lr=2.11445e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=83489 2023-05-02 01:45:17 - progress_bar.py[line:274] - INFO: epoch 004: 2309 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7949.3, nsentences=120, sample_size=4065.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1984.5, ups=0.25, wpb=7949.3, bsz=120, num_updates=20400, lr=2.11392e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=83529 2023-05-02 01:45:57 - progress_bar.py[line:274] - INFO: epoch 004: 2319 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7589.2, nsentences=120, sample_size=4078.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1885.6, ups=0.25, wpb=7589.2, bsz=120, num_updates=20410, lr=2.11339e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=83570 2023-05-02 01:46:37 - progress_bar.py[line:274] - INFO: epoch 004: 2329 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7627.3, nsentences=120, sample_size=3874.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1910.4, ups=0.25, wpb=7627.3, bsz=120, num_updates=20420, lr=2.11286e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=83609 2023-05-02 01:47:17 - progress_bar.py[line:274] - INFO: epoch 004: 2339 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7809.5, nsentences=120, sample_size=4079.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1950.3, ups=0.25, wpb=7809.5, bsz=120, num_updates=20430, lr=2.11233e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=83649 2023-05-02 01:47:57 - progress_bar.py[line:274] - INFO: epoch 004: 2349 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7750, nsentences=120, sample_size=3956.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1932.9, ups=0.25, wpb=7750, bsz=120, num_updates=20440, lr=2.11181e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=27.1, wall=83690 2023-05-02 01:48:38 - progress_bar.py[line:274] - INFO: epoch 004: 2359 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7923.3, nsentences=120, sample_size=4313.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1947.8, ups=0.25, wpb=7923.3, bsz=120, num_updates=20450, lr=2.11128e-05, gnorm=0.898, clip=0, loss_scale=64, train_wall=41, gb_free=29.5, wall=83730 2023-05-02 01:49:19 - progress_bar.py[line:274] - INFO: epoch 004: 2369 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7900.4, nsentences=120, sample_size=3834.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1939.1, ups=0.25, wpb=7900.4, bsz=120, num_updates=20460, lr=2.11075e-05, gnorm=1.011, clip=40, loss_scale=64, train_wall=41, gb_free=28.3, wall=83771 2023-05-02 01:49:58 - progress_bar.py[line:274] - INFO: epoch 004: 2379 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7752.7, nsentences=120, sample_size=4355.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1941, ups=0.25, wpb=7752.7, bsz=120, num_updates=20470, lr=2.11022e-05, gnorm=0.915, clip=20, loss_scale=64, train_wall=40, gb_free=28.4, wall=83811 2023-05-02 01:50:38 - progress_bar.py[line:274] - INFO: epoch 004: 2389 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7815, nsentences=120, sample_size=4182.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1963.3, ups=0.25, wpb=7815, bsz=120, num_updates=20480, lr=2.10969e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=83851 2023-05-02 01:51:18 - progress_bar.py[line:274] - INFO: epoch 004: 2399 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7666.3, nsentences=120, sample_size=3973.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1943.3, ups=0.25, wpb=7666.3, bsz=120, num_updates=20490, lr=2.10916e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=83890 2023-05-02 01:51:59 - progress_bar.py[line:274] - INFO: epoch 004: 2409 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7640.6, nsentences=120, sample_size=4090.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1874.1, ups=0.25, wpb=7640.6, bsz=120, num_updates=20500, lr=2.10864e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=41, gb_free=30.3, wall=83931 2023-05-02 01:52:39 - progress_bar.py[line:274] - INFO: epoch 004: 2419 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7628.8, nsentences=120, sample_size=4050.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1891.4, ups=0.25, wpb=7628.8, bsz=120, num_updates=20510, lr=2.10811e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=26, wall=83971 2023-05-02 01:53:19 - progress_bar.py[line:274] - INFO: epoch 004: 2429 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7692.8, nsentences=120, sample_size=4114.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1933.4, ups=0.25, wpb=7692.8, bsz=120, num_updates=20520, lr=2.10758e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=84011 2023-05-02 01:53:57 - progress_bar.py[line:274] - INFO: epoch 004: 2439 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7737.4, nsentences=120, sample_size=4232.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2001.4, ups=0.26, wpb=7737.4, bsz=120, num_updates=20530, lr=2.10705e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=39, gb_free=30.7, wall=84050 2023-05-02 01:54:37 - progress_bar.py[line:274] - INFO: epoch 004: 2449 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7980.5, nsentences=120, sample_size=4003.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2007.9, ups=0.25, wpb=7980.5, bsz=120, num_updates=20540, lr=2.10652e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=84090 2023-05-02 01:55:17 - progress_bar.py[line:274] - INFO: epoch 004: 2459 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7765.2, nsentences=120, sample_size=3913.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1946.8, ups=0.25, wpb=7765.2, bsz=120, num_updates=20550, lr=2.106e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=84129 2023-05-02 01:55:55 - progress_bar.py[line:274] - INFO: epoch 004: 2469 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7486.7, nsentences=120, sample_size=3979.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1953.1, ups=0.26, wpb=7486.7, bsz=120, num_updates=20560, lr=2.10547e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=38, gb_free=31, wall=84168 2023-05-02 01:56:35 - progress_bar.py[line:274] - INFO: epoch 004: 2479 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7766.5, nsentences=120, sample_size=4169.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1947.3, ups=0.25, wpb=7766.5, bsz=120, num_updates=20570, lr=2.10494e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=84208 2023-05-02 01:57:15 - progress_bar.py[line:274] - INFO: epoch 004: 2489 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7492.7, nsentences=120, sample_size=4229.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1896.4, ups=0.25, wpb=7492.7, bsz=120, num_updates=20580, lr=2.10441e-05, gnorm=0.939, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=84247 2023-05-02 01:57:54 - progress_bar.py[line:274] - INFO: epoch 004: 2499 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7560.3, nsentences=120, sample_size=4000.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1909.4, ups=0.25, wpb=7560.3, bsz=120, num_updates=20590, lr=2.10388e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=84287 2023-05-02 01:58:34 - progress_bar.py[line:274] - INFO: epoch 004: 2509 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7752.9, nsentences=120, sample_size=4225.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1974.8, ups=0.25, wpb=7752.9, bsz=120, num_updates=20600, lr=2.10335e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=39, gb_free=30.6, wall=84326 2023-05-02 01:59:14 - progress_bar.py[line:274] - INFO: epoch 004: 2519 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=8080.5, nsentences=120, sample_size=4042.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2003.8, ups=0.25, wpb=8080.5, bsz=120, num_updates=20610, lr=2.10283e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=84366 2023-05-02 01:59:54 - progress_bar.py[line:274] - INFO: epoch 004: 2529 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7778.2, nsentences=120, sample_size=4162.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1927.9, ups=0.25, wpb=7778.2, bsz=120, num_updates=20620, lr=2.1023e-05, gnorm=0.941, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=84407 2023-05-02 02:00:34 - progress_bar.py[line:274] - INFO: epoch 004: 2539 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7630.8, nsentences=120, sample_size=4163.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1918.7, ups=0.25, wpb=7630.8, bsz=120, num_updates=20630, lr=2.10177e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=84446 2023-05-02 02:01:15 - progress_bar.py[line:274] - INFO: epoch 004: 2549 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7714.4, nsentences=120, sample_size=3936, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1884.8, ups=0.24, wpb=7714.4, bsz=120, num_updates=20640, lr=2.10124e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=41, gb_free=29.1, wall=84487 2023-05-02 02:01:55 - progress_bar.py[line:274] - INFO: epoch 004: 2559 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7805, nsentences=120, sample_size=4150.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1968.7, ups=0.25, wpb=7805, bsz=120, num_updates=20650, lr=2.10071e-05, gnorm=0.926, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=84527 2023-05-02 02:02:35 - progress_bar.py[line:274] - INFO: epoch 004: 2569 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=8107, nsentences=120, sample_size=3618.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2015.2, ups=0.25, wpb=8107, bsz=120, num_updates=20660, lr=2.10018e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=84567 2023-05-02 02:03:15 - progress_bar.py[line:274] - INFO: epoch 004: 2579 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7590.8, nsentences=120, sample_size=3875.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1874.6, ups=0.25, wpb=7590.8, bsz=120, num_updates=20670, lr=2.09966e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=84608 2023-05-02 02:03:55 - progress_bar.py[line:274] - INFO: epoch 004: 2589 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7682.9, nsentences=120, sample_size=3992.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1913.3, ups=0.25, wpb=7682.9, bsz=120, num_updates=20680, lr=2.09913e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=84648 2023-05-02 02:04:35 - progress_bar.py[line:274] - INFO: epoch 004: 2599 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7843.8, nsentences=120, sample_size=4094.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1974.2, ups=0.25, wpb=7843.8, bsz=120, num_updates=20690, lr=2.0986e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=84688 2023-05-02 02:05:15 - progress_bar.py[line:274] - INFO: epoch 004: 2609 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7745.8, nsentences=120, sample_size=4041.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1946.7, ups=0.25, wpb=7745.8, bsz=120, num_updates=20700, lr=2.09807e-05, gnorm=0.926, clip=0, loss_scale=64, train_wall=40, gb_free=28.5, wall=84727 2023-05-02 02:05:55 - progress_bar.py[line:274] - INFO: epoch 004: 2619 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7799.9, nsentences=120, sample_size=3974.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1965.6, ups=0.25, wpb=7799.9, bsz=120, num_updates=20710, lr=2.09754e-05, gnorm=0.946, clip=0, loss_scale=64, train_wall=40, gb_free=29.1, wall=84767 2023-05-02 02:06:35 - progress_bar.py[line:274] - INFO: epoch 004: 2629 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7737.4, nsentences=120, sample_size=4067, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1903.7, ups=0.25, wpb=7737.4, bsz=120, num_updates=20720, lr=2.09702e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=41, gb_free=28.9, wall=84808 2023-05-02 02:07:16 - progress_bar.py[line:274] - INFO: epoch 004: 2639 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7729.7, nsentences=120, sample_size=4183.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1906.1, ups=0.25, wpb=7729.7, bsz=120, num_updates=20730, lr=2.09649e-05, gnorm=0.96, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=84848 2023-05-02 02:07:56 - progress_bar.py[line:274] - INFO: epoch 004: 2649 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7529.1, nsentences=120, sample_size=4122.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1881.6, ups=0.25, wpb=7529.1, bsz=120, num_updates=20740, lr=2.09596e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=84888 2023-05-02 02:08:36 - progress_bar.py[line:274] - INFO: epoch 004: 2659 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7648.1, nsentences=120, sample_size=4257.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1907.6, ups=0.25, wpb=7648.1, bsz=120, num_updates=20750, lr=2.09543e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=84928 2023-05-02 02:09:15 - progress_bar.py[line:274] - INFO: epoch 004: 2669 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7886.8, nsentences=120, sample_size=3690.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1999.4, ups=0.25, wpb=7886.8, bsz=120, num_updates=20760, lr=2.0949e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=84968 2023-05-02 02:09:55 - progress_bar.py[line:274] - INFO: epoch 004: 2679 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7556.9, nsentences=120, sample_size=4263, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1906.1, ups=0.25, wpb=7556.9, bsz=120, num_updates=20770, lr=2.09437e-05, gnorm=0.924, clip=0, loss_scale=128, train_wall=40, gb_free=30.7, wall=85008 2023-05-02 02:10:20 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 02:10:39 - progress_bar.py[line:274] - INFO: epoch 004: 2690 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7880.2, nsentences=120, sample_size=4075.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1782.7, ups=0.23, wpb=7880.2, bsz=120, num_updates=20780, lr=2.09385e-05, gnorm=1.001, clip=40, loss_scale=64, train_wall=44, gb_free=29.6, wall=85052 2023-05-02 02:11:19 - progress_bar.py[line:274] - INFO: epoch 004: 2700 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7656, nsentences=120, sample_size=4226.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1935.3, ups=0.25, wpb=7656, bsz=120, num_updates=20790, lr=2.09332e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=28.5, wall=85091 2023-05-02 02:11:59 - progress_bar.py[line:274] - INFO: epoch 004: 2710 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7946.9, nsentences=120, sample_size=3881.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1960.3, ups=0.25, wpb=7946.9, bsz=120, num_updates=20800, lr=2.09279e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=85132 2023-05-02 02:12:39 - progress_bar.py[line:274] - INFO: epoch 004: 2720 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7875.3, nsentences=120, sample_size=3978.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2003, ups=0.25, wpb=7875.3, bsz=120, num_updates=20810, lr=2.09226e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=85171 2023-05-02 02:13:19 - progress_bar.py[line:274] - INFO: epoch 004: 2730 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7771.6, nsentences=120, sample_size=4217.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1925.5, ups=0.25, wpb=7771.6, bsz=120, num_updates=20820, lr=2.09173e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=31.4, wall=85211 2023-05-02 02:13:59 - progress_bar.py[line:274] - INFO: epoch 004: 2740 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7696, nsentences=120, sample_size=4153.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1923.4, ups=0.25, wpb=7696, bsz=120, num_updates=20830, lr=2.09121e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=85252 2023-05-02 02:14:39 - progress_bar.py[line:274] - INFO: epoch 004: 2750 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7334.6, nsentences=120, sample_size=4346.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1836.1, ups=0.25, wpb=7334.6, bsz=120, num_updates=20840, lr=2.09068e-05, gnorm=0.914, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=85291 2023-05-02 02:15:19 - progress_bar.py[line:274] - INFO: epoch 004: 2760 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7645.9, nsentences=120, sample_size=4135.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1913.8, ups=0.25, wpb=7645.9, bsz=120, num_updates=20850, lr=2.09015e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=85331 2023-05-02 02:15:59 - progress_bar.py[line:274] - INFO: epoch 004: 2770 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7715, nsentences=120, sample_size=4200.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1946.9, ups=0.25, wpb=7715, bsz=120, num_updates=20860, lr=2.08962e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=31.2, wall=85371 2023-05-02 02:16:38 - progress_bar.py[line:274] - INFO: epoch 004: 2780 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7733.9, nsentences=120, sample_size=4047.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1938.1, ups=0.25, wpb=7733.9, bsz=120, num_updates=20870, lr=2.08909e-05, gnorm=0.938, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=85411 2023-05-02 02:17:18 - progress_bar.py[line:274] - INFO: epoch 004: 2790 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7545.8, nsentences=120, sample_size=3764.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1900.6, ups=0.25, wpb=7545.8, bsz=120, num_updates=20880, lr=2.08856e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=85451 2023-05-02 02:17:59 - progress_bar.py[line:274] - INFO: epoch 004: 2800 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7674.6, nsentences=120, sample_size=4186, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1874.5, ups=0.24, wpb=7674.6, bsz=120, num_updates=20890, lr=2.08804e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=41, gb_free=29.4, wall=85492 2023-05-02 02:18:39 - progress_bar.py[line:274] - INFO: epoch 004: 2810 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7638.3, nsentences=120, sample_size=3950.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1915.8, ups=0.25, wpb=7638.3, bsz=120, num_updates=20900, lr=2.08751e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=85531 2023-05-02 02:19:19 - progress_bar.py[line:274] - INFO: epoch 004: 2820 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7621, nsentences=120, sample_size=3718.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1902, ups=0.25, wpb=7621, bsz=120, num_updates=20910, lr=2.08698e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=85572 2023-05-02 02:19:59 - progress_bar.py[line:274] - INFO: epoch 004: 2830 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7812.3, nsentences=120, sample_size=4139.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1963.6, ups=0.25, wpb=7812.3, bsz=120, num_updates=20920, lr=2.08645e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=85611 2023-05-02 02:20:39 - progress_bar.py[line:274] - INFO: epoch 004: 2840 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7909, nsentences=120, sample_size=4278, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1960.4, ups=0.25, wpb=7909, bsz=120, num_updates=20930, lr=2.08592e-05, gnorm=0.912, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=85652 2023-05-02 02:21:18 - progress_bar.py[line:274] - INFO: epoch 004: 2850 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7803.7, nsentences=120, sample_size=4034.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1990.7, ups=0.26, wpb=7803.7, bsz=120, num_updates=20940, lr=2.08539e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=85691 2023-05-02 02:21:59 - progress_bar.py[line:274] - INFO: epoch 004: 2860 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7588.7, nsentences=120, sample_size=4089.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1887.4, ups=0.25, wpb=7588.7, bsz=120, num_updates=20950, lr=2.08487e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=85731 2023-05-02 02:22:38 - progress_bar.py[line:274] - INFO: epoch 004: 2870 / 6042 loss=2.479, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7718.7, nsentences=120, sample_size=3917, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1943.1, ups=0.25, wpb=7718.7, bsz=120, num_updates=20960, lr=2.08434e-05, gnorm=0.938, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=85771 2023-05-02 02:23:18 - progress_bar.py[line:274] - INFO: epoch 004: 2880 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7928.5, nsentences=120, sample_size=3684.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1996.9, ups=0.25, wpb=7928.5, bsz=120, num_updates=20970, lr=2.08381e-05, gnorm=0.988, clip=60, loss_scale=64, train_wall=40, gb_free=29.1, wall=85811 2023-05-02 02:23:58 - progress_bar.py[line:274] - INFO: epoch 004: 2890 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7418.1, nsentences=120, sample_size=4235.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1846.3, ups=0.25, wpb=7418.1, bsz=120, num_updates=20980, lr=2.08328e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=85851 2023-05-02 02:24:38 - progress_bar.py[line:274] - INFO: epoch 004: 2900 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7725.2, nsentences=120, sample_size=4065.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1927.3, ups=0.25, wpb=7725.2, bsz=120, num_updates=20990, lr=2.08275e-05, gnorm=0.958, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=85891 2023-05-02 02:25:17 - progress_bar.py[line:274] - INFO: epoch 004: 2910 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7612.2, nsentences=120, sample_size=3954.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1945.5, ups=0.26, wpb=7612.2, bsz=120, num_updates=21000, lr=2.08223e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=85930 2023-05-02 02:25:17 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 02:25:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 02:25:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:25:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 02:25:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:25:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:48 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 02:25:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:25:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:25:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:25:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:00 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 02:26:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:26:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:04 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 02:26:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:26:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:08 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 02:26:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 02:26:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 02:26:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 02:26:09 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.196 | loss_v1 0 | loss_v2 0 | nll_loss 2.031 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.09 | score 0.7456 | wps 3290 | wpb 3202.1 | bsz 39.4 | num_updates 21000 | best_score 0.751 2023-05-02 02:26:09 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 21000 updates 2023-05-02 02:26:09 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_21000.pt 2023-05-02 02:26:33 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_21000.pt 2023-05-02 02:26:47 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_21000.pt (epoch 4 @ 21000 updates, score 0.7456) (writing took 38.24780947109684 seconds) 2023-05-02 02:27:27 - progress_bar.py[line:274] - INFO: epoch 004: 2920 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7210.7, nsentences=120, sample_size=3909.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=557.9, ups=0.08, wpb=7210.7, bsz=120, num_updates=21010, lr=2.0817e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=86059 2023-05-02 02:28:06 - progress_bar.py[line:274] - INFO: epoch 004: 2930 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7545.4, nsentences=120, sample_size=3899, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1928.3, ups=0.26, wpb=7545.4, bsz=120, num_updates=21020, lr=2.08117e-05, gnorm=0.973, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=86098 2023-05-02 02:28:46 - progress_bar.py[line:274] - INFO: epoch 004: 2940 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7811.7, nsentences=120, sample_size=4320.6, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1924.5, ups=0.25, wpb=7811.7, bsz=120, num_updates=21030, lr=2.08064e-05, gnorm=0.904, clip=0, loss_scale=64, train_wall=41, gb_free=29.3, wall=86139 2023-05-02 02:29:27 - progress_bar.py[line:274] - INFO: epoch 004: 2950 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7867.8, nsentences=120, sample_size=3808.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1947.4, ups=0.25, wpb=7867.8, bsz=120, num_updates=21040, lr=2.08011e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=86179 2023-05-02 02:30:06 - progress_bar.py[line:274] - INFO: epoch 004: 2960 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7647.9, nsentences=120, sample_size=4264.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1930.3, ups=0.25, wpb=7647.9, bsz=120, num_updates=21050, lr=2.07958e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=86219 2023-05-02 02:30:46 - progress_bar.py[line:274] - INFO: epoch 004: 2970 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7540.1, nsentences=120, sample_size=4104.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1919.4, ups=0.25, wpb=7540.1, bsz=120, num_updates=21060, lr=2.07906e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=39, gb_free=28.8, wall=86258 2023-05-02 02:31:25 - progress_bar.py[line:274] - INFO: epoch 004: 2980 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7917.7, nsentences=120, sample_size=3828.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=2007.3, ups=0.25, wpb=7917.7, bsz=120, num_updates=21070, lr=2.07853e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=86298 2023-05-02 02:32:05 - progress_bar.py[line:274] - INFO: epoch 004: 2990 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7718.8, nsentences=120, sample_size=3862.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1913.7, ups=0.25, wpb=7718.8, bsz=120, num_updates=21080, lr=2.078e-05, gnorm=0.945, clip=0, loss_scale=64, train_wall=40, gb_free=29.8, wall=86338 2023-05-02 02:32:46 - progress_bar.py[line:274] - INFO: epoch 004: 3000 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7839.6, nsentences=120, sample_size=3908.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1920.8, ups=0.25, wpb=7839.6, bsz=120, num_updates=21090, lr=2.07747e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=41, gb_free=28.5, wall=86379 2023-05-02 02:33:26 - progress_bar.py[line:274] - INFO: epoch 004: 3010 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7889.7, nsentences=120, sample_size=3758.5, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1963.4, ups=0.25, wpb=7889.7, bsz=120, num_updates=21100, lr=2.07694e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=86419 2023-05-02 02:34:07 - progress_bar.py[line:274] - INFO: epoch 004: 3020 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7567.5, nsentences=120, sample_size=4316.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1875.5, ups=0.25, wpb=7567.5, bsz=120, num_updates=21110, lr=2.07642e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=86459 2023-05-02 02:34:47 - progress_bar.py[line:274] - INFO: epoch 004: 3030 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7761, nsentences=120, sample_size=4025, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1942.7, ups=0.25, wpb=7761, bsz=120, num_updates=21120, lr=2.07589e-05, gnorm=0.962, clip=50, loss_scale=64, train_wall=40, gb_free=28.9, wall=86499 2023-05-02 02:35:27 - progress_bar.py[line:274] - INFO: epoch 004: 3040 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=8081.3, nsentences=120, sample_size=3736.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2030.1, ups=0.25, wpb=8081.3, bsz=120, num_updates=21130, lr=2.07536e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=86539 2023-05-02 02:36:07 - progress_bar.py[line:274] - INFO: epoch 004: 3050 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7631.3, nsentences=120, sample_size=4362.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1911.1, ups=0.25, wpb=7631.3, bsz=120, num_updates=21140, lr=2.07483e-05, gnorm=0.911, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=86579 2023-05-02 02:36:46 - progress_bar.py[line:274] - INFO: epoch 004: 3060 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7876.6, nsentences=120, sample_size=4241.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1990.6, ups=0.25, wpb=7876.6, bsz=120, num_updates=21150, lr=2.0743e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=39, gb_free=29.1, wall=86619 2023-05-02 02:37:25 - progress_bar.py[line:274] - INFO: epoch 004: 3070 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7739.8, nsentences=120, sample_size=3912.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1976.3, ups=0.26, wpb=7739.8, bsz=120, num_updates=21160, lr=2.07377e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=86658 2023-05-02 02:38:05 - progress_bar.py[line:274] - INFO: epoch 004: 3080 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7623.8, nsentences=120, sample_size=3975.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1931.3, ups=0.25, wpb=7623.8, bsz=120, num_updates=21170, lr=2.07325e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=86697 2023-05-02 02:38:44 - progress_bar.py[line:274] - INFO: epoch 004: 3090 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7453.3, nsentences=120, sample_size=4103.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1880, ups=0.25, wpb=7453.3, bsz=120, num_updates=21180, lr=2.07272e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=86737 2023-05-02 02:39:24 - progress_bar.py[line:274] - INFO: epoch 004: 3100 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7660, nsentences=120, sample_size=3557.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1953.6, ups=0.26, wpb=7660, bsz=120, num_updates=21190, lr=2.07219e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=86776 2023-05-02 02:40:04 - progress_bar.py[line:274] - INFO: epoch 004: 3110 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7722.2, nsentences=120, sample_size=3920.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1917.1, ups=0.25, wpb=7722.2, bsz=120, num_updates=21200, lr=2.07166e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=86816 2023-05-02 02:40:45 - progress_bar.py[line:274] - INFO: epoch 004: 3120 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7828.3, nsentences=120, sample_size=3793, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1915.5, ups=0.24, wpb=7828.3, bsz=120, num_updates=21210, lr=2.07113e-05, gnorm=0.96, clip=40, loss_scale=64, train_wall=41, gb_free=28.9, wall=86857 2023-05-02 02:41:25 - progress_bar.py[line:274] - INFO: epoch 004: 3130 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7538.5, nsentences=120, sample_size=4222.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1890.9, ups=0.25, wpb=7538.5, bsz=120, num_updates=21220, lr=2.0706e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=86897 2023-05-02 02:42:04 - progress_bar.py[line:274] - INFO: epoch 004: 3140 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7685.8, nsentences=120, sample_size=3892.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1940.7, ups=0.25, wpb=7685.8, bsz=120, num_updates=21230, lr=2.07008e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=86937 2023-05-02 02:42:44 - progress_bar.py[line:274] - INFO: epoch 004: 3150 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7633.5, nsentences=120, sample_size=4067.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1929.6, ups=0.25, wpb=7633.5, bsz=120, num_updates=21240, lr=2.06955e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=86976 2023-05-02 02:43:24 - progress_bar.py[line:274] - INFO: epoch 004: 3160 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7748.5, nsentences=120, sample_size=4290.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1927.6, ups=0.25, wpb=7748.5, bsz=120, num_updates=21250, lr=2.06902e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=87016 2023-05-02 02:44:03 - progress_bar.py[line:274] - INFO: epoch 004: 3170 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7607.2, nsentences=120, sample_size=4229.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1929.7, ups=0.25, wpb=7607.2, bsz=120, num_updates=21260, lr=2.06849e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=87056 2023-05-02 02:44:44 - progress_bar.py[line:274] - INFO: epoch 004: 3180 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7680.9, nsentences=120, sample_size=3713.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1915.9, ups=0.25, wpb=7680.9, bsz=120, num_updates=21270, lr=2.06796e-05, gnorm=0.978, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=87096 2023-05-02 02:45:23 - progress_bar.py[line:274] - INFO: epoch 004: 3190 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7662.4, nsentences=120, sample_size=3853.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1945.5, ups=0.25, wpb=7662.4, bsz=120, num_updates=21280, lr=2.06744e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=39, gb_free=30.7, wall=87135 2023-05-02 02:46:03 - progress_bar.py[line:274] - INFO: epoch 004: 3200 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7625.4, nsentences=120, sample_size=4038, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1911.6, ups=0.25, wpb=7625.4, bsz=120, num_updates=21290, lr=2.06691e-05, gnorm=0.936, clip=30, loss_scale=128, train_wall=40, gb_free=30.3, wall=87175 2023-05-02 02:46:43 - progress_bar.py[line:274] - INFO: epoch 004: 3210 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7744.9, nsentences=120, sample_size=4124.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1931.2, ups=0.25, wpb=7744.9, bsz=120, num_updates=21300, lr=2.06638e-05, gnorm=0.949, clip=30, loss_scale=128, train_wall=40, gb_free=30.1, wall=87215 2023-05-02 02:47:23 - progress_bar.py[line:274] - INFO: epoch 004: 3220 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7761.5, nsentences=120, sample_size=4319.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1926.7, ups=0.25, wpb=7761.5, bsz=120, num_updates=21310, lr=2.06585e-05, gnorm=0.931, clip=10, loss_scale=128, train_wall=40, gb_free=30.8, wall=87256 2023-05-02 02:47:35 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 02:48:07 - progress_bar.py[line:274] - INFO: epoch 004: 3231 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7662.5, nsentences=120, sample_size=4139.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1743.4, ups=0.23, wpb=7662.5, bsz=120, num_updates=21320, lr=2.06532e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=44, gb_free=29.5, wall=87300 2023-05-02 02:48:47 - progress_bar.py[line:274] - INFO: epoch 004: 3241 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7905.1, nsentences=120, sample_size=3980.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1965.3, ups=0.25, wpb=7905.1, bsz=120, num_updates=21330, lr=2.06479e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=87340 2023-05-02 02:49:27 - progress_bar.py[line:274] - INFO: epoch 004: 3251 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7694.6, nsentences=120, sample_size=4032.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1944.8, ups=0.25, wpb=7694.6, bsz=120, num_updates=21340, lr=2.06427e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=87379 2023-05-02 02:50:07 - progress_bar.py[line:274] - INFO: epoch 004: 3261 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7662.5, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1927.5, ups=0.25, wpb=7662.5, bsz=120, num_updates=21350, lr=2.06374e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=87419 2023-05-02 02:50:47 - progress_bar.py[line:274] - INFO: epoch 004: 3271 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7747.5, nsentences=120, sample_size=3860.3, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1922.1, ups=0.25, wpb=7747.5, bsz=120, num_updates=21360, lr=2.06321e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=87459 2023-05-02 02:51:27 - progress_bar.py[line:274] - INFO: epoch 004: 3281 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7790.3, nsentences=120, sample_size=4230.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1961.4, ups=0.25, wpb=7790.3, bsz=120, num_updates=21370, lr=2.06268e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=87499 2023-05-02 02:52:07 - progress_bar.py[line:274] - INFO: epoch 004: 3291 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7638.4, nsentences=120, sample_size=4199.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1876.6, ups=0.25, wpb=7638.4, bsz=120, num_updates=21380, lr=2.06215e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=41, gb_free=30.4, wall=87540 2023-05-02 02:52:48 - progress_bar.py[line:274] - INFO: epoch 004: 3301 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7589.7, nsentences=120, sample_size=4265.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1874.4, ups=0.25, wpb=7589.7, bsz=120, num_updates=21390, lr=2.06163e-05, gnorm=0.9, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=87580 2023-05-02 02:53:27 - progress_bar.py[line:274] - INFO: epoch 004: 3311 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7805.8, nsentences=120, sample_size=3968.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1986.3, ups=0.25, wpb=7805.8, bsz=120, num_updates=21400, lr=2.0611e-05, gnorm=0.938, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=87620 2023-05-02 02:54:07 - progress_bar.py[line:274] - INFO: epoch 004: 3321 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7651.1, nsentences=120, sample_size=3864.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1938.5, ups=0.25, wpb=7651.1, bsz=120, num_updates=21410, lr=2.06057e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=39, gb_free=30.7, wall=87659 2023-05-02 02:54:46 - progress_bar.py[line:274] - INFO: epoch 004: 3331 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7682.1, nsentences=120, sample_size=3975.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1957.2, ups=0.25, wpb=7682.1, bsz=120, num_updates=21420, lr=2.06004e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=39, gb_free=31.3, wall=87698 2023-05-02 02:55:26 - progress_bar.py[line:274] - INFO: epoch 004: 3341 / 6042 loss=2.484, loss_v1=0, loss_v2=0, nll_loss=1.245, ntokens=7989.6, nsentences=120, sample_size=4164.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2015.1, ups=0.25, wpb=7989.6, bsz=120, num_updates=21430, lr=2.05951e-05, gnorm=0.894, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=87738 2023-05-02 02:56:05 - progress_bar.py[line:274] - INFO: epoch 004: 3351 / 6042 loss=2.47, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7735.1, nsentences=120, sample_size=4259.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1944.8, ups=0.25, wpb=7735.1, bsz=120, num_updates=21440, lr=2.05898e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=87778 2023-05-02 02:56:46 - progress_bar.py[line:274] - INFO: epoch 004: 3361 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7762.5, nsentences=120, sample_size=4173.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1928.2, ups=0.25, wpb=7762.5, bsz=120, num_updates=21450, lr=2.05846e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=29, wall=87818 2023-05-02 02:57:25 - progress_bar.py[line:274] - INFO: epoch 004: 3371 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7496.8, nsentences=120, sample_size=3855.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1894.7, ups=0.25, wpb=7496.8, bsz=120, num_updates=21460, lr=2.05793e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=28.6, wall=87858 2023-05-02 02:58:05 - progress_bar.py[line:274] - INFO: epoch 004: 3381 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7654.6, nsentences=120, sample_size=3955.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1924.9, ups=0.25, wpb=7654.6, bsz=120, num_updates=21470, lr=2.0574e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=87897 2023-05-02 02:58:45 - progress_bar.py[line:274] - INFO: epoch 004: 3391 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7785.7, nsentences=120, sample_size=3938.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1940.8, ups=0.25, wpb=7785.7, bsz=120, num_updates=21480, lr=2.05687e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=87938 2023-05-02 02:59:25 - progress_bar.py[line:274] - INFO: epoch 004: 3401 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7814.6, nsentences=120, sample_size=3830, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1948.9, ups=0.25, wpb=7814.6, bsz=120, num_updates=21490, lr=2.05634e-05, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=87978 2023-05-02 03:00:05 - progress_bar.py[line:274] - INFO: epoch 004: 3411 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7682, nsentences=120, sample_size=3965.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1917.3, ups=0.25, wpb=7682, bsz=120, num_updates=21500, lr=2.05581e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=88018 2023-05-02 03:00:45 - progress_bar.py[line:274] - INFO: epoch 004: 3421 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.26, ntokens=7874.5, nsentences=120, sample_size=4104.5, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=1971.9, ups=0.25, wpb=7874.5, bsz=120, num_updates=21510, lr=2.05529e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=31.3, wall=88058 2023-05-02 03:01:24 - progress_bar.py[line:274] - INFO: epoch 004: 3431 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7501.6, nsentences=120, sample_size=3971.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1925.1, ups=0.26, wpb=7501.6, bsz=120, num_updates=21520, lr=2.05476e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=28.4, wall=88097 2023-05-02 03:02:04 - progress_bar.py[line:274] - INFO: epoch 004: 3441 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7787.5, nsentences=120, sample_size=3912.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1941, ups=0.25, wpb=7787.5, bsz=120, num_updates=21530, lr=2.05423e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=88137 2023-05-02 03:02:44 - progress_bar.py[line:274] - INFO: epoch 004: 3451 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7598.3, nsentences=120, sample_size=3661, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1912, ups=0.25, wpb=7598.3, bsz=120, num_updates=21540, lr=2.0537e-05, gnorm=1.012, clip=70, loss_scale=64, train_wall=40, gb_free=30.3, wall=88176 2023-05-02 03:03:24 - progress_bar.py[line:274] - INFO: epoch 004: 3461 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7820, nsentences=120, sample_size=4185.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1958, ups=0.25, wpb=7820, bsz=120, num_updates=21550, lr=2.05317e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=88216 2023-05-02 03:04:05 - progress_bar.py[line:274] - INFO: epoch 004: 3471 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7793.1, nsentences=120, sample_size=4202.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1907, ups=0.24, wpb=7793.1, bsz=120, num_updates=21560, lr=2.05265e-05, gnorm=0.916, clip=20, loss_scale=64, train_wall=41, gb_free=31, wall=88257 2023-05-02 03:04:44 - progress_bar.py[line:274] - INFO: epoch 004: 3481 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7784.6, nsentences=120, sample_size=4070.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1967.7, ups=0.25, wpb=7784.6, bsz=120, num_updates=21570, lr=2.05212e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=88297 2023-05-02 03:05:24 - progress_bar.py[line:274] - INFO: epoch 004: 3491 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7778.5, nsentences=120, sample_size=4160.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1981.9, ups=0.25, wpb=7778.5, bsz=120, num_updates=21580, lr=2.05159e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=39, gb_free=27.4, wall=88336 2023-05-02 03:06:02 - progress_bar.py[line:274] - INFO: epoch 004: 3501 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7596.1, nsentences=120, sample_size=4078.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969, ups=0.26, wpb=7596.1, bsz=120, num_updates=21590, lr=2.05106e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=88375 2023-05-02 03:06:43 - progress_bar.py[line:274] - INFO: epoch 004: 3511 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7787.1, nsentences=120, sample_size=3913.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1916.2, ups=0.25, wpb=7787.1, bsz=120, num_updates=21600, lr=2.05053e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=41, gb_free=29.7, wall=88415 2023-05-02 03:07:23 - progress_bar.py[line:274] - INFO: epoch 004: 3521 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7553.4, nsentences=120, sample_size=4090.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1883.2, ups=0.25, wpb=7553.4, bsz=120, num_updates=21610, lr=2.05e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=88455 2023-05-02 03:08:03 - progress_bar.py[line:274] - INFO: epoch 004: 3531 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7598.8, nsentences=120, sample_size=4111.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1901.7, ups=0.25, wpb=7598.8, bsz=120, num_updates=21620, lr=2.04948e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=88495 2023-05-02 03:08:43 - progress_bar.py[line:274] - INFO: epoch 004: 3541 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7855.7, nsentences=120, sample_size=3851.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1963.8, ups=0.25, wpb=7855.7, bsz=120, num_updates=21630, lr=2.04895e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=88535 2023-05-02 03:09:22 - progress_bar.py[line:274] - INFO: epoch 004: 3551 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7867.4, nsentences=120, sample_size=4216.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1994.2, ups=0.25, wpb=7867.4, bsz=120, num_updates=21640, lr=2.04842e-05, gnorm=0.908, clip=0, loss_scale=64, train_wall=39, gb_free=29.6, wall=88575 2023-05-02 03:10:02 - progress_bar.py[line:274] - INFO: epoch 004: 3561 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7282, nsentences=120, sample_size=4368.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1834.9, ups=0.25, wpb=7282, bsz=120, num_updates=21650, lr=2.04789e-05, gnorm=0.915, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=88615 2023-05-02 03:10:41 - progress_bar.py[line:274] - INFO: epoch 004: 3571 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7715.9, nsentences=120, sample_size=4017.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1992.2, ups=0.26, wpb=7715.9, bsz=120, num_updates=21660, lr=2.04736e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=28.7, wall=88653 2023-05-02 03:11:20 - progress_bar.py[line:274] - INFO: epoch 004: 3581 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7794.5, nsentences=120, sample_size=4021.3, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1981.5, ups=0.25, wpb=7794.5, bsz=120, num_updates=21670, lr=2.04684e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30.8, wall=88693 2023-05-02 03:12:01 - progress_bar.py[line:274] - INFO: epoch 004: 3591 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7639.2, nsentences=120, sample_size=4534.8, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1870.2, ups=0.24, wpb=7639.2, bsz=120, num_updates=21680, lr=2.04631e-05, gnorm=0.906, clip=10, loss_scale=64, train_wall=41, gb_free=30.5, wall=88733 2023-05-02 03:12:41 - progress_bar.py[line:274] - INFO: epoch 004: 3601 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7618.1, nsentences=120, sample_size=4398.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1926, ups=0.25, wpb=7618.1, bsz=120, num_updates=21690, lr=2.04578e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=39, gb_free=28.4, wall=88773 2023-05-02 03:13:20 - progress_bar.py[line:274] - INFO: epoch 004: 3611 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7996, nsentences=120, sample_size=3986.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2017, ups=0.25, wpb=7996, bsz=120, num_updates=21700, lr=2.04525e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=88813 2023-05-02 03:14:00 - progress_bar.py[line:274] - INFO: epoch 004: 3621 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7767.7, nsentences=120, sample_size=4206.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1962.3, ups=0.25, wpb=7767.7, bsz=120, num_updates=21710, lr=2.04472e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=88852 2023-05-02 03:14:40 - progress_bar.py[line:274] - INFO: epoch 004: 3631 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7423.1, nsentences=120, sample_size=3903.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1850.7, ups=0.25, wpb=7423.1, bsz=120, num_updates=21720, lr=2.04419e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=88892 2023-05-02 03:15:20 - progress_bar.py[line:274] - INFO: epoch 004: 3641 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7936.3, nsentences=120, sample_size=4203.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1992.6, ups=0.25, wpb=7936.3, bsz=120, num_updates=21730, lr=2.04367e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=40, gb_free=29.7, wall=88932 2023-05-02 03:16:00 - progress_bar.py[line:274] - INFO: epoch 004: 3651 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7499.7, nsentences=120, sample_size=4240.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1855.8, ups=0.25, wpb=7499.7, bsz=120, num_updates=21740, lr=2.04314e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=88973 2023-05-02 03:16:40 - progress_bar.py[line:274] - INFO: epoch 004: 3661 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7951.5, nsentences=120, sample_size=4057.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2016.2, ups=0.25, wpb=7951.5, bsz=120, num_updates=21750, lr=2.04261e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=89012 2023-05-02 03:17:19 - progress_bar.py[line:274] - INFO: epoch 004: 3671 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7572.9, nsentences=120, sample_size=3943.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1929.6, ups=0.25, wpb=7572.9, bsz=120, num_updates=21760, lr=2.04208e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=89051 2023-05-02 03:17:59 - progress_bar.py[line:274] - INFO: epoch 004: 3681 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=7497, nsentences=120, sample_size=4099.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1884.8, ups=0.25, wpb=7497, bsz=120, num_updates=21770, lr=2.04155e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=89091 2023-05-02 03:18:38 - progress_bar.py[line:274] - INFO: epoch 004: 3691 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7601.5, nsentences=120, sample_size=3702.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1926.8, ups=0.25, wpb=7601.5, bsz=120, num_updates=21780, lr=2.04102e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=89130 2023-05-02 03:19:18 - progress_bar.py[line:274] - INFO: epoch 004: 3701 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7718.7, nsentences=120, sample_size=4314.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1948.5, ups=0.25, wpb=7718.7, bsz=120, num_updates=21790, lr=2.0405e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=89170 2023-05-02 03:19:57 - progress_bar.py[line:274] - INFO: epoch 004: 3711 / 6042 loss=2.496, loss_v1=0, loss_v2=0, nll_loss=1.261, ntokens=8003.9, nsentences=120, sample_size=4017.7, sample_size_v1=0, sample_size_v2=0, ppl=2.4, wps=2038.9, ups=0.25, wpb=8003.9, bsz=120, num_updates=21800, lr=2.03997e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=89209 2023-05-02 03:20:37 - progress_bar.py[line:274] - INFO: epoch 004: 3721 / 6042 loss=2.486, loss_v1=0, loss_v2=0, nll_loss=1.24, ntokens=8058.3, nsentences=120, sample_size=3968.2, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1998.9, ups=0.25, wpb=8058.3, bsz=120, num_updates=21810, lr=2.03944e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=89250 2023-05-02 03:21:17 - progress_bar.py[line:274] - INFO: epoch 004: 3731 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7651.5, nsentences=120, sample_size=4211.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1915.7, ups=0.25, wpb=7651.5, bsz=120, num_updates=21820, lr=2.03891e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=89290 2023-05-02 03:21:57 - progress_bar.py[line:274] - INFO: epoch 004: 3741 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7745.2, nsentences=120, sample_size=3873.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1923.3, ups=0.25, wpb=7745.2, bsz=120, num_updates=21830, lr=2.03838e-05, gnorm=0.957, clip=30, loss_scale=128, train_wall=40, gb_free=27.2, wall=89330 2023-05-02 03:22:13 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 03:22:42 - progress_bar.py[line:274] - INFO: epoch 004: 3752 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7747.5, nsentences=120, sample_size=3801.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1752.5, ups=0.23, wpb=7747.5, bsz=120, num_updates=21840, lr=2.03786e-05, gnorm=0.954, clip=40, loss_scale=64, train_wall=44, gb_free=30.3, wall=89374 2023-05-02 03:23:22 - progress_bar.py[line:274] - INFO: epoch 004: 3762 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7679, nsentences=120, sample_size=4362.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1909.3, ups=0.25, wpb=7679, bsz=120, num_updates=21850, lr=2.03733e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=25.6, wall=89414 2023-05-02 03:24:01 - progress_bar.py[line:274] - INFO: epoch 004: 3772 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7571.2, nsentences=120, sample_size=4103, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1922.2, ups=0.25, wpb=7571.2, bsz=120, num_updates=21860, lr=2.0368e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=89454 2023-05-02 03:24:42 - progress_bar.py[line:274] - INFO: epoch 004: 3782 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7703.4, nsentences=120, sample_size=3911.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1888.2, ups=0.25, wpb=7703.4, bsz=120, num_updates=21870, lr=2.03627e-05, gnorm=0.977, clip=40, loss_scale=64, train_wall=41, gb_free=30.9, wall=89494 2023-05-02 03:25:21 - progress_bar.py[line:274] - INFO: epoch 004: 3792 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7600.9, nsentences=120, sample_size=4254.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1944.3, ups=0.26, wpb=7600.9, bsz=120, num_updates=21880, lr=2.03574e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=39, gb_free=28.7, wall=89534 2023-05-02 03:26:00 - progress_bar.py[line:274] - INFO: epoch 004: 3802 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7588, nsentences=120, sample_size=3894.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1927.8, ups=0.25, wpb=7588, bsz=120, num_updates=21890, lr=2.03521e-05, gnorm=0.967, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=89573 2023-05-02 03:26:41 - progress_bar.py[line:274] - INFO: epoch 004: 3812 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7845.8, nsentences=120, sample_size=4385.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1948.9, ups=0.25, wpb=7845.8, bsz=120, num_updates=21900, lr=2.03469e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=89613 2023-05-02 03:27:21 - progress_bar.py[line:274] - INFO: epoch 004: 3822 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7629.1, nsentences=120, sample_size=3766.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1898.4, ups=0.25, wpb=7629.1, bsz=120, num_updates=21910, lr=2.03416e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=89653 2023-05-02 03:28:00 - progress_bar.py[line:274] - INFO: epoch 004: 3832 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7598.9, nsentences=120, sample_size=3994.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1925.8, ups=0.25, wpb=7598.9, bsz=120, num_updates=21920, lr=2.03363e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=89693 2023-05-02 03:28:40 - progress_bar.py[line:274] - INFO: epoch 004: 3842 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7545.1, nsentences=120, sample_size=4256.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1903.5, ups=0.25, wpb=7545.1, bsz=120, num_updates=21930, lr=2.0331e-05, gnorm=0.92, clip=10, loss_scale=64, train_wall=40, gb_free=31.2, wall=89733 2023-05-02 03:29:20 - progress_bar.py[line:274] - INFO: epoch 004: 3852 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7693, nsentences=120, sample_size=3698, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1905.3, ups=0.25, wpb=7693, bsz=120, num_updates=21940, lr=2.03257e-05, gnorm=1.007, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=89773 2023-05-02 03:30:00 - progress_bar.py[line:274] - INFO: epoch 004: 3862 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7619.7, nsentences=120, sample_size=4346.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1905.3, ups=0.25, wpb=7619.7, bsz=120, num_updates=21950, lr=2.03205e-05, gnorm=0.901, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=89813 2023-05-02 03:30:40 - progress_bar.py[line:274] - INFO: epoch 004: 3872 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=7618.5, nsentences=120, sample_size=4282.9, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1920, ups=0.25, wpb=7618.5, bsz=120, num_updates=21960, lr=2.03152e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=40, gb_free=31, wall=89853 2023-05-02 03:31:20 - progress_bar.py[line:274] - INFO: epoch 004: 3882 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7863.8, nsentences=120, sample_size=4134.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1980, ups=0.25, wpb=7863.8, bsz=120, num_updates=21970, lr=2.03099e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=31.8, wall=89892 2023-05-02 03:32:00 - progress_bar.py[line:274] - INFO: epoch 004: 3892 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7549, nsentences=120, sample_size=4007.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1901.8, ups=0.25, wpb=7549, bsz=120, num_updates=21980, lr=2.03046e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=31.2, wall=89932 2023-05-02 03:32:39 - progress_bar.py[line:274] - INFO: epoch 004: 3902 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7992.5, nsentences=120, sample_size=4150.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2048.1, ups=0.26, wpb=7992.5, bsz=120, num_updates=21990, lr=2.02993e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=89971 2023-05-02 03:33:19 - progress_bar.py[line:274] - INFO: epoch 004: 3912 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7600.2, nsentences=120, sample_size=4142.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1886.5, ups=0.25, wpb=7600.2, bsz=120, num_updates=22000, lr=2.0294e-05, gnorm=0.938, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=90011 2023-05-02 03:33:19 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 03:33:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 03:33:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:33:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:37 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 03:33:37 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:33:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:49 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 03:33:49 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:33:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:33:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:33:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:00 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 03:34:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:34:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:05 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 03:34:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:34:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 03:34:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 03:34:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 03:34:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 03:34:10 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.203 | loss_v1 0 | loss_v2 0 | nll_loss 2.038 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.11 | score 0.7529 | wps 3302.4 | wpb 3202.1 | bsz 39.4 | num_updates 22000 | best_score 0.7529 2023-05-02 03:34:10 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 22000 updates 2023-05-02 03:34:10 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_22000.pt 2023-05-02 03:34:32 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_22000.pt 2023-05-02 03:35:13 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_22000.pt (epoch 4 @ 22000 updates, score 0.7529) (writing took 63.09673851588741 seconds) 2023-05-02 03:35:52 - progress_bar.py[line:274] - INFO: epoch 004: 3922 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7750.5, nsentences=120, sample_size=3967.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=505.8, ups=0.07, wpb=7750.5, bsz=120, num_updates=22010, lr=2.02888e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=90165 2023-05-02 03:36:32 - progress_bar.py[line:274] - INFO: epoch 004: 3932 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7569.4, nsentences=120, sample_size=4010.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1916.8, ups=0.25, wpb=7569.4, bsz=120, num_updates=22020, lr=2.02835e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=90204 2023-05-02 03:37:11 - progress_bar.py[line:274] - INFO: epoch 004: 3942 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7694.6, nsentences=120, sample_size=4192.7, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1927, ups=0.25, wpb=7694.6, bsz=120, num_updates=22030, lr=2.02782e-05, gnorm=0.954, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=90244 2023-05-02 03:37:51 - progress_bar.py[line:274] - INFO: epoch 004: 3952 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7416.5, nsentences=120, sample_size=4372.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1877.1, ups=0.25, wpb=7416.5, bsz=120, num_updates=22040, lr=2.02729e-05, gnorm=0.936, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=90283 2023-05-02 03:38:30 - progress_bar.py[line:274] - INFO: epoch 004: 3962 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7466.3, nsentences=120, sample_size=4137.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1914.8, ups=0.26, wpb=7466.3, bsz=120, num_updates=22050, lr=2.02676e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=90322 2023-05-02 03:39:09 - progress_bar.py[line:274] - INFO: epoch 004: 3972 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7348.1, nsentences=120, sample_size=3809.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1866, ups=0.25, wpb=7348.1, bsz=120, num_updates=22060, lr=2.02623e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=27.1, wall=90362 2023-05-02 03:39:50 - progress_bar.py[line:274] - INFO: epoch 004: 3982 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7762.1, nsentences=120, sample_size=3922.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1914.9, ups=0.25, wpb=7762.1, bsz=120, num_updates=22070, lr=2.02571e-05, gnorm=0.958, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=90402 2023-05-02 03:40:30 - progress_bar.py[line:274] - INFO: epoch 004: 3992 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7863.7, nsentences=120, sample_size=4278, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1945.8, ups=0.25, wpb=7863.7, bsz=120, num_updates=22080, lr=2.02518e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=90443 2023-05-02 03:41:10 - progress_bar.py[line:274] - INFO: epoch 004: 4002 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7442.8, nsentences=120, sample_size=4417.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1896.5, ups=0.25, wpb=7442.8, bsz=120, num_updates=22090, lr=2.02465e-05, gnorm=0.9, clip=0, loss_scale=64, train_wall=39, gb_free=29.3, wall=90482 2023-05-02 03:41:49 - progress_bar.py[line:274] - INFO: epoch 004: 4012 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7850.5, nsentences=120, sample_size=3694.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1983.8, ups=0.25, wpb=7850.5, bsz=120, num_updates=22100, lr=2.02412e-05, gnorm=1.004, clip=50, loss_scale=64, train_wall=39, gb_free=28.9, wall=90522 2023-05-02 03:42:30 - progress_bar.py[line:274] - INFO: epoch 004: 4022 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7739.3, nsentences=120, sample_size=4151.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1916.4, ups=0.25, wpb=7739.3, bsz=120, num_updates=22110, lr=2.02359e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=28.7, wall=90562 2023-05-02 03:43:09 - progress_bar.py[line:274] - INFO: epoch 004: 4032 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7634.7, nsentences=120, sample_size=4170.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1943.7, ups=0.25, wpb=7634.7, bsz=120, num_updates=22120, lr=2.02307e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=90601 2023-05-02 03:43:49 - progress_bar.py[line:274] - INFO: epoch 004: 4042 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7811.3, nsentences=120, sample_size=3909, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1937.5, ups=0.25, wpb=7811.3, bsz=120, num_updates=22130, lr=2.02254e-05, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=90642 2023-05-02 03:44:29 - progress_bar.py[line:274] - INFO: epoch 004: 4052 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7592.9, nsentences=120, sample_size=3884.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1892.5, ups=0.25, wpb=7592.9, bsz=120, num_updates=22140, lr=2.02201e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=90682 2023-05-02 03:45:09 - progress_bar.py[line:274] - INFO: epoch 004: 4062 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7329.3, nsentences=120, sample_size=3890.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1852.2, ups=0.25, wpb=7329.3, bsz=120, num_updates=22150, lr=2.02148e-05, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=90721 2023-05-02 03:45:49 - progress_bar.py[line:274] - INFO: epoch 004: 4072 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7958.9, nsentences=120, sample_size=3819.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2004.6, ups=0.25, wpb=7958.9, bsz=120, num_updates=22160, lr=2.02095e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=90761 2023-05-02 03:46:28 - progress_bar.py[line:274] - INFO: epoch 004: 4082 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7619.6, nsentences=120, sample_size=4285.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1940.7, ups=0.25, wpb=7619.6, bsz=120, num_updates=22170, lr=2.02042e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=39, gb_free=28.1, wall=90800 2023-05-02 03:47:08 - progress_bar.py[line:274] - INFO: epoch 004: 4092 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7923, nsentences=120, sample_size=3783.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1972.8, ups=0.25, wpb=7923, bsz=120, num_updates=22180, lr=2.0199e-05, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=90840 2023-05-02 03:47:48 - progress_bar.py[line:274] - INFO: epoch 004: 4102 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7848, nsentences=120, sample_size=3889.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1968.5, ups=0.25, wpb=7848, bsz=120, num_updates=22190, lr=2.01937e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=90880 2023-05-02 03:48:27 - progress_bar.py[line:274] - INFO: epoch 004: 4112 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7667.9, nsentences=120, sample_size=3973.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1935.8, ups=0.25, wpb=7667.9, bsz=120, num_updates=22200, lr=2.01884e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=90920 2023-05-02 03:49:08 - progress_bar.py[line:274] - INFO: epoch 004: 4122 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7552.5, nsentences=120, sample_size=4051.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1880, ups=0.25, wpb=7552.5, bsz=120, num_updates=22210, lr=2.01831e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=90960 2023-05-02 03:49:48 - progress_bar.py[line:274] - INFO: epoch 004: 4132 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7892.6, nsentences=120, sample_size=4297, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1958.7, ups=0.25, wpb=7892.6, bsz=120, num_updates=22220, lr=2.01778e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=91000 2023-05-02 03:50:28 - progress_bar.py[line:274] - INFO: epoch 004: 4142 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=8000.1, nsentences=120, sample_size=3943.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1981.6, ups=0.25, wpb=8000.1, bsz=120, num_updates=22230, lr=2.01726e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=91041 2023-05-02 03:51:08 - progress_bar.py[line:274] - INFO: epoch 004: 4152 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7708.7, nsentences=120, sample_size=4099.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1951.3, ups=0.25, wpb=7708.7, bsz=120, num_updates=22240, lr=2.01673e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=91080 2023-05-02 03:51:47 - progress_bar.py[line:274] - INFO: epoch 004: 4162 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7938.2, nsentences=120, sample_size=3855.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2008.2, ups=0.25, wpb=7938.2, bsz=120, num_updates=22250, lr=2.0162e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=91120 2023-05-02 03:52:27 - progress_bar.py[line:274] - INFO: epoch 004: 4172 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7659.2, nsentences=120, sample_size=3979.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1920.6, ups=0.25, wpb=7659.2, bsz=120, num_updates=22260, lr=2.01567e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=91160 2023-05-02 03:53:07 - progress_bar.py[line:274] - INFO: epoch 004: 4182 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7697.2, nsentences=120, sample_size=4083, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1947.4, ups=0.25, wpb=7697.2, bsz=120, num_updates=22270, lr=2.01514e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=39, gb_free=28.5, wall=91199 2023-05-02 03:53:47 - progress_bar.py[line:274] - INFO: epoch 004: 4192 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7794.9, nsentences=120, sample_size=3956.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1943.7, ups=0.25, wpb=7794.9, bsz=120, num_updates=22280, lr=2.01461e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=91239 2023-05-02 03:54:26 - progress_bar.py[line:274] - INFO: epoch 004: 4202 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7549.2, nsentences=120, sample_size=4124.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1920.7, ups=0.25, wpb=7549.2, bsz=120, num_updates=22290, lr=2.01409e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=91279 2023-05-02 03:55:05 - progress_bar.py[line:274] - INFO: epoch 004: 4212 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7736.3, nsentences=120, sample_size=3932, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1966.8, ups=0.25, wpb=7736.3, bsz=120, num_updates=22300, lr=2.01356e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=31, wall=91318 2023-05-02 03:55:46 - progress_bar.py[line:274] - INFO: epoch 004: 4222 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7668.7, nsentences=120, sample_size=4161.1, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1887, ups=0.25, wpb=7668.7, bsz=120, num_updates=22310, lr=2.01303e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=41, gb_free=30.5, wall=91359 2023-05-02 03:56:26 - progress_bar.py[line:274] - INFO: epoch 004: 4232 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7447.9, nsentences=120, sample_size=3974.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1862.3, ups=0.25, wpb=7447.9, bsz=120, num_updates=22320, lr=2.0125e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=91399 2023-05-02 03:57:06 - progress_bar.py[line:274] - INFO: epoch 004: 4242 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7618.4, nsentences=120, sample_size=4159.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1931.4, ups=0.25, wpb=7618.4, bsz=120, num_updates=22330, lr=2.01197e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=91438 2023-05-02 03:57:45 - progress_bar.py[line:274] - INFO: epoch 004: 4252 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7670.1, nsentences=120, sample_size=4480.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1926.2, ups=0.25, wpb=7670.1, bsz=120, num_updates=22340, lr=2.01144e-05, gnorm=0.898, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=91478 2023-05-02 03:58:25 - progress_bar.py[line:274] - INFO: epoch 004: 4262 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7522.3, nsentences=120, sample_size=3919, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1917.2, ups=0.25, wpb=7522.3, bsz=120, num_updates=22350, lr=2.01092e-05, gnorm=0.968, clip=30, loss_scale=128, train_wall=39, gb_free=29.7, wall=91517 2023-05-02 03:59:04 - progress_bar.py[line:274] - INFO: epoch 004: 4272 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7649.6, nsentences=120, sample_size=3969.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1923.1, ups=0.25, wpb=7649.6, bsz=120, num_updates=22360, lr=2.01039e-05, gnorm=0.977, clip=50, loss_scale=128, train_wall=40, gb_free=30.5, wall=91557 2023-05-02 03:59:44 - progress_bar.py[line:274] - INFO: epoch 004: 4282 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.222, ntokens=7997, nsentences=120, sample_size=4230.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=2041.4, ups=0.26, wpb=7997, bsz=120, num_updates=22370, lr=2.00986e-05, gnorm=0.945, clip=30, loss_scale=128, train_wall=39, gb_free=30, wall=91596 2023-05-02 04:00:04 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 04:00:28 - progress_bar.py[line:274] - INFO: epoch 004: 4293 / 6042 loss=2.495, loss_v1=0, loss_v2=0, nll_loss=1.253, ntokens=8051.5, nsentences=120, sample_size=4013.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1829.1, ups=0.23, wpb=8051.5, bsz=120, num_updates=22380, lr=2.00933e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=44, gb_free=30.5, wall=91640 2023-05-02 04:01:08 - progress_bar.py[line:274] - INFO: epoch 004: 4303 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7916.9, nsentences=120, sample_size=3818, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1952.6, ups=0.25, wpb=7916.9, bsz=120, num_updates=22390, lr=2.0088e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=28.1, wall=91681 2023-05-02 04:01:47 - progress_bar.py[line:274] - INFO: epoch 004: 4313 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7449.8, nsentences=120, sample_size=4024.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1914.7, ups=0.26, wpb=7449.8, bsz=120, num_updates=22400, lr=2.00828e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=91719 2023-05-02 04:02:27 - progress_bar.py[line:274] - INFO: epoch 004: 4323 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=8079.4, nsentences=120, sample_size=3776.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=2028.1, ups=0.25, wpb=8079.4, bsz=120, num_updates=22410, lr=2.00775e-05, gnorm=0.979, clip=50, loss_scale=64, train_wall=40, gb_free=28.1, wall=91759 2023-05-02 04:03:06 - progress_bar.py[line:274] - INFO: epoch 004: 4333 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.216, ntokens=7859.2, nsentences=120, sample_size=4063.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2017.7, ups=0.26, wpb=7859.2, bsz=120, num_updates=22420, lr=2.00722e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=91798 2023-05-02 04:03:46 - progress_bar.py[line:274] - INFO: epoch 004: 4343 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7696.4, nsentences=120, sample_size=3963.9, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1897.7, ups=0.25, wpb=7696.4, bsz=120, num_updates=22430, lr=2.00669e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=91839 2023-05-02 04:04:26 - progress_bar.py[line:274] - INFO: epoch 004: 4353 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7470, nsentences=120, sample_size=4027.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1895.3, ups=0.25, wpb=7470, bsz=120, num_updates=22440, lr=2.00616e-05, gnorm=0.952, clip=10, loss_scale=64, train_wall=39, gb_free=29.2, wall=91878 2023-05-02 04:05:06 - progress_bar.py[line:274] - INFO: epoch 004: 4363 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7445.8, nsentences=120, sample_size=4401.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1861.8, ups=0.25, wpb=7445.8, bsz=120, num_updates=22450, lr=2.00563e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=91918 2023-05-02 04:05:45 - progress_bar.py[line:274] - INFO: epoch 004: 4373 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7494.4, nsentences=120, sample_size=3968.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1909, ups=0.25, wpb=7494.4, bsz=120, num_updates=22460, lr=2.00511e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=91958 2023-05-02 04:06:25 - progress_bar.py[line:274] - INFO: epoch 004: 4383 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7349, nsentences=120, sample_size=4141.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1857.3, ups=0.25, wpb=7349, bsz=120, num_updates=22470, lr=2.00458e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=39, gb_free=31.2, wall=91997 2023-05-02 04:07:04 - progress_bar.py[line:274] - INFO: epoch 004: 4393 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7669.6, nsentences=120, sample_size=3975.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1941.4, ups=0.25, wpb=7669.6, bsz=120, num_updates=22480, lr=2.00405e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=28, wall=92037 2023-05-02 04:07:44 - progress_bar.py[line:274] - INFO: epoch 004: 4403 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7805.9, nsentences=120, sample_size=4038.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1957.3, ups=0.25, wpb=7805.9, bsz=120, num_updates=22490, lr=2.00352e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=31.2, wall=92076 2023-05-02 04:08:24 - progress_bar.py[line:274] - INFO: epoch 004: 4413 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=7760, nsentences=120, sample_size=3996.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1926.8, ups=0.25, wpb=7760, bsz=120, num_updates=22500, lr=2.00299e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=92117 2023-05-02 04:09:04 - progress_bar.py[line:274] - INFO: epoch 004: 4423 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7661.3, nsentences=120, sample_size=4162.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1906.6, ups=0.25, wpb=7661.3, bsz=120, num_updates=22510, lr=2.00247e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=92157 2023-05-02 04:09:44 - progress_bar.py[line:274] - INFO: epoch 004: 4433 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7792.9, nsentences=120, sample_size=4093.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1977.4, ups=0.25, wpb=7792.9, bsz=120, num_updates=22520, lr=2.00194e-05, gnorm=0.967, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=92196 2023-05-02 04:10:23 - progress_bar.py[line:274] - INFO: epoch 004: 4443 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7669.4, nsentences=120, sample_size=4184.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1938.4, ups=0.25, wpb=7669.4, bsz=120, num_updates=22530, lr=2.00141e-05, gnorm=0.941, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=92236 2023-05-02 04:11:03 - progress_bar.py[line:274] - INFO: epoch 004: 4453 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7511.3, nsentences=120, sample_size=3860.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1902.5, ups=0.25, wpb=7511.3, bsz=120, num_updates=22540, lr=2.00088e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=92275 2023-05-02 04:11:43 - progress_bar.py[line:274] - INFO: epoch 004: 4463 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7783, nsentences=120, sample_size=3978.1, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1949.4, ups=0.25, wpb=7783, bsz=120, num_updates=22550, lr=2.00035e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=92315 2023-05-02 04:12:22 - progress_bar.py[line:274] - INFO: epoch 004: 4473 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7765.2, nsentences=120, sample_size=3699, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1982, ups=0.26, wpb=7765.2, bsz=120, num_updates=22560, lr=1.99982e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=92354 2023-05-02 04:13:03 - progress_bar.py[line:274] - INFO: epoch 004: 4483 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7966.9, nsentences=120, sample_size=3656.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1934.7, ups=0.24, wpb=7966.9, bsz=120, num_updates=22570, lr=1.9993e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=41, gb_free=24.7, wall=92396 2023-05-02 04:13:44 - progress_bar.py[line:274] - INFO: epoch 004: 4493 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.235, ntokens=7937.8, nsentences=120, sample_size=3988, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1960.7, ups=0.25, wpb=7937.8, bsz=120, num_updates=22580, lr=1.99877e-05, gnorm=0.947, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=92436 2023-05-02 04:14:24 - progress_bar.py[line:274] - INFO: epoch 004: 4503 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7760.4, nsentences=120, sample_size=4023.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1936.2, ups=0.25, wpb=7760.4, bsz=120, num_updates=22590, lr=1.99824e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=92476 2023-05-02 04:15:04 - progress_bar.py[line:274] - INFO: epoch 004: 4513 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7734.4, nsentences=120, sample_size=4033.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1942.8, ups=0.25, wpb=7734.4, bsz=120, num_updates=22600, lr=1.99771e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=92516 2023-05-02 04:15:45 - progress_bar.py[line:274] - INFO: epoch 004: 4523 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7771.6, nsentences=120, sample_size=3847.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1899.1, ups=0.24, wpb=7771.6, bsz=120, num_updates=22610, lr=1.99718e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=41, gb_free=29.6, wall=92557 2023-05-02 04:16:24 - progress_bar.py[line:274] - INFO: epoch 004: 4533 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7549, nsentences=120, sample_size=4233.6, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1906.4, ups=0.25, wpb=7549, bsz=120, num_updates=22620, lr=1.99665e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=92597 2023-05-02 04:17:04 - progress_bar.py[line:274] - INFO: epoch 004: 4543 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7734.7, nsentences=120, sample_size=4376.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1955.3, ups=0.25, wpb=7734.7, bsz=120, num_updates=22630, lr=1.99613e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=39, gb_free=30.7, wall=92636 2023-05-02 04:17:43 - progress_bar.py[line:274] - INFO: epoch 004: 4553 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7914.8, nsentences=120, sample_size=3893.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2034.2, ups=0.26, wpb=7914.8, bsz=120, num_updates=22640, lr=1.9956e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=92675 2023-05-02 04:18:22 - progress_bar.py[line:274] - INFO: epoch 004: 4563 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7492.8, nsentences=120, sample_size=3951.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1895.6, ups=0.25, wpb=7492.8, bsz=120, num_updates=22650, lr=1.99507e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=92715 2023-05-02 04:19:02 - progress_bar.py[line:274] - INFO: epoch 004: 4573 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7545.4, nsentences=120, sample_size=4132.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1880.7, ups=0.25, wpb=7545.4, bsz=120, num_updates=22660, lr=1.99454e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=40, gb_free=28.7, wall=92755 2023-05-02 04:19:42 - progress_bar.py[line:274] - INFO: epoch 004: 4583 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7523.2, nsentences=120, sample_size=3935.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1912.9, ups=0.25, wpb=7523.2, bsz=120, num_updates=22670, lr=1.99401e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=92794 2023-05-02 04:20:21 - progress_bar.py[line:274] - INFO: epoch 004: 4593 / 6042 loss=2.499, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=7965.4, nsentences=120, sample_size=3914.6, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=2007.8, ups=0.25, wpb=7965.4, bsz=120, num_updates=22680, lr=1.99349e-05, gnorm=1.111, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=92834 2023-05-02 04:21:01 - progress_bar.py[line:274] - INFO: epoch 004: 4603 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=8100.4, nsentences=120, sample_size=4172.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2026, ups=0.25, wpb=8100.4, bsz=120, num_updates=22690, lr=1.99296e-05, gnorm=0.93, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=92874 2023-05-02 04:21:42 - progress_bar.py[line:274] - INFO: epoch 004: 4613 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.234, ntokens=7913.1, nsentences=120, sample_size=4013.5, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1960.4, ups=0.25, wpb=7913.1, bsz=120, num_updates=22700, lr=1.99243e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=92914 2023-05-02 04:22:22 - progress_bar.py[line:274] - INFO: epoch 004: 4623 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.242, ntokens=7835.6, nsentences=120, sample_size=4094.5, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1929.5, ups=0.25, wpb=7835.6, bsz=120, num_updates=22710, lr=1.9919e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=41, gb_free=29.4, wall=92955 2023-05-02 04:23:02 - progress_bar.py[line:274] - INFO: epoch 004: 4633 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7787.3, nsentences=120, sample_size=4101.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1936.1, ups=0.25, wpb=7787.3, bsz=120, num_updates=22720, lr=1.99137e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=92995 2023-05-02 04:23:42 - progress_bar.py[line:274] - INFO: epoch 004: 4643 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7721, nsentences=120, sample_size=4293.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1951.8, ups=0.25, wpb=7721, bsz=120, num_updates=22730, lr=1.99084e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=93034 2023-05-02 04:24:22 - progress_bar.py[line:274] - INFO: epoch 004: 4653 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7727.1, nsentences=120, sample_size=4061, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1952.7, ups=0.25, wpb=7727.1, bsz=120, num_updates=22740, lr=1.99032e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=93074 2023-05-02 04:25:01 - progress_bar.py[line:274] - INFO: epoch 004: 4663 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7161.7, nsentences=120, sample_size=4186.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1811.6, ups=0.25, wpb=7161.7, bsz=120, num_updates=22750, lr=1.98979e-05, gnorm=0.981, clip=60, loss_scale=64, train_wall=39, gb_free=30.9, wall=93114 2023-05-02 04:25:41 - progress_bar.py[line:274] - INFO: epoch 004: 4673 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7622.8, nsentences=120, sample_size=3814.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1895.6, ups=0.25, wpb=7622.8, bsz=120, num_updates=22760, lr=1.98926e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=93154 2023-05-02 04:26:22 - progress_bar.py[line:274] - INFO: epoch 004: 4683 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=8012.8, nsentences=120, sample_size=3908.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1979.8, ups=0.25, wpb=8012.8, bsz=120, num_updates=22770, lr=1.98873e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=93194 2023-05-02 04:27:03 - progress_bar.py[line:274] - INFO: epoch 004: 4693 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7631.6, nsentences=120, sample_size=4051.3, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1845.6, ups=0.24, wpb=7631.6, bsz=120, num_updates=22780, lr=1.9882e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=41, gb_free=30, wall=93236 2023-05-02 04:27:43 - progress_bar.py[line:274] - INFO: epoch 004: 4703 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7795.2, nsentences=120, sample_size=3867.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1947.3, ups=0.25, wpb=7795.2, bsz=120, num_updates=22790, lr=1.98767e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=93276 2023-05-02 04:28:23 - progress_bar.py[line:274] - INFO: epoch 004: 4713 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7873.8, nsentences=120, sample_size=4000.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1989.4, ups=0.25, wpb=7873.8, bsz=120, num_updates=22800, lr=1.98715e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=93315 2023-05-02 04:29:02 - progress_bar.py[line:274] - INFO: epoch 004: 4723 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7888.1, nsentences=120, sample_size=4102.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1992.1, ups=0.25, wpb=7888.1, bsz=120, num_updates=22810, lr=1.98662e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=93355 2023-05-02 04:29:43 - progress_bar.py[line:274] - INFO: epoch 004: 4733 / 6042 loss=2.497, loss_v1=0, loss_v2=0, nll_loss=1.259, ntokens=8033.7, nsentences=120, sample_size=4177.2, sample_size_v1=0, sample_size_v2=0, ppl=2.39, wps=1995.7, ups=0.25, wpb=8033.7, bsz=120, num_updates=22820, lr=1.98609e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=93395 2023-05-02 04:30:22 - progress_bar.py[line:274] - INFO: epoch 004: 4743 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7723.7, nsentences=119.2, sample_size=4023.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1946.5, ups=0.25, wpb=7723.7, bsz=119.2, num_updates=22830, lr=1.98556e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=93435 2023-05-02 04:31:02 - progress_bar.py[line:274] - INFO: epoch 004: 4753 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7486.7, nsentences=120, sample_size=3978.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1892.9, ups=0.25, wpb=7486.7, bsz=120, num_updates=22840, lr=1.98503e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=28.9, wall=93474 2023-05-02 04:31:41 - progress_bar.py[line:274] - INFO: epoch 004: 4763 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7601.8, nsentences=120, sample_size=4021.8, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1940.8, ups=0.26, wpb=7601.8, bsz=120, num_updates=22850, lr=1.98451e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=93513 2023-05-02 04:32:21 - progress_bar.py[line:274] - INFO: epoch 004: 4773 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.228, ntokens=7651.6, nsentences=120, sample_size=4139.9, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1909.4, ups=0.25, wpb=7651.6, bsz=120, num_updates=22860, lr=1.98398e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=93554 2023-05-02 04:33:01 - progress_bar.py[line:274] - INFO: epoch 004: 4783 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7589.8, nsentences=120, sample_size=3869, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1902, ups=0.25, wpb=7589.8, bsz=120, num_updates=22870, lr=1.98345e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=93593 2023-05-02 04:33:41 - progress_bar.py[line:274] - INFO: epoch 004: 4793 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7721.5, nsentences=120, sample_size=4298.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1926.2, ups=0.25, wpb=7721.5, bsz=120, num_updates=22880, lr=1.98292e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=93634 2023-05-02 04:34:14 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 04:34:25 - progress_bar.py[line:274] - INFO: epoch 004: 4804 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7529.1, nsentences=120, sample_size=3916.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1705.9, ups=0.23, wpb=7529.1, bsz=120, num_updates=22890, lr=1.98239e-05, gnorm=1, clip=40, loss_scale=64, train_wall=44, gb_free=29.3, wall=93678 2023-05-02 04:35:05 - progress_bar.py[line:274] - INFO: epoch 004: 4814 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7672.5, nsentences=120, sample_size=4242.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1951.8, ups=0.25, wpb=7672.5, bsz=120, num_updates=22900, lr=1.98186e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=39, gb_free=26.9, wall=93717 2023-05-02 04:35:44 - progress_bar.py[line:274] - INFO: epoch 004: 4824 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7761.7, nsentences=120, sample_size=4300.1, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1954.7, ups=0.25, wpb=7761.7, bsz=120, num_updates=22910, lr=1.98134e-05, gnorm=0.914, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=93757 2023-05-02 04:36:24 - progress_bar.py[line:274] - INFO: epoch 004: 4834 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7749.4, nsentences=120, sample_size=3946.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1938.5, ups=0.25, wpb=7749.4, bsz=120, num_updates=22920, lr=1.98081e-05, gnorm=0.938, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=93797 2023-05-02 04:37:05 - progress_bar.py[line:274] - INFO: epoch 004: 4844 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=8054.7, nsentences=120, sample_size=3675.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1981.6, ups=0.25, wpb=8054.7, bsz=120, num_updates=22930, lr=1.98028e-05, gnorm=1.004, clip=60, loss_scale=64, train_wall=41, gb_free=29.4, wall=93837 2023-05-02 04:37:44 - progress_bar.py[line:274] - INFO: epoch 004: 4854 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7547, nsentences=120, sample_size=3930.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1920.4, ups=0.25, wpb=7547, bsz=120, num_updates=22940, lr=1.97975e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=39, gb_free=27.4, wall=93877 2023-05-02 04:38:24 - progress_bar.py[line:274] - INFO: epoch 004: 4864 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7470.9, nsentences=120, sample_size=4065.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1890.4, ups=0.25, wpb=7470.9, bsz=120, num_updates=22950, lr=1.97922e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=93916 2023-05-02 04:39:03 - progress_bar.py[line:274] - INFO: epoch 004: 4874 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7959.6, nsentences=120, sample_size=3908.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2019.9, ups=0.25, wpb=7959.6, bsz=120, num_updates=22960, lr=1.9787e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=93956 2023-05-02 04:39:43 - progress_bar.py[line:274] - INFO: epoch 004: 4884 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7533.9, nsentences=120, sample_size=4154.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1908.3, ups=0.25, wpb=7533.9, bsz=120, num_updates=22970, lr=1.97817e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=39, gb_free=25.3, wall=93995 2023-05-02 04:40:22 - progress_bar.py[line:274] - INFO: epoch 004: 4894 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7636.6, nsentences=120, sample_size=4153.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1921.9, ups=0.25, wpb=7636.6, bsz=120, num_updates=22980, lr=1.97764e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=94035 2023-05-02 04:41:02 - progress_bar.py[line:274] - INFO: epoch 004: 4904 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7655.1, nsentences=120, sample_size=3925.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1946.7, ups=0.25, wpb=7655.1, bsz=120, num_updates=22990, lr=1.97711e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=24.8, wall=94074 2023-05-02 04:41:41 - progress_bar.py[line:274] - INFO: epoch 004: 4914 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7663.3, nsentences=120, sample_size=4094.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1964.1, ups=0.26, wpb=7663.3, bsz=120, num_updates=23000, lr=1.97658e-05, gnorm=0.968, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=94113 2023-05-02 04:41:41 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 04:41:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 04:41:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:41:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:41:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:41:59 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 04:41:59 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:42:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:11 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 04:42:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:42:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 04:42:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:42:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:27 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 04:42:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:42:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 04:42:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 04:42:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 04:42:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 04:42:32 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.194 | loss_v1 0 | loss_v2 0 | nll_loss 2.028 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.08 | score 0.7515 | wps 3302.8 | wpb 3202.1 | bsz 39.4 | num_updates 23000 | best_score 0.7529 2023-05-02 04:42:32 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 23000 updates 2023-05-02 04:42:32 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_23000.pt 2023-05-02 04:42:57 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_23000.pt 2023-05-02 04:43:23 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_23000.pt (epoch 4 @ 23000 updates, score 0.7515) (writing took 51.27530999481678 seconds) 2023-05-02 04:44:03 - progress_bar.py[line:274] - INFO: epoch 004: 4924 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7727.6, nsentences=120, sample_size=4067.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=543.7, ups=0.07, wpb=7727.6, bsz=120, num_updates=23010, lr=1.97605e-05, gnorm=0.959, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=94255 2023-05-02 04:44:43 - progress_bar.py[line:274] - INFO: epoch 004: 4934 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7933, nsentences=120, sample_size=4132.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1995, ups=0.25, wpb=7933, bsz=120, num_updates=23020, lr=1.97553e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=27.5, wall=94295 2023-05-02 04:45:22 - progress_bar.py[line:274] - INFO: epoch 004: 4944 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7705.9, nsentences=120, sample_size=3890.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1928.4, ups=0.25, wpb=7705.9, bsz=120, num_updates=23030, lr=1.975e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=94335 2023-05-02 04:46:02 - progress_bar.py[line:274] - INFO: epoch 004: 4954 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7808.1, nsentences=120, sample_size=4049.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1964.3, ups=0.25, wpb=7808.1, bsz=120, num_updates=23040, lr=1.97447e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=94375 2023-05-02 04:46:42 - progress_bar.py[line:274] - INFO: epoch 004: 4964 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7716, nsentences=120, sample_size=4174.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1957.9, ups=0.25, wpb=7716, bsz=120, num_updates=23050, lr=1.97394e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=29.1, wall=94414 2023-05-02 04:47:22 - progress_bar.py[line:274] - INFO: epoch 004: 4974 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7719, nsentences=120, sample_size=4021.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1923, ups=0.25, wpb=7719, bsz=120, num_updates=23060, lr=1.97341e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=94454 2023-05-02 04:48:01 - progress_bar.py[line:274] - INFO: epoch 004: 4984 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7818.8, nsentences=120, sample_size=4057.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1987.3, ups=0.25, wpb=7818.8, bsz=120, num_updates=23070, lr=1.97288e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=94494 2023-05-02 04:48:41 - progress_bar.py[line:274] - INFO: epoch 004: 4994 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.243, ntokens=8063.4, nsentences=120, sample_size=3807.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=2026, ups=0.25, wpb=8063.4, bsz=120, num_updates=23080, lr=1.97236e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=94533 2023-05-02 04:49:20 - progress_bar.py[line:274] - INFO: epoch 004: 5004 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.227, ntokens=7559.3, nsentences=120, sample_size=4307.3, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1916.3, ups=0.25, wpb=7559.3, bsz=120, num_updates=23090, lr=1.97183e-05, gnorm=0.898, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=94573 2023-05-02 04:50:00 - progress_bar.py[line:274] - INFO: epoch 004: 5014 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7741.5, nsentences=120, sample_size=4395.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1957.6, ups=0.25, wpb=7741.5, bsz=120, num_updates=23100, lr=1.9713e-05, gnorm=0.897, clip=0, loss_scale=64, train_wall=39, gb_free=30.4, wall=94612 2023-05-02 04:50:41 - progress_bar.py[line:274] - INFO: epoch 004: 5024 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7766.8, nsentences=120, sample_size=4066.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1904.9, ups=0.25, wpb=7766.8, bsz=120, num_updates=23110, lr=1.97077e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=41, gb_free=28.5, wall=94653 2023-05-02 04:51:20 - progress_bar.py[line:274] - INFO: epoch 004: 5034 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7765.6, nsentences=120, sample_size=4134.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1979.2, ups=0.25, wpb=7765.6, bsz=120, num_updates=23120, lr=1.97024e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=94692 2023-05-02 04:51:59 - progress_bar.py[line:274] - INFO: epoch 004: 5044 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=8055.8, nsentences=120, sample_size=4097.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2066.2, ups=0.26, wpb=8055.8, bsz=120, num_updates=23130, lr=1.96972e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=94731 2023-05-02 04:52:39 - progress_bar.py[line:274] - INFO: epoch 004: 5054 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7298.9, nsentences=120, sample_size=4056.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1832.1, ups=0.25, wpb=7298.9, bsz=120, num_updates=23140, lr=1.96919e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=94771 2023-05-02 04:53:19 - progress_bar.py[line:274] - INFO: epoch 004: 5064 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7345.1, nsentences=120, sample_size=4226.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1846.3, ups=0.25, wpb=7345.1, bsz=120, num_updates=23150, lr=1.96866e-05, gnorm=0.939, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=94811 2023-05-02 04:53:59 - progress_bar.py[line:274] - INFO: epoch 004: 5074 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7787, nsentences=120, sample_size=4320.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1932.8, ups=0.25, wpb=7787, bsz=120, num_updates=23160, lr=1.96813e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=94851 2023-05-02 04:54:39 - progress_bar.py[line:274] - INFO: epoch 004: 5084 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7746.3, nsentences=120, sample_size=4286.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1921.7, ups=0.25, wpb=7746.3, bsz=120, num_updates=23170, lr=1.9676e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=94892 2023-05-02 04:55:19 - progress_bar.py[line:274] - INFO: epoch 004: 5094 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7547.5, nsentences=120, sample_size=3913, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1875.9, ups=0.25, wpb=7547.5, bsz=120, num_updates=23180, lr=1.96707e-05, gnorm=0.938, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=94932 2023-05-02 04:55:59 - progress_bar.py[line:274] - INFO: epoch 004: 5104 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7863.4, nsentences=120, sample_size=3979, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1975.1, ups=0.25, wpb=7863.4, bsz=120, num_updates=23190, lr=1.96655e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=94972 2023-05-02 04:56:38 - progress_bar.py[line:274] - INFO: epoch 004: 5114 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7716.9, nsentences=120, sample_size=3974.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1971.7, ups=0.26, wpb=7716.9, bsz=120, num_updates=23200, lr=1.96602e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=95011 2023-05-02 04:57:18 - progress_bar.py[line:274] - INFO: epoch 004: 5124 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7671.3, nsentences=120, sample_size=4351.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1942.1, ups=0.25, wpb=7671.3, bsz=120, num_updates=23210, lr=1.96549e-05, gnorm=0.915, clip=0, loss_scale=64, train_wall=39, gb_free=30.3, wall=95050 2023-05-02 04:57:58 - progress_bar.py[line:274] - INFO: epoch 004: 5134 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7593.2, nsentences=120, sample_size=3944.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1884.7, ups=0.25, wpb=7593.2, bsz=120, num_updates=23220, lr=1.96496e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=95091 2023-05-02 04:58:38 - progress_bar.py[line:274] - INFO: epoch 004: 5144 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7724.6, nsentences=120, sample_size=3975.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1930, ups=0.25, wpb=7724.6, bsz=120, num_updates=23230, lr=1.96443e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=95131 2023-05-02 04:59:18 - progress_bar.py[line:274] - INFO: epoch 004: 5154 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7414.3, nsentences=120, sample_size=3997.7, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1866.2, ups=0.25, wpb=7414.3, bsz=120, num_updates=23240, lr=1.96391e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=95170 2023-05-02 04:59:58 - progress_bar.py[line:274] - INFO: epoch 004: 5164 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7705.1, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1926.7, ups=0.25, wpb=7705.1, bsz=120, num_updates=23250, lr=1.96338e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=95210 2023-05-02 05:00:38 - progress_bar.py[line:274] - INFO: epoch 004: 5174 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.224, ntokens=7835.9, nsentences=120, sample_size=3904.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1966.6, ups=0.25, wpb=7835.9, bsz=120, num_updates=23260, lr=1.96285e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=26.7, wall=95250 2023-05-02 05:01:18 - progress_bar.py[line:274] - INFO: epoch 004: 5184 / 6042 loss=2.489, loss_v1=0, loss_v2=0, nll_loss=1.251, ntokens=8222.5, nsentences=120, sample_size=4093.7, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=2030.6, ups=0.25, wpb=8222.5, bsz=120, num_updates=23270, lr=1.96232e-05, gnorm=0.928, clip=30, loss_scale=64, train_wall=40, gb_free=28.3, wall=95291 2023-05-02 05:01:57 - progress_bar.py[line:274] - INFO: epoch 004: 5194 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7637.6, nsentences=120, sample_size=4198.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1959.6, ups=0.26, wpb=7637.6, bsz=120, num_updates=23280, lr=1.96179e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=95330 2023-05-02 05:02:36 - progress_bar.py[line:274] - INFO: epoch 004: 5204 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=8052.6, nsentences=120, sample_size=3820.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2050.7, ups=0.25, wpb=8052.6, bsz=120, num_updates=23290, lr=1.96126e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=95369 2023-05-02 05:03:16 - progress_bar.py[line:274] - INFO: epoch 004: 5214 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7673.3, nsentences=120, sample_size=4199, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1927, ups=0.25, wpb=7673.3, bsz=120, num_updates=23300, lr=1.96074e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=31.4, wall=95409 2023-05-02 05:03:56 - progress_bar.py[line:274] - INFO: epoch 004: 5224 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.231, ntokens=7840.3, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1956.4, ups=0.25, wpb=7840.3, bsz=120, num_updates=23310, lr=1.96021e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=27.9, wall=95449 2023-05-02 05:04:37 - progress_bar.py[line:274] - INFO: epoch 004: 5234 / 6042 loss=2.473, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=7687.8, nsentences=120, sample_size=3778.9, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1905.2, ups=0.25, wpb=7687.8, bsz=120, num_updates=23320, lr=1.95968e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=95489 2023-05-02 05:05:17 - progress_bar.py[line:274] - INFO: epoch 004: 5244 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.241, ntokens=7854.9, nsentences=120, sample_size=4414, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1930.5, ups=0.25, wpb=7854.9, bsz=120, num_updates=23330, lr=1.95915e-05, gnorm=0.916, clip=10, loss_scale=64, train_wall=41, gb_free=29.8, wall=95530 2023-05-02 05:05:58 - progress_bar.py[line:274] - INFO: epoch 004: 5254 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7805, nsentences=120, sample_size=4007.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1915.3, ups=0.25, wpb=7805, bsz=120, num_updates=23340, lr=1.95862e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=41, gb_free=26.5, wall=95571 2023-05-02 05:06:38 - progress_bar.py[line:274] - INFO: epoch 004: 5264 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7383.8, nsentences=120, sample_size=3814, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1877.3, ups=0.25, wpb=7383.8, bsz=120, num_updates=23350, lr=1.95809e-05, gnorm=0.999, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=95610 2023-05-02 05:07:17 - progress_bar.py[line:274] - INFO: epoch 004: 5274 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7705.7, nsentences=120, sample_size=4403.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1939.7, ups=0.25, wpb=7705.7, bsz=120, num_updates=23360, lr=1.95757e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=95650 2023-05-02 05:07:58 - progress_bar.py[line:274] - INFO: epoch 004: 5284 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7663.2, nsentences=120, sample_size=4018.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1899.7, ups=0.25, wpb=7663.2, bsz=120, num_updates=23370, lr=1.95704e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=95690 2023-05-02 05:08:37 - progress_bar.py[line:274] - INFO: epoch 004: 5294 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7587.9, nsentences=120, sample_size=3868, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1918.6, ups=0.25, wpb=7587.9, bsz=120, num_updates=23380, lr=1.95651e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=95730 2023-05-02 05:09:17 - progress_bar.py[line:274] - INFO: epoch 004: 5304 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7598.2, nsentences=120, sample_size=3870.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1926.5, ups=0.25, wpb=7598.2, bsz=120, num_updates=23390, lr=1.95598e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=95769 2023-05-02 05:09:57 - progress_bar.py[line:274] - INFO: epoch 004: 5314 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7892.2, nsentences=120, sample_size=3833.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1971.5, ups=0.25, wpb=7892.2, bsz=120, num_updates=23400, lr=1.95545e-05, gnorm=0.962, clip=40, loss_scale=128, train_wall=40, gb_free=29.9, wall=95809 2023-05-02 05:10:05 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 05:10:40 - progress_bar.py[line:274] - INFO: epoch 004: 5325 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7640.3, nsentences=120, sample_size=4293.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1751.7, ups=0.23, wpb=7640.3, bsz=120, num_updates=23410, lr=1.95493e-05, gnorm=0.929, clip=0, loss_scale=64, train_wall=44, gb_free=29.9, wall=95853 2023-05-02 05:11:20 - progress_bar.py[line:274] - INFO: epoch 004: 5335 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7761.9, nsentences=120, sample_size=3970.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1957.7, ups=0.25, wpb=7761.9, bsz=120, num_updates=23420, lr=1.9544e-05, gnorm=0.92, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=95892 2023-05-02 05:12:00 - progress_bar.py[line:274] - INFO: epoch 004: 5345 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7569, nsentences=120, sample_size=4454.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1905.2, ups=0.25, wpb=7569, bsz=120, num_updates=23430, lr=1.95387e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=95932 2023-05-02 05:12:39 - progress_bar.py[line:274] - INFO: epoch 004: 5355 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7670.1, nsentences=120, sample_size=3854.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1930.4, ups=0.25, wpb=7670.1, bsz=120, num_updates=23440, lr=1.95334e-05, gnorm=0.928, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=95972 2023-05-02 05:13:20 - progress_bar.py[line:274] - INFO: epoch 004: 5365 / 6042 loss=2.474, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7937.1, nsentences=120, sample_size=4030.1, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1972.6, ups=0.25, wpb=7937.1, bsz=120, num_updates=23450, lr=1.95281e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=96012 2023-05-02 05:13:58 - progress_bar.py[line:274] - INFO: epoch 004: 5375 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7463.5, nsentences=120, sample_size=4023.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1926.2, ups=0.26, wpb=7463.5, bsz=120, num_updates=23460, lr=1.95228e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=96051 2023-05-02 05:14:39 - progress_bar.py[line:274] - INFO: epoch 004: 5385 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7806.8, nsentences=120, sample_size=4360.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1933.9, ups=0.25, wpb=7806.8, bsz=120, num_updates=23470, lr=1.95176e-05, gnorm=0.92, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=96091 2023-05-02 05:15:17 - progress_bar.py[line:274] - INFO: epoch 004: 5395 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7647.1, nsentences=120, sample_size=3728.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1973.5, ups=0.26, wpb=7647.1, bsz=120, num_updates=23480, lr=1.95123e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=96130 2023-05-02 05:15:56 - progress_bar.py[line:274] - INFO: epoch 004: 5405 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7579, nsentences=120, sample_size=3811.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1944.8, ups=0.26, wpb=7579, bsz=120, num_updates=23490, lr=1.9507e-05, gnorm=0.979, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=96169 2023-05-02 05:16:36 - progress_bar.py[line:274] - INFO: epoch 004: 5415 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7911.1, nsentences=120, sample_size=3799.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1977.7, ups=0.25, wpb=7911.1, bsz=120, num_updates=23500, lr=1.95017e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=96209 2023-05-02 05:17:17 - progress_bar.py[line:274] - INFO: epoch 004: 5425 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7625.5, nsentences=120, sample_size=4019.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1897.5, ups=0.25, wpb=7625.5, bsz=120, num_updates=23510, lr=1.94964e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=96249 2023-05-02 05:17:56 - progress_bar.py[line:274] - INFO: epoch 004: 5435 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.237, ntokens=7721.7, nsentences=120, sample_size=3913.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1938.1, ups=0.25, wpb=7721.7, bsz=120, num_updates=23520, lr=1.94912e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=96289 2023-05-02 05:18:36 - progress_bar.py[line:274] - INFO: epoch 004: 5445 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7526.3, nsentences=120, sample_size=3916.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1914.8, ups=0.25, wpb=7526.3, bsz=120, num_updates=23530, lr=1.94859e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=96328 2023-05-02 05:19:17 - progress_bar.py[line:274] - INFO: epoch 004: 5455 / 6042 loss=2.485, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=7758.7, nsentences=120, sample_size=4124.5, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1882.3, ups=0.24, wpb=7758.7, bsz=120, num_updates=23540, lr=1.94806e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=41, gb_free=30.5, wall=96369 2023-05-02 05:19:56 - progress_bar.py[line:274] - INFO: epoch 004: 5465 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7461.1, nsentences=120, sample_size=3848.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1920.3, ups=0.26, wpb=7461.1, bsz=120, num_updates=23550, lr=1.94753e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=96408 2023-05-02 05:20:36 - progress_bar.py[line:274] - INFO: epoch 004: 5475 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7686.4, nsentences=120, sample_size=4172.6, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1889.8, ups=0.25, wpb=7686.4, bsz=120, num_updates=23560, lr=1.947e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=96449 2023-05-02 05:21:17 - progress_bar.py[line:274] - INFO: epoch 004: 5485 / 6042 loss=2.466, loss_v1=0, loss_v2=0, nll_loss=1.225, ntokens=7700.1, nsentences=120, sample_size=4172.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1923.4, ups=0.25, wpb=7700.1, bsz=120, num_updates=23570, lr=1.94647e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=96489 2023-05-02 05:21:56 - progress_bar.py[line:274] - INFO: epoch 004: 5495 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7732.2, nsentences=120, sample_size=4028, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1961.2, ups=0.25, wpb=7732.2, bsz=120, num_updates=23580, lr=1.94595e-05, gnorm=0.94, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=96528 2023-05-02 05:22:35 - progress_bar.py[line:274] - INFO: epoch 004: 5505 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7677, nsentences=120, sample_size=4027.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1950.4, ups=0.25, wpb=7677, bsz=120, num_updates=23590, lr=1.94542e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=96568 2023-05-02 05:23:16 - progress_bar.py[line:274] - INFO: epoch 004: 5515 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7892.3, nsentences=120, sample_size=3946.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1943.8, ups=0.25, wpb=7892.3, bsz=120, num_updates=23600, lr=1.94489e-05, gnorm=0.96, clip=40, loss_scale=64, train_wall=41, gb_free=30.4, wall=96608 2023-05-02 05:23:56 - progress_bar.py[line:274] - INFO: epoch 004: 5525 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7594.1, nsentences=120, sample_size=3792.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1897.6, ups=0.25, wpb=7594.1, bsz=120, num_updates=23610, lr=1.94436e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=96648 2023-05-02 05:24:36 - progress_bar.py[line:274] - INFO: epoch 004: 5535 / 6042 loss=2.465, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7850.6, nsentences=120, sample_size=3963.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1984.8, ups=0.25, wpb=7850.6, bsz=120, num_updates=23620, lr=1.94383e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=31.2, wall=96688 2023-05-02 05:25:15 - progress_bar.py[line:274] - INFO: epoch 004: 5545 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7545.5, nsentences=120, sample_size=4106.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1923.5, ups=0.25, wpb=7545.5, bsz=120, num_updates=23630, lr=1.9433e-05, gnorm=0.919, clip=0, loss_scale=64, train_wall=39, gb_free=29.7, wall=96727 2023-05-02 05:25:54 - progress_bar.py[line:274] - INFO: epoch 004: 5555 / 6042 loss=2.472, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7641.7, nsentences=120, sample_size=4048.4, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1953, ups=0.26, wpb=7641.7, bsz=120, num_updates=23640, lr=1.94278e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=96766 2023-05-02 05:26:34 - progress_bar.py[line:274] - INFO: epoch 004: 5565 / 6042 loss=2.481, loss_v1=0, loss_v2=0, nll_loss=1.244, ntokens=8041.4, nsentences=120, sample_size=3736.4, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1982.3, ups=0.25, wpb=8041.4, bsz=120, num_updates=23650, lr=1.94225e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=28.4, wall=96807 2023-05-02 05:27:14 - progress_bar.py[line:274] - INFO: epoch 004: 5575 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7801.9, nsentences=120, sample_size=3939.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1964.5, ups=0.25, wpb=7801.9, bsz=120, num_updates=23660, lr=1.94172e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=96847 2023-05-02 05:27:54 - progress_bar.py[line:274] - INFO: epoch 004: 5585 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7909.3, nsentences=120, sample_size=4299.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1976.3, ups=0.25, wpb=7909.3, bsz=120, num_updates=23670, lr=1.94119e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=96887 2023-05-02 05:28:34 - progress_bar.py[line:274] - INFO: epoch 004: 5595 / 6042 loss=2.467, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=7917.9, nsentences=120, sample_size=3955, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1998.3, ups=0.25, wpb=7917.9, bsz=120, num_updates=23680, lr=1.94066e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=27.2, wall=96926 2023-05-02 05:29:14 - progress_bar.py[line:274] - INFO: epoch 004: 5605 / 6042 loss=2.483, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7846.6, nsentences=120, sample_size=4115.1, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1934.7, ups=0.25, wpb=7846.6, bsz=120, num_updates=23690, lr=1.94014e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=96967 2023-05-02 05:29:54 - progress_bar.py[line:274] - INFO: epoch 004: 5615 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7448.2, nsentences=120, sample_size=4375.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1871.7, ups=0.25, wpb=7448.2, bsz=120, num_updates=23700, lr=1.93961e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=97007 2023-05-02 05:30:35 - progress_bar.py[line:274] - INFO: epoch 004: 5625 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.223, ntokens=8039.9, nsentences=120, sample_size=4185.6, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1970.2, ups=0.25, wpb=8039.9, bsz=120, num_updates=23710, lr=1.93908e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=41, gb_free=30.7, wall=97047 2023-05-02 05:31:15 - progress_bar.py[line:274] - INFO: epoch 004: 5635 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7803.2, nsentences=120, sample_size=3806.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1946.3, ups=0.25, wpb=7803.2, bsz=120, num_updates=23720, lr=1.93855e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=31.1, wall=97088 2023-05-02 05:31:55 - progress_bar.py[line:274] - INFO: epoch 004: 5645 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7582.5, nsentences=120, sample_size=4159.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1902.6, ups=0.25, wpb=7582.5, bsz=120, num_updates=23730, lr=1.93802e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=97127 2023-05-02 05:32:35 - progress_bar.py[line:274] - INFO: epoch 004: 5655 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7454, nsentences=120, sample_size=4451.7, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1878.4, ups=0.25, wpb=7454, bsz=120, num_updates=23740, lr=1.93749e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=97167 2023-05-02 05:33:06 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 05:33:18 - progress_bar.py[line:274] - INFO: epoch 004: 5666 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.226, ntokens=8028, nsentences=120, sample_size=4138.2, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1842.4, ups=0.23, wpb=8028, bsz=120, num_updates=23750, lr=1.93697e-05, gnorm=0.906, clip=10, loss_scale=32, train_wall=43, gb_free=29, wall=97211 2023-05-02 05:33:57 - progress_bar.py[line:274] - INFO: epoch 004: 5676 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7734.8, nsentences=120, sample_size=3881.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1971.3, ups=0.25, wpb=7734.8, bsz=120, num_updates=23760, lr=1.93644e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=39, gb_free=30.8, wall=97250 2023-05-02 05:34:36 - progress_bar.py[line:274] - INFO: epoch 004: 5686 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7537.2, nsentences=120, sample_size=4131.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1932.2, ups=0.26, wpb=7537.2, bsz=120, num_updates=23770, lr=1.93591e-05, gnorm=0.953, clip=40, loss_scale=32, train_wall=39, gb_free=29.6, wall=97289 2023-05-02 05:35:16 - progress_bar.py[line:274] - INFO: epoch 004: 5696 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7582.6, nsentences=120, sample_size=4013.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1901.5, ups=0.25, wpb=7582.6, bsz=120, num_updates=23780, lr=1.93538e-05, gnorm=0.944, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=97329 2023-05-02 05:35:56 - progress_bar.py[line:274] - INFO: epoch 004: 5706 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.252, ntokens=7628.7, nsentences=120, sample_size=3990.5, sample_size_v1=0, sample_size_v2=0, ppl=2.38, wps=1902, ups=0.25, wpb=7628.7, bsz=120, num_updates=23790, lr=1.93485e-05, gnorm=0.935, clip=10, loss_scale=32, train_wall=40, gb_free=30.5, wall=97369 2023-05-02 05:36:37 - progress_bar.py[line:274] - INFO: epoch 004: 5716 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7850.3, nsentences=120, sample_size=3852.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1940.5, ups=0.25, wpb=7850.3, bsz=120, num_updates=23800, lr=1.93433e-05, gnorm=0.915, clip=0, loss_scale=32, train_wall=40, gb_free=30.9, wall=97409 2023-05-02 05:37:16 - progress_bar.py[line:274] - INFO: epoch 004: 5726 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7801.1, nsentences=120, sample_size=3968.3, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1980.1, ups=0.25, wpb=7801.1, bsz=120, num_updates=23810, lr=1.9338e-05, gnorm=0.943, clip=10, loss_scale=32, train_wall=39, gb_free=28.6, wall=97449 2023-05-02 05:37:57 - progress_bar.py[line:274] - INFO: epoch 004: 5736 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7801.7, nsentences=120, sample_size=4144.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1935.1, ups=0.25, wpb=7801.7, bsz=120, num_updates=23820, lr=1.93327e-05, gnorm=0.94, clip=20, loss_scale=32, train_wall=40, gb_free=31.1, wall=97489 2023-05-02 05:38:37 - progress_bar.py[line:274] - INFO: epoch 004: 5746 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7617.7, nsentences=120, sample_size=4122.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1897.6, ups=0.25, wpb=7617.7, bsz=120, num_updates=23830, lr=1.93274e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=40, gb_free=26.6, wall=97529 2023-05-02 05:39:16 - progress_bar.py[line:274] - INFO: epoch 004: 5756 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7444.5, nsentences=120, sample_size=3896.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1880.4, ups=0.25, wpb=7444.5, bsz=120, num_updates=23840, lr=1.93221e-05, gnorm=0.968, clip=40, loss_scale=32, train_wall=40, gb_free=30.3, wall=97569 2023-05-02 05:39:56 - progress_bar.py[line:274] - INFO: epoch 004: 5766 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7731.3, nsentences=120, sample_size=4149, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1944.7, ups=0.25, wpb=7731.3, bsz=120, num_updates=23850, lr=1.93168e-05, gnorm=0.973, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=97609 2023-05-02 05:40:36 - progress_bar.py[line:274] - INFO: epoch 004: 5776 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7866.4, nsentences=120, sample_size=4043.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1977.9, ups=0.25, wpb=7866.4, bsz=120, num_updates=23860, lr=1.93116e-05, gnorm=0.963, clip=40, loss_scale=32, train_wall=40, gb_free=30.9, wall=97648 2023-05-02 05:41:15 - progress_bar.py[line:274] - INFO: epoch 004: 5786 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7685.8, nsentences=120, sample_size=4075.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1968.1, ups=0.26, wpb=7685.8, bsz=120, num_updates=23870, lr=1.93063e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=39, gb_free=26.9, wall=97687 2023-05-02 05:41:55 - progress_bar.py[line:274] - INFO: epoch 004: 5796 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7757.4, nsentences=120, sample_size=3954.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1925.6, ups=0.25, wpb=7757.4, bsz=120, num_updates=23880, lr=1.9301e-05, gnorm=0.952, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=97728 2023-05-02 05:42:35 - progress_bar.py[line:274] - INFO: epoch 004: 5806 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7568.7, nsentences=120, sample_size=4004.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1889.8, ups=0.25, wpb=7568.7, bsz=120, num_updates=23890, lr=1.92957e-05, gnorm=0.953, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=97768 2023-05-02 05:43:15 - progress_bar.py[line:274] - INFO: epoch 004: 5816 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7782.7, nsentences=120, sample_size=3899.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1959.5, ups=0.25, wpb=7782.7, bsz=120, num_updates=23900, lr=1.92904e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=40, gb_free=27.4, wall=97807 2023-05-02 05:43:54 - progress_bar.py[line:274] - INFO: epoch 004: 5826 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7702.5, nsentences=120, sample_size=4034.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1961.4, ups=0.25, wpb=7702.5, bsz=120, num_updates=23910, lr=1.92851e-05, gnorm=0.961, clip=30, loss_scale=32, train_wall=39, gb_free=29.1, wall=97847 2023-05-02 05:44:34 - progress_bar.py[line:274] - INFO: epoch 004: 5836 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7711.6, nsentences=120, sample_size=4069.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1952.2, ups=0.25, wpb=7711.6, bsz=120, num_updates=23920, lr=1.92799e-05, gnorm=0.945, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=97886 2023-05-02 05:45:14 - progress_bar.py[line:274] - INFO: epoch 004: 5846 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7828, nsentences=120, sample_size=3925, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1947.7, ups=0.25, wpb=7828, bsz=120, num_updates=23930, lr=1.92746e-05, gnorm=0.973, clip=40, loss_scale=32, train_wall=40, gb_free=31.3, wall=97926 2023-05-02 05:45:54 - progress_bar.py[line:274] - INFO: epoch 004: 5856 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7529.3, nsentences=120, sample_size=3984.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1898.2, ups=0.25, wpb=7529.3, bsz=120, num_updates=23940, lr=1.92693e-05, gnorm=0.948, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=97966 2023-05-02 05:46:34 - progress_bar.py[line:274] - INFO: epoch 004: 5866 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7610.8, nsentences=120, sample_size=4001.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1887.7, ups=0.25, wpb=7610.8, bsz=120, num_updates=23950, lr=1.9264e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=98006 2023-05-02 05:47:14 - progress_bar.py[line:274] - INFO: epoch 004: 5876 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7743.4, nsentences=120, sample_size=4157.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1917.9, ups=0.25, wpb=7743.4, bsz=120, num_updates=23960, lr=1.92587e-05, gnorm=0.941, clip=0, loss_scale=32, train_wall=40, gb_free=29.7, wall=98047 2023-05-02 05:47:53 - progress_bar.py[line:274] - INFO: epoch 004: 5886 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7698.8, nsentences=120, sample_size=3681, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1968.3, ups=0.26, wpb=7698.8, bsz=120, num_updates=23970, lr=1.92535e-05, gnorm=0.968, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=98086 2023-05-02 05:48:34 - progress_bar.py[line:274] - INFO: epoch 004: 5896 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7621.1, nsentences=120, sample_size=4063.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1867.7, ups=0.25, wpb=7621.1, bsz=120, num_updates=23980, lr=1.92482e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=41, gb_free=30.8, wall=98127 2023-05-02 05:49:14 - progress_bar.py[line:274] - INFO: epoch 004: 5906 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7509.2, nsentences=120, sample_size=4236.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1895.6, ups=0.25, wpb=7509.2, bsz=120, num_updates=23990, lr=1.92429e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=98166 2023-05-02 05:49:54 - progress_bar.py[line:274] - INFO: epoch 004: 5916 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7562.3, nsentences=120, sample_size=4216.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1904.5, ups=0.25, wpb=7562.3, bsz=120, num_updates=24000, lr=1.92376e-05, gnorm=0.916, clip=0, loss_scale=32, train_wall=40, gb_free=29.8, wall=98206 2023-05-02 05:49:54 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 05:49:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:49:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:49:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:49:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:49:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:49:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:49:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:49:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:50:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:50:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:50:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:50:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:50:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:50:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:40 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 05:50:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:50:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:50:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 05:50:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 05:50:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 05:50:45 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.204 | loss_v1 0 | loss_v2 0 | nll_loss 2.035 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.1 | score 0.7524 | wps 3311.3 | wpb 3202.1 | bsz 39.4 | num_updates 24000 | best_score 0.7529 2023-05-02 05:50:45 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 24000 updates 2023-05-02 05:50:45 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_24000.pt 2023-05-02 05:51:10 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_24000.pt 2023-05-02 05:51:36 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_4_24000.pt (epoch 4 @ 24000 updates, score 0.7524) (writing took 51.53920812509023 seconds) 2023-05-02 05:52:16 - progress_bar.py[line:274] - INFO: epoch 004: 5926 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7571.2, nsentences=120, sample_size=3964.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=530.1, ups=0.07, wpb=7571.2, bsz=120, num_updates=24010, lr=1.92323e-05, gnorm=0.97, clip=20, loss_scale=32, train_wall=39, gb_free=29.5, wall=98349 2023-05-02 05:52:56 - progress_bar.py[line:274] - INFO: epoch 004: 5936 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7516.5, nsentences=120, sample_size=3917.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1880.2, ups=0.25, wpb=7516.5, bsz=120, num_updates=24020, lr=1.9227e-05, gnorm=0.97, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=98389 2023-05-02 05:53:37 - progress_bar.py[line:274] - INFO: epoch 004: 5946 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7388.4, nsentences=120, sample_size=4095.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1837.7, ups=0.25, wpb=7388.4, bsz=120, num_updates=24030, lr=1.92218e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=98429 2023-05-02 05:54:16 - progress_bar.py[line:274] - INFO: epoch 004: 5956 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=8183.8, nsentences=120, sample_size=3855.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2056.5, ups=0.25, wpb=8183.8, bsz=120, num_updates=24040, lr=1.92165e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=30.9, wall=98469 2023-05-02 05:54:56 - progress_bar.py[line:274] - INFO: epoch 004: 5966 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7624, nsentences=120, sample_size=4086.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1941.4, ups=0.25, wpb=7624, bsz=120, num_updates=24050, lr=1.92112e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=39, gb_free=31.4, wall=98508 2023-05-02 05:55:35 - progress_bar.py[line:274] - INFO: epoch 004: 5976 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7671.9, nsentences=120, sample_size=3881.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1935.1, ups=0.25, wpb=7671.9, bsz=120, num_updates=24060, lr=1.92059e-05, gnorm=0.972, clip=30, loss_scale=32, train_wall=40, gb_free=27.2, wall=98548 2023-05-02 05:56:15 - progress_bar.py[line:274] - INFO: epoch 004: 5986 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7627.6, nsentences=120, sample_size=4220.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1899.8, ups=0.25, wpb=7627.6, bsz=120, num_updates=24070, lr=1.92006e-05, gnorm=0.942, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=98588 2023-05-02 05:56:55 - progress_bar.py[line:274] - INFO: epoch 004: 5996 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7816.3, nsentences=120, sample_size=3866, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1963.5, ups=0.25, wpb=7816.3, bsz=120, num_updates=24080, lr=1.91954e-05, gnorm=0.969, clip=40, loss_scale=32, train_wall=40, gb_free=28.1, wall=98628 2023-05-02 05:57:35 - progress_bar.py[line:274] - INFO: epoch 004: 6006 / 6042 loss=2.471, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7699.7, nsentences=120, sample_size=3982.1, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1935.2, ups=0.25, wpb=7699.7, bsz=120, num_updates=24090, lr=1.91901e-05, gnorm=0.969, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=98667 2023-05-02 05:58:15 - progress_bar.py[line:274] - INFO: epoch 004: 6016 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7866.5, nsentences=120, sample_size=4144.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1975.8, ups=0.25, wpb=7866.5, bsz=120, num_updates=24100, lr=1.91848e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=98707 2023-05-02 05:58:54 - progress_bar.py[line:274] - INFO: epoch 004: 6026 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7723.8, nsentences=120, sample_size=4210.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1966.7, ups=0.25, wpb=7723.8, bsz=120, num_updates=24110, lr=1.91795e-05, gnorm=0.956, clip=30, loss_scale=32, train_wall=39, gb_free=30.3, wall=98747 2023-05-02 05:59:33 - progress_bar.py[line:274] - INFO: epoch 004: 6036 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7729.1, nsentences=120, sample_size=4249, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1979, ups=0.26, wpb=7729.1, bsz=120, num_updates=24120, lr=1.91742e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=39, gb_free=30.1, wall=98786 2023-05-02 05:59:56 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 05:59:58 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 05:59:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:15 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 06:00:15 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 06:00:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 06:00:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:42 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 06:00:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 06:00:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 06:00:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 06:00:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 06:00:47 - progress_bar.py[line:282] - INFO: epoch 004 | valid on 'valid' subset | loss 3.19 | loss_v1 0 | loss_v2 0 | nll_loss 2.021 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.06 | score 0.7598 | wps 3315.1 | wpb 3202.1 | bsz 39.4 | num_updates 24126 | best_score 0.7598 2023-05-02 06:00:47 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 4 @ 24126 updates 2023-05-02 06:00:47 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_best.pt 2023-05-02 06:01:13 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_best.pt 2023-05-02 06:01:39 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_best.pt (epoch 4 @ 24126 updates, score 0.7598) (writing took 51.79239982389845 seconds) 2023-05-02 06:01:40 - train.py[line:332] - INFO: end of epoch 4 (average epoch stats below) 2023-05-02 06:01:40 - progress_bar.py[line:282] - INFO: epoch 004 | loss 2.446 | loss_v1 0 | loss_v2 0 | nll_loss 1.198 | ntokens 7720.95 | nsentences 119.992 | sample_size 4043.75 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.29 | wps 1880 | ups 0.24 | wpb 7720.9 | bsz 120 | num_updates 24126 | lr 1.91711e-05 | gnorm 0.947 | clip 21.5 | loss_scale 32 | train_wall 24018 | gb_free 31.5 | wall 98912 2023-05-02 06:01:40 - trainer.py[line:639] - INFO: loading train data for epoch 5 2023-05-02 06:01:40 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-02 06:01:40 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-02 06:01:42 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-02 06:01:43 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-02 06:01:44 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-02 06:01:44 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-02 06:01:45 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-02 06:01:45 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-02 06:01:46 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-02 06:01:46 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-02 06:01:46 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-02 06:01:46 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-02 06:01:47 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-02 06:01:47 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-02 06:01:47 - trainer.py[line:703] - INFO: begin training epoch 5 2023-05-02 06:01:47 - train.py[line:305] - INFO: Start iterating over samples 2023-05-02 06:02:03 - progress_bar.py[line:274] - INFO: epoch 005: 4 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7308.1, nsentences=116, sample_size=3382.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=489, ups=0.07, wpb=7308.1, bsz=116, num_updates=24130, lr=1.91689e-05, gnorm=0.999, clip=50, loss_scale=32, train_wall=38, gb_free=30.3, wall=98935 2023-05-02 06:02:42 - progress_bar.py[line:274] - INFO: epoch 005: 14 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7607.3, nsentences=120, sample_size=4009.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1928.3, ups=0.25, wpb=7607.3, bsz=120, num_updates=24140, lr=1.91637e-05, gnorm=0.927, clip=10, loss_scale=32, train_wall=39, gb_free=29.5, wall=98975 2023-05-02 06:03:22 - progress_bar.py[line:274] - INFO: epoch 005: 24 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7629.2, nsentences=120, sample_size=4178.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1915.4, ups=0.25, wpb=7629.2, bsz=120, num_updates=24150, lr=1.91584e-05, gnorm=0.932, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=99014 2023-05-02 06:04:01 - progress_bar.py[line:274] - INFO: epoch 005: 34 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7729.2, nsentences=120, sample_size=3963.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1963.6, ups=0.25, wpb=7729.2, bsz=120, num_updates=24160, lr=1.91531e-05, gnorm=0.942, clip=20, loss_scale=32, train_wall=39, gb_free=28.6, wall=99054 2023-05-02 06:04:41 - progress_bar.py[line:274] - INFO: epoch 005: 44 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7849.2, nsentences=120, sample_size=3726.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1960.9, ups=0.25, wpb=7849.2, bsz=120, num_updates=24170, lr=1.91478e-05, gnorm=0.99, clip=50, loss_scale=32, train_wall=40, gb_free=30.2, wall=99094 2023-05-02 06:05:22 - progress_bar.py[line:274] - INFO: epoch 005: 54 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7876.3, nsentences=120, sample_size=4084.4, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1927.9, ups=0.24, wpb=7876.3, bsz=120, num_updates=24180, lr=1.91425e-05, gnorm=0.935, clip=0, loss_scale=32, train_wall=41, gb_free=29, wall=99135 2023-05-02 06:06:02 - progress_bar.py[line:274] - INFO: epoch 005: 64 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7982, nsentences=120, sample_size=3665.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2017.9, ups=0.25, wpb=7982, bsz=120, num_updates=24190, lr=1.91372e-05, gnorm=0.974, clip=30, loss_scale=32, train_wall=39, gb_free=29.6, wall=99174 2023-05-02 06:06:41 - progress_bar.py[line:274] - INFO: epoch 005: 74 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7430.3, nsentences=120, sample_size=3869.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1869.7, ups=0.25, wpb=7430.3, bsz=120, num_updates=24200, lr=1.9132e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=31.1, wall=99214 2023-05-02 06:07:21 - progress_bar.py[line:274] - INFO: epoch 005: 84 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7601.7, nsentences=120, sample_size=4437.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1917.6, ups=0.25, wpb=7601.7, bsz=120, num_updates=24210, lr=1.91267e-05, gnorm=0.927, clip=20, loss_scale=32, train_wall=40, gb_free=31, wall=99254 2023-05-02 06:08:00 - progress_bar.py[line:274] - INFO: epoch 005: 94 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7579.3, nsentences=120, sample_size=3922.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1957.5, ups=0.26, wpb=7579.3, bsz=120, num_updates=24220, lr=1.91214e-05, gnorm=0.963, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=99292 2023-05-02 06:08:39 - progress_bar.py[line:274] - INFO: epoch 005: 104 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7796.6, nsentences=120, sample_size=4057.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1964.2, ups=0.25, wpb=7796.6, bsz=120, num_updates=24230, lr=1.91161e-05, gnorm=0.967, clip=30, loss_scale=32, train_wall=40, gb_free=29.7, wall=99332 2023-05-02 06:09:20 - progress_bar.py[line:274] - INFO: epoch 005: 114 / 6042 loss=2.468, loss_v1=0, loss_v2=0, nll_loss=1.232, ntokens=8192.2, nsentences=120, sample_size=3878.1, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=2035.3, ups=0.25, wpb=8192.2, bsz=120, num_updates=24240, lr=1.91108e-05, gnorm=0.975, clip=40, loss_scale=32, train_wall=40, gb_free=29, wall=99372 2023-05-02 06:10:00 - progress_bar.py[line:274] - INFO: epoch 005: 124 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7695.6, nsentences=120, sample_size=4033.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1913.8, ups=0.25, wpb=7695.6, bsz=120, num_updates=24250, lr=1.91056e-05, gnorm=0.944, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=99412 2023-05-02 06:10:39 - progress_bar.py[line:274] - INFO: epoch 005: 134 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7638.2, nsentences=120, sample_size=4008.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1941.5, ups=0.25, wpb=7638.2, bsz=120, num_updates=24260, lr=1.91003e-05, gnorm=0.955, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=99452 2023-05-02 06:11:19 - progress_bar.py[line:274] - INFO: epoch 005: 144 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7696.1, nsentences=120, sample_size=3887.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1946.7, ups=0.25, wpb=7696.1, bsz=120, num_updates=24270, lr=1.9095e-05, gnorm=0.958, clip=50, loss_scale=64, train_wall=39, gb_free=28.1, wall=99491 2023-05-02 06:11:58 - progress_bar.py[line:274] - INFO: epoch 005: 154 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7570.4, nsentences=120, sample_size=3726.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927.9, ups=0.25, wpb=7570.4, bsz=120, num_updates=24280, lr=1.90897e-05, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=99531 2023-05-02 06:12:37 - progress_bar.py[line:274] - INFO: epoch 005: 164 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7853, nsentences=120, sample_size=3963.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1993.5, ups=0.25, wpb=7853, bsz=120, num_updates=24290, lr=1.90844e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=99570 2023-05-02 06:13:17 - progress_bar.py[line:274] - INFO: epoch 005: 174 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7637.9, nsentences=120, sample_size=3992.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1914.6, ups=0.25, wpb=7637.9, bsz=120, num_updates=24300, lr=1.90791e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=99610 2023-05-02 06:13:58 - progress_bar.py[line:274] - INFO: epoch 005: 184 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7847.4, nsentences=120, sample_size=3951.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1952.4, ups=0.25, wpb=7847.4, bsz=120, num_updates=24310, lr=1.90739e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=99650 2023-05-02 06:14:37 - progress_bar.py[line:274] - INFO: epoch 005: 194 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7878.6, nsentences=120, sample_size=3810.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1996.8, ups=0.25, wpb=7878.6, bsz=120, num_updates=24320, lr=1.90686e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=39, gb_free=28.4, wall=99689 2023-05-02 06:15:16 - progress_bar.py[line:274] - INFO: epoch 005: 204 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7845.8, nsentences=120, sample_size=4000.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1993.6, ups=0.25, wpb=7845.8, bsz=120, num_updates=24330, lr=1.90633e-05, gnorm=0.937, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=99729 2023-05-02 06:15:56 - progress_bar.py[line:274] - INFO: epoch 005: 214 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7906.9, nsentences=120, sample_size=3727.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2016.9, ups=0.26, wpb=7906.9, bsz=120, num_updates=24340, lr=1.9058e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=99768 2023-05-02 06:16:35 - progress_bar.py[line:274] - INFO: epoch 005: 224 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7702.3, nsentences=120, sample_size=4007.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1961.9, ups=0.25, wpb=7702.3, bsz=120, num_updates=24350, lr=1.90527e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=39, gb_free=27, wall=99807 2023-05-02 06:17:15 - progress_bar.py[line:274] - INFO: epoch 005: 234 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7680.7, nsentences=120, sample_size=4164, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1911.5, ups=0.25, wpb=7680.7, bsz=120, num_updates=24360, lr=1.90475e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=28.4, wall=99847 2023-05-02 06:17:55 - progress_bar.py[line:274] - INFO: epoch 005: 244 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7939.8, nsentences=120, sample_size=3890.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2004.1, ups=0.25, wpb=7939.8, bsz=120, num_updates=24370, lr=1.90422e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=27.2, wall=99887 2023-05-02 06:18:34 - progress_bar.py[line:274] - INFO: epoch 005: 254 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7742.7, nsentences=120, sample_size=3680.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1953.8, ups=0.25, wpb=7742.7, bsz=120, num_updates=24380, lr=1.90369e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=99927 2023-05-02 06:19:14 - progress_bar.py[line:274] - INFO: epoch 005: 264 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7668.8, nsentences=120, sample_size=4261.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1919.2, ups=0.25, wpb=7668.8, bsz=120, num_updates=24390, lr=1.90316e-05, gnorm=0.955, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=99967 2023-05-02 06:19:55 - progress_bar.py[line:274] - INFO: epoch 005: 274 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7709.3, nsentences=120, sample_size=4047.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1913.3, ups=0.25, wpb=7709.3, bsz=120, num_updates=24400, lr=1.90263e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=100007 2023-05-02 06:20:34 - progress_bar.py[line:274] - INFO: epoch 005: 284 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7890.3, nsentences=120, sample_size=3896.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1983.1, ups=0.25, wpb=7890.3, bsz=120, num_updates=24410, lr=1.9021e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=26, wall=100047 2023-05-02 06:21:14 - progress_bar.py[line:274] - INFO: epoch 005: 294 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7612.1, nsentences=120, sample_size=3984.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1914, ups=0.25, wpb=7612.1, bsz=120, num_updates=24420, lr=1.90158e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=100087 2023-05-02 06:21:54 - progress_bar.py[line:274] - INFO: epoch 005: 304 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7685.6, nsentences=120, sample_size=3717.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1928.5, ups=0.25, wpb=7685.6, bsz=120, num_updates=24430, lr=1.90105e-05, gnorm=1.016, clip=70, loss_scale=64, train_wall=40, gb_free=30.5, wall=100126 2023-05-02 06:22:34 - progress_bar.py[line:274] - INFO: epoch 005: 314 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7717.5, nsentences=120, sample_size=3830.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1906.1, ups=0.25, wpb=7717.5, bsz=120, num_updates=24440, lr=1.90052e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=100167 2023-05-02 06:23:14 - progress_bar.py[line:274] - INFO: epoch 005: 324 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7624.1, nsentences=120, sample_size=4194.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1922.5, ups=0.25, wpb=7624.1, bsz=120, num_updates=24450, lr=1.89999e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=100207 2023-05-02 06:23:54 - progress_bar.py[line:274] - INFO: epoch 005: 334 / 6042 loss=2.458, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7603.7, nsentences=120, sample_size=4166.6, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1924.9, ups=0.25, wpb=7603.7, bsz=120, num_updates=24460, lr=1.89946e-05, gnorm=0.959, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=100246 2023-05-02 06:24:32 - progress_bar.py[line:274] - INFO: epoch 005: 344 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7603.8, nsentences=120, sample_size=4143.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1959.4, ups=0.26, wpb=7603.8, bsz=120, num_updates=24470, lr=1.89893e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=100285 2023-05-02 06:25:13 - progress_bar.py[line:274] - INFO: epoch 005: 354 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7767.8, nsentences=120, sample_size=4247.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1936.2, ups=0.25, wpb=7767.8, bsz=120, num_updates=24480, lr=1.89841e-05, gnorm=0.923, clip=0, loss_scale=64, train_wall=40, gb_free=28.8, wall=100325 2023-05-02 06:25:52 - progress_bar.py[line:274] - INFO: epoch 005: 364 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7676.6, nsentences=120, sample_size=3775, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1949.2, ups=0.25, wpb=7676.6, bsz=120, num_updates=24490, lr=1.89788e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=100364 2023-05-02 06:26:31 - progress_bar.py[line:274] - INFO: epoch 005: 374 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7506, nsentences=120, sample_size=4167.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1896.8, ups=0.25, wpb=7506, bsz=120, num_updates=24500, lr=1.89735e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=100404 2023-05-02 06:27:11 - progress_bar.py[line:274] - INFO: epoch 005: 384 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7287.3, nsentences=120, sample_size=4073.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1860.5, ups=0.26, wpb=7287.3, bsz=120, num_updates=24510, lr=1.89682e-05, gnorm=0.954, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=100443 2023-05-02 06:27:51 - progress_bar.py[line:274] - INFO: epoch 005: 394 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.208, ntokens=7646.9, nsentences=120, sample_size=4028.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1916.3, ups=0.25, wpb=7646.9, bsz=120, num_updates=24520, lr=1.89629e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=100483 2023-05-02 06:28:31 - progress_bar.py[line:274] - INFO: epoch 005: 404 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7833.4, nsentences=120, sample_size=4029, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1947.2, ups=0.25, wpb=7833.4, bsz=120, num_updates=24530, lr=1.89577e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=100523 2023-05-02 06:29:11 - progress_bar.py[line:274] - INFO: epoch 005: 414 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7666.8, nsentences=120, sample_size=4203.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1911.8, ups=0.25, wpb=7666.8, bsz=120, num_updates=24540, lr=1.89524e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=100563 2023-05-02 06:29:51 - progress_bar.py[line:274] - INFO: epoch 005: 424 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7720.6, nsentences=120, sample_size=4053.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1935.3, ups=0.25, wpb=7720.6, bsz=120, num_updates=24550, lr=1.89471e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=25.8, wall=100603 2023-05-02 06:30:30 - progress_bar.py[line:274] - INFO: epoch 005: 434 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7800.7, nsentences=120, sample_size=3957.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1984.9, ups=0.25, wpb=7800.7, bsz=120, num_updates=24560, lr=1.89418e-05, gnorm=0.979, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=100643 2023-05-02 06:31:10 - progress_bar.py[line:274] - INFO: epoch 005: 444 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=8001.5, nsentences=120, sample_size=4159.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1999.9, ups=0.25, wpb=8001.5, bsz=120, num_updates=24570, lr=1.89365e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=100683 2023-05-02 06:31:57 - progress_bar.py[line:274] - INFO: epoch 005: 454 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7880.1, nsentences=120, sample_size=3726.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1669, ups=0.21, wpb=7880.1, bsz=120, num_updates=24580, lr=1.89312e-05, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=100730 2023-05-02 06:32:36 - progress_bar.py[line:274] - INFO: epoch 005: 464 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7379.7, nsentences=120, sample_size=4355, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1905.1, ups=0.26, wpb=7379.7, bsz=120, num_updates=24590, lr=1.8926e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=30.7, wall=100769 2023-05-02 06:33:16 - progress_bar.py[line:274] - INFO: epoch 005: 474 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7979.3, nsentences=120, sample_size=4045.1, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1997.1, ups=0.25, wpb=7979.3, bsz=120, num_updates=24600, lr=1.89207e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=100808 2023-05-02 06:33:55 - progress_bar.py[line:274] - INFO: epoch 005: 484 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7760.8, nsentences=120, sample_size=3963.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2000.1, ups=0.26, wpb=7760.8, bsz=120, num_updates=24610, lr=1.89154e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=100847 2023-05-02 06:34:35 - progress_bar.py[line:274] - INFO: epoch 005: 494 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7803.4, nsentences=120, sample_size=3971.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1923.1, ups=0.25, wpb=7803.4, bsz=120, num_updates=24620, lr=1.89101e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=41, gb_free=26.8, wall=100888 2023-05-02 06:35:16 - progress_bar.py[line:274] - INFO: epoch 005: 504 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7740.1, nsentences=120, sample_size=4022.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1927.3, ups=0.25, wpb=7740.1, bsz=120, num_updates=24630, lr=1.89048e-05, gnorm=0.948, clip=0, loss_scale=64, train_wall=40, gb_free=28.6, wall=100928 2023-05-02 06:35:56 - progress_bar.py[line:274] - INFO: epoch 005: 514 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7834.8, nsentences=120, sample_size=4055.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1946, ups=0.25, wpb=7834.8, bsz=120, num_updates=24640, lr=1.88996e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=100968 2023-05-02 06:36:35 - progress_bar.py[line:274] - INFO: epoch 005: 524 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7654.1, nsentences=120, sample_size=4061.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1936.9, ups=0.25, wpb=7654.1, bsz=120, num_updates=24650, lr=1.88943e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=101008 2023-05-02 06:37:15 - progress_bar.py[line:274] - INFO: epoch 005: 534 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7680.7, nsentences=120, sample_size=4313.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1935.6, ups=0.25, wpb=7680.7, bsz=120, num_updates=24660, lr=1.8889e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=101047 2023-05-02 06:37:55 - progress_bar.py[line:274] - INFO: epoch 005: 544 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7860, nsentences=120, sample_size=4119.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1950.9, ups=0.25, wpb=7860, bsz=120, num_updates=24670, lr=1.88837e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=101088 2023-05-02 06:38:35 - progress_bar.py[line:274] - INFO: epoch 005: 554 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7704.1, nsentences=120, sample_size=3819.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1929.6, ups=0.25, wpb=7704.1, bsz=120, num_updates=24680, lr=1.88784e-05, gnorm=0.965, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=101128 2023-05-02 06:39:15 - progress_bar.py[line:274] - INFO: epoch 005: 564 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7795.8, nsentences=120, sample_size=3910.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1956.4, ups=0.25, wpb=7795.8, bsz=120, num_updates=24690, lr=1.88731e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=28, wall=101168 2023-05-02 06:39:55 - progress_bar.py[line:274] - INFO: epoch 005: 574 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7864, nsentences=120, sample_size=3876.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1956.2, ups=0.25, wpb=7864, bsz=120, num_updates=24700, lr=1.88679e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=101208 2023-05-02 06:40:36 - progress_bar.py[line:274] - INFO: epoch 005: 584 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7782.2, nsentences=120, sample_size=4044.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1932, ups=0.25, wpb=7782.2, bsz=120, num_updates=24710, lr=1.88626e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=101248 2023-05-02 06:41:15 - progress_bar.py[line:274] - INFO: epoch 005: 594 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7819.3, nsentences=120, sample_size=3752.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1965.3, ups=0.25, wpb=7819.3, bsz=120, num_updates=24720, lr=1.88573e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=101288 2023-05-02 06:41:56 - progress_bar.py[line:274] - INFO: epoch 005: 604 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=8058.4, nsentences=120, sample_size=3896.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1997.7, ups=0.25, wpb=8058.4, bsz=120, num_updates=24730, lr=1.8852e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=26.1, wall=101328 2023-05-02 06:42:36 - progress_bar.py[line:274] - INFO: epoch 005: 614 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7865.1, nsentences=120, sample_size=4017.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.7, ups=0.25, wpb=7865.1, bsz=120, num_updates=24740, lr=1.88467e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=101368 2023-05-02 06:43:16 - progress_bar.py[line:274] - INFO: epoch 005: 624 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7777.8, nsentences=120, sample_size=3949, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1924.5, ups=0.25, wpb=7777.8, bsz=120, num_updates=24750, lr=1.88414e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=101409 2023-05-02 06:43:56 - progress_bar.py[line:274] - INFO: epoch 005: 634 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7630.9, nsentences=120, sample_size=4111.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1894.6, ups=0.25, wpb=7630.9, bsz=120, num_updates=24760, lr=1.88362e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=101449 2023-05-02 06:44:36 - progress_bar.py[line:274] - INFO: epoch 005: 644 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7548, nsentences=120, sample_size=4172.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1901.7, ups=0.25, wpb=7548, bsz=120, num_updates=24770, lr=1.88309e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=101488 2023-05-02 06:45:16 - progress_bar.py[line:274] - INFO: epoch 005: 654 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7790.7, nsentences=120, sample_size=3871.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1962.5, ups=0.25, wpb=7790.7, bsz=120, num_updates=24780, lr=1.88256e-05, gnorm=0.989, clip=40, loss_scale=128, train_wall=40, gb_free=30.3, wall=101528 2023-05-02 06:45:24 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 06:46:00 - progress_bar.py[line:274] - INFO: epoch 005: 665 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7814, nsentences=120, sample_size=4079.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1767.2, ups=0.23, wpb=7814, bsz=120, num_updates=24790, lr=1.88203e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=44, gb_free=29.7, wall=101572 2023-05-02 06:46:39 - progress_bar.py[line:274] - INFO: epoch 005: 675 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7452.3, nsentences=120, sample_size=3913.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1901.2, ups=0.26, wpb=7452.3, bsz=120, num_updates=24800, lr=1.8815e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=101612 2023-05-02 06:47:18 - progress_bar.py[line:274] - INFO: epoch 005: 685 / 6042 loss=2.49, loss_v1=0, loss_v2=0, nll_loss=1.247, ntokens=7500.1, nsentences=120, sample_size=3921.2, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1905.3, ups=0.25, wpb=7500.1, bsz=120, num_updates=24810, lr=1.88098e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=101651 2023-05-02 06:47:58 - progress_bar.py[line:274] - INFO: epoch 005: 695 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7412, nsentences=120, sample_size=4010.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1863, ups=0.25, wpb=7412, bsz=120, num_updates=24820, lr=1.88045e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=101691 2023-05-02 06:48:48 - progress_bar.py[line:274] - INFO: epoch 005: 705 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7787.1, nsentences=120, sample_size=4268.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1574.1, ups=0.2, wpb=7787.1, bsz=120, num_updates=24830, lr=1.87992e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=101740 2023-05-02 06:49:27 - progress_bar.py[line:274] - INFO: epoch 005: 715 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7822.9, nsentences=120, sample_size=4131.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1974.9, ups=0.25, wpb=7822.9, bsz=120, num_updates=24840, lr=1.87939e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=101780 2023-05-02 06:50:07 - progress_bar.py[line:274] - INFO: epoch 005: 725 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.229, ntokens=7756, nsentences=120, sample_size=3757.7, sample_size_v1=0, sample_size_v2=0, ppl=2.34, wps=1973.4, ups=0.25, wpb=7756, bsz=120, num_updates=24850, lr=1.87886e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=101819 2023-05-02 06:50:47 - progress_bar.py[line:274] - INFO: epoch 005: 735 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7788.4, nsentences=120, sample_size=3935, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1955.1, ups=0.25, wpb=7788.4, bsz=120, num_updates=24860, lr=1.87833e-05, gnorm=0.961, clip=10, loss_scale=64, train_wall=40, gb_free=23.7, wall=101859 2023-05-02 06:51:26 - progress_bar.py[line:274] - INFO: epoch 005: 745 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7317.3, nsentences=120, sample_size=4002.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1846.5, ups=0.25, wpb=7317.3, bsz=120, num_updates=24870, lr=1.87781e-05, gnorm=0.954, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=101899 2023-05-02 06:52:07 - progress_bar.py[line:274] - INFO: epoch 005: 755 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7869.4, nsentences=120, sample_size=3884.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1924.9, ups=0.24, wpb=7869.4, bsz=120, num_updates=24880, lr=1.87728e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=41, gb_free=29.8, wall=101939 2023-05-02 06:52:47 - progress_bar.py[line:274] - INFO: epoch 005: 765 / 6042 loss=2.482, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7884.3, nsentences=120, sample_size=4028.8, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1958, ups=0.25, wpb=7884.3, bsz=120, num_updates=24890, lr=1.87675e-05, gnorm=1.001, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=101980 2023-05-02 06:53:27 - progress_bar.py[line:274] - INFO: epoch 005: 775 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7431.8, nsentences=120, sample_size=4114.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1880.8, ups=0.25, wpb=7431.8, bsz=120, num_updates=24900, lr=1.87622e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=27.1, wall=102019 2023-05-02 06:54:06 - progress_bar.py[line:274] - INFO: epoch 005: 785 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7640.4, nsentences=120, sample_size=4169.4, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1939.1, ups=0.25, wpb=7640.4, bsz=120, num_updates=24910, lr=1.87569e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=102059 2023-05-02 06:54:46 - progress_bar.py[line:274] - INFO: epoch 005: 795 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7648, nsentences=120, sample_size=4023.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1898.4, ups=0.25, wpb=7648, bsz=120, num_updates=24920, lr=1.87517e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=102099 2023-05-02 06:55:26 - progress_bar.py[line:274] - INFO: epoch 005: 805 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7588.9, nsentences=120, sample_size=3975.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1903.6, ups=0.25, wpb=7588.9, bsz=120, num_updates=24930, lr=1.87464e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=102139 2023-05-02 06:56:07 - progress_bar.py[line:274] - INFO: epoch 005: 815 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7883.6, nsentences=120, sample_size=3934.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1963.4, ups=0.25, wpb=7883.6, bsz=120, num_updates=24940, lr=1.87411e-05, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=102179 2023-05-02 06:56:46 - progress_bar.py[line:274] - INFO: epoch 005: 825 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7616.4, nsentences=120, sample_size=4102.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1918.8, ups=0.25, wpb=7616.4, bsz=120, num_updates=24950, lr=1.87358e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=102219 2023-05-02 06:57:26 - progress_bar.py[line:274] - INFO: epoch 005: 835 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7754.2, nsentences=120, sample_size=3885.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1952.8, ups=0.25, wpb=7754.2, bsz=120, num_updates=24960, lr=1.87305e-05, gnorm=0.992, clip=30, loss_scale=64, train_wall=40, gb_free=28.7, wall=102258 2023-05-02 06:58:05 - progress_bar.py[line:274] - INFO: epoch 005: 845 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7504.6, nsentences=120, sample_size=4060, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1901.7, ups=0.25, wpb=7504.6, bsz=120, num_updates=24970, lr=1.87252e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=102298 2023-05-02 06:58:46 - progress_bar.py[line:274] - INFO: epoch 005: 855 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7953, nsentences=120, sample_size=4032.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1981, ups=0.25, wpb=7953, bsz=120, num_updates=24980, lr=1.872e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=40, gb_free=30.9, wall=102338 2023-05-02 06:59:26 - progress_bar.py[line:274] - INFO: epoch 005: 865 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7841.9, nsentences=120, sample_size=4082, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1959.3, ups=0.25, wpb=7841.9, bsz=120, num_updates=24990, lr=1.87147e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=102378 2023-05-02 07:00:05 - progress_bar.py[line:274] - INFO: epoch 005: 875 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7648.8, nsentences=120, sample_size=3794.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1931.8, ups=0.25, wpb=7648.8, bsz=120, num_updates=25000, lr=1.87094e-05, gnorm=0.992, clip=30, loss_scale=64, train_wall=40, gb_free=27.7, wall=102418 2023-05-02 07:00:05 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 07:00:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 07:00:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 07:00:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 07:00:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 07:00:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:51 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 07:00:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 07:00:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 07:00:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 07:00:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 07:00:56 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.217 | loss_v1 0 | loss_v2 0 | nll_loss 2.051 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.14 | score 0.7485 | wps 3315.4 | wpb 3202.1 | bsz 39.4 | num_updates 25000 | best_score 0.7598 2023-05-02 07:00:57 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 25000 updates 2023-05-02 07:00:57 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_25000.pt 2023-05-02 07:01:22 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_25000.pt 2023-05-02 07:01:36 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_25000.pt (epoch 5 @ 25000 updates, score 0.7485) (writing took 39.237522694980726 seconds) 2023-05-02 07:02:15 - progress_bar.py[line:274] - INFO: epoch 005: 885 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7643.1, nsentences=120, sample_size=3780.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=588.2, ups=0.08, wpb=7643.1, bsz=120, num_updates=25010, lr=1.87041e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=29.5, wall=102548 2023-05-02 07:02:55 - progress_bar.py[line:274] - INFO: epoch 005: 895 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7579.5, nsentences=120, sample_size=4460.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1883.9, ups=0.25, wpb=7579.5, bsz=120, num_updates=25020, lr=1.86988e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=102588 2023-05-02 07:03:36 - progress_bar.py[line:274] - INFO: epoch 005: 905 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7903.9, nsentences=120, sample_size=3733.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1955.5, ups=0.25, wpb=7903.9, bsz=120, num_updates=25030, lr=1.86935e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=102628 2023-05-02 07:04:16 - progress_bar.py[line:274] - INFO: epoch 005: 915 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7615.8, nsentences=120, sample_size=4213.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1910.4, ups=0.25, wpb=7615.8, bsz=120, num_updates=25040, lr=1.86883e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=102668 2023-05-02 07:04:56 - progress_bar.py[line:274] - INFO: epoch 005: 925 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7590.5, nsentences=120, sample_size=4423.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1882.5, ups=0.25, wpb=7590.5, bsz=120, num_updates=25050, lr=1.8683e-05, gnorm=0.94, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=102708 2023-05-02 07:05:36 - progress_bar.py[line:274] - INFO: epoch 005: 935 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7949.6, nsentences=120, sample_size=4080.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1995.6, ups=0.25, wpb=7949.6, bsz=120, num_updates=25060, lr=1.86777e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=102748 2023-05-02 07:06:16 - progress_bar.py[line:274] - INFO: epoch 005: 945 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7712.1, nsentences=120, sample_size=4357, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1922.9, ups=0.25, wpb=7712.1, bsz=120, num_updates=25070, lr=1.86724e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=102788 2023-05-02 07:06:56 - progress_bar.py[line:274] - INFO: epoch 005: 955 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7742.8, nsentences=120, sample_size=4164, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1921.8, ups=0.25, wpb=7742.8, bsz=120, num_updates=25080, lr=1.86671e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=102829 2023-05-02 07:07:36 - progress_bar.py[line:274] - INFO: epoch 005: 965 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7862.4, nsentences=120, sample_size=3993.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1962.2, ups=0.25, wpb=7862.4, bsz=120, num_updates=25090, lr=1.86619e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=102869 2023-05-02 07:08:15 - progress_bar.py[line:274] - INFO: epoch 005: 975 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7918.9, nsentences=120, sample_size=4181.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2017.6, ups=0.25, wpb=7918.9, bsz=120, num_updates=25100, lr=1.86566e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=102908 2023-05-02 07:08:55 - progress_bar.py[line:274] - INFO: epoch 005: 985 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7763.2, nsentences=120, sample_size=4027.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1970.1, ups=0.25, wpb=7763.2, bsz=120, num_updates=25110, lr=1.86513e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=39, gb_free=23.6, wall=102947 2023-05-02 07:09:34 - progress_bar.py[line:274] - INFO: epoch 005: 995 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7817.2, nsentences=120, sample_size=4229.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1982.3, ups=0.25, wpb=7817.2, bsz=120, num_updates=25120, lr=1.8646e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=102987 2023-05-02 07:10:15 - progress_bar.py[line:274] - INFO: epoch 005: 1005 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7698.1, nsentences=120, sample_size=4168.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1905.3, ups=0.25, wpb=7698.1, bsz=120, num_updates=25130, lr=1.86407e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=103027 2023-05-02 07:10:55 - progress_bar.py[line:274] - INFO: epoch 005: 1015 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7607.2, nsentences=120, sample_size=4063.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1890.9, ups=0.25, wpb=7607.2, bsz=120, num_updates=25140, lr=1.86354e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=103067 2023-05-02 07:11:35 - progress_bar.py[line:274] - INFO: epoch 005: 1025 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7843.8, nsentences=120, sample_size=4251.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1952.6, ups=0.25, wpb=7843.8, bsz=120, num_updates=25150, lr=1.86302e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=103108 2023-05-02 07:12:15 - progress_bar.py[line:274] - INFO: epoch 005: 1035 / 6042 loss=2.46, loss_v1=0, loss_v2=0, nll_loss=1.211, ntokens=7882.9, nsentences=120, sample_size=4197.9, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1958.1, ups=0.25, wpb=7882.9, bsz=120, num_updates=25160, lr=1.86249e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=103148 2023-05-02 07:12:54 - progress_bar.py[line:274] - INFO: epoch 005: 1045 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7630.5, nsentences=120, sample_size=3724.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1953.4, ups=0.26, wpb=7630.5, bsz=120, num_updates=25170, lr=1.86196e-05, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=103187 2023-05-02 07:13:34 - progress_bar.py[line:274] - INFO: epoch 005: 1055 / 6042 loss=2.462, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=7761, nsentences=120, sample_size=4042.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1942.8, ups=0.25, wpb=7761, bsz=120, num_updates=25180, lr=1.86143e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=103227 2023-05-02 07:14:14 - progress_bar.py[line:274] - INFO: epoch 005: 1065 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7786.5, nsentences=120, sample_size=4082.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1959.3, ups=0.25, wpb=7786.5, bsz=120, num_updates=25190, lr=1.8609e-05, gnorm=0.944, clip=0, loss_scale=64, train_wall=40, gb_free=31.1, wall=103267 2023-05-02 07:14:54 - progress_bar.py[line:274] - INFO: epoch 005: 1075 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7889.4, nsentences=120, sample_size=4031.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1973.9, ups=0.25, wpb=7889.4, bsz=120, num_updates=25200, lr=1.86038e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=103307 2023-05-02 07:15:35 - progress_bar.py[line:274] - INFO: epoch 005: 1085 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7896.1, nsentences=120, sample_size=4111.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1955.4, ups=0.25, wpb=7896.1, bsz=120, num_updates=25210, lr=1.85985e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=103347 2023-05-02 07:16:15 - progress_bar.py[line:274] - INFO: epoch 005: 1095 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7976.1, nsentences=120, sample_size=3871, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1982.3, ups=0.25, wpb=7976.1, bsz=120, num_updates=25220, lr=1.85932e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=103387 2023-05-02 07:16:55 - progress_bar.py[line:274] - INFO: epoch 005: 1105 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7784.6, nsentences=120, sample_size=3931.9, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1930.8, ups=0.25, wpb=7784.6, bsz=120, num_updates=25230, lr=1.85879e-05, gnorm=0.978, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=103428 2023-05-02 07:17:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 07:17:40 - progress_bar.py[line:274] - INFO: epoch 005: 1116 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=8097.7, nsentences=120, sample_size=3848.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1810.9, ups=0.22, wpb=8097.7, bsz=120, num_updates=25240, lr=1.85826e-05, gnorm=0.965, clip=30, loss_scale=32, train_wall=45, gb_free=29.7, wall=103472 2023-05-02 07:18:20 - progress_bar.py[line:274] - INFO: epoch 005: 1126 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7498, nsentences=120, sample_size=4160.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1849.9, ups=0.25, wpb=7498, bsz=120, num_updates=25250, lr=1.85773e-05, gnorm=0.947, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=103513 2023-05-02 07:19:01 - progress_bar.py[line:274] - INFO: epoch 005: 1136 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7794.4, nsentences=120, sample_size=4181.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1898.3, ups=0.24, wpb=7794.4, bsz=120, num_updates=25260, lr=1.85721e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=41, gb_free=30.2, wall=103554 2023-05-02 07:19:41 - progress_bar.py[line:274] - INFO: epoch 005: 1146 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7883.4, nsentences=120, sample_size=4130.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1995.6, ups=0.25, wpb=7883.4, bsz=120, num_updates=25270, lr=1.85668e-05, gnorm=0.963, clip=30, loss_scale=32, train_wall=39, gb_free=30.2, wall=103593 2023-05-02 07:20:20 - progress_bar.py[line:274] - INFO: epoch 005: 1156 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7582.2, nsentences=120, sample_size=4057.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1928, ups=0.25, wpb=7582.2, bsz=120, num_updates=25280, lr=1.85615e-05, gnorm=0.971, clip=40, loss_scale=32, train_wall=39, gb_free=30, wall=103633 2023-05-02 07:21:00 - progress_bar.py[line:274] - INFO: epoch 005: 1166 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7701.1, nsentences=120, sample_size=4170.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1947.8, ups=0.25, wpb=7701.1, bsz=120, num_updates=25290, lr=1.85562e-05, gnorm=0.934, clip=10, loss_scale=32, train_wall=39, gb_free=28.7, wall=103672 2023-05-02 07:21:40 - progress_bar.py[line:274] - INFO: epoch 005: 1176 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7890.2, nsentences=120, sample_size=3990.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1968.6, ups=0.25, wpb=7890.2, bsz=120, num_updates=25300, lr=1.85509e-05, gnorm=0.94, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=103712 2023-05-02 07:22:20 - progress_bar.py[line:274] - INFO: epoch 005: 1186 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7714.6, nsentences=120, sample_size=3938.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1918.6, ups=0.25, wpb=7714.6, bsz=120, num_updates=25310, lr=1.85456e-05, gnorm=0.974, clip=30, loss_scale=32, train_wall=40, gb_free=29.1, wall=103753 2023-05-02 07:23:00 - progress_bar.py[line:274] - INFO: epoch 005: 1196 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7440.5, nsentences=120, sample_size=4355.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1876.6, ups=0.25, wpb=7440.5, bsz=120, num_updates=25320, lr=1.85404e-05, gnorm=0.904, clip=0, loss_scale=32, train_wall=40, gb_free=28.4, wall=103792 2023-05-02 07:23:40 - progress_bar.py[line:274] - INFO: epoch 005: 1206 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7673.7, nsentences=120, sample_size=4065.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1902, ups=0.25, wpb=7673.7, bsz=120, num_updates=25330, lr=1.85351e-05, gnorm=0.989, clip=40, loss_scale=32, train_wall=40, gb_free=30.2, wall=103833 2023-05-02 07:24:20 - progress_bar.py[line:274] - INFO: epoch 005: 1216 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7812.4, nsentences=120, sample_size=3899.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1952.6, ups=0.25, wpb=7812.4, bsz=120, num_updates=25340, lr=1.85298e-05, gnorm=0.998, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=103873 2023-05-02 07:24:59 - progress_bar.py[line:274] - INFO: epoch 005: 1226 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7757, nsentences=120, sample_size=4145.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1987.8, ups=0.26, wpb=7757, bsz=120, num_updates=25350, lr=1.85245e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=103912 2023-05-02 07:25:39 - progress_bar.py[line:274] - INFO: epoch 005: 1236 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7770.2, nsentences=120, sample_size=3787.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1942, ups=0.25, wpb=7770.2, bsz=120, num_updates=25360, lr=1.85192e-05, gnorm=0.976, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=103952 2023-05-02 07:26:19 - progress_bar.py[line:274] - INFO: epoch 005: 1246 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7713.8, nsentences=120, sample_size=4095.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1947.8, ups=0.25, wpb=7713.8, bsz=120, num_updates=25370, lr=1.8514e-05, gnorm=0.969, clip=50, loss_scale=32, train_wall=40, gb_free=30.1, wall=103991 2023-05-02 07:26:59 - progress_bar.py[line:274] - INFO: epoch 005: 1256 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7695.3, nsentences=120, sample_size=3910.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1900.7, ups=0.25, wpb=7695.3, bsz=120, num_updates=25380, lr=1.85087e-05, gnorm=0.985, clip=40, loss_scale=32, train_wall=40, gb_free=30.4, wall=104032 2023-05-02 07:27:39 - progress_bar.py[line:274] - INFO: epoch 005: 1266 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7796, nsentences=120, sample_size=3976.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1945.4, ups=0.25, wpb=7796, bsz=120, num_updates=25390, lr=1.85034e-05, gnorm=0.96, clip=20, loss_scale=32, train_wall=40, gb_free=30.6, wall=104072 2023-05-02 07:28:19 - progress_bar.py[line:274] - INFO: epoch 005: 1276 / 6042 loss=2.461, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=8067.4, nsentences=120, sample_size=4210.3, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2017.6, ups=0.25, wpb=8067.4, bsz=120, num_updates=25400, lr=1.84981e-05, gnorm=0.917, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=104112 2023-05-02 07:28:59 - progress_bar.py[line:274] - INFO: epoch 005: 1286 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7399.1, nsentences=120, sample_size=4083.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1875.3, ups=0.25, wpb=7399.1, bsz=120, num_updates=25410, lr=1.84928e-05, gnorm=0.96, clip=30, loss_scale=32, train_wall=39, gb_free=30.7, wall=104151 2023-05-02 07:29:39 - progress_bar.py[line:274] - INFO: epoch 005: 1296 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7764.6, nsentences=120, sample_size=3839.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1924, ups=0.25, wpb=7764.6, bsz=120, num_updates=25420, lr=1.84875e-05, gnorm=0.958, clip=40, loss_scale=32, train_wall=40, gb_free=31.5, wall=104192 2023-05-02 07:30:20 - progress_bar.py[line:274] - INFO: epoch 005: 1306 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7876.7, nsentences=120, sample_size=3883.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1939.5, ups=0.25, wpb=7876.7, bsz=120, num_updates=25430, lr=1.84823e-05, gnorm=0.955, clip=10, loss_scale=32, train_wall=41, gb_free=30, wall=104232 2023-05-02 07:30:59 - progress_bar.py[line:274] - INFO: epoch 005: 1316 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7494.3, nsentences=120, sample_size=4020.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1903.9, ups=0.25, wpb=7494.3, bsz=120, num_updates=25440, lr=1.8477e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=39, gb_free=29, wall=104272 2023-05-02 07:31:39 - progress_bar.py[line:274] - INFO: epoch 005: 1326 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7476.6, nsentences=120, sample_size=4129.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1853.8, ups=0.25, wpb=7476.6, bsz=120, num_updates=25450, lr=1.84717e-05, gnorm=0.933, clip=10, loss_scale=32, train_wall=40, gb_free=31.1, wall=104312 2023-05-02 07:32:20 - progress_bar.py[line:274] - INFO: epoch 005: 1336 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=8347.3, nsentences=120, sample_size=4196.5, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=2057.6, ups=0.25, wpb=8347.3, bsz=120, num_updates=25460, lr=1.84664e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=104352 2023-05-02 07:33:00 - progress_bar.py[line:274] - INFO: epoch 005: 1346 / 6042 loss=2.463, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7991.9, nsentences=120, sample_size=3826.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2009.6, ups=0.25, wpb=7991.9, bsz=120, num_updates=25470, lr=1.84611e-05, gnorm=0.962, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=104392 2023-05-02 07:33:39 - progress_bar.py[line:274] - INFO: epoch 005: 1356 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7777.8, nsentences=120, sample_size=3972.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1960.6, ups=0.25, wpb=7777.8, bsz=120, num_updates=25480, lr=1.84558e-05, gnorm=0.945, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=104432 2023-05-02 07:34:19 - progress_bar.py[line:274] - INFO: epoch 005: 1366 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7773.8, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1955.6, ups=0.25, wpb=7773.8, bsz=120, num_updates=25490, lr=1.84506e-05, gnorm=0.949, clip=10, loss_scale=32, train_wall=40, gb_free=28.4, wall=104472 2023-05-02 07:34:59 - progress_bar.py[line:274] - INFO: epoch 005: 1376 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=8056.2, nsentences=120, sample_size=3704.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2011.1, ups=0.25, wpb=8056.2, bsz=120, num_updates=25500, lr=1.84453e-05, gnorm=0.98, clip=40, loss_scale=32, train_wall=40, gb_free=29.7, wall=104512 2023-05-02 07:35:39 - progress_bar.py[line:274] - INFO: epoch 005: 1386 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7879.8, nsentences=120, sample_size=4186.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1989.2, ups=0.25, wpb=7879.8, bsz=120, num_updates=25510, lr=1.844e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=104551 2023-05-02 07:36:19 - progress_bar.py[line:274] - INFO: epoch 005: 1396 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7752.8, nsentences=120, sample_size=3761.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1946.8, ups=0.25, wpb=7752.8, bsz=120, num_updates=25520, lr=1.84347e-05, gnorm=0.978, clip=30, loss_scale=32, train_wall=40, gb_free=30.1, wall=104591 2023-05-02 07:36:59 - progress_bar.py[line:274] - INFO: epoch 005: 1406 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7578.1, nsentences=120, sample_size=4274.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1888.8, ups=0.25, wpb=7578.1, bsz=120, num_updates=25530, lr=1.84294e-05, gnorm=0.911, clip=0, loss_scale=32, train_wall=40, gb_free=30.1, wall=104631 2023-05-02 07:37:38 - progress_bar.py[line:274] - INFO: epoch 005: 1416 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7680.6, nsentences=120, sample_size=4219.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1970.3, ups=0.26, wpb=7680.6, bsz=120, num_updates=25540, lr=1.84242e-05, gnorm=0.969, clip=30, loss_scale=32, train_wall=39, gb_free=31.3, wall=104670 2023-05-02 07:38:18 - progress_bar.py[line:274] - INFO: epoch 005: 1426 / 6042 loss=2.477, loss_v1=0, loss_v2=0, nll_loss=1.236, ntokens=7831.8, nsentences=120, sample_size=4017.4, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=1959.3, ups=0.25, wpb=7831.8, bsz=120, num_updates=25550, lr=1.84189e-05, gnorm=0.973, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=104710 2023-05-02 07:38:58 - progress_bar.py[line:274] - INFO: epoch 005: 1436 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=8288.7, nsentences=120, sample_size=4067.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2081.8, ups=0.25, wpb=8288.7, bsz=120, num_updates=25560, lr=1.84136e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=104750 2023-05-02 07:39:37 - progress_bar.py[line:274] - INFO: epoch 005: 1446 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7543.7, nsentences=120, sample_size=4191.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.4, ups=0.25, wpb=7543.7, bsz=120, num_updates=25570, lr=1.84083e-05, gnorm=0.926, clip=10, loss_scale=32, train_wall=40, gb_free=31.4, wall=104790 2023-05-02 07:40:17 - progress_bar.py[line:274] - INFO: epoch 005: 1456 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7326.2, nsentences=120, sample_size=4227.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1816.4, ups=0.25, wpb=7326.2, bsz=120, num_updates=25580, lr=1.8403e-05, gnorm=0.908, clip=0, loss_scale=32, train_wall=40, gb_free=29.3, wall=104830 2023-05-02 07:40:57 - progress_bar.py[line:274] - INFO: epoch 005: 1466 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7641.3, nsentences=120, sample_size=4135.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1932.4, ups=0.25, wpb=7641.3, bsz=120, num_updates=25590, lr=1.83977e-05, gnorm=0.934, clip=20, loss_scale=32, train_wall=39, gb_free=30.3, wall=104870 2023-05-02 07:41:37 - progress_bar.py[line:274] - INFO: epoch 005: 1476 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=8017.7, nsentences=120, sample_size=3930.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2009.8, ups=0.25, wpb=8017.7, bsz=120, num_updates=25600, lr=1.83925e-05, gnorm=0.939, clip=0, loss_scale=32, train_wall=40, gb_free=30.7, wall=104909 2023-05-02 07:42:17 - progress_bar.py[line:274] - INFO: epoch 005: 1486 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7700.3, nsentences=120, sample_size=4234.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1931.1, ups=0.25, wpb=7700.3, bsz=120, num_updates=25610, lr=1.83872e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=104949 2023-05-02 07:42:57 - progress_bar.py[line:274] - INFO: epoch 005: 1496 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7971.2, nsentences=120, sample_size=3846.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1968.7, ups=0.25, wpb=7971.2, bsz=120, num_updates=25620, lr=1.83819e-05, gnorm=0.932, clip=10, loss_scale=32, train_wall=40, gb_free=25.9, wall=104990 2023-05-02 07:43:37 - progress_bar.py[line:274] - INFO: epoch 005: 1506 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7574.7, nsentences=120, sample_size=4107, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917.4, ups=0.25, wpb=7574.7, bsz=120, num_updates=25630, lr=1.83766e-05, gnorm=0.928, clip=10, loss_scale=32, train_wall=39, gb_free=30.8, wall=105029 2023-05-02 07:44:16 - progress_bar.py[line:274] - INFO: epoch 005: 1516 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7263, nsentences=120, sample_size=4059.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1851.4, ups=0.25, wpb=7263, bsz=120, num_updates=25640, lr=1.83713e-05, gnorm=0.953, clip=30, loss_scale=32, train_wall=39, gb_free=25.8, wall=105069 2023-05-02 07:44:56 - progress_bar.py[line:274] - INFO: epoch 005: 1526 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7809, nsentences=120, sample_size=4115.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1962.4, ups=0.25, wpb=7809, bsz=120, num_updates=25650, lr=1.83661e-05, gnorm=0.96, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=105108 2023-05-02 07:45:36 - progress_bar.py[line:274] - INFO: epoch 005: 1536 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7938.4, nsentences=120, sample_size=3963.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1983.7, ups=0.25, wpb=7938.4, bsz=120, num_updates=25660, lr=1.83608e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=105148 2023-05-02 07:46:16 - progress_bar.py[line:274] - INFO: epoch 005: 1546 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7743.9, nsentences=120, sample_size=3912.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1939.9, ups=0.25, wpb=7743.9, bsz=120, num_updates=25670, lr=1.83555e-05, gnorm=0.957, clip=30, loss_scale=32, train_wall=40, gb_free=30.2, wall=105188 2023-05-02 07:46:56 - progress_bar.py[line:274] - INFO: epoch 005: 1556 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7524.2, nsentences=120, sample_size=4191.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1876, ups=0.25, wpb=7524.2, bsz=120, num_updates=25680, lr=1.83502e-05, gnorm=0.964, clip=20, loss_scale=32, train_wall=40, gb_free=30.1, wall=105228 2023-05-02 07:47:36 - progress_bar.py[line:274] - INFO: epoch 005: 1566 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7524.9, nsentences=120, sample_size=3978.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1867.2, ups=0.25, wpb=7524.9, bsz=120, num_updates=25690, lr=1.83449e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=40, gb_free=27.7, wall=105269 2023-05-02 07:48:16 - progress_bar.py[line:274] - INFO: epoch 005: 1576 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7642.5, nsentences=120, sample_size=4105.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1925.2, ups=0.25, wpb=7642.5, bsz=120, num_updates=25700, lr=1.83396e-05, gnorm=0.946, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=105308 2023-05-02 07:48:56 - progress_bar.py[line:274] - INFO: epoch 005: 1586 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7894.4, nsentences=120, sample_size=3978.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1967.3, ups=0.25, wpb=7894.4, bsz=120, num_updates=25710, lr=1.83344e-05, gnorm=0.967, clip=30, loss_scale=32, train_wall=40, gb_free=23.6, wall=105348 2023-05-02 07:49:35 - progress_bar.py[line:274] - INFO: epoch 005: 1596 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7707, nsentences=120, sample_size=4182.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1961.4, ups=0.25, wpb=7707, bsz=120, num_updates=25720, lr=1.83291e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=39, gb_free=28.6, wall=105388 2023-05-02 07:50:15 - progress_bar.py[line:274] - INFO: epoch 005: 1606 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7632.5, nsentences=120, sample_size=4118.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1917.3, ups=0.25, wpb=7632.5, bsz=120, num_updates=25730, lr=1.83238e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=105428 2023-05-02 07:50:55 - progress_bar.py[line:274] - INFO: epoch 005: 1616 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7968.7, nsentences=120, sample_size=4192.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2003.5, ups=0.25, wpb=7968.7, bsz=120, num_updates=25740, lr=1.83185e-05, gnorm=0.931, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=105467 2023-05-02 07:51:35 - progress_bar.py[line:274] - INFO: epoch 005: 1626 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7586, nsentences=120, sample_size=4167.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1892.7, ups=0.25, wpb=7586, bsz=120, num_updates=25750, lr=1.83132e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=105507 2023-05-02 07:52:15 - progress_bar.py[line:274] - INFO: epoch 005: 1636 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7496.9, nsentences=120, sample_size=3962.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1887.4, ups=0.25, wpb=7496.9, bsz=120, num_updates=25760, lr=1.83079e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=105547 2023-05-02 07:52:54 - progress_bar.py[line:274] - INFO: epoch 005: 1646 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7631.5, nsentences=120, sample_size=4069.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1927, ups=0.25, wpb=7631.5, bsz=120, num_updates=25770, lr=1.83027e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=105587 2023-05-02 07:53:34 - progress_bar.py[line:274] - INFO: epoch 005: 1656 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7502.3, nsentences=120, sample_size=4273.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1883.9, ups=0.25, wpb=7502.3, bsz=120, num_updates=25780, lr=1.82974e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29.1, wall=105627 2023-05-02 07:54:14 - progress_bar.py[line:274] - INFO: epoch 005: 1666 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7719.5, nsentences=120, sample_size=3905.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1956.9, ups=0.25, wpb=7719.5, bsz=120, num_updates=25790, lr=1.82921e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=105666 2023-05-02 07:54:53 - progress_bar.py[line:274] - INFO: epoch 005: 1676 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7823, nsentences=120, sample_size=3908.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1963.3, ups=0.25, wpb=7823, bsz=120, num_updates=25800, lr=1.82868e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=105706 2023-05-02 07:55:33 - progress_bar.py[line:274] - INFO: epoch 005: 1686 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7685.4, nsentences=120, sample_size=3687.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1948, ups=0.25, wpb=7685.4, bsz=120, num_updates=25810, lr=1.82815e-05, gnorm=1.013, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=105745 2023-05-02 07:56:12 - progress_bar.py[line:274] - INFO: epoch 005: 1696 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7865.1, nsentences=120, sample_size=3896.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2000.3, ups=0.25, wpb=7865.1, bsz=120, num_updates=25820, lr=1.82763e-05, gnorm=1.001, clip=50, loss_scale=64, train_wall=39, gb_free=26.6, wall=105785 2023-05-02 07:56:52 - progress_bar.py[line:274] - INFO: epoch 005: 1706 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7562.1, nsentences=120, sample_size=4206.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1906.6, ups=0.25, wpb=7562.1, bsz=120, num_updates=25830, lr=1.8271e-05, gnorm=0.915, clip=0, loss_scale=64, train_wall=40, gb_free=29.7, wall=105824 2023-05-02 07:57:31 - progress_bar.py[line:274] - INFO: epoch 005: 1716 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7690.9, nsentences=120, sample_size=4081.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1980.1, ups=0.26, wpb=7690.9, bsz=120, num_updates=25840, lr=1.82657e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=31.4, wall=105863 2023-05-02 07:58:10 - progress_bar.py[line:274] - INFO: epoch 005: 1726 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7759.9, nsentences=120, sample_size=4123.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.6, ups=0.25, wpb=7759.9, bsz=120, num_updates=25850, lr=1.82604e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=28.4, wall=105903 2023-05-02 07:58:50 - progress_bar.py[line:274] - INFO: epoch 005: 1736 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7614.5, nsentences=120, sample_size=3947.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1933, ups=0.25, wpb=7614.5, bsz=120, num_updates=25860, lr=1.82551e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=105942 2023-05-02 07:59:29 - progress_bar.py[line:274] - INFO: epoch 005: 1746 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7568.7, nsentences=120, sample_size=4011.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1902.2, ups=0.25, wpb=7568.7, bsz=120, num_updates=25870, lr=1.82498e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=105982 2023-05-02 08:00:09 - progress_bar.py[line:274] - INFO: epoch 005: 1756 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7818.2, nsentences=120, sample_size=3888.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1971.7, ups=0.25, wpb=7818.2, bsz=120, num_updates=25880, lr=1.82446e-05, gnorm=1.017, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=106021 2023-05-02 08:00:49 - progress_bar.py[line:274] - INFO: epoch 005: 1766 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7979, nsentences=120, sample_size=3712.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2004.4, ups=0.25, wpb=7979, bsz=120, num_updates=25890, lr=1.82393e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=27.9, wall=106061 2023-05-02 08:01:29 - progress_bar.py[line:274] - INFO: epoch 005: 1776 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7810.3, nsentences=120, sample_size=3927.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1934.1, ups=0.25, wpb=7810.3, bsz=120, num_updates=25900, lr=1.8234e-05, gnorm=0.952, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=106102 2023-05-02 08:02:09 - progress_bar.py[line:274] - INFO: epoch 005: 1786 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7694.9, nsentences=120, sample_size=4468.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1927.8, ups=0.25, wpb=7694.9, bsz=120, num_updates=25910, lr=1.82287e-05, gnorm=0.916, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=106142 2023-05-02 08:02:49 - progress_bar.py[line:274] - INFO: epoch 005: 1796 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7704, nsentences=120, sample_size=4073.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1919.9, ups=0.25, wpb=7704, bsz=120, num_updates=25920, lr=1.82234e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=106182 2023-05-02 08:03:29 - progress_bar.py[line:274] - INFO: epoch 005: 1806 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7577.3, nsentences=120, sample_size=4259.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1884.6, ups=0.25, wpb=7577.3, bsz=120, num_updates=25930, lr=1.82182e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=106222 2023-05-02 08:04:10 - progress_bar.py[line:274] - INFO: epoch 005: 1816 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7759.3, nsentences=120, sample_size=3904.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1933.6, ups=0.25, wpb=7759.3, bsz=120, num_updates=25940, lr=1.82129e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=106262 2023-05-02 08:04:49 - progress_bar.py[line:274] - INFO: epoch 005: 1826 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7637.6, nsentences=120, sample_size=3931.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1943.1, ups=0.25, wpb=7637.6, bsz=120, num_updates=25950, lr=1.82076e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=106301 2023-05-02 08:05:30 - progress_bar.py[line:274] - INFO: epoch 005: 1836 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7816.8, nsentences=120, sample_size=3993.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1910.1, ups=0.24, wpb=7816.8, bsz=120, num_updates=25960, lr=1.82023e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=41, gb_free=29.5, wall=106342 2023-05-02 08:06:10 - progress_bar.py[line:274] - INFO: epoch 005: 1846 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7672.1, nsentences=120, sample_size=3997.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1895.9, ups=0.25, wpb=7672.1, bsz=120, num_updates=25970, lr=1.8197e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=106383 2023-05-02 08:06:50 - progress_bar.py[line:274] - INFO: epoch 005: 1856 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7447, nsentences=120, sample_size=4275.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1882.7, ups=0.25, wpb=7447, bsz=120, num_updates=25980, lr=1.81917e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=106422 2023-05-02 08:07:30 - progress_bar.py[line:274] - INFO: epoch 005: 1866 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7439.4, nsentences=120, sample_size=4084, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1875.5, ups=0.25, wpb=7439.4, bsz=120, num_updates=25990, lr=1.81865e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=106462 2023-05-02 08:08:10 - progress_bar.py[line:274] - INFO: epoch 005: 1876 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=8106.3, nsentences=120, sample_size=3861.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1996.6, ups=0.25, wpb=8106.3, bsz=120, num_updates=26000, lr=1.81812e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=41, gb_free=29.1, wall=106503 2023-05-02 08:08:10 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 08:08:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 08:08:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:08:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:28 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 08:08:28 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:08:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 08:08:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:08:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 08:08:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:08:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:56 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 08:08:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:08:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:08:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:08:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:09:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:09:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:09:01 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 08:09:01 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 08:09:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 08:09:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 08:09:01 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.225 | loss_v1 0 | loss_v2 0 | nll_loss 2.06 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.17 | score 0.7446 | wps 3288.1 | wpb 3202.1 | bsz 39.4 | num_updates 26000 | best_score 0.7598 2023-05-02 08:09:01 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 26000 updates 2023-05-02 08:09:01 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_26000.pt 2023-05-02 08:09:26 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_26000.pt 2023-05-02 08:09:39 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_26000.pt (epoch 5 @ 26000 updates, score 0.7446) (writing took 38.25912877591327 seconds) 2023-05-02 08:10:18 - progress_bar.py[line:274] - INFO: epoch 005: 1886 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7389, nsentences=120, sample_size=4222.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=576.1, ups=0.08, wpb=7389, bsz=120, num_updates=26010, lr=1.81759e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=28, wall=106631 2023-05-02 08:10:57 - progress_bar.py[line:274] - INFO: epoch 005: 1896 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7791.1, nsentences=120, sample_size=4294.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1997.8, ups=0.26, wpb=7791.1, bsz=120, num_updates=26020, lr=1.81706e-05, gnorm=0.919, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=106670 2023-05-02 08:11:37 - progress_bar.py[line:274] - INFO: epoch 005: 1906 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7965.2, nsentences=120, sample_size=3811.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1997.2, ups=0.25, wpb=7965.2, bsz=120, num_updates=26030, lr=1.81653e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=106710 2023-05-02 08:12:17 - progress_bar.py[line:274] - INFO: epoch 005: 1916 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7985.5, nsentences=120, sample_size=4000.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2017.6, ups=0.25, wpb=7985.5, bsz=120, num_updates=26040, lr=1.816e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=106749 2023-05-02 08:12:57 - progress_bar.py[line:274] - INFO: epoch 005: 1926 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7858.3, nsentences=120, sample_size=3916.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1964.6, ups=0.25, wpb=7858.3, bsz=120, num_updates=26050, lr=1.81548e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=106789 2023-05-02 08:13:37 - progress_bar.py[line:274] - INFO: epoch 005: 1936 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7711.4, nsentences=120, sample_size=4187.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1941.1, ups=0.25, wpb=7711.4, bsz=120, num_updates=26060, lr=1.81495e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=28.2, wall=106829 2023-05-02 08:14:16 - progress_bar.py[line:274] - INFO: epoch 005: 1946 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7655.9, nsentences=120, sample_size=4078.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1921.6, ups=0.25, wpb=7655.9, bsz=120, num_updates=26070, lr=1.81442e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=106869 2023-05-02 08:14:56 - progress_bar.py[line:274] - INFO: epoch 005: 1956 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7451.4, nsentences=120, sample_size=3993.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1905.1, ups=0.26, wpb=7451.4, bsz=120, num_updates=26080, lr=1.81389e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=106908 2023-05-02 08:15:34 - progress_bar.py[line:274] - INFO: epoch 005: 1966 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7784.8, nsentences=120, sample_size=3953.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2001.9, ups=0.26, wpb=7784.8, bsz=120, num_updates=26090, lr=1.81336e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=26.7, wall=106947 2023-05-02 08:16:14 - progress_bar.py[line:274] - INFO: epoch 005: 1976 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7723.7, nsentences=120, sample_size=4210.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1931.8, ups=0.25, wpb=7723.7, bsz=120, num_updates=26100, lr=1.81284e-05, gnorm=0.999, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=106987 2023-05-02 08:16:54 - progress_bar.py[line:274] - INFO: epoch 005: 1986 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7690.5, nsentences=120, sample_size=3932.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1930.5, ups=0.25, wpb=7690.5, bsz=120, num_updates=26110, lr=1.81231e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=107027 2023-05-02 08:17:34 - progress_bar.py[line:274] - INFO: epoch 005: 1996 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7790.9, nsentences=120, sample_size=3488, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1966.6, ups=0.25, wpb=7790.9, bsz=120, num_updates=26120, lr=1.81178e-05, gnorm=1.033, clip=70, loss_scale=64, train_wall=40, gb_free=30.8, wall=107066 2023-05-02 08:18:13 - progress_bar.py[line:274] - INFO: epoch 005: 2006 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7547.7, nsentences=120, sample_size=3863.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1950.3, ups=0.26, wpb=7547.7, bsz=120, num_updates=26130, lr=1.81125e-05, gnorm=1.003, clip=50, loss_scale=64, train_wall=39, gb_free=30.3, wall=107105 2023-05-02 08:18:53 - progress_bar.py[line:274] - INFO: epoch 005: 2016 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7592.4, nsentences=120, sample_size=4296.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1886.1, ups=0.25, wpb=7592.4, bsz=120, num_updates=26140, lr=1.81072e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=107145 2023-05-02 08:19:33 - progress_bar.py[line:274] - INFO: epoch 005: 2026 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7948.1, nsentences=120, sample_size=3874.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1984.1, ups=0.25, wpb=7948.1, bsz=120, num_updates=26150, lr=1.81019e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=107185 2023-05-02 08:20:12 - progress_bar.py[line:274] - INFO: epoch 005: 2036 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7669.1, nsentences=120, sample_size=4330.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1972.1, ups=0.26, wpb=7669.1, bsz=120, num_updates=26160, lr=1.80967e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=107224 2023-05-02 08:20:52 - progress_bar.py[line:274] - INFO: epoch 005: 2046 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7963, nsentences=120, sample_size=3927.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973.9, ups=0.25, wpb=7963, bsz=120, num_updates=26170, lr=1.80914e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=107265 2023-05-02 08:21:32 - progress_bar.py[line:274] - INFO: epoch 005: 2056 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7800.1, nsentences=120, sample_size=4098.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1975.9, ups=0.25, wpb=7800.1, bsz=120, num_updates=26180, lr=1.80861e-05, gnorm=0.992, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=107304 2023-05-02 08:22:12 - progress_bar.py[line:274] - INFO: epoch 005: 2066 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7654, nsentences=120, sample_size=4022.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1910.8, ups=0.25, wpb=7654, bsz=120, num_updates=26190, lr=1.80808e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=107344 2023-05-02 08:22:51 - progress_bar.py[line:274] - INFO: epoch 005: 2076 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7502.2, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1892.9, ups=0.25, wpb=7502.2, bsz=120, num_updates=26200, lr=1.80755e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=107384 2023-05-02 08:23:30 - progress_bar.py[line:274] - INFO: epoch 005: 2086 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7753.3, nsentences=120, sample_size=3974.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1993.2, ups=0.26, wpb=7753.3, bsz=120, num_updates=26210, lr=1.80703e-05, gnorm=0.98, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=107423 2023-05-02 08:24:10 - progress_bar.py[line:274] - INFO: epoch 005: 2096 / 6042 loss=2.455, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7862.3, nsentences=120, sample_size=3909.5, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1989.8, ups=0.25, wpb=7862.3, bsz=120, num_updates=26220, lr=1.8065e-05, gnorm=0.99, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=107462 2023-05-02 08:24:49 - progress_bar.py[line:274] - INFO: epoch 005: 2106 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7739, nsentences=120, sample_size=4204.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1943, ups=0.25, wpb=7739, bsz=120, num_updates=26230, lr=1.80597e-05, gnorm=0.943, clip=0, loss_scale=64, train_wall=40, gb_free=30.4, wall=107502 2023-05-02 08:25:29 - progress_bar.py[line:274] - INFO: epoch 005: 2116 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7727.1, nsentences=120, sample_size=4155.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1965.6, ups=0.25, wpb=7727.1, bsz=120, num_updates=26240, lr=1.80544e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=107541 2023-05-02 08:26:08 - progress_bar.py[line:274] - INFO: epoch 005: 2126 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7514.4, nsentences=120, sample_size=4092.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1938.9, ups=0.26, wpb=7514.4, bsz=120, num_updates=26250, lr=1.80491e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=28, wall=107580 2023-05-02 08:26:47 - progress_bar.py[line:274] - INFO: epoch 005: 2136 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7745.2, nsentences=120, sample_size=3936.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1942.3, ups=0.25, wpb=7745.2, bsz=120, num_updates=26260, lr=1.80438e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=31.1, wall=107620 2023-05-02 08:27:27 - progress_bar.py[line:274] - INFO: epoch 005: 2146 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7782.6, nsentences=120, sample_size=3980.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1968.5, ups=0.25, wpb=7782.6, bsz=120, num_updates=26270, lr=1.80386e-05, gnorm=0.975, clip=30, loss_scale=128, train_wall=39, gb_free=30.2, wall=107659 2023-05-02 08:27:47 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 08:28:11 - progress_bar.py[line:274] - INFO: epoch 005: 2157 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7655.1, nsentences=120, sample_size=4080.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1726.3, ups=0.23, wpb=7655.1, bsz=120, num_updates=26280, lr=1.80333e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=44, gb_free=29.9, wall=107704 2023-05-02 08:28:51 - progress_bar.py[line:274] - INFO: epoch 005: 2167 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7568.5, nsentences=120, sample_size=3985.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927.6, ups=0.25, wpb=7568.5, bsz=120, num_updates=26290, lr=1.8028e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=29, wall=107743 2023-05-02 08:29:30 - progress_bar.py[line:274] - INFO: epoch 005: 2177 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7881.4, nsentences=120, sample_size=3898.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1988.7, ups=0.25, wpb=7881.4, bsz=120, num_updates=26300, lr=1.80227e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=107783 2023-05-02 08:30:10 - progress_bar.py[line:274] - INFO: epoch 005: 2187 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7660.5, nsentences=120, sample_size=4146.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1917.8, ups=0.25, wpb=7660.5, bsz=120, num_updates=26310, lr=1.80174e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=107823 2023-05-02 08:30:50 - progress_bar.py[line:274] - INFO: epoch 005: 2197 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7529.3, nsentences=120, sample_size=3864.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1910.1, ups=0.25, wpb=7529.3, bsz=120, num_updates=26320, lr=1.80121e-05, gnorm=0.981, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=107862 2023-05-02 08:31:30 - progress_bar.py[line:274] - INFO: epoch 005: 2207 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7449.4, nsentences=120, sample_size=4183.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1861.5, ups=0.25, wpb=7449.4, bsz=120, num_updates=26330, lr=1.80069e-05, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=28.3, wall=107902 2023-05-02 08:32:09 - progress_bar.py[line:274] - INFO: epoch 005: 2217 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7940.1, nsentences=120, sample_size=4151.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2002.3, ups=0.25, wpb=7940.1, bsz=120, num_updates=26340, lr=1.80016e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=107942 2023-05-02 08:32:49 - progress_bar.py[line:274] - INFO: epoch 005: 2227 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7946.8, nsentences=120, sample_size=4389.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2000.9, ups=0.25, wpb=7946.8, bsz=120, num_updates=26350, lr=1.79963e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=107981 2023-05-02 08:33:28 - progress_bar.py[line:274] - INFO: epoch 005: 2237 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7530.9, nsentences=120, sample_size=4127.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908.3, ups=0.25, wpb=7530.9, bsz=120, num_updates=26360, lr=1.7991e-05, gnorm=0.935, clip=20, loss_scale=64, train_wall=39, gb_free=26.3, wall=108021 2023-05-02 08:34:08 - progress_bar.py[line:274] - INFO: epoch 005: 2247 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7610.5, nsentences=120, sample_size=4016.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1923.7, ups=0.25, wpb=7610.5, bsz=120, num_updates=26370, lr=1.79857e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=108060 2023-05-02 08:34:48 - progress_bar.py[line:274] - INFO: epoch 005: 2257 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7869.3, nsentences=120, sample_size=3925.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1960.3, ups=0.25, wpb=7869.3, bsz=120, num_updates=26380, lr=1.79805e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=108101 2023-05-02 08:35:28 - progress_bar.py[line:274] - INFO: epoch 005: 2267 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7734.5, nsentences=120, sample_size=4069.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1961.4, ups=0.25, wpb=7734.5, bsz=120, num_updates=26390, lr=1.79752e-05, gnorm=0.983, clip=60, loss_scale=64, train_wall=39, gb_free=30, wall=108140 2023-05-02 08:36:08 - progress_bar.py[line:274] - INFO: epoch 005: 2277 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=8007.6, nsentences=120, sample_size=4050.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1992.8, ups=0.25, wpb=8007.6, bsz=120, num_updates=26400, lr=1.79699e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=108180 2023-05-02 08:36:48 - progress_bar.py[line:274] - INFO: epoch 005: 2287 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7822, nsentences=120, sample_size=4131.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1935.8, ups=0.25, wpb=7822, bsz=120, num_updates=26410, lr=1.79646e-05, gnorm=0.937, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=108221 2023-05-02 08:37:27 - progress_bar.py[line:274] - INFO: epoch 005: 2297 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7703.4, nsentences=120, sample_size=3890.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1979.3, ups=0.26, wpb=7703.4, bsz=120, num_updates=26420, lr=1.79593e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=108260 2023-05-02 08:38:07 - progress_bar.py[line:274] - INFO: epoch 005: 2307 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7779.3, nsentences=120, sample_size=4086.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1954, ups=0.25, wpb=7779.3, bsz=120, num_updates=26430, lr=1.7954e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=108299 2023-05-02 08:38:46 - progress_bar.py[line:274] - INFO: epoch 005: 2317 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7600.6, nsentences=120, sample_size=3921.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1931.9, ups=0.25, wpb=7600.6, bsz=120, num_updates=26440, lr=1.79488e-05, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=108339 2023-05-02 08:39:26 - progress_bar.py[line:274] - INFO: epoch 005: 2327 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7812.4, nsentences=120, sample_size=4055.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1950.9, ups=0.25, wpb=7812.4, bsz=120, num_updates=26450, lr=1.79435e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=108379 2023-05-02 08:40:06 - progress_bar.py[line:274] - INFO: epoch 005: 2337 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7760.5, nsentences=120, sample_size=4107.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1949.4, ups=0.25, wpb=7760.5, bsz=120, num_updates=26460, lr=1.79382e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=108419 2023-05-02 08:40:45 - progress_bar.py[line:274] - INFO: epoch 005: 2347 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7809.3, nsentences=120, sample_size=3991.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1986.4, ups=0.25, wpb=7809.3, bsz=120, num_updates=26470, lr=1.79329e-05, gnorm=0.965, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=108458 2023-05-02 08:41:26 - progress_bar.py[line:274] - INFO: epoch 005: 2357 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7897, nsentences=120, sample_size=4180.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1929.7, ups=0.24, wpb=7897, bsz=120, num_updates=26480, lr=1.79276e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=108499 2023-05-02 08:42:07 - progress_bar.py[line:274] - INFO: epoch 005: 2367 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7786.5, nsentences=120, sample_size=4215.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1939.8, ups=0.25, wpb=7786.5, bsz=120, num_updates=26490, lr=1.79224e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=40, gb_free=30, wall=108539 2023-05-02 08:42:46 - progress_bar.py[line:274] - INFO: epoch 005: 2377 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7470, nsentences=120, sample_size=4176.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1882.3, ups=0.25, wpb=7470, bsz=120, num_updates=26500, lr=1.79171e-05, gnorm=0.927, clip=0, loss_scale=64, train_wall=40, gb_free=30.2, wall=108579 2023-05-02 08:43:25 - progress_bar.py[line:274] - INFO: epoch 005: 2387 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7644.4, nsentences=120, sample_size=4122, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1947.1, ups=0.25, wpb=7644.4, bsz=120, num_updates=26510, lr=1.79118e-05, gnorm=0.93, clip=0, loss_scale=64, train_wall=39, gb_free=28.3, wall=108618 2023-05-02 08:44:05 - progress_bar.py[line:274] - INFO: epoch 005: 2397 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7473.8, nsentences=120, sample_size=4027, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1888.1, ups=0.25, wpb=7473.8, bsz=120, num_updates=26520, lr=1.79065e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=108658 2023-05-02 08:44:45 - progress_bar.py[line:274] - INFO: epoch 005: 2407 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7906.4, nsentences=120, sample_size=4372.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1986.7, ups=0.25, wpb=7906.4, bsz=120, num_updates=26530, lr=1.79012e-05, gnorm=0.96, clip=10, loss_scale=64, train_wall=40, gb_free=28.2, wall=108697 2023-05-02 08:45:24 - progress_bar.py[line:274] - INFO: epoch 005: 2417 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=8026.4, nsentences=120, sample_size=3977.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2043.4, ups=0.25, wpb=8026.4, bsz=120, num_updates=26540, lr=1.78959e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=108737 2023-05-02 08:46:04 - progress_bar.py[line:274] - INFO: epoch 005: 2427 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7549, nsentences=120, sample_size=3988.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1911.5, ups=0.25, wpb=7549, bsz=120, num_updates=26550, lr=1.78907e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=39, gb_free=30, wall=108776 2023-05-02 08:46:43 - progress_bar.py[line:274] - INFO: epoch 005: 2437 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7739.7, nsentences=120, sample_size=3925.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1952.2, ups=0.25, wpb=7739.7, bsz=120, num_updates=26560, lr=1.78854e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=108816 2023-05-02 08:47:24 - progress_bar.py[line:274] - INFO: epoch 005: 2447 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7726.5, nsentences=120, sample_size=3737.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1895.9, ups=0.25, wpb=7726.5, bsz=120, num_updates=26570, lr=1.78801e-05, gnorm=1.002, clip=60, loss_scale=64, train_wall=41, gb_free=29.5, wall=108856 2023-05-02 08:48:03 - progress_bar.py[line:274] - INFO: epoch 005: 2457 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7696.6, nsentences=120, sample_size=4221.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1981.8, ups=0.26, wpb=7696.6, bsz=120, num_updates=26580, lr=1.78748e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=108895 2023-05-02 08:48:43 - progress_bar.py[line:274] - INFO: epoch 005: 2467 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7953.6, nsentences=120, sample_size=3974.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1989.4, ups=0.25, wpb=7953.6, bsz=120, num_updates=26590, lr=1.78695e-05, gnorm=0.951, clip=40, loss_scale=64, train_wall=40, gb_free=26.8, wall=108935 2023-05-02 08:49:22 - progress_bar.py[line:274] - INFO: epoch 005: 2477 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=8027.2, nsentences=120, sample_size=3874.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2065.5, ups=0.26, wpb=8027.2, bsz=120, num_updates=26600, lr=1.78642e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=108974 2023-05-02 08:50:02 - progress_bar.py[line:274] - INFO: epoch 005: 2487 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7767.8, nsentences=120, sample_size=4316.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1947.4, ups=0.25, wpb=7767.8, bsz=120, num_updates=26610, lr=1.7859e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=109014 2023-05-02 08:50:41 - progress_bar.py[line:274] - INFO: epoch 005: 2497 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7435.6, nsentences=120, sample_size=4294.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1887.2, ups=0.25, wpb=7435.6, bsz=120, num_updates=26620, lr=1.78537e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=109053 2023-05-02 08:51:21 - progress_bar.py[line:274] - INFO: epoch 005: 2507 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7938.9, nsentences=120, sample_size=4262.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1971.5, ups=0.25, wpb=7938.9, bsz=120, num_updates=26630, lr=1.78484e-05, gnorm=0.924, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=109094 2023-05-02 08:52:01 - progress_bar.py[line:274] - INFO: epoch 005: 2517 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=8112.9, nsentences=120, sample_size=3961.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2017.4, ups=0.25, wpb=8112.9, bsz=120, num_updates=26640, lr=1.78431e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=109134 2023-05-02 08:52:41 - progress_bar.py[line:274] - INFO: epoch 005: 2527 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7995.3, nsentences=120, sample_size=3995.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2011.7, ups=0.25, wpb=7995.3, bsz=120, num_updates=26650, lr=1.78378e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=109174 2023-05-02 08:53:21 - progress_bar.py[line:274] - INFO: epoch 005: 2537 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7478.3, nsentences=120, sample_size=4603.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1882.8, ups=0.25, wpb=7478.3, bsz=120, num_updates=26660, lr=1.78326e-05, gnorm=0.903, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=109213 2023-05-02 08:54:00 - progress_bar.py[line:274] - INFO: epoch 005: 2547 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7682, nsentences=120, sample_size=4166, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1945.8, ups=0.25, wpb=7682, bsz=120, num_updates=26670, lr=1.78273e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=109253 2023-05-02 08:54:40 - progress_bar.py[line:274] - INFO: epoch 005: 2557 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7811, nsentences=120, sample_size=4088.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1983.4, ups=0.25, wpb=7811, bsz=120, num_updates=26680, lr=1.7822e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=109292 2023-05-02 08:55:20 - progress_bar.py[line:274] - INFO: epoch 005: 2567 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7836.5, nsentences=120, sample_size=3933.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1972, ups=0.25, wpb=7836.5, bsz=120, num_updates=26690, lr=1.78167e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=109332 2023-05-02 08:55:59 - progress_bar.py[line:274] - INFO: epoch 005: 2577 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7667.6, nsentences=120, sample_size=3857.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1952.1, ups=0.25, wpb=7667.6, bsz=120, num_updates=26700, lr=1.78114e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=39, gb_free=28, wall=109371 2023-05-02 08:56:39 - progress_bar.py[line:274] - INFO: epoch 005: 2587 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7703.2, nsentences=120, sample_size=4111.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1940.9, ups=0.25, wpb=7703.2, bsz=120, num_updates=26710, lr=1.78061e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=109411 2023-05-02 08:57:19 - progress_bar.py[line:274] - INFO: epoch 005: 2597 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7746.8, nsentences=120, sample_size=3844.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926.7, ups=0.25, wpb=7746.8, bsz=120, num_updates=26720, lr=1.78009e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=109451 2023-05-02 08:57:58 - progress_bar.py[line:274] - INFO: epoch 005: 2607 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7681, nsentences=120, sample_size=4278.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1947.2, ups=0.25, wpb=7681, bsz=120, num_updates=26730, lr=1.77956e-05, gnorm=0.95, clip=0, loss_scale=64, train_wall=39, gb_free=31.4, wall=109491 2023-05-02 08:58:37 - progress_bar.py[line:274] - INFO: epoch 005: 2617 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7696.3, nsentences=120, sample_size=3890.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1974.3, ups=0.26, wpb=7696.3, bsz=120, num_updates=26740, lr=1.77903e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=109530 2023-05-02 08:59:17 - progress_bar.py[line:274] - INFO: epoch 005: 2627 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7709.9, nsentences=120, sample_size=4314.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1938.8, ups=0.25, wpb=7709.9, bsz=120, num_updates=26750, lr=1.7785e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=109569 2023-05-02 08:59:57 - progress_bar.py[line:274] - INFO: epoch 005: 2637 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7701.4, nsentences=120, sample_size=4179.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1916.2, ups=0.25, wpb=7701.4, bsz=120, num_updates=26760, lr=1.77797e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=109610 2023-05-02 09:00:37 - progress_bar.py[line:274] - INFO: epoch 005: 2647 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7591.9, nsentences=120, sample_size=3901.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1881.8, ups=0.25, wpb=7591.9, bsz=120, num_updates=26770, lr=1.77745e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=109650 2023-05-02 09:01:16 - progress_bar.py[line:274] - INFO: epoch 005: 2657 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7770.9, nsentences=120, sample_size=3993.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1992.3, ups=0.26, wpb=7770.9, bsz=120, num_updates=26780, lr=1.77692e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=109689 2023-05-02 09:01:56 - progress_bar.py[line:274] - INFO: epoch 005: 2667 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7701.3, nsentences=120, sample_size=3601.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1960, ups=0.25, wpb=7701.3, bsz=120, num_updates=26790, lr=1.77639e-05, gnorm=1.015, clip=70, loss_scale=128, train_wall=39, gb_free=29.2, wall=109728 2023-05-02 09:02:08 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 09:02:39 - progress_bar.py[line:274] - INFO: epoch 005: 2678 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7561, nsentences=120, sample_size=4091.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1734.9, ups=0.23, wpb=7561, bsz=120, num_updates=26800, lr=1.77586e-05, gnorm=1.029, clip=60, loss_scale=64, train_wall=44, gb_free=30.9, wall=109772 2023-05-02 09:03:19 - progress_bar.py[line:274] - INFO: epoch 005: 2688 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7892.4, nsentences=120, sample_size=3752.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1995.3, ups=0.25, wpb=7892.4, bsz=120, num_updates=26810, lr=1.77533e-05, gnorm=1.025, clip=70, loss_scale=64, train_wall=39, gb_free=31.3, wall=109811 2023-05-02 09:03:59 - progress_bar.py[line:274] - INFO: epoch 005: 2698 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7978.9, nsentences=120, sample_size=4154.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1982.2, ups=0.25, wpb=7978.9, bsz=120, num_updates=26820, lr=1.7748e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=40, gb_free=23.6, wall=109852 2023-05-02 09:04:39 - progress_bar.py[line:274] - INFO: epoch 005: 2708 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7638.2, nsentences=120, sample_size=3991.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1910, ups=0.25, wpb=7638.2, bsz=120, num_updates=26830, lr=1.77428e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=109892 2023-05-02 09:05:18 - progress_bar.py[line:274] - INFO: epoch 005: 2718 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7898.9, nsentences=120, sample_size=3960.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=2028.2, ups=0.26, wpb=7898.9, bsz=120, num_updates=26840, lr=1.77375e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=109931 2023-05-02 09:05:58 - progress_bar.py[line:274] - INFO: epoch 005: 2728 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7934.6, nsentences=120, sample_size=4114.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2000.4, ups=0.25, wpb=7934.6, bsz=120, num_updates=26850, lr=1.77322e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=109970 2023-05-02 09:06:38 - progress_bar.py[line:274] - INFO: epoch 005: 2738 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7825.6, nsentences=120, sample_size=3824.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1969.2, ups=0.25, wpb=7825.6, bsz=120, num_updates=26860, lr=1.77269e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=110010 2023-05-02 09:07:17 - progress_bar.py[line:274] - INFO: epoch 005: 2748 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7759, nsentences=120, sample_size=4142, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1985.8, ups=0.26, wpb=7759, bsz=120, num_updates=26870, lr=1.77216e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=110049 2023-05-02 09:07:57 - progress_bar.py[line:274] - INFO: epoch 005: 2758 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7518.6, nsentences=120, sample_size=4022.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1844, ups=0.25, wpb=7518.6, bsz=120, num_updates=26880, lr=1.77163e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=41, gb_free=29.9, wall=110090 2023-05-02 09:08:37 - progress_bar.py[line:274] - INFO: epoch 005: 2768 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7517.6, nsentences=120, sample_size=4150.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1915.3, ups=0.25, wpb=7517.6, bsz=120, num_updates=26890, lr=1.77111e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=110129 2023-05-02 09:09:16 - progress_bar.py[line:274] - INFO: epoch 005: 2778 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7752.1, nsentences=120, sample_size=4069.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1951.2, ups=0.25, wpb=7752.1, bsz=120, num_updates=26900, lr=1.77058e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=110169 2023-05-02 09:09:56 - progress_bar.py[line:274] - INFO: epoch 005: 2788 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7590, nsentences=120, sample_size=3974.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927.6, ups=0.25, wpb=7590, bsz=120, num_updates=26910, lr=1.77005e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=110208 2023-05-02 09:10:36 - progress_bar.py[line:274] - INFO: epoch 005: 2798 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7916.6, nsentences=120, sample_size=4037.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1989, ups=0.25, wpb=7916.6, bsz=120, num_updates=26920, lr=1.76952e-05, gnorm=0.921, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=110248 2023-05-02 09:11:15 - progress_bar.py[line:274] - INFO: epoch 005: 2808 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7643.2, nsentences=120, sample_size=4310.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1925.8, ups=0.25, wpb=7643.2, bsz=120, num_updates=26930, lr=1.76899e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=40, gb_free=26.4, wall=110288 2023-05-02 09:11:55 - progress_bar.py[line:274] - INFO: epoch 005: 2818 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7556, nsentences=120, sample_size=4253.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1917.6, ups=0.25, wpb=7556, bsz=120, num_updates=26940, lr=1.76847e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=28.3, wall=110327 2023-05-02 09:12:35 - progress_bar.py[line:274] - INFO: epoch 005: 2828 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=8007.4, nsentences=120, sample_size=3904.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1997.7, ups=0.25, wpb=8007.4, bsz=120, num_updates=26950, lr=1.76794e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=110367 2023-05-02 09:12:50 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 09:13:18 - progress_bar.py[line:274] - INFO: epoch 005: 2839 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7543.6, nsentences=120, sample_size=4441.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1725.5, ups=0.23, wpb=7543.6, bsz=120, num_updates=26960, lr=1.76741e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=44, gb_free=29.2, wall=110411 2023-05-02 09:13:58 - progress_bar.py[line:274] - INFO: epoch 005: 2849 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7588.9, nsentences=120, sample_size=3984, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1898.6, ups=0.25, wpb=7588.9, bsz=120, num_updates=26970, lr=1.76688e-05, gnorm=1.003, clip=50, loss_scale=32, train_wall=40, gb_free=28.7, wall=110451 2023-05-02 09:14:39 - progress_bar.py[line:274] - INFO: epoch 005: 2859 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7435.8, nsentences=120, sample_size=4105.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1832.9, ups=0.25, wpb=7435.8, bsz=120, num_updates=26980, lr=1.76635e-05, gnorm=0.981, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=110491 2023-05-02 09:15:19 - progress_bar.py[line:274] - INFO: epoch 005: 2869 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7590.3, nsentences=120, sample_size=4132.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1904.4, ups=0.25, wpb=7590.3, bsz=120, num_updates=26990, lr=1.76582e-05, gnorm=0.968, clip=20, loss_scale=32, train_wall=40, gb_free=30.1, wall=110531 2023-05-02 09:16:00 - progress_bar.py[line:274] - INFO: epoch 005: 2879 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7844.1, nsentences=120, sample_size=3845.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1919.2, ups=0.24, wpb=7844.1, bsz=120, num_updates=27000, lr=1.7653e-05, gnorm=0.998, clip=50, loss_scale=32, train_wall=41, gb_free=30.9, wall=110572 2023-05-02 09:16:00 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 09:16:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 09:16:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:18 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 09:16:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 09:16:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 09:16:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:46 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 09:16:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 09:16:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 09:16:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 09:16:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 09:16:51 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.215 | loss_v1 0 | loss_v2 0 | nll_loss 2.046 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.13 | score 0.7544 | wps 3309.4 | wpb 3202.1 | bsz 39.4 | num_updates 27000 | best_score 0.7598 2023-05-02 09:16:51 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 27000 updates 2023-05-02 09:16:51 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_27000.pt 2023-05-02 09:17:15 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_27000.pt 2023-05-02 09:17:44 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_27000.pt (epoch 5 @ 27000 updates, score 0.7544) (writing took 52.575995091116056 seconds) 2023-05-02 09:18:23 - progress_bar.py[line:274] - INFO: epoch 005: 2889 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7763.1, nsentences=120, sample_size=4066.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=542.3, ups=0.07, wpb=7763.1, bsz=120, num_updates=27010, lr=1.76477e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=39, gb_free=30.5, wall=110715 2023-05-02 09:19:03 - progress_bar.py[line:274] - INFO: epoch 005: 2899 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7565.6, nsentences=120, sample_size=4370.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1872.8, ups=0.25, wpb=7565.6, bsz=120, num_updates=27020, lr=1.76424e-05, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=110756 2023-05-02 09:19:43 - progress_bar.py[line:274] - INFO: epoch 005: 2909 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7777.8, nsentences=120, sample_size=4220.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1962.5, ups=0.25, wpb=7777.8, bsz=120, num_updates=27030, lr=1.76371e-05, gnorm=0.94, clip=10, loss_scale=32, train_wall=40, gb_free=30.6, wall=110795 2023-05-02 09:20:23 - progress_bar.py[line:274] - INFO: epoch 005: 2919 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7631, nsentences=120, sample_size=3823.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1900.9, ups=0.25, wpb=7631, bsz=120, num_updates=27040, lr=1.76318e-05, gnorm=0.989, clip=50, loss_scale=32, train_wall=40, gb_free=29.7, wall=110835 2023-05-02 09:21:04 - progress_bar.py[line:274] - INFO: epoch 005: 2929 / 6042 loss=2.464, loss_v1=0, loss_v2=0, nll_loss=1.221, ntokens=7707.9, nsentences=120, sample_size=4035, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1890.4, ups=0.25, wpb=7707.9, bsz=120, num_updates=27050, lr=1.76266e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=41, gb_free=29.6, wall=110876 2023-05-02 09:21:43 - progress_bar.py[line:274] - INFO: epoch 005: 2939 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7406.3, nsentences=120, sample_size=4081.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1894, ups=0.26, wpb=7406.3, bsz=120, num_updates=27060, lr=1.76213e-05, gnorm=0.951, clip=40, loss_scale=32, train_wall=39, gb_free=30, wall=110915 2023-05-02 09:22:23 - progress_bar.py[line:274] - INFO: epoch 005: 2949 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7688.3, nsentences=120, sample_size=4281.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1920, ups=0.25, wpb=7688.3, bsz=120, num_updates=27070, lr=1.7616e-05, gnorm=0.919, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=110955 2023-05-02 09:23:03 - progress_bar.py[line:274] - INFO: epoch 005: 2959 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7617.8, nsentences=120, sample_size=3877.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1908.7, ups=0.25, wpb=7617.8, bsz=120, num_updates=27080, lr=1.76107e-05, gnorm=0.999, clip=60, loss_scale=32, train_wall=40, gb_free=29.5, wall=110995 2023-05-02 09:23:44 - progress_bar.py[line:274] - INFO: epoch 005: 2969 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7740.4, nsentences=120, sample_size=4158.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1893.3, ups=0.24, wpb=7740.4, bsz=120, num_updates=27090, lr=1.76054e-05, gnorm=0.955, clip=30, loss_scale=32, train_wall=41, gb_free=30.6, wall=111036 2023-05-02 09:24:23 - progress_bar.py[line:274] - INFO: epoch 005: 2979 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7742.9, nsentences=120, sample_size=4070, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1965.1, ups=0.25, wpb=7742.9, bsz=120, num_updates=27100, lr=1.76001e-05, gnorm=0.949, clip=20, loss_scale=32, train_wall=39, gb_free=29.1, wall=111076 2023-05-02 09:25:03 - progress_bar.py[line:274] - INFO: epoch 005: 2989 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7473.4, nsentences=120, sample_size=3925.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1852.2, ups=0.25, wpb=7473.4, bsz=120, num_updates=27110, lr=1.75949e-05, gnorm=1.007, clip=40, loss_scale=32, train_wall=40, gb_free=26, wall=111116 2023-05-02 09:25:43 - progress_bar.py[line:274] - INFO: epoch 005: 2999 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7902.4, nsentences=120, sample_size=4079.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1998.9, ups=0.25, wpb=7902.4, bsz=120, num_updates=27120, lr=1.75896e-05, gnorm=0.945, clip=30, loss_scale=32, train_wall=39, gb_free=29.9, wall=111155 2023-05-02 09:26:23 - progress_bar.py[line:274] - INFO: epoch 005: 3009 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7394.6, nsentences=120, sample_size=4155.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1853.8, ups=0.25, wpb=7394.6, bsz=120, num_updates=27130, lr=1.75843e-05, gnorm=0.955, clip=20, loss_scale=32, train_wall=40, gb_free=28.6, wall=111195 2023-05-02 09:27:02 - progress_bar.py[line:274] - INFO: epoch 005: 3019 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7549.5, nsentences=120, sample_size=4256.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1913.2, ups=0.25, wpb=7549.5, bsz=120, num_updates=27140, lr=1.7579e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=39, gb_free=29.1, wall=111235 2023-05-02 09:27:43 - progress_bar.py[line:274] - INFO: epoch 005: 3029 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7759.8, nsentences=120, sample_size=4056.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1924.4, ups=0.25, wpb=7759.8, bsz=120, num_updates=27150, lr=1.75737e-05, gnorm=0.95, clip=30, loss_scale=32, train_wall=40, gb_free=29.7, wall=111275 2023-05-02 09:28:23 - progress_bar.py[line:274] - INFO: epoch 005: 3039 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7915.1, nsentences=120, sample_size=4073.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1947.2, ups=0.25, wpb=7915.1, bsz=120, num_updates=27160, lr=1.75684e-05, gnorm=0.928, clip=0, loss_scale=32, train_wall=41, gb_free=30.9, wall=111316 2023-05-02 09:29:04 - progress_bar.py[line:274] - INFO: epoch 005: 3049 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7780.9, nsentences=120, sample_size=3885.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1905.3, ups=0.24, wpb=7780.9, bsz=120, num_updates=27170, lr=1.75632e-05, gnorm=0.979, clip=50, loss_scale=32, train_wall=41, gb_free=29.7, wall=111357 2023-05-02 09:29:44 - progress_bar.py[line:274] - INFO: epoch 005: 3059 / 6042 loss=2.469, loss_v1=0, loss_v2=0, nll_loss=1.219, ntokens=7780.4, nsentences=120, sample_size=4207.1, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1943.5, ups=0.25, wpb=7780.4, bsz=120, num_updates=27180, lr=1.75579e-05, gnorm=0.965, clip=50, loss_scale=32, train_wall=40, gb_free=29.4, wall=111397 2023-05-02 09:30:24 - progress_bar.py[line:274] - INFO: epoch 005: 3069 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7677.8, nsentences=120, sample_size=3931.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1909.2, ups=0.25, wpb=7677.8, bsz=120, num_updates=27190, lr=1.75526e-05, gnorm=0.94, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=111437 2023-05-02 09:31:04 - progress_bar.py[line:274] - INFO: epoch 005: 3079 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7638, nsentences=120, sample_size=4113.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1921.6, ups=0.25, wpb=7638, bsz=120, num_updates=27200, lr=1.75473e-05, gnorm=0.953, clip=30, loss_scale=32, train_wall=40, gb_free=27.2, wall=111477 2023-05-02 09:31:45 - progress_bar.py[line:274] - INFO: epoch 005: 3089 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7749.4, nsentences=120, sample_size=4112.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1904.6, ups=0.25, wpb=7749.4, bsz=120, num_updates=27210, lr=1.7542e-05, gnorm=0.961, clip=30, loss_scale=32, train_wall=41, gb_free=28.8, wall=111517 2023-05-02 09:32:24 - progress_bar.py[line:274] - INFO: epoch 005: 3099 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7594.2, nsentences=120, sample_size=3981.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1933.2, ups=0.25, wpb=7594.2, bsz=120, num_updates=27220, lr=1.75368e-05, gnorm=0.968, clip=40, loss_scale=32, train_wall=39, gb_free=30.9, wall=111557 2023-05-02 09:33:04 - progress_bar.py[line:274] - INFO: epoch 005: 3109 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7602.5, nsentences=120, sample_size=3932.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1917.7, ups=0.25, wpb=7602.5, bsz=120, num_updates=27230, lr=1.75315e-05, gnorm=1.011, clip=60, loss_scale=32, train_wall=40, gb_free=29.4, wall=111596 2023-05-02 09:33:44 - progress_bar.py[line:274] - INFO: epoch 005: 3119 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7820, nsentences=120, sample_size=4114.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1929.1, ups=0.25, wpb=7820, bsz=120, num_updates=27240, lr=1.75262e-05, gnorm=0.944, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=111637 2023-05-02 09:34:25 - progress_bar.py[line:274] - INFO: epoch 005: 3129 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7951.1, nsentences=120, sample_size=4125.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1973.5, ups=0.25, wpb=7951.1, bsz=120, num_updates=27250, lr=1.75209e-05, gnorm=0.944, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=111677 2023-05-02 09:35:04 - progress_bar.py[line:274] - INFO: epoch 005: 3139 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7520.8, nsentences=120, sample_size=4105.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1887.8, ups=0.25, wpb=7520.8, bsz=120, num_updates=27260, lr=1.75156e-05, gnorm=0.946, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=111717 2023-05-02 09:35:45 - progress_bar.py[line:274] - INFO: epoch 005: 3149 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7835.3, nsentences=120, sample_size=4158, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1929.6, ups=0.25, wpb=7835.3, bsz=120, num_updates=27270, lr=1.75103e-05, gnorm=0.955, clip=30, loss_scale=32, train_wall=41, gb_free=29.1, wall=111758 2023-05-02 09:36:25 - progress_bar.py[line:274] - INFO: epoch 005: 3159 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7826.5, nsentences=120, sample_size=4148.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1968.4, ups=0.25, wpb=7826.5, bsz=120, num_updates=27280, lr=1.75051e-05, gnorm=0.952, clip=30, loss_scale=32, train_wall=40, gb_free=29, wall=111797 2023-05-02 09:37:04 - progress_bar.py[line:274] - INFO: epoch 005: 3169 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7651.5, nsentences=120, sample_size=3908.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1935.9, ups=0.25, wpb=7651.5, bsz=120, num_updates=27290, lr=1.74998e-05, gnorm=0.999, clip=50, loss_scale=32, train_wall=39, gb_free=30.1, wall=111837 2023-05-02 09:37:45 - progress_bar.py[line:274] - INFO: epoch 005: 3179 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7754.8, nsentences=120, sample_size=3909.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1919.8, ups=0.25, wpb=7754.8, bsz=120, num_updates=27300, lr=1.74945e-05, gnorm=0.98, clip=60, loss_scale=32, train_wall=40, gb_free=29.7, wall=111877 2023-05-02 09:38:24 - progress_bar.py[line:274] - INFO: epoch 005: 3189 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7641.8, nsentences=120, sample_size=4154.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1968.2, ups=0.26, wpb=7641.8, bsz=120, num_updates=27310, lr=1.74892e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=39, gb_free=23.6, wall=111916 2023-05-02 09:39:04 - progress_bar.py[line:274] - INFO: epoch 005: 3199 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7743.2, nsentences=120, sample_size=4197.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1937.9, ups=0.25, wpb=7743.2, bsz=120, num_updates=27320, lr=1.74839e-05, gnorm=0.959, clip=20, loss_scale=32, train_wall=40, gb_free=28.4, wall=111956 2023-05-02 09:39:43 - progress_bar.py[line:274] - INFO: epoch 005: 3209 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7393, nsentences=120, sample_size=4182.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1870.1, ups=0.25, wpb=7393, bsz=120, num_updates=27330, lr=1.74787e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=39, gb_free=30.6, wall=111996 2023-05-02 09:40:23 - progress_bar.py[line:274] - INFO: epoch 005: 3219 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7855.3, nsentences=120, sample_size=4401, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1962.7, ups=0.25, wpb=7855.3, bsz=120, num_updates=27340, lr=1.74734e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=28.5, wall=112036 2023-05-02 09:41:03 - progress_bar.py[line:274] - INFO: epoch 005: 3229 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7790.8, nsentences=120, sample_size=3840.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1961.4, ups=0.25, wpb=7790.8, bsz=120, num_updates=27350, lr=1.74681e-05, gnorm=0.999, clip=50, loss_scale=32, train_wall=40, gb_free=29.1, wall=112075 2023-05-02 09:41:44 - progress_bar.py[line:274] - INFO: epoch 005: 3239 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7682.1, nsentences=120, sample_size=4024.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1884.4, ups=0.25, wpb=7682.1, bsz=120, num_updates=27360, lr=1.74628e-05, gnorm=0.96, clip=20, loss_scale=32, train_wall=41, gb_free=29.7, wall=112116 2023-05-02 09:42:24 - progress_bar.py[line:274] - INFO: epoch 005: 3249 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7857.1, nsentences=120, sample_size=4019.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1943.7, ups=0.25, wpb=7857.1, bsz=120, num_updates=27370, lr=1.74575e-05, gnorm=0.981, clip=40, loss_scale=32, train_wall=40, gb_free=25.3, wall=112156 2023-05-02 09:43:04 - progress_bar.py[line:274] - INFO: epoch 005: 3259 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7586, nsentences=120, sample_size=4041.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1907.3, ups=0.25, wpb=7586, bsz=120, num_updates=27380, lr=1.74522e-05, gnorm=0.978, clip=30, loss_scale=32, train_wall=40, gb_free=28.7, wall=112196 2023-05-02 09:43:44 - progress_bar.py[line:274] - INFO: epoch 005: 3269 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7373.2, nsentences=120, sample_size=3967.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1855.6, ups=0.25, wpb=7373.2, bsz=120, num_updates=27390, lr=1.7447e-05, gnorm=0.961, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=112236 2023-05-02 09:44:23 - progress_bar.py[line:274] - INFO: epoch 005: 3279 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7831.9, nsentences=120, sample_size=3776.8, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1964.3, ups=0.25, wpb=7831.9, bsz=120, num_updates=27400, lr=1.74417e-05, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=29.7, wall=112276 2023-05-02 09:45:03 - progress_bar.py[line:274] - INFO: epoch 005: 3289 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7600.6, nsentences=120, sample_size=4039.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1914.1, ups=0.25, wpb=7600.6, bsz=120, num_updates=27410, lr=1.74364e-05, gnorm=0.977, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=112316 2023-05-02 09:45:43 - progress_bar.py[line:274] - INFO: epoch 005: 3299 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7681.9, nsentences=120, sample_size=4603.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1920.9, ups=0.25, wpb=7681.9, bsz=120, num_updates=27420, lr=1.74311e-05, gnorm=0.897, clip=0, loss_scale=32, train_wall=40, gb_free=30.8, wall=112356 2023-05-02 09:46:23 - progress_bar.py[line:274] - INFO: epoch 005: 3309 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7866.5, nsentences=120, sample_size=4325.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1950.8, ups=0.25, wpb=7866.5, bsz=120, num_updates=27430, lr=1.74258e-05, gnorm=0.905, clip=10, loss_scale=32, train_wall=40, gb_free=31.3, wall=112396 2023-05-02 09:47:03 - progress_bar.py[line:274] - INFO: epoch 005: 3319 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7766.8, nsentences=120, sample_size=4121.1, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1972.8, ups=0.25, wpb=7766.8, bsz=120, num_updates=27440, lr=1.74205e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=39, gb_free=30.8, wall=112435 2023-05-02 09:47:43 - progress_bar.py[line:274] - INFO: epoch 005: 3329 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7521.1, nsentences=120, sample_size=4022.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1893.6, ups=0.25, wpb=7521.1, bsz=120, num_updates=27450, lr=1.74153e-05, gnorm=0.956, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=112475 2023-05-02 09:48:23 - progress_bar.py[line:274] - INFO: epoch 005: 3339 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7891.4, nsentences=120, sample_size=4012.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1971.2, ups=0.25, wpb=7891.4, bsz=120, num_updates=27460, lr=1.741e-05, gnorm=0.944, clip=30, loss_scale=32, train_wall=40, gb_free=27.4, wall=112515 2023-05-02 09:49:02 - progress_bar.py[line:274] - INFO: epoch 005: 3349 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7795.4, nsentences=120, sample_size=3964.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1975.9, ups=0.25, wpb=7795.4, bsz=120, num_updates=27470, lr=1.74047e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=112554 2023-05-02 09:49:42 - progress_bar.py[line:274] - INFO: epoch 005: 3359 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7848.3, nsentences=120, sample_size=3866, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1982.4, ups=0.25, wpb=7848.3, bsz=120, num_updates=27480, lr=1.73994e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=112594 2023-05-02 09:50:21 - progress_bar.py[line:274] - INFO: epoch 005: 3369 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7916.5, nsentences=120, sample_size=3956.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1987.9, ups=0.25, wpb=7916.5, bsz=120, num_updates=27490, lr=1.73941e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=112634 2023-05-02 09:51:01 - progress_bar.py[line:274] - INFO: epoch 005: 3379 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7511.7, nsentences=120, sample_size=4209.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1916.1, ups=0.26, wpb=7511.7, bsz=120, num_updates=27500, lr=1.73889e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=112673 2023-05-02 09:51:41 - progress_bar.py[line:274] - INFO: epoch 005: 3389 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7600.3, nsentences=120, sample_size=3934.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1905.5, ups=0.25, wpb=7600.3, bsz=120, num_updates=27510, lr=1.73836e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=112713 2023-05-02 09:52:20 - progress_bar.py[line:274] - INFO: epoch 005: 3399 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7563.2, nsentences=120, sample_size=4214.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1908.2, ups=0.25, wpb=7563.2, bsz=120, num_updates=27520, lr=1.73783e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=112753 2023-05-02 09:53:00 - progress_bar.py[line:274] - INFO: epoch 005: 3409 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7687.7, nsentences=120, sample_size=3801.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1947.6, ups=0.25, wpb=7687.7, bsz=120, num_updates=27530, lr=1.7373e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=112792 2023-05-02 09:53:39 - progress_bar.py[line:274] - INFO: epoch 005: 3419 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7820.6, nsentences=120, sample_size=3751.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1967.4, ups=0.25, wpb=7820.6, bsz=120, num_updates=27540, lr=1.73677e-05, gnorm=1.008, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=112832 2023-05-02 09:54:19 - progress_bar.py[line:274] - INFO: epoch 005: 3429 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7456.9, nsentences=120, sample_size=3922.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1876.7, ups=0.25, wpb=7456.9, bsz=120, num_updates=27550, lr=1.73624e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=112872 2023-05-02 09:54:58 - progress_bar.py[line:274] - INFO: epoch 005: 3439 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7636.1, nsentences=120, sample_size=4193.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1948.3, ups=0.26, wpb=7636.1, bsz=120, num_updates=27560, lr=1.73572e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=39, gb_free=29.3, wall=112911 2023-05-02 09:55:37 - progress_bar.py[line:274] - INFO: epoch 005: 3449 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7808.1, nsentences=120, sample_size=3882.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1996.5, ups=0.26, wpb=7808.1, bsz=120, num_updates=27570, lr=1.73519e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=112950 2023-05-02 09:56:17 - progress_bar.py[line:274] - INFO: epoch 005: 3459 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7943.3, nsentences=120, sample_size=4045.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1989.6, ups=0.25, wpb=7943.3, bsz=120, num_updates=27580, lr=1.73466e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=112990 2023-05-02 09:56:57 - progress_bar.py[line:274] - INFO: epoch 005: 3469 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7627.2, nsentences=120, sample_size=4139.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1924, ups=0.25, wpb=7627.2, bsz=120, num_updates=27590, lr=1.73413e-05, gnorm=0.941, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=113029 2023-05-02 09:57:37 - progress_bar.py[line:274] - INFO: epoch 005: 3479 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=8034.3, nsentences=120, sample_size=3835.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1991.4, ups=0.25, wpb=8034.3, bsz=120, num_updates=27600, lr=1.7336e-05, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=24.5, wall=113070 2023-05-02 09:58:17 - progress_bar.py[line:274] - INFO: epoch 005: 3489 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=7733.8, nsentences=120, sample_size=4401.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1949.8, ups=0.25, wpb=7733.8, bsz=120, num_updates=27610, lr=1.73308e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=113109 2023-05-02 09:58:57 - progress_bar.py[line:274] - INFO: epoch 005: 3499 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7768.8, nsentences=120, sample_size=4336.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1922.8, ups=0.25, wpb=7768.8, bsz=120, num_updates=27620, lr=1.73255e-05, gnorm=0.929, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=113150 2023-05-02 09:59:37 - progress_bar.py[line:274] - INFO: epoch 005: 3509 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7775.7, nsentences=120, sample_size=4122.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1965.4, ups=0.25, wpb=7775.7, bsz=120, num_updates=27630, lr=1.73202e-05, gnorm=0.91, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=113189 2023-05-02 10:00:17 - progress_bar.py[line:274] - INFO: epoch 005: 3519 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7798.4, nsentences=120, sample_size=3782.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1966, ups=0.25, wpb=7798.4, bsz=120, num_updates=27640, lr=1.73149e-05, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=113229 2023-05-02 10:00:56 - progress_bar.py[line:274] - INFO: epoch 005: 3529 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7739.6, nsentences=120, sample_size=3972, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1966.9, ups=0.25, wpb=7739.6, bsz=120, num_updates=27650, lr=1.73096e-05, gnorm=0.955, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=113268 2023-05-02 10:01:35 - progress_bar.py[line:274] - INFO: epoch 005: 3539 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7322.7, nsentences=120, sample_size=3968.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1862.9, ups=0.25, wpb=7322.7, bsz=120, num_updates=27660, lr=1.73043e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=113308 2023-05-02 10:02:16 - progress_bar.py[line:274] - INFO: epoch 005: 3549 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7663.2, nsentences=120, sample_size=4250.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1890, ups=0.25, wpb=7663.2, bsz=120, num_updates=27670, lr=1.72991e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=113348 2023-05-02 10:02:57 - progress_bar.py[line:274] - INFO: epoch 005: 3559 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7889, nsentences=120, sample_size=3922.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1936.6, ups=0.25, wpb=7889, bsz=120, num_updates=27680, lr=1.72938e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=41, gb_free=30.9, wall=113389 2023-05-02 10:03:36 - progress_bar.py[line:274] - INFO: epoch 005: 3569 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7665.3, nsentences=120, sample_size=4111.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929.8, ups=0.25, wpb=7665.3, bsz=120, num_updates=27690, lr=1.72885e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=27.9, wall=113429 2023-05-02 10:04:16 - progress_bar.py[line:274] - INFO: epoch 005: 3579 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7693.6, nsentences=120, sample_size=4043.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1942.1, ups=0.25, wpb=7693.6, bsz=120, num_updates=27700, lr=1.72832e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=113468 2023-05-02 10:04:55 - progress_bar.py[line:274] - INFO: epoch 005: 3589 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7621.1, nsentences=120, sample_size=3998.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926.8, ups=0.25, wpb=7621.1, bsz=120, num_updates=27710, lr=1.72779e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=113508 2023-05-02 10:05:35 - progress_bar.py[line:274] - INFO: epoch 005: 3599 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7497.6, nsentences=120, sample_size=4412.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1893, ups=0.25, wpb=7497.6, bsz=120, num_updates=27720, lr=1.72726e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=113548 2023-05-02 10:06:15 - progress_bar.py[line:274] - INFO: epoch 005: 3609 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7530.9, nsentences=120, sample_size=4361, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1899.1, ups=0.25, wpb=7530.9, bsz=120, num_updates=27730, lr=1.72674e-05, gnorm=0.914, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=113587 2023-05-02 10:06:56 - progress_bar.py[line:274] - INFO: epoch 005: 3619 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7896.9, nsentences=120, sample_size=4385.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1933.3, ups=0.24, wpb=7896.9, bsz=120, num_updates=27740, lr=1.72621e-05, gnorm=0.903, clip=20, loss_scale=64, train_wall=41, gb_free=30, wall=113628 2023-05-02 10:07:36 - progress_bar.py[line:274] - INFO: epoch 005: 3629 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7715.5, nsentences=120, sample_size=3928.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1893.9, ups=0.25, wpb=7715.5, bsz=120, num_updates=27750, lr=1.72568e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=41, gb_free=30.5, wall=113669 2023-05-02 10:08:17 - progress_bar.py[line:274] - INFO: epoch 005: 3639 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7602, nsentences=120, sample_size=3961.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1890.9, ups=0.25, wpb=7602, bsz=120, num_updates=27760, lr=1.72515e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=113709 2023-05-02 10:08:56 - progress_bar.py[line:274] - INFO: epoch 005: 3649 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7559.9, nsentences=120, sample_size=3763.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1925.8, ups=0.25, wpb=7559.9, bsz=120, num_updates=27770, lr=1.72462e-05, gnorm=0.997, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=113748 2023-05-02 10:09:36 - progress_bar.py[line:274] - INFO: epoch 005: 3659 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7643.4, nsentences=120, sample_size=4365.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1884.3, ups=0.25, wpb=7643.4, bsz=120, num_updates=27780, lr=1.7241e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=113789 2023-05-02 10:10:18 - progress_bar.py[line:274] - INFO: epoch 005: 3669 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7862.7, nsentences=120, sample_size=4377.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1887.5, ups=0.24, wpb=7862.7, bsz=120, num_updates=27790, lr=1.72357e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=42, gb_free=29, wall=113830 2023-05-02 10:10:57 - progress_bar.py[line:274] - INFO: epoch 005: 3679 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7128.4, nsentences=120, sample_size=4121.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1824.3, ups=0.26, wpb=7128.4, bsz=120, num_updates=27800, lr=1.72304e-05, gnorm=0.955, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=113870 2023-05-02 10:11:37 - progress_bar.py[line:274] - INFO: epoch 005: 3689 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7442.8, nsentences=120, sample_size=4168.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1866.6, ups=0.25, wpb=7442.8, bsz=120, num_updates=27810, lr=1.72251e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=113909 2023-05-02 10:12:17 - progress_bar.py[line:274] - INFO: epoch 005: 3699 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7329.3, nsentences=120, sample_size=4198.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1838.2, ups=0.25, wpb=7329.3, bsz=120, num_updates=27820, lr=1.72198e-05, gnorm=0.936, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=113949 2023-05-02 10:12:57 - progress_bar.py[line:274] - INFO: epoch 005: 3709 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7873.5, nsentences=120, sample_size=3824.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1961.3, ups=0.25, wpb=7873.5, bsz=120, num_updates=27830, lr=1.72145e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=113989 2023-05-02 10:13:38 - progress_bar.py[line:274] - INFO: epoch 005: 3719 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=8082.3, nsentences=120, sample_size=4193.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1985.7, ups=0.25, wpb=8082.3, bsz=120, num_updates=27840, lr=1.72093e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=41, gb_free=30.4, wall=114030 2023-05-02 10:14:17 - progress_bar.py[line:274] - INFO: epoch 005: 3729 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7417.4, nsentences=120, sample_size=3905.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1864.5, ups=0.25, wpb=7417.4, bsz=120, num_updates=27850, lr=1.7204e-05, gnorm=1.007, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=114070 2023-05-02 10:14:57 - progress_bar.py[line:274] - INFO: epoch 005: 3739 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7818.3, nsentences=120, sample_size=3697, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1992.1, ups=0.25, wpb=7818.3, bsz=120, num_updates=27860, lr=1.71987e-05, gnorm=1.011, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=114109 2023-05-02 10:15:37 - progress_bar.py[line:274] - INFO: epoch 005: 3749 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7642.4, nsentences=120, sample_size=4045.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1912.4, ups=0.25, wpb=7642.4, bsz=120, num_updates=27870, lr=1.71934e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=114149 2023-05-02 10:16:17 - progress_bar.py[line:274] - INFO: epoch 005: 3759 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7956, nsentences=120, sample_size=4285.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1955.7, ups=0.25, wpb=7956, bsz=120, num_updates=27880, lr=1.71881e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=41, gb_free=29.5, wall=114190 2023-05-02 10:16:58 - progress_bar.py[line:274] - INFO: epoch 005: 3769 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7773.5, nsentences=120, sample_size=4189, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1904.6, ups=0.25, wpb=7773.5, bsz=120, num_updates=27890, lr=1.71829e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=41, gb_free=28.2, wall=114231 2023-05-02 10:17:38 - progress_bar.py[line:274] - INFO: epoch 005: 3779 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7923.3, nsentences=120, sample_size=3908, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1991.3, ups=0.25, wpb=7923.3, bsz=120, num_updates=27900, lr=1.71776e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=114270 2023-05-02 10:18:17 - progress_bar.py[line:274] - INFO: epoch 005: 3789 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7610.5, nsentences=120, sample_size=4323, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.5, ups=0.25, wpb=7610.5, bsz=120, num_updates=27910, lr=1.71723e-05, gnorm=0.922, clip=0, loss_scale=64, train_wall=39, gb_free=30.1, wall=114310 2023-05-02 10:18:57 - progress_bar.py[line:274] - INFO: epoch 005: 3799 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7851.9, nsentences=120, sample_size=3860.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1985.1, ups=0.25, wpb=7851.9, bsz=120, num_updates=27920, lr=1.7167e-05, gnorm=0.954, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=114349 2023-05-02 10:19:36 - progress_bar.py[line:274] - INFO: epoch 005: 3809 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7658.8, nsentences=120, sample_size=4277.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1936.9, ups=0.25, wpb=7658.8, bsz=120, num_updates=27930, lr=1.71617e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=29.2, wall=114389 2023-05-02 10:20:16 - progress_bar.py[line:274] - INFO: epoch 005: 3819 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7756.4, nsentences=120, sample_size=4280.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1937.3, ups=0.25, wpb=7756.4, bsz=120, num_updates=27940, lr=1.71564e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=114429 2023-05-02 10:20:56 - progress_bar.py[line:274] - INFO: epoch 005: 3829 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7755.1, nsentences=120, sample_size=4147.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1956.2, ups=0.25, wpb=7755.1, bsz=120, num_updates=27950, lr=1.71512e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=31.4, wall=114469 2023-05-02 10:21:36 - progress_bar.py[line:274] - INFO: epoch 005: 3839 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7489.7, nsentences=120, sample_size=3899.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1894.4, ups=0.25, wpb=7489.7, bsz=120, num_updates=27960, lr=1.71459e-05, gnorm=0.962, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=114508 2023-05-02 10:22:16 - progress_bar.py[line:274] - INFO: epoch 005: 3849 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7659.2, nsentences=120, sample_size=4161.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1878, ups=0.25, wpb=7659.2, bsz=120, num_updates=27970, lr=1.71406e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=41, gb_free=30, wall=114549 2023-05-02 10:22:56 - progress_bar.py[line:274] - INFO: epoch 005: 3859 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7548.9, nsentences=120, sample_size=3975.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1903.2, ups=0.25, wpb=7548.9, bsz=120, num_updates=27980, lr=1.71353e-05, gnorm=0.947, clip=0, loss_scale=128, train_wall=40, gb_free=29.8, wall=114589 2023-05-02 10:23:31 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 10:23:39 - progress_bar.py[line:274] - INFO: epoch 005: 3870 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7416.2, nsentences=120, sample_size=3811.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1718.5, ups=0.23, wpb=7416.2, bsz=120, num_updates=27990, lr=1.713e-05, gnorm=0.983, clip=40, loss_scale=64, train_wall=43, gb_free=27.3, wall=114632 2023-05-02 10:24:20 - progress_bar.py[line:274] - INFO: epoch 005: 3880 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7992.5, nsentences=120, sample_size=3776.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1978.8, ups=0.25, wpb=7992.5, bsz=120, num_updates=28000, lr=1.71247e-05, gnorm=1.011, clip=80, loss_scale=64, train_wall=40, gb_free=27.5, wall=114672 2023-05-02 10:24:20 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 10:24:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 10:24:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:24:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 10:24:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:24:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 10:24:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:24:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:24:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:24:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 10:25:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:25:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:06 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 10:25:06 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:25:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:10 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 10:25:10 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 10:25:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 10:25:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 10:25:11 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.23 | loss_v1 0 | loss_v2 0 | nll_loss 2.068 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.19 | score 0.7505 | wps 3302.4 | wpb 3202.1 | bsz 39.4 | num_updates 28000 | best_score 0.7598 2023-05-02 10:25:11 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 28000 updates 2023-05-02 10:25:11 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_28000.pt 2023-05-02 10:25:35 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_28000.pt 2023-05-02 10:25:49 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_28000.pt (epoch 5 @ 28000 updates, score 0.7505) (writing took 38.20193452714011 seconds) 2023-05-02 10:26:29 - progress_bar.py[line:274] - INFO: epoch 005: 3890 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7740.8, nsentences=120, sample_size=3876.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=598, ups=0.08, wpb=7740.8, bsz=120, num_updates=28010, lr=1.71195e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=114802 2023-05-02 10:27:09 - progress_bar.py[line:274] - INFO: epoch 005: 3900 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=8008.4, nsentences=120, sample_size=4276.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1989.6, ups=0.25, wpb=8008.4, bsz=120, num_updates=28020, lr=1.71142e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=28.6, wall=114842 2023-05-02 10:27:50 - progress_bar.py[line:274] - INFO: epoch 005: 3910 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7566, nsentences=120, sample_size=4036.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1878.4, ups=0.25, wpb=7566, bsz=120, num_updates=28030, lr=1.71089e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=28.9, wall=114882 2023-05-02 10:28:29 - progress_bar.py[line:274] - INFO: epoch 005: 3920 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=8040.5, nsentences=120, sample_size=3973.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=2023.7, ups=0.25, wpb=8040.5, bsz=120, num_updates=28040, lr=1.71036e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=114922 2023-05-02 10:29:10 - progress_bar.py[line:274] - INFO: epoch 005: 3930 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7656.4, nsentences=120, sample_size=4084.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1905.8, ups=0.25, wpb=7656.4, bsz=120, num_updates=28050, lr=1.70983e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=114962 2023-05-02 10:29:49 - progress_bar.py[line:274] - INFO: epoch 005: 3940 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7600.2, nsentences=120, sample_size=4193.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1923.3, ups=0.25, wpb=7600.2, bsz=120, num_updates=28060, lr=1.70931e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=115002 2023-05-02 10:30:28 - progress_bar.py[line:274] - INFO: epoch 005: 3950 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7525.5, nsentences=120, sample_size=4239.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1911.5, ups=0.25, wpb=7525.5, bsz=120, num_updates=28070, lr=1.70878e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=115041 2023-05-02 10:31:09 - progress_bar.py[line:274] - INFO: epoch 005: 3960 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7621.8, nsentences=120, sample_size=4232, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1880.4, ups=0.25, wpb=7621.8, bsz=120, num_updates=28080, lr=1.70825e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=115081 2023-05-02 10:31:49 - progress_bar.py[line:274] - INFO: epoch 005: 3970 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7745.8, nsentences=120, sample_size=4032.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1951.7, ups=0.25, wpb=7745.8, bsz=120, num_updates=28090, lr=1.70772e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=24.2, wall=115121 2023-05-02 10:32:29 - progress_bar.py[line:274] - INFO: epoch 005: 3980 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7624.1, nsentences=120, sample_size=4127.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1905.7, ups=0.25, wpb=7624.1, bsz=120, num_updates=28100, lr=1.70719e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=115161 2023-05-02 10:33:08 - progress_bar.py[line:274] - INFO: epoch 005: 3990 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7664.4, nsentences=120, sample_size=4224.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1939.2, ups=0.25, wpb=7664.4, bsz=120, num_updates=28110, lr=1.70666e-05, gnorm=0.951, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=115201 2023-05-02 10:33:49 - progress_bar.py[line:274] - INFO: epoch 005: 4000 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.214, ntokens=7812.3, nsentences=120, sample_size=4239.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1934.4, ups=0.25, wpb=7812.3, bsz=120, num_updates=28120, lr=1.70614e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=115241 2023-05-02 10:34:29 - progress_bar.py[line:274] - INFO: epoch 005: 4010 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7964.7, nsentences=120, sample_size=3921, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1979.5, ups=0.25, wpb=7964.7, bsz=120, num_updates=28130, lr=1.70561e-05, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=115281 2023-05-02 10:35:08 - progress_bar.py[line:274] - INFO: epoch 005: 4020 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7530.9, nsentences=120, sample_size=4093.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1921.4, ups=0.26, wpb=7530.9, bsz=120, num_updates=28140, lr=1.70508e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=115320 2023-05-02 10:35:48 - progress_bar.py[line:274] - INFO: epoch 005: 4030 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7786.9, nsentences=120, sample_size=4085.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1943.8, ups=0.25, wpb=7786.9, bsz=120, num_updates=28150, lr=1.70455e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=115361 2023-05-02 10:36:28 - progress_bar.py[line:274] - INFO: epoch 005: 4040 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7866.1, nsentences=120, sample_size=4124.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1967.1, ups=0.25, wpb=7866.1, bsz=120, num_updates=28160, lr=1.70402e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=115401 2023-05-02 10:37:07 - progress_bar.py[line:274] - INFO: epoch 005: 4050 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7871.7, nsentences=120, sample_size=3771.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2011.7, ups=0.26, wpb=7871.7, bsz=120, num_updates=28170, lr=1.7035e-05, gnorm=0.999, clip=60, loss_scale=64, train_wall=39, gb_free=27.5, wall=115440 2023-05-02 10:37:47 - progress_bar.py[line:274] - INFO: epoch 005: 4060 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7513.7, nsentences=120, sample_size=3947.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1908.5, ups=0.25, wpb=7513.7, bsz=120, num_updates=28180, lr=1.70297e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=26.9, wall=115479 2023-05-02 10:38:26 - progress_bar.py[line:274] - INFO: epoch 005: 4070 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7550, nsentences=120, sample_size=4181.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1922.1, ups=0.25, wpb=7550, bsz=120, num_updates=28190, lr=1.70244e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=115518 2023-05-02 10:39:06 - progress_bar.py[line:274] - INFO: epoch 005: 4080 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7754.5, nsentences=120, sample_size=3900.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1930.8, ups=0.25, wpb=7754.5, bsz=120, num_updates=28200, lr=1.70191e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=115558 2023-05-02 10:39:45 - progress_bar.py[line:274] - INFO: epoch 005: 4090 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7591.4, nsentences=120, sample_size=4342.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1928.4, ups=0.25, wpb=7591.4, bsz=120, num_updates=28210, lr=1.70138e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=28, wall=115598 2023-05-02 10:40:25 - progress_bar.py[line:274] - INFO: epoch 005: 4100 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7670.6, nsentences=120, sample_size=4011.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1931.4, ups=0.25, wpb=7670.6, bsz=120, num_updates=28220, lr=1.70085e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=115638 2023-05-02 10:41:05 - progress_bar.py[line:274] - INFO: epoch 005: 4110 / 6042 loss=2.448, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=8015.5, nsentences=120, sample_size=3840.6, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2020.1, ups=0.25, wpb=8015.5, bsz=120, num_updates=28230, lr=1.70033e-05, gnorm=1.04, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=115677 2023-05-02 10:41:44 - progress_bar.py[line:274] - INFO: epoch 005: 4120 / 6042 loss=2.48, loss_v1=0, loss_v2=0, nll_loss=1.239, ntokens=7955.3, nsentences=120, sample_size=4194.7, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2019.3, ups=0.25, wpb=7955.3, bsz=120, num_updates=28240, lr=1.6998e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=115717 2023-05-02 10:42:24 - progress_bar.py[line:274] - INFO: epoch 005: 4130 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7670.3, nsentences=120, sample_size=4097.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1947.9, ups=0.25, wpb=7670.3, bsz=120, num_updates=28250, lr=1.69927e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=115756 2023-05-02 10:43:04 - progress_bar.py[line:274] - INFO: epoch 005: 4140 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=8129, nsentences=120, sample_size=3987.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2004.5, ups=0.25, wpb=8129, bsz=120, num_updates=28260, lr=1.69874e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=115797 2023-05-02 10:43:44 - progress_bar.py[line:274] - INFO: epoch 005: 4150 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=8113.3, nsentences=120, sample_size=3843.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2035.1, ups=0.25, wpb=8113.3, bsz=120, num_updates=28270, lr=1.69821e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=115836 2023-05-02 10:44:24 - progress_bar.py[line:274] - INFO: epoch 005: 4160 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7743.4, nsentences=120, sample_size=4525.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1928.3, ups=0.25, wpb=7743.4, bsz=120, num_updates=28280, lr=1.69768e-05, gnorm=0.913, clip=0, loss_scale=64, train_wall=40, gb_free=26.2, wall=115877 2023-05-02 10:45:04 - progress_bar.py[line:274] - INFO: epoch 005: 4170 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7580.1, nsentences=120, sample_size=3914.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1894.5, ups=0.25, wpb=7580.1, bsz=120, num_updates=28290, lr=1.69716e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=26.2, wall=115917 2023-05-02 10:45:45 - progress_bar.py[line:274] - INFO: epoch 005: 4180 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7661.7, nsentences=120, sample_size=4368.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1888.3, ups=0.25, wpb=7661.7, bsz=120, num_updates=28300, lr=1.69663e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=41, gb_free=29.6, wall=115957 2023-05-02 10:46:25 - progress_bar.py[line:274] - INFO: epoch 005: 4190 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7746.9, nsentences=120, sample_size=4257.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1936.6, ups=0.25, wpb=7746.9, bsz=120, num_updates=28310, lr=1.6961e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=115997 2023-05-02 10:47:05 - progress_bar.py[line:274] - INFO: epoch 005: 4200 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.207, ntokens=7579.9, nsentences=120, sample_size=4214.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1898.7, ups=0.25, wpb=7579.9, bsz=120, num_updates=28320, lr=1.69557e-05, gnorm=0.976, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=116037 2023-05-02 10:47:44 - progress_bar.py[line:274] - INFO: epoch 005: 4210 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7628.6, nsentences=120, sample_size=4012.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1933.2, ups=0.25, wpb=7628.6, bsz=120, num_updates=28330, lr=1.69504e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=39, gb_free=28.7, wall=116077 2023-05-02 10:48:24 - progress_bar.py[line:274] - INFO: epoch 005: 4220 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7785.6, nsentences=120, sample_size=4116.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1927.6, ups=0.25, wpb=7785.6, bsz=120, num_updates=28340, lr=1.69452e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=116117 2023-05-02 10:49:05 - progress_bar.py[line:274] - INFO: epoch 005: 4230 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7804.9, nsentences=120, sample_size=3947, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1946.6, ups=0.25, wpb=7804.9, bsz=120, num_updates=28350, lr=1.69399e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=116157 2023-05-02 10:49:44 - progress_bar.py[line:274] - INFO: epoch 005: 4240 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7808.3, nsentences=120, sample_size=4194.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1975.2, ups=0.25, wpb=7808.3, bsz=120, num_updates=28360, lr=1.69346e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=28.3, wall=116197 2023-05-02 10:50:24 - progress_bar.py[line:274] - INFO: epoch 005: 4250 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=8075.8, nsentences=120, sample_size=3826.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2030, ups=0.25, wpb=8075.8, bsz=120, num_updates=28370, lr=1.69293e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=116236 2023-05-02 10:51:05 - progress_bar.py[line:274] - INFO: epoch 005: 4260 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7793.6, nsentences=120, sample_size=3906.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1914.4, ups=0.25, wpb=7793.6, bsz=120, num_updates=28380, lr=1.6924e-05, gnorm=0.967, clip=40, loss_scale=64, train_wall=41, gb_free=29.4, wall=116277 2023-05-02 10:51:44 - progress_bar.py[line:274] - INFO: epoch 005: 4270 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7716, nsentences=120, sample_size=3802.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1961.6, ups=0.25, wpb=7716, bsz=120, num_updates=28390, lr=1.69187e-05, gnorm=0.998, clip=50, loss_scale=64, train_wall=39, gb_free=25.2, wall=116316 2023-05-02 10:52:24 - progress_bar.py[line:274] - INFO: epoch 005: 4280 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7949.6, nsentences=120, sample_size=4255.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2004.6, ups=0.25, wpb=7949.6, bsz=120, num_updates=28400, lr=1.69135e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=28.6, wall=116356 2023-05-02 10:53:04 - progress_bar.py[line:274] - INFO: epoch 005: 4290 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7634.1, nsentences=120, sample_size=4191.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1913.1, ups=0.25, wpb=7634.1, bsz=120, num_updates=28410, lr=1.69082e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=116396 2023-05-02 10:53:44 - progress_bar.py[line:274] - INFO: epoch 005: 4300 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7620.2, nsentences=120, sample_size=4093, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1899.3, ups=0.25, wpb=7620.2, bsz=120, num_updates=28420, lr=1.69029e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=116436 2023-05-02 10:54:25 - progress_bar.py[line:274] - INFO: epoch 005: 4310 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7680.7, nsentences=120, sample_size=3935.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1874.2, ups=0.24, wpb=7680.7, bsz=120, num_updates=28430, lr=1.68976e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=41, gb_free=30.5, wall=116477 2023-05-02 10:55:04 - progress_bar.py[line:274] - INFO: epoch 005: 4320 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7706.8, nsentences=120, sample_size=4179.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1943, ups=0.25, wpb=7706.8, bsz=120, num_updates=28440, lr=1.68923e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=116517 2023-05-02 10:55:44 - progress_bar.py[line:274] - INFO: epoch 005: 4330 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7643.5, nsentences=120, sample_size=4081.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1923, ups=0.25, wpb=7643.5, bsz=120, num_updates=28450, lr=1.6887e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=116557 2023-05-02 10:56:24 - progress_bar.py[line:274] - INFO: epoch 005: 4340 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7724.4, nsentences=120, sample_size=3935.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1927.2, ups=0.25, wpb=7724.4, bsz=120, num_updates=28460, lr=1.68818e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=27.5, wall=116597 2023-05-02 10:57:04 - progress_bar.py[line:274] - INFO: epoch 005: 4350 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7309.1, nsentences=120, sample_size=4107.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1834.6, ups=0.25, wpb=7309.1, bsz=120, num_updates=28470, lr=1.68765e-05, gnorm=0.976, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=116636 2023-05-02 10:57:43 - progress_bar.py[line:274] - INFO: epoch 005: 4360 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7825, nsentences=120, sample_size=3868.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1999.8, ups=0.26, wpb=7825, bsz=120, num_updates=28480, lr=1.68712e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=116676 2023-05-02 10:58:23 - progress_bar.py[line:274] - INFO: epoch 005: 4370 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7594.5, nsentences=120, sample_size=4082.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1882.9, ups=0.25, wpb=7594.5, bsz=120, num_updates=28490, lr=1.68659e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=116716 2023-05-02 10:59:03 - progress_bar.py[line:274] - INFO: epoch 005: 4380 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7414.1, nsentences=120, sample_size=3944.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1858.2, ups=0.25, wpb=7414.1, bsz=120, num_updates=28500, lr=1.68606e-05, gnorm=0.982, clip=50, loss_scale=128, train_wall=40, gb_free=30.5, wall=116756 2023-05-02 10:59:36 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 10:59:48 - progress_bar.py[line:274] - INFO: epoch 005: 4391 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7930.5, nsentences=120, sample_size=3746.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1795.1, ups=0.23, wpb=7930.5, bsz=120, num_updates=28510, lr=1.68554e-05, gnorm=1.038, clip=60, loss_scale=64, train_wall=44, gb_free=29.7, wall=116800 2023-05-02 11:00:28 - progress_bar.py[line:274] - INFO: epoch 005: 4401 / 6042 loss=2.452, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7644.3, nsentences=120, sample_size=4299.3, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1892.6, ups=0.25, wpb=7644.3, bsz=120, num_updates=28520, lr=1.68501e-05, gnorm=1, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=116840 2023-05-02 11:01:08 - progress_bar.py[line:274] - INFO: epoch 005: 4411 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7799.9, nsentences=120, sample_size=4127.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1939.4, ups=0.25, wpb=7799.9, bsz=120, num_updates=28530, lr=1.68448e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=116881 2023-05-02 11:01:48 - progress_bar.py[line:274] - INFO: epoch 005: 4421 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7497.9, nsentences=120, sample_size=4029.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1885.6, ups=0.25, wpb=7497.9, bsz=120, num_updates=28540, lr=1.68395e-05, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=116920 2023-05-02 11:02:28 - progress_bar.py[line:274] - INFO: epoch 005: 4431 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7858.9, nsentences=120, sample_size=4220.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1951.7, ups=0.25, wpb=7858.9, bsz=120, num_updates=28550, lr=1.68342e-05, gnorm=0.958, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=116961 2023-05-02 11:03:08 - progress_bar.py[line:274] - INFO: epoch 005: 4441 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7846.7, nsentences=120, sample_size=4088.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1983.5, ups=0.25, wpb=7846.7, bsz=120, num_updates=28560, lr=1.68289e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=117000 2023-05-02 11:03:47 - progress_bar.py[line:274] - INFO: epoch 005: 4451 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7634.7, nsentences=120, sample_size=4191.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1943.5, ups=0.25, wpb=7634.7, bsz=120, num_updates=28570, lr=1.68237e-05, gnorm=0.978, clip=60, loss_scale=64, train_wall=39, gb_free=29.5, wall=117039 2023-05-02 11:04:27 - progress_bar.py[line:274] - INFO: epoch 005: 4461 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7864.2, nsentences=120, sample_size=3941.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1968.9, ups=0.25, wpb=7864.2, bsz=120, num_updates=28580, lr=1.68184e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=117079 2023-05-02 11:05:07 - progress_bar.py[line:274] - INFO: epoch 005: 4471 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7616.4, nsentences=120, sample_size=3985.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1920.5, ups=0.25, wpb=7616.4, bsz=120, num_updates=28590, lr=1.68131e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=117119 2023-05-02 11:05:46 - progress_bar.py[line:274] - INFO: epoch 005: 4481 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7874.3, nsentences=120, sample_size=4085.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1979.9, ups=0.25, wpb=7874.3, bsz=120, num_updates=28600, lr=1.68078e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=117159 2023-05-02 11:06:26 - progress_bar.py[line:274] - INFO: epoch 005: 4491 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7539.3, nsentences=120, sample_size=4093.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1895.5, ups=0.25, wpb=7539.3, bsz=120, num_updates=28610, lr=1.68025e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=117199 2023-05-02 11:07:06 - progress_bar.py[line:274] - INFO: epoch 005: 4501 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7592.2, nsentences=120, sample_size=3879, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1893.5, ups=0.25, wpb=7592.2, bsz=120, num_updates=28620, lr=1.67973e-05, gnorm=1.013, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=117239 2023-05-02 11:07:46 - progress_bar.py[line:274] - INFO: epoch 005: 4511 / 6042 loss=2.478, loss_v1=0, loss_v2=0, nll_loss=1.238, ntokens=7974.4, nsentences=120, sample_size=3999.9, sample_size_v1=0, sample_size_v2=0, ppl=2.36, wps=2015.9, ups=0.25, wpb=7974.4, bsz=120, num_updates=28630, lr=1.6792e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=117278 2023-05-02 11:08:25 - progress_bar.py[line:274] - INFO: epoch 005: 4521 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7859.5, nsentences=120, sample_size=4099.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1985.9, ups=0.25, wpb=7859.5, bsz=120, num_updates=28640, lr=1.67867e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=117318 2023-05-02 11:09:05 - progress_bar.py[line:274] - INFO: epoch 005: 4531 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7389.8, nsentences=120, sample_size=4039.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1860.9, ups=0.25, wpb=7389.8, bsz=120, num_updates=28650, lr=1.67814e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=117358 2023-05-02 11:09:45 - progress_bar.py[line:274] - INFO: epoch 005: 4541 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7675.1, nsentences=120, sample_size=3861.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1932.3, ups=0.25, wpb=7675.1, bsz=120, num_updates=28660, lr=1.67761e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=26.7, wall=117397 2023-05-02 11:10:25 - progress_bar.py[line:274] - INFO: epoch 005: 4551 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7963.9, nsentences=120, sample_size=4187.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1988.9, ups=0.25, wpb=7963.9, bsz=120, num_updates=28670, lr=1.67708e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=117437 2023-05-02 11:11:04 - progress_bar.py[line:274] - INFO: epoch 005: 4561 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7545, nsentences=120, sample_size=4106.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1920.8, ups=0.25, wpb=7545, bsz=120, num_updates=28680, lr=1.67656e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=31.4, wall=117477 2023-05-02 11:11:44 - progress_bar.py[line:274] - INFO: epoch 005: 4571 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7578.6, nsentences=120, sample_size=3920.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1898.8, ups=0.25, wpb=7578.6, bsz=120, num_updates=28690, lr=1.67603e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=117517 2023-05-02 11:12:24 - progress_bar.py[line:274] - INFO: epoch 005: 4581 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7693.9, nsentences=120, sample_size=4058.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1930.4, ups=0.25, wpb=7693.9, bsz=120, num_updates=28700, lr=1.6755e-05, gnorm=0.949, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=117556 2023-05-02 11:13:04 - progress_bar.py[line:274] - INFO: epoch 005: 4591 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7796.4, nsentences=120, sample_size=4181.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1944.3, ups=0.25, wpb=7796.4, bsz=120, num_updates=28710, lr=1.67497e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=40, gb_free=29.5, wall=117596 2023-05-02 11:13:43 - progress_bar.py[line:274] - INFO: epoch 005: 4601 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=8046.8, nsentences=120, sample_size=4036.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2077.1, ups=0.26, wpb=8046.8, bsz=120, num_updates=28720, lr=1.67444e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=117635 2023-05-02 11:14:23 - progress_bar.py[line:274] - INFO: epoch 005: 4611 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7629.3, nsentences=120, sample_size=4331.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1890.7, ups=0.25, wpb=7629.3, bsz=120, num_updates=28730, lr=1.67391e-05, gnorm=0.92, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=117676 2023-05-02 11:15:03 - progress_bar.py[line:274] - INFO: epoch 005: 4621 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7869, nsentences=120, sample_size=3944.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1960.7, ups=0.25, wpb=7869, bsz=120, num_updates=28740, lr=1.67339e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=25.9, wall=117716 2023-05-02 11:15:44 - progress_bar.py[line:274] - INFO: epoch 005: 4631 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7697.1, nsentences=120, sample_size=4305.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1906.2, ups=0.25, wpb=7697.1, bsz=120, num_updates=28750, lr=1.67286e-05, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=27.2, wall=117756 2023-05-02 11:16:24 - progress_bar.py[line:274] - INFO: epoch 005: 4641 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7786.1, nsentences=120, sample_size=4139.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1952.4, ups=0.25, wpb=7786.1, bsz=120, num_updates=28760, lr=1.67233e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=117796 2023-05-02 11:17:03 - progress_bar.py[line:274] - INFO: epoch 005: 4651 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7587.9, nsentences=120, sample_size=4169.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1921.3, ups=0.25, wpb=7587.9, bsz=120, num_updates=28770, lr=1.6718e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=29, wall=117835 2023-05-02 11:17:43 - progress_bar.py[line:274] - INFO: epoch 005: 4661 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7817, nsentences=120, sample_size=3895.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1945.6, ups=0.25, wpb=7817, bsz=120, num_updates=28780, lr=1.67127e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=117876 2023-05-02 11:18:23 - progress_bar.py[line:274] - INFO: epoch 005: 4671 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7591.8, nsentences=120, sample_size=4129.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1907.4, ups=0.25, wpb=7591.8, bsz=120, num_updates=28790, lr=1.67075e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=117915 2023-05-02 11:19:02 - progress_bar.py[line:274] - INFO: epoch 005: 4681 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7686.4, nsentences=120, sample_size=3931.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1950.5, ups=0.25, wpb=7686.4, bsz=120, num_updates=28800, lr=1.67022e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=117955 2023-05-02 11:19:42 - progress_bar.py[line:274] - INFO: epoch 005: 4691 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7800.9, nsentences=120, sample_size=3989, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1977.5, ups=0.25, wpb=7800.9, bsz=120, num_updates=28810, lr=1.66969e-05, gnorm=0.951, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=117994 2023-05-02 11:20:22 - progress_bar.py[line:274] - INFO: epoch 005: 4701 / 6042 loss=2.457, loss_v1=0, loss_v2=0, nll_loss=1.218, ntokens=7776.8, nsentences=120, sample_size=3876.2, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1923.7, ups=0.25, wpb=7776.8, bsz=120, num_updates=28820, lr=1.66916e-05, gnorm=0.976, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=118035 2023-05-02 11:21:02 - progress_bar.py[line:274] - INFO: epoch 005: 4711 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7885.3, nsentences=120, sample_size=4046.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1991.8, ups=0.25, wpb=7885.3, bsz=120, num_updates=28830, lr=1.66863e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=40, gb_free=28.6, wall=118074 2023-05-02 11:21:42 - progress_bar.py[line:274] - INFO: epoch 005: 4721 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7765, nsentences=120, sample_size=4021.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1946.1, ups=0.25, wpb=7765, bsz=120, num_updates=28840, lr=1.6681e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=118114 2023-05-02 11:22:21 - progress_bar.py[line:274] - INFO: epoch 005: 4731 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7749.6, nsentences=120, sample_size=3922, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1952.1, ups=0.25, wpb=7749.6, bsz=120, num_updates=28850, lr=1.66758e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=118154 2023-05-02 11:23:02 - progress_bar.py[line:274] - INFO: epoch 005: 4741 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7562.9, nsentences=120, sample_size=4137, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1875.3, ups=0.25, wpb=7562.9, bsz=120, num_updates=28860, lr=1.66705e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=118194 2023-05-02 11:23:42 - progress_bar.py[line:274] - INFO: epoch 005: 4751 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7686.4, nsentences=120, sample_size=4067.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1891.4, ups=0.25, wpb=7686.4, bsz=120, num_updates=28870, lr=1.66652e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=41, gb_free=30.4, wall=118235 2023-05-02 11:24:22 - progress_bar.py[line:274] - INFO: epoch 005: 4761 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7774.4, nsentences=120, sample_size=4250.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1966, ups=0.25, wpb=7774.4, bsz=120, num_updates=28880, lr=1.66599e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=28.2, wall=118274 2023-05-02 11:25:01 - progress_bar.py[line:274] - INFO: epoch 005: 4771 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=8193.8, nsentences=120, sample_size=3741.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2086.7, ups=0.25, wpb=8193.8, bsz=120, num_updates=28890, lr=1.66546e-05, gnorm=1.002, clip=60, loss_scale=64, train_wall=39, gb_free=31, wall=118314 2023-05-02 11:25:41 - progress_bar.py[line:274] - INFO: epoch 005: 4781 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7529.5, nsentences=120, sample_size=3670.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902.1, ups=0.25, wpb=7529.5, bsz=120, num_updates=28900, lr=1.66494e-05, gnorm=0.939, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=118353 2023-05-02 11:26:21 - progress_bar.py[line:274] - INFO: epoch 005: 4791 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7843.4, nsentences=120, sample_size=3884.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1935, ups=0.25, wpb=7843.4, bsz=120, num_updates=28910, lr=1.66441e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=118394 2023-05-02 11:27:01 - progress_bar.py[line:274] - INFO: epoch 005: 4801 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7716.8, nsentences=120, sample_size=3976, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1932.4, ups=0.25, wpb=7716.8, bsz=120, num_updates=28920, lr=1.66388e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=25.7, wall=118434 2023-05-02 11:27:09 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 11:27:45 - progress_bar.py[line:274] - INFO: epoch 005: 4812 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7604, nsentences=120, sample_size=4092.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1724.7, ups=0.23, wpb=7604, bsz=120, num_updates=28930, lr=1.66335e-05, gnorm=0.997, clip=50, loss_scale=32, train_wall=44, gb_free=30.9, wall=118478 2023-05-02 11:28:25 - progress_bar.py[line:274] - INFO: epoch 005: 4822 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7738, nsentences=120, sample_size=4105.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1936.5, ups=0.25, wpb=7738, bsz=120, num_updates=28940, lr=1.66282e-05, gnorm=0.939, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=118518 2023-05-02 11:29:05 - progress_bar.py[line:274] - INFO: epoch 005: 4832 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7832.2, nsentences=120, sample_size=4007.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1977.8, ups=0.25, wpb=7832.2, bsz=120, num_updates=28950, lr=1.66229e-05, gnorm=0.958, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=118557 2023-05-02 11:29:45 - progress_bar.py[line:274] - INFO: epoch 005: 4842 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=8062.5, nsentences=120, sample_size=3849.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2018.9, ups=0.25, wpb=8062.5, bsz=120, num_updates=28960, lr=1.66177e-05, gnorm=0.962, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=118597 2023-05-02 11:30:24 - progress_bar.py[line:274] - INFO: epoch 005: 4852 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7823.3, nsentences=120, sample_size=3972.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2009, ups=0.26, wpb=7823.3, bsz=120, num_updates=28970, lr=1.66124e-05, gnorm=0.946, clip=10, loss_scale=32, train_wall=39, gb_free=26.2, wall=118636 2023-05-02 11:31:03 - progress_bar.py[line:274] - INFO: epoch 005: 4862 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.217, ntokens=7741.8, nsentences=120, sample_size=3992.2, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1953.9, ups=0.25, wpb=7741.8, bsz=120, num_updates=28980, lr=1.66071e-05, gnorm=0.96, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=118676 2023-05-02 11:31:43 - progress_bar.py[line:274] - INFO: epoch 005: 4872 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.209, ntokens=7801.5, nsentences=120, sample_size=4213.6, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1966.8, ups=0.25, wpb=7801.5, bsz=120, num_updates=28990, lr=1.66018e-05, gnorm=0.936, clip=0, loss_scale=32, train_wall=40, gb_free=30.6, wall=118716 2023-05-02 11:32:23 - progress_bar.py[line:274] - INFO: epoch 005: 4882 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7697.7, nsentences=120, sample_size=3769.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1930.1, ups=0.25, wpb=7697.7, bsz=120, num_updates=29000, lr=1.65965e-05, gnorm=0.998, clip=50, loss_scale=32, train_wall=40, gb_free=30.8, wall=118755 2023-05-02 11:32:23 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 11:32:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 11:32:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:32:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 11:32:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:32:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 11:32:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:32:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:32:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:32:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 11:33:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:33:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:09 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 11:33:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:33:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 11:33:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 11:33:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 11:33:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 11:33:14 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.192 | loss_v1 0 | loss_v2 0 | nll_loss 2.026 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.07 | score 0.7578 | wps 3300.7 | wpb 3202.1 | bsz 39.4 | num_updates 29000 | best_score 0.7598 2023-05-02 11:33:14 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 29000 updates 2023-05-02 11:33:14 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_29000.pt 2023-05-02 11:33:38 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_29000.pt 2023-05-02 11:34:05 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_29000.pt (epoch 5 @ 29000 updates, score 0.7578) (writing took 51.63988700089976 seconds) 2023-05-02 11:34:45 - progress_bar.py[line:274] - INFO: epoch 005: 4892 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7623.5, nsentences=120, sample_size=4136.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=536.3, ups=0.07, wpb=7623.5, bsz=120, num_updates=29010, lr=1.65912e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=39, gb_free=29.9, wall=118898 2023-05-02 11:35:25 - progress_bar.py[line:274] - INFO: epoch 005: 4902 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7564.9, nsentences=120, sample_size=3894.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1897.5, ups=0.25, wpb=7564.9, bsz=120, num_updates=29020, lr=1.6586e-05, gnorm=0.99, clip=50, loss_scale=32, train_wall=40, gb_free=30.2, wall=118938 2023-05-02 11:36:05 - progress_bar.py[line:274] - INFO: epoch 005: 4912 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7527.4, nsentences=120, sample_size=4132.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1873.6, ups=0.25, wpb=7527.4, bsz=120, num_updates=29030, lr=1.65807e-05, gnorm=0.93, clip=30, loss_scale=32, train_wall=40, gb_free=27.5, wall=118978 2023-05-02 11:36:45 - progress_bar.py[line:274] - INFO: epoch 005: 4922 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7506.4, nsentences=120, sample_size=4057.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1887.6, ups=0.25, wpb=7506.4, bsz=120, num_updates=29040, lr=1.65754e-05, gnorm=0.967, clip=20, loss_scale=32, train_wall=40, gb_free=26.1, wall=119017 2023-05-02 11:37:24 - progress_bar.py[line:274] - INFO: epoch 005: 4932 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7968.8, nsentences=120, sample_size=3767.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2016.8, ups=0.25, wpb=7968.8, bsz=120, num_updates=29050, lr=1.65701e-05, gnorm=0.971, clip=30, loss_scale=32, train_wall=39, gb_free=29.7, wall=119057 2023-05-02 11:38:05 - progress_bar.py[line:274] - INFO: epoch 005: 4942 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=8048.6, nsentences=120, sample_size=3960.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2000.7, ups=0.25, wpb=8048.6, bsz=120, num_updates=29060, lr=1.65648e-05, gnorm=0.956, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=119097 2023-05-02 11:38:44 - progress_bar.py[line:274] - INFO: epoch 005: 4952 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7942.8, nsentences=120, sample_size=3990.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2007, ups=0.25, wpb=7942.8, bsz=120, num_updates=29070, lr=1.65596e-05, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=29.2, wall=119137 2023-05-02 11:39:24 - progress_bar.py[line:274] - INFO: epoch 005: 4962 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7849.6, nsentences=120, sample_size=4159.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1954.1, ups=0.25, wpb=7849.6, bsz=120, num_updates=29080, lr=1.65543e-05, gnorm=0.936, clip=10, loss_scale=32, train_wall=40, gb_free=29.8, wall=119177 2023-05-02 11:40:04 - progress_bar.py[line:274] - INFO: epoch 005: 4972 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7855.7, nsentences=120, sample_size=4271.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2009, ups=0.26, wpb=7855.7, bsz=120, num_updates=29090, lr=1.6549e-05, gnorm=0.931, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=119216 2023-05-02 11:40:43 - progress_bar.py[line:274] - INFO: epoch 005: 4982 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7821.4, nsentences=120, sample_size=3560.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1989.9, ups=0.25, wpb=7821.4, bsz=120, num_updates=29100, lr=1.65437e-05, gnorm=0.977, clip=40, loss_scale=32, train_wall=39, gb_free=30.7, wall=119255 2023-05-02 11:41:23 - progress_bar.py[line:274] - INFO: epoch 005: 4992 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7923.3, nsentences=120, sample_size=3840.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1978.1, ups=0.25, wpb=7923.3, bsz=120, num_updates=29110, lr=1.65384e-05, gnorm=0.982, clip=30, loss_scale=32, train_wall=40, gb_free=23.6, wall=119295 2023-05-02 11:42:03 - progress_bar.py[line:274] - INFO: epoch 005: 5002 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7607.4, nsentences=120, sample_size=4003.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1912.5, ups=0.25, wpb=7607.4, bsz=120, num_updates=29120, lr=1.65331e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=40, gb_free=29, wall=119335 2023-05-02 11:42:43 - progress_bar.py[line:274] - INFO: epoch 005: 5012 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7802.7, nsentences=120, sample_size=4280.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1956.4, ups=0.25, wpb=7802.7, bsz=120, num_updates=29130, lr=1.65279e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=119375 2023-05-02 11:43:23 - progress_bar.py[line:274] - INFO: epoch 005: 5022 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7843.6, nsentences=120, sample_size=4242.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1951.8, ups=0.25, wpb=7843.6, bsz=120, num_updates=29140, lr=1.65226e-05, gnorm=0.912, clip=20, loss_scale=32, train_wall=40, gb_free=31.3, wall=119415 2023-05-02 11:44:03 - progress_bar.py[line:274] - INFO: epoch 005: 5032 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7968.6, nsentences=120, sample_size=3690.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1994.8, ups=0.25, wpb=7968.6, bsz=120, num_updates=29150, lr=1.65173e-05, gnorm=0.981, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=119455 2023-05-02 11:44:43 - progress_bar.py[line:274] - INFO: epoch 005: 5042 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=8038, nsentences=120, sample_size=4217.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2007.5, ups=0.25, wpb=8038, bsz=120, num_updates=29160, lr=1.6512e-05, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=119495 2023-05-02 11:45:23 - progress_bar.py[line:274] - INFO: epoch 005: 5052 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7597.8, nsentences=120, sample_size=4203.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1904.7, ups=0.25, wpb=7597.8, bsz=120, num_updates=29170, lr=1.65067e-05, gnorm=0.958, clip=10, loss_scale=32, train_wall=40, gb_free=30.1, wall=119535 2023-05-02 11:46:02 - progress_bar.py[line:274] - INFO: epoch 005: 5062 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7545.5, nsentences=120, sample_size=3889.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1902.7, ups=0.25, wpb=7545.5, bsz=120, num_updates=29180, lr=1.65015e-05, gnorm=0.935, clip=10, loss_scale=32, train_wall=40, gb_free=31, wall=119575 2023-05-02 11:46:42 - progress_bar.py[line:274] - INFO: epoch 005: 5072 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7747.1, nsentences=120, sample_size=4149.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1941.7, ups=0.25, wpb=7747.1, bsz=120, num_updates=29190, lr=1.64962e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=119615 2023-05-02 11:47:23 - progress_bar.py[line:274] - INFO: epoch 005: 5082 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7924.3, nsentences=120, sample_size=4164.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1967.8, ups=0.25, wpb=7924.3, bsz=120, num_updates=29200, lr=1.64909e-05, gnorm=0.944, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=119655 2023-05-02 11:48:02 - progress_bar.py[line:274] - INFO: epoch 005: 5092 / 6042 loss=2.449, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7789.4, nsentences=120, sample_size=3995.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1982.9, ups=0.25, wpb=7789.4, bsz=120, num_updates=29210, lr=1.64856e-05, gnorm=0.969, clip=20, loss_scale=32, train_wall=39, gb_free=30.8, wall=119694 2023-05-02 11:48:42 - progress_bar.py[line:274] - INFO: epoch 005: 5102 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7653.2, nsentences=120, sample_size=4001, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1906.8, ups=0.25, wpb=7653.2, bsz=120, num_updates=29220, lr=1.64803e-05, gnorm=0.982, clip=30, loss_scale=32, train_wall=40, gb_free=28.6, wall=119734 2023-05-02 11:49:22 - progress_bar.py[line:274] - INFO: epoch 005: 5112 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7689.7, nsentences=120, sample_size=4089.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1942.3, ups=0.25, wpb=7689.7, bsz=120, num_updates=29230, lr=1.6475e-05, gnorm=0.963, clip=50, loss_scale=32, train_wall=40, gb_free=29, wall=119774 2023-05-02 11:50:01 - progress_bar.py[line:274] - INFO: epoch 005: 5122 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=8032.2, nsentences=120, sample_size=3752.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2025.5, ups=0.25, wpb=8032.2, bsz=120, num_updates=29240, lr=1.64698e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=119814 2023-05-02 11:50:41 - progress_bar.py[line:274] - INFO: epoch 005: 5132 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7609, nsentences=120, sample_size=3907.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1910, ups=0.25, wpb=7609, bsz=120, num_updates=29250, lr=1.64645e-05, gnorm=0.983, clip=30, loss_scale=32, train_wall=40, gb_free=30, wall=119853 2023-05-02 11:51:21 - progress_bar.py[line:274] - INFO: epoch 005: 5142 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7618.1, nsentences=120, sample_size=3767.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1929.2, ups=0.25, wpb=7618.1, bsz=120, num_updates=29260, lr=1.64592e-05, gnorm=0.994, clip=60, loss_scale=32, train_wall=39, gb_free=30.5, wall=119893 2023-05-02 11:52:01 - progress_bar.py[line:274] - INFO: epoch 005: 5152 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7639.6, nsentences=120, sample_size=3711.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1900.9, ups=0.25, wpb=7639.6, bsz=120, num_updates=29270, lr=1.64539e-05, gnorm=1.016, clip=70, loss_scale=32, train_wall=40, gb_free=30.8, wall=119933 2023-05-02 11:52:40 - progress_bar.py[line:274] - INFO: epoch 005: 5162 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7917.8, nsentences=120, sample_size=3968.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2004.3, ups=0.25, wpb=7917.8, bsz=120, num_updates=29280, lr=1.64486e-05, gnorm=0.971, clip=40, loss_scale=32, train_wall=39, gb_free=30.2, wall=119973 2023-05-02 11:53:20 - progress_bar.py[line:274] - INFO: epoch 005: 5172 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7535.3, nsentences=120, sample_size=4094.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1900.9, ups=0.25, wpb=7535.3, bsz=120, num_updates=29290, lr=1.64433e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=120012 2023-05-02 11:54:00 - progress_bar.py[line:274] - INFO: epoch 005: 5182 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=8139.9, nsentences=120, sample_size=4203.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2052.2, ups=0.25, wpb=8139.9, bsz=120, num_updates=29300, lr=1.64381e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=28.8, wall=120052 2023-05-02 11:54:39 - progress_bar.py[line:274] - INFO: epoch 005: 5192 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7776.3, nsentences=120, sample_size=3975.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1946, ups=0.25, wpb=7776.3, bsz=120, num_updates=29310, lr=1.64328e-05, gnorm=0.974, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=120092 2023-05-02 11:55:20 - progress_bar.py[line:274] - INFO: epoch 005: 5202 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7894.3, nsentences=120, sample_size=4111.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1925.5, ups=0.24, wpb=7894.3, bsz=120, num_updates=29320, lr=1.64275e-05, gnorm=0.934, clip=30, loss_scale=32, train_wall=41, gb_free=29.5, wall=120133 2023-05-02 11:55:59 - progress_bar.py[line:274] - INFO: epoch 005: 5212 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7674.7, nsentences=120, sample_size=3667.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1984, ups=0.26, wpb=7674.7, bsz=120, num_updates=29330, lr=1.64222e-05, gnorm=0.995, clip=50, loss_scale=32, train_wall=39, gb_free=31.2, wall=120172 2023-05-02 11:56:39 - progress_bar.py[line:274] - INFO: epoch 005: 5222 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7700.1, nsentences=120, sample_size=4111.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1950.2, ups=0.25, wpb=7700.1, bsz=120, num_updates=29340, lr=1.64169e-05, gnorm=0.97, clip=30, loss_scale=32, train_wall=39, gb_free=30.3, wall=120211 2023-05-02 11:57:18 - progress_bar.py[line:274] - INFO: epoch 005: 5232 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7838.1, nsentences=120, sample_size=4024.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1970.3, ups=0.25, wpb=7838.1, bsz=120, num_updates=29350, lr=1.64117e-05, gnorm=0.956, clip=10, loss_scale=32, train_wall=40, gb_free=25.7, wall=120251 2023-05-02 11:57:58 - progress_bar.py[line:274] - INFO: epoch 005: 5242 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7695.2, nsentences=120, sample_size=3795.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1953.5, ups=0.25, wpb=7695.2, bsz=120, num_updates=29360, lr=1.64064e-05, gnorm=0.976, clip=20, loss_scale=32, train_wall=39, gb_free=30.3, wall=120290 2023-05-02 11:58:39 - progress_bar.py[line:274] - INFO: epoch 005: 5252 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7936.1, nsentences=120, sample_size=3894, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1945.8, ups=0.25, wpb=7936.1, bsz=120, num_updates=29370, lr=1.64011e-05, gnorm=0.997, clip=40, loss_scale=32, train_wall=41, gb_free=30, wall=120331 2023-05-02 11:59:18 - progress_bar.py[line:274] - INFO: epoch 005: 5262 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7296.6, nsentences=120, sample_size=4213.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1831.9, ups=0.25, wpb=7296.6, bsz=120, num_updates=29380, lr=1.63958e-05, gnorm=0.977, clip=40, loss_scale=32, train_wall=40, gb_free=25.1, wall=120371 2023-05-02 11:59:58 - progress_bar.py[line:274] - INFO: epoch 005: 5272 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7776.2, nsentences=120, sample_size=4161, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1954.4, ups=0.25, wpb=7776.2, bsz=120, num_updates=29390, lr=1.63905e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=30.9, wall=120411 2023-05-02 12:00:38 - progress_bar.py[line:274] - INFO: epoch 005: 5282 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7599, nsentences=120, sample_size=3975, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1912.1, ups=0.25, wpb=7599, bsz=120, num_updates=29400, lr=1.63852e-05, gnorm=0.973, clip=30, loss_scale=32, train_wall=40, gb_free=31.4, wall=120450 2023-05-02 12:01:18 - progress_bar.py[line:274] - INFO: epoch 005: 5292 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7656.6, nsentences=120, sample_size=4091.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1926.6, ups=0.25, wpb=7656.6, bsz=120, num_updates=29410, lr=1.638e-05, gnorm=0.955, clip=30, loss_scale=32, train_wall=40, gb_free=30.5, wall=120490 2023-05-02 12:01:57 - progress_bar.py[line:274] - INFO: epoch 005: 5302 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7693.9, nsentences=120, sample_size=4192.2, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1942.5, ups=0.25, wpb=7693.9, bsz=120, num_updates=29420, lr=1.63747e-05, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=31.3, wall=120530 2023-05-02 12:02:37 - progress_bar.py[line:274] - INFO: epoch 005: 5312 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7726.7, nsentences=120, sample_size=4036.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1945.1, ups=0.25, wpb=7726.7, bsz=120, num_updates=29430, lr=1.63694e-05, gnorm=0.953, clip=10, loss_scale=32, train_wall=40, gb_free=29.4, wall=120570 2023-05-02 12:03:16 - progress_bar.py[line:274] - INFO: epoch 005: 5322 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7658.5, nsentences=120, sample_size=4044.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1957.7, ups=0.26, wpb=7658.5, bsz=120, num_updates=29440, lr=1.63641e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=120609 2023-05-02 12:03:56 - progress_bar.py[line:274] - INFO: epoch 005: 5332 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7846.2, nsentences=120, sample_size=4114.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1950.9, ups=0.25, wpb=7846.2, bsz=120, num_updates=29450, lr=1.63588e-05, gnorm=0.944, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=120649 2023-05-02 12:04:37 - progress_bar.py[line:274] - INFO: epoch 005: 5342 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7831.8, nsentences=120, sample_size=3991.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1920.5, ups=0.25, wpb=7831.8, bsz=120, num_updates=29460, lr=1.63536e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=41, gb_free=29.7, wall=120690 2023-05-02 12:05:18 - progress_bar.py[line:274] - INFO: epoch 005: 5352 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7524.2, nsentences=120, sample_size=3979.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1866, ups=0.25, wpb=7524.2, bsz=120, num_updates=29470, lr=1.63483e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=120730 2023-05-02 12:05:58 - progress_bar.py[line:274] - INFO: epoch 005: 5362 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7802, nsentences=120, sample_size=3968.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1933.3, ups=0.25, wpb=7802, bsz=120, num_updates=29480, lr=1.6343e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=120770 2023-05-02 12:06:38 - progress_bar.py[line:274] - INFO: epoch 005: 5372 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7791, nsentences=120, sample_size=3985.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1961.8, ups=0.25, wpb=7791, bsz=120, num_updates=29490, lr=1.63377e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=120810 2023-05-02 12:07:17 - progress_bar.py[line:274] - INFO: epoch 005: 5382 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7715.9, nsentences=120, sample_size=3985, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1936.7, ups=0.25, wpb=7715.9, bsz=120, num_updates=29500, lr=1.63324e-05, gnorm=0.963, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=120850 2023-05-02 12:07:57 - progress_bar.py[line:274] - INFO: epoch 005: 5392 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7478.7, nsentences=120, sample_size=3826.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1904.4, ups=0.25, wpb=7478.7, bsz=120, num_updates=29510, lr=1.63271e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=29, wall=120889 2023-05-02 12:08:36 - progress_bar.py[line:274] - INFO: epoch 005: 5402 / 6042 loss=2.475, loss_v1=0, loss_v2=0, nll_loss=1.23, ntokens=7686.7, nsentences=120, sample_size=4208.4, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1946.8, ups=0.25, wpb=7686.7, bsz=120, num_updates=29520, lr=1.63219e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=28, wall=120929 2023-05-02 12:09:16 - progress_bar.py[line:274] - INFO: epoch 005: 5412 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.196, ntokens=7734.9, nsentences=120, sample_size=3810.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1937.6, ups=0.25, wpb=7734.9, bsz=120, num_updates=29530, lr=1.63166e-05, gnorm=0.932, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=120969 2023-05-02 12:09:56 - progress_bar.py[line:274] - INFO: epoch 005: 5422 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7571.7, nsentences=120, sample_size=4240.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1879.8, ups=0.25, wpb=7571.7, bsz=120, num_updates=29540, lr=1.63113e-05, gnorm=0.924, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=121009 2023-05-02 12:10:36 - progress_bar.py[line:274] - INFO: epoch 005: 5432 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7653, nsentences=120, sample_size=4112.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1922.5, ups=0.25, wpb=7653, bsz=120, num_updates=29550, lr=1.6306e-05, gnorm=0.906, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=121049 2023-05-02 12:11:16 - progress_bar.py[line:274] - INFO: epoch 005: 5442 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7548.4, nsentences=120, sample_size=4109.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1885.6, ups=0.25, wpb=7548.4, bsz=120, num_updates=29560, lr=1.63007e-05, gnorm=0.913, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=121089 2023-05-02 12:11:56 - progress_bar.py[line:274] - INFO: epoch 005: 5452 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7894.3, nsentences=120, sample_size=4112.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1971.9, ups=0.25, wpb=7894.3, bsz=120, num_updates=29570, lr=1.62954e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=121129 2023-05-02 12:12:37 - progress_bar.py[line:274] - INFO: epoch 005: 5462 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7898.4, nsentences=120, sample_size=4278.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1934.6, ups=0.24, wpb=7898.4, bsz=120, num_updates=29580, lr=1.62902e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=41, gb_free=27.8, wall=121170 2023-05-02 12:13:17 - progress_bar.py[line:274] - INFO: epoch 005: 5472 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7899.4, nsentences=120, sample_size=4097.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2000, ups=0.25, wpb=7899.4, bsz=120, num_updates=29590, lr=1.62849e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=23.6, wall=121209 2023-05-02 12:13:57 - progress_bar.py[line:274] - INFO: epoch 005: 5482 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7715.3, nsentences=120, sample_size=4155.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1927.6, ups=0.25, wpb=7715.3, bsz=120, num_updates=29600, lr=1.62796e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=121249 2023-05-02 12:14:36 - progress_bar.py[line:274] - INFO: epoch 005: 5492 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7818.1, nsentences=120, sample_size=4011.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1979.1, ups=0.25, wpb=7818.1, bsz=120, num_updates=29610, lr=1.62743e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=121289 2023-05-02 12:15:15 - progress_bar.py[line:274] - INFO: epoch 005: 5502 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7715.6, nsentences=120, sample_size=4068.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1979, ups=0.26, wpb=7715.6, bsz=120, num_updates=29620, lr=1.6269e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=121328 2023-05-02 12:15:55 - progress_bar.py[line:274] - INFO: epoch 005: 5512 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7873.7, nsentences=120, sample_size=4017, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1991.7, ups=0.25, wpb=7873.7, bsz=120, num_updates=29630, lr=1.62638e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=121367 2023-05-02 12:16:34 - progress_bar.py[line:274] - INFO: epoch 005: 5522 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7906.4, nsentences=120, sample_size=3805.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1991.9, ups=0.25, wpb=7906.4, bsz=120, num_updates=29640, lr=1.62585e-05, gnorm=0.949, clip=0, loss_scale=64, train_wall=40, gb_free=26.7, wall=121407 2023-05-02 12:17:14 - progress_bar.py[line:274] - INFO: epoch 005: 5532 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7572.8, nsentences=120, sample_size=4104.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1887.9, ups=0.25, wpb=7572.8, bsz=120, num_updates=29650, lr=1.62532e-05, gnorm=0.959, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=121447 2023-05-02 12:17:55 - progress_bar.py[line:274] - INFO: epoch 005: 5542 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7955.9, nsentences=120, sample_size=3932.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.3, ups=0.25, wpb=7955.9, bsz=120, num_updates=29660, lr=1.62479e-05, gnorm=0.958, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=121487 2023-05-02 12:18:35 - progress_bar.py[line:274] - INFO: epoch 005: 5552 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7733.6, nsentences=120, sample_size=3985.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1926.6, ups=0.25, wpb=7733.6, bsz=120, num_updates=29670, lr=1.62426e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=121527 2023-05-02 12:19:15 - progress_bar.py[line:274] - INFO: epoch 005: 5562 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=8089.2, nsentences=120, sample_size=3895.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2014.6, ups=0.25, wpb=8089.2, bsz=120, num_updates=29680, lr=1.62373e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=121568 2023-05-02 12:19:55 - progress_bar.py[line:274] - INFO: epoch 005: 5572 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=8000, nsentences=120, sample_size=3994.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2017, ups=0.25, wpb=8000, bsz=120, num_updates=29690, lr=1.62321e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=121607 2023-05-02 12:20:34 - progress_bar.py[line:274] - INFO: epoch 005: 5582 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7923.7, nsentences=120, sample_size=3988.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2026.3, ups=0.26, wpb=7923.7, bsz=120, num_updates=29700, lr=1.62268e-05, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=121646 2023-05-02 12:21:14 - progress_bar.py[line:274] - INFO: epoch 005: 5592 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7997.3, nsentences=120, sample_size=3906.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1971.3, ups=0.25, wpb=7997.3, bsz=120, num_updates=29710, lr=1.62215e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=121687 2023-05-02 12:21:54 - progress_bar.py[line:274] - INFO: epoch 005: 5602 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7884.5, nsentences=120, sample_size=4017.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1979.3, ups=0.25, wpb=7884.5, bsz=120, num_updates=29720, lr=1.62162e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=121727 2023-05-02 12:22:34 - progress_bar.py[line:274] - INFO: epoch 005: 5612 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7645.5, nsentences=120, sample_size=4045.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1908.7, ups=0.25, wpb=7645.5, bsz=120, num_updates=29730, lr=1.62109e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=121767 2023-05-02 12:23:14 - progress_bar.py[line:274] - INFO: epoch 005: 5622 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7939.5, nsentences=120, sample_size=3767.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1994.2, ups=0.25, wpb=7939.5, bsz=120, num_updates=29740, lr=1.62057e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=121807 2023-05-02 12:23:54 - progress_bar.py[line:274] - INFO: epoch 005: 5632 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7591.7, nsentences=120, sample_size=4101.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1901.2, ups=0.25, wpb=7591.7, bsz=120, num_updates=29750, lr=1.62004e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=121847 2023-05-02 12:24:34 - progress_bar.py[line:274] - INFO: epoch 005: 5642 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7811.9, nsentences=120, sample_size=4240.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.1, ups=0.25, wpb=7811.9, bsz=120, num_updates=29760, lr=1.61951e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=121887 2023-05-02 12:25:14 - progress_bar.py[line:274] - INFO: epoch 005: 5652 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7593.3, nsentences=120, sample_size=4047.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1925, ups=0.25, wpb=7593.3, bsz=120, num_updates=29770, lr=1.61898e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=28.5, wall=121926 2023-05-02 12:25:53 - progress_bar.py[line:274] - INFO: epoch 005: 5662 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7786.9, nsentences=120, sample_size=3909.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1966.5, ups=0.25, wpb=7786.9, bsz=120, num_updates=29780, lr=1.61845e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=121966 2023-05-02 12:26:34 - progress_bar.py[line:274] - INFO: epoch 005: 5672 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=8009.2, nsentences=119.2, sample_size=3819.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1979.2, ups=0.25, wpb=8009.2, bsz=119.2, num_updates=29790, lr=1.61792e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=122006 2023-05-02 12:27:13 - progress_bar.py[line:274] - INFO: epoch 005: 5682 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7624.1, nsentences=120, sample_size=3934.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1940.7, ups=0.25, wpb=7624.1, bsz=120, num_updates=29800, lr=1.6174e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=122045 2023-05-02 12:27:53 - progress_bar.py[line:274] - INFO: epoch 005: 5692 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7714, nsentences=120, sample_size=4261.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1906.8, ups=0.25, wpb=7714, bsz=120, num_updates=29810, lr=1.61687e-05, gnorm=0.929, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=122086 2023-05-02 12:28:33 - progress_bar.py[line:274] - INFO: epoch 005: 5702 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7301.1, nsentences=120, sample_size=4099.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1854.2, ups=0.25, wpb=7301.1, bsz=120, num_updates=29820, lr=1.61634e-05, gnorm=0.954, clip=50, loss_scale=64, train_wall=39, gb_free=28.8, wall=122125 2023-05-02 12:29:12 - progress_bar.py[line:274] - INFO: epoch 005: 5712 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7627, nsentences=120, sample_size=4109.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1933.7, ups=0.25, wpb=7627, bsz=120, num_updates=29830, lr=1.61581e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=122165 2023-05-02 12:29:52 - progress_bar.py[line:274] - INFO: epoch 005: 5722 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7910, nsentences=120, sample_size=3894.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2007.6, ups=0.25, wpb=7910, bsz=120, num_updates=29840, lr=1.61528e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=27.6, wall=122204 2023-05-02 12:30:32 - progress_bar.py[line:274] - INFO: epoch 005: 5732 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7910, nsentences=120, sample_size=3897.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1942.6, ups=0.25, wpb=7910, bsz=120, num_updates=29850, lr=1.61475e-05, gnorm=0.976, clip=50, loss_scale=64, train_wall=41, gb_free=24.9, wall=122245 2023-05-02 12:31:12 - progress_bar.py[line:274] - INFO: epoch 005: 5742 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7757, nsentences=120, sample_size=4300.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1960.3, ups=0.25, wpb=7757, bsz=120, num_updates=29860, lr=1.61423e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=29, wall=122284 2023-05-02 12:31:52 - progress_bar.py[line:274] - INFO: epoch 005: 5752 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7512, nsentences=120, sample_size=4166.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1894.8, ups=0.25, wpb=7512, bsz=120, num_updates=29870, lr=1.6137e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=122324 2023-05-02 12:32:31 - progress_bar.py[line:274] - INFO: epoch 005: 5762 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7532.2, nsentences=120, sample_size=4141, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1898, ups=0.25, wpb=7532.2, bsz=120, num_updates=29880, lr=1.61317e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=122364 2023-05-02 12:33:11 - progress_bar.py[line:274] - INFO: epoch 005: 5772 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7893.5, nsentences=120, sample_size=3913.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1969, ups=0.25, wpb=7893.5, bsz=120, num_updates=29890, lr=1.61264e-05, gnorm=0.953, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=122404 2023-05-02 12:33:51 - progress_bar.py[line:274] - INFO: epoch 005: 5782 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7659.4, nsentences=120, sample_size=4105.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1938.2, ups=0.25, wpb=7659.4, bsz=120, num_updates=29900, lr=1.61211e-05, gnorm=0.968, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=122443 2023-05-02 12:34:30 - progress_bar.py[line:274] - INFO: epoch 005: 5792 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7665.8, nsentences=120, sample_size=3732, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1961.7, ups=0.26, wpb=7665.8, bsz=120, num_updates=29910, lr=1.61159e-05, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=122482 2023-05-02 12:35:10 - progress_bar.py[line:274] - INFO: epoch 005: 5802 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7629.9, nsentences=120, sample_size=4044.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1920.1, ups=0.25, wpb=7629.9, bsz=120, num_updates=29920, lr=1.61106e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=122522 2023-05-02 12:35:50 - progress_bar.py[line:274] - INFO: epoch 005: 5812 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7753.4, nsentences=120, sample_size=4214.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929.9, ups=0.25, wpb=7753.4, bsz=120, num_updates=29930, lr=1.61053e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=31.5, wall=122562 2023-05-02 12:36:29 - progress_bar.py[line:274] - INFO: epoch 005: 5822 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7778.1, nsentences=120, sample_size=4096.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1978.7, ups=0.25, wpb=7778.1, bsz=120, num_updates=29940, lr=1.61e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=27.7, wall=122602 2023-05-02 12:37:09 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 12:37:13 - progress_bar.py[line:274] - INFO: epoch 005: 5833 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7985.8, nsentences=120, sample_size=3637.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1842.1, ups=0.23, wpb=7985.8, bsz=120, num_updates=29950, lr=1.60947e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=43, gb_free=30.7, wall=122645 2023-05-02 12:37:52 - progress_bar.py[line:274] - INFO: epoch 005: 5843 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.215, ntokens=7877.5, nsentences=120, sample_size=4158.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1988.2, ups=0.25, wpb=7877.5, bsz=120, num_updates=29960, lr=1.60894e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=122685 2023-05-02 12:38:32 - progress_bar.py[line:274] - INFO: epoch 005: 5853 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7705.6, nsentences=120, sample_size=3962.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1926.6, ups=0.25, wpb=7705.6, bsz=120, num_updates=29970, lr=1.60842e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=122725 2023-05-02 12:39:11 - progress_bar.py[line:274] - INFO: epoch 005: 5863 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7463.9, nsentences=120, sample_size=3970.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1918, ups=0.26, wpb=7463.9, bsz=120, num_updates=29980, lr=1.60789e-05, gnorm=1.053, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=122764 2023-05-02 12:39:52 - progress_bar.py[line:274] - INFO: epoch 005: 5873 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7994.7, nsentences=120, sample_size=3894.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1975.7, ups=0.25, wpb=7994.7, bsz=120, num_updates=29990, lr=1.60736e-05, gnorm=0.989, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=122804 2023-05-02 12:40:32 - progress_bar.py[line:274] - INFO: epoch 005: 5883 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7672.3, nsentences=120, sample_size=3917.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1913.9, ups=0.25, wpb=7672.3, bsz=120, num_updates=30000, lr=1.60683e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=122844 2023-05-02 12:40:32 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 12:40:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:40:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:40:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:40:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:40:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:40:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:40:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:03 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:41:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:41:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:14 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:41:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:41:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:18 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 12:41:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:41:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:41:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:41:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:41:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:41:23 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.201 | loss_v1 0 | loss_v2 0 | nll_loss 2.035 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.1 | score 0.7627 | wps 3303.2 | wpb 3202.1 | bsz 39.4 | num_updates 30000 | best_score 0.7627 2023-05-02 12:41:23 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 30000 updates 2023-05-02 12:41:23 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_30000.pt 2023-05-02 12:41:49 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_30000.pt 2023-05-02 12:42:29 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_5_30000.pt (epoch 5 @ 30000 updates, score 0.7627) (writing took 65.80319508793764 seconds) 2023-05-02 12:43:10 - progress_bar.py[line:274] - INFO: epoch 005: 5893 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7610.4, nsentences=120, sample_size=4102.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=480.6, ups=0.06, wpb=7610.4, bsz=120, num_updates=30010, lr=1.6063e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=123002 2023-05-02 12:43:49 - progress_bar.py[line:274] - INFO: epoch 005: 5903 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7378.9, nsentences=120, sample_size=4004.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1874, ups=0.25, wpb=7378.9, bsz=120, num_updates=30020, lr=1.60578e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=39, gb_free=28.7, wall=123042 2023-05-02 12:44:29 - progress_bar.py[line:274] - INFO: epoch 005: 5913 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7531, nsentences=120, sample_size=3913.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1904, ups=0.25, wpb=7531, bsz=120, num_updates=30030, lr=1.60525e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=123081 2023-05-02 12:45:09 - progress_bar.py[line:274] - INFO: epoch 005: 5923 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7824.9, nsentences=120, sample_size=3889.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1936.2, ups=0.25, wpb=7824.9, bsz=120, num_updates=30040, lr=1.60472e-05, gnorm=0.99, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=123122 2023-05-02 12:45:49 - progress_bar.py[line:274] - INFO: epoch 005: 5933 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7700.2, nsentences=120, sample_size=3769, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1931.3, ups=0.25, wpb=7700.2, bsz=120, num_updates=30050, lr=1.60419e-05, gnorm=1.008, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=123162 2023-05-02 12:46:29 - progress_bar.py[line:274] - INFO: epoch 005: 5943 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7794.9, nsentences=120, sample_size=3660.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1939.7, ups=0.25, wpb=7794.9, bsz=120, num_updates=30060, lr=1.60366e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=28.5, wall=123202 2023-05-02 12:47:09 - progress_bar.py[line:274] - INFO: epoch 005: 5953 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7784.3, nsentences=120, sample_size=4194.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1974.7, ups=0.25, wpb=7784.3, bsz=120, num_updates=30070, lr=1.60313e-05, gnorm=0.936, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=123241 2023-05-02 12:47:48 - progress_bar.py[line:274] - INFO: epoch 005: 5963 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7221.5, nsentences=120, sample_size=4051.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1840.9, ups=0.25, wpb=7221.5, bsz=120, num_updates=30080, lr=1.60261e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=123281 2023-05-02 12:48:28 - progress_bar.py[line:274] - INFO: epoch 005: 5973 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7403.3, nsentences=120, sample_size=4287.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1837.9, ups=0.25, wpb=7403.3, bsz=120, num_updates=30090, lr=1.60208e-05, gnorm=0.961, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=123321 2023-05-02 12:49:08 - progress_bar.py[line:274] - INFO: epoch 005: 5983 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7614, nsentences=120, sample_size=3873.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917.1, ups=0.25, wpb=7614, bsz=120, num_updates=30100, lr=1.60155e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=123361 2023-05-02 12:49:48 - progress_bar.py[line:274] - INFO: epoch 005: 5993 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7833, nsentences=120, sample_size=4204.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1984.8, ups=0.25, wpb=7833, bsz=120, num_updates=30110, lr=1.60102e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=123400 2023-05-02 12:50:27 - progress_bar.py[line:274] - INFO: epoch 005: 6003 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7695.3, nsentences=120, sample_size=4172.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1937.5, ups=0.25, wpb=7695.3, bsz=120, num_updates=30120, lr=1.60049e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=123440 2023-05-02 12:51:07 - progress_bar.py[line:274] - INFO: epoch 005: 6013 / 6042 loss=2.459, loss_v1=0, loss_v2=0, nll_loss=1.212, ntokens=7880.8, nsentences=120, sample_size=4376.8, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1964.4, ups=0.25, wpb=7880.8, bsz=120, num_updates=30130, lr=1.59996e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=123480 2023-05-02 12:51:48 - progress_bar.py[line:274] - INFO: epoch 005: 6023 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7883.5, nsentences=120, sample_size=4205.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1963.9, ups=0.25, wpb=7883.5, bsz=120, num_updates=30140, lr=1.59944e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=123520 2023-05-02 12:52:27 - progress_bar.py[line:274] - INFO: epoch 005: 6033 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7494.9, nsentences=120, sample_size=4105.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908.8, ups=0.25, wpb=7494.9, bsz=120, num_updates=30150, lr=1.59891e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=123559 2023-05-02 12:53:01 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 12:53:03 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:53:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:53:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:53:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:43 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:53:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:48 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 12:53:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 12:53:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 12:53:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 12:53:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 12:53:53 - progress_bar.py[line:282] - INFO: epoch 005 | valid on 'valid' subset | loss 3.223 | loss_v1 0 | loss_v2 0 | nll_loss 2.057 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.16 | score 0.751 | wps 3289.9 | wpb 3202.1 | bsz 39.4 | num_updates 30159 | best_score 0.7627 2023-05-02 12:53:53 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 5 @ 30159 updates 2023-05-02 12:53:53 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-02 12:54:18 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-02 12:54:19 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 5 @ 30159 updates, score 0.751) (writing took 26.072542655048892 seconds) 2023-05-02 12:54:19 - train.py[line:332] - INFO: end of epoch 5 (average epoch stats below) 2023-05-02 12:54:19 - progress_bar.py[line:282] - INFO: epoch 005 | loss 2.409 | loss_v1 0 | loss_v2 0 | nll_loss 1.156 | ntokens 7728.26 | nsentences 119.992 | sample_size 4041.64 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.23 | wps 1883.2 | ups 0.24 | wpb 7728.3 | bsz 120 | num_updates 30159 | lr 1.59843e-05 | gnorm 0.958 | clip 24.7 | loss_scale 64 | train_wall 24017 | gb_free 30 | wall 123671 2023-05-02 12:54:19 - trainer.py[line:639] - INFO: loading train data for epoch 6 2023-05-02 12:54:19 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-02 12:54:19 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-02 12:54:21 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-02 12:54:22 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-02 12:54:23 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-02 12:54:23 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-02 12:54:23 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-02 12:54:23 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-02 12:54:24 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-02 12:54:24 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-02 12:54:25 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-02 12:54:25 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-02 12:54:25 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-02 12:54:25 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-02 12:54:26 - trainer.py[line:703] - INFO: begin training epoch 6 2023-05-02 12:54:26 - train.py[line:305] - INFO: Start iterating over samples 2023-05-02 12:54:30 - progress_bar.py[line:274] - INFO: epoch 006: 1 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7282.8, nsentences=116, sample_size=3825, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=591.6, ups=0.08, wpb=7282.8, bsz=116, num_updates=30160, lr=1.59838e-05, gnorm=1.004, clip=60, loss_scale=64, train_wall=38, gb_free=29.9, wall=123682 2023-05-02 12:55:10 - progress_bar.py[line:274] - INFO: epoch 006: 11 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7651.6, nsentences=120, sample_size=4163.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1923.6, ups=0.25, wpb=7651.6, bsz=120, num_updates=30170, lr=1.59785e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=123722 2023-05-02 12:55:50 - progress_bar.py[line:274] - INFO: epoch 006: 21 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.187, ntokens=7965.3, nsentences=120, sample_size=3768.2, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1991.8, ups=0.25, wpb=7965.3, bsz=120, num_updates=30180, lr=1.59732e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=123762 2023-05-02 12:56:30 - progress_bar.py[line:274] - INFO: epoch 006: 31 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7556.8, nsentences=120, sample_size=4080.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1856.5, ups=0.25, wpb=7556.8, bsz=120, num_updates=30190, lr=1.5968e-05, gnorm=0.931, clip=10, loss_scale=64, train_wall=41, gb_free=30, wall=123803 2023-05-02 12:57:10 - progress_bar.py[line:274] - INFO: epoch 006: 41 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7687.2, nsentences=120, sample_size=3826.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1936.3, ups=0.25, wpb=7687.2, bsz=120, num_updates=30200, lr=1.59627e-05, gnorm=1.01, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=123843 2023-05-02 12:57:50 - progress_bar.py[line:274] - INFO: epoch 006: 51 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7554.6, nsentences=120, sample_size=3941.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1907.3, ups=0.25, wpb=7554.6, bsz=120, num_updates=30210, lr=1.59574e-05, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=123882 2023-05-02 12:58:29 - progress_bar.py[line:274] - INFO: epoch 006: 61 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7820.9, nsentences=120, sample_size=3988.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1990.7, ups=0.25, wpb=7820.9, bsz=120, num_updates=30220, lr=1.59521e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=123921 2023-05-02 12:59:09 - progress_bar.py[line:274] - INFO: epoch 006: 71 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7726.6, nsentences=120, sample_size=4195.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1931.5, ups=0.25, wpb=7726.6, bsz=120, num_updates=30230, lr=1.59468e-05, gnorm=0.941, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=123961 2023-05-02 12:59:49 - progress_bar.py[line:274] - INFO: epoch 006: 81 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7859.3, nsentences=120, sample_size=3699, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1983, ups=0.25, wpb=7859.3, bsz=120, num_updates=30240, lr=1.59415e-05, gnorm=0.989, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=124001 2023-05-02 13:00:29 - progress_bar.py[line:274] - INFO: epoch 006: 91 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7785.2, nsentences=120, sample_size=3911.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1928.1, ups=0.25, wpb=7785.2, bsz=120, num_updates=30250, lr=1.59363e-05, gnorm=0.932, clip=30, loss_scale=64, train_wall=40, gb_free=25.4, wall=124041 2023-05-02 13:01:09 - progress_bar.py[line:274] - INFO: epoch 006: 101 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7679.5, nsentences=120, sample_size=4215.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916, ups=0.25, wpb=7679.5, bsz=120, num_updates=30260, lr=1.5931e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=124082 2023-05-02 13:01:49 - progress_bar.py[line:274] - INFO: epoch 006: 111 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7799.8, nsentences=120, sample_size=3670.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1975.2, ups=0.25, wpb=7799.8, bsz=120, num_updates=30270, lr=1.59257e-05, gnorm=0.993, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=124121 2023-05-02 13:02:30 - progress_bar.py[line:274] - INFO: epoch 006: 121 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7832.1, nsentences=120, sample_size=4118.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1913.3, ups=0.24, wpb=7832.1, bsz=120, num_updates=30280, lr=1.59204e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=41, gb_free=29.5, wall=124162 2023-05-02 13:03:10 - progress_bar.py[line:274] - INFO: epoch 006: 131 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7824.7, nsentences=120, sample_size=4046, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1946.9, ups=0.25, wpb=7824.7, bsz=120, num_updates=30290, lr=1.59151e-05, gnorm=0.953, clip=10, loss_scale=64, train_wall=40, gb_free=26.3, wall=124202 2023-05-02 13:03:49 - progress_bar.py[line:274] - INFO: epoch 006: 141 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7551.9, nsentences=120, sample_size=3814.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1912, ups=0.25, wpb=7551.9, bsz=120, num_updates=30300, lr=1.59099e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=39, gb_free=31.5, wall=124242 2023-05-02 13:04:28 - progress_bar.py[line:274] - INFO: epoch 006: 151 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7634.1, nsentences=120, sample_size=3955.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1943.1, ups=0.25, wpb=7634.1, bsz=120, num_updates=30310, lr=1.59046e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=124281 2023-05-02 13:05:09 - progress_bar.py[line:274] - INFO: epoch 006: 161 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7771.9, nsentences=120, sample_size=4567.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1924.8, ups=0.25, wpb=7771.9, bsz=120, num_updates=30320, lr=1.58993e-05, gnorm=0.911, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=124321 2023-05-02 13:05:48 - progress_bar.py[line:274] - INFO: epoch 006: 171 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7640.3, nsentences=120, sample_size=3929.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1950.1, ups=0.26, wpb=7640.3, bsz=120, num_updates=30330, lr=1.5894e-05, gnorm=0.994, clip=20, loss_scale=64, train_wall=39, gb_free=27.2, wall=124361 2023-05-02 13:06:27 - progress_bar.py[line:274] - INFO: epoch 006: 181 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7789.5, nsentences=120, sample_size=3734, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1986.9, ups=0.26, wpb=7789.5, bsz=120, num_updates=30340, lr=1.58887e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=124400 2023-05-02 13:07:07 - progress_bar.py[line:274] - INFO: epoch 006: 191 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7540.3, nsentences=120, sample_size=3962.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1893.2, ups=0.25, wpb=7540.3, bsz=120, num_updates=30350, lr=1.58834e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=124440 2023-05-02 13:07:46 - progress_bar.py[line:274] - INFO: epoch 006: 201 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7858.6, nsentences=120, sample_size=4016.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2000.6, ups=0.25, wpb=7858.6, bsz=120, num_updates=30360, lr=1.58782e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=39, gb_free=29.1, wall=124479 2023-05-02 13:08:26 - progress_bar.py[line:274] - INFO: epoch 006: 211 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7768.2, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1964, ups=0.25, wpb=7768.2, bsz=120, num_updates=30370, lr=1.58729e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=39, gb_free=26.6, wall=124518 2023-05-02 13:09:05 - progress_bar.py[line:274] - INFO: epoch 006: 221 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7860.3, nsentences=120, sample_size=3840.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1993.9, ups=0.25, wpb=7860.3, bsz=120, num_updates=30380, lr=1.58676e-05, gnorm=0.993, clip=30, loss_scale=64, train_wall=39, gb_free=31.3, wall=124558 2023-05-02 13:09:46 - progress_bar.py[line:274] - INFO: epoch 006: 231 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7686.4, nsentences=120, sample_size=4171, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1905.5, ups=0.25, wpb=7686.4, bsz=120, num_updates=30390, lr=1.58623e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=124598 2023-05-02 13:10:25 - progress_bar.py[line:274] - INFO: epoch 006: 241 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7799.5, nsentences=119.2, sample_size=3871.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1977.6, ups=0.25, wpb=7799.5, bsz=119.2, num_updates=30400, lr=1.5857e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=124638 2023-05-02 13:11:05 - progress_bar.py[line:274] - INFO: epoch 006: 251 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7651.5, nsentences=120, sample_size=4089.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1918.3, ups=0.25, wpb=7651.5, bsz=120, num_updates=30410, lr=1.58517e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=124677 2023-05-02 13:11:44 - progress_bar.py[line:274] - INFO: epoch 006: 261 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7702, nsentences=120, sample_size=3903.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1956.4, ups=0.25, wpb=7702, bsz=120, num_updates=30420, lr=1.58465e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=124717 2023-05-02 13:12:24 - progress_bar.py[line:274] - INFO: epoch 006: 271 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7665.8, nsentences=120, sample_size=4117.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1918.7, ups=0.25, wpb=7665.8, bsz=120, num_updates=30430, lr=1.58412e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=124757 2023-05-02 13:13:05 - progress_bar.py[line:274] - INFO: epoch 006: 281 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7682.8, nsentences=120, sample_size=4288.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1904.1, ups=0.25, wpb=7682.8, bsz=120, num_updates=30440, lr=1.58359e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=124797 2023-05-02 13:13:45 - progress_bar.py[line:274] - INFO: epoch 006: 291 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7670.1, nsentences=120, sample_size=4304.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1921, ups=0.25, wpb=7670.1, bsz=120, num_updates=30450, lr=1.58306e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=28.7, wall=124837 2023-05-02 13:14:24 - progress_bar.py[line:274] - INFO: epoch 006: 301 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7767, nsentences=120, sample_size=4164.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1949.1, ups=0.25, wpb=7767, bsz=120, num_updates=30460, lr=1.58253e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=124877 2023-05-02 13:14:52 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 13:15:08 - progress_bar.py[line:274] - INFO: epoch 006: 312 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7694.2, nsentences=120, sample_size=3851.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1750.7, ups=0.23, wpb=7694.2, bsz=120, num_updates=30470, lr=1.58201e-05, gnorm=1.003, clip=30, loss_scale=64, train_wall=44, gb_free=30.5, wall=124921 2023-05-02 13:15:48 - progress_bar.py[line:274] - INFO: epoch 006: 322 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7536.2, nsentences=120, sample_size=4155.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1923.9, ups=0.26, wpb=7536.2, bsz=120, num_updates=30480, lr=1.58148e-05, gnorm=1.018, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=124960 2023-05-02 13:16:27 - progress_bar.py[line:274] - INFO: epoch 006: 332 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7804.5, nsentences=120, sample_size=3971.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1985, ups=0.25, wpb=7804.5, bsz=120, num_updates=30490, lr=1.58095e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=124999 2023-05-02 13:17:06 - progress_bar.py[line:274] - INFO: epoch 006: 342 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7696.2, nsentences=120, sample_size=3816.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1958.4, ups=0.25, wpb=7696.2, bsz=120, num_updates=30500, lr=1.58042e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=125039 2023-05-02 13:17:46 - progress_bar.py[line:274] - INFO: epoch 006: 352 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7799.3, nsentences=120, sample_size=3960.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1936.9, ups=0.25, wpb=7799.3, bsz=120, num_updates=30510, lr=1.57989e-05, gnorm=0.992, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=125079 2023-05-02 13:18:27 - progress_bar.py[line:274] - INFO: epoch 006: 362 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7607.1, nsentences=120, sample_size=4028.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1888.3, ups=0.25, wpb=7607.1, bsz=120, num_updates=30520, lr=1.57936e-05, gnorm=1.003, clip=60, loss_scale=64, train_wall=40, gb_free=25.5, wall=125119 2023-05-02 13:19:06 - progress_bar.py[line:274] - INFO: epoch 006: 372 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7767.7, nsentences=120, sample_size=3935.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1970.1, ups=0.25, wpb=7767.7, bsz=120, num_updates=30530, lr=1.57884e-05, gnorm=1.003, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=125159 2023-05-02 13:19:46 - progress_bar.py[line:274] - INFO: epoch 006: 382 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7900.2, nsentences=120, sample_size=3872.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.5, ups=0.25, wpb=7900.2, bsz=120, num_updates=30540, lr=1.57831e-05, gnorm=0.999, clip=40, loss_scale=64, train_wall=40, gb_free=26.2, wall=125199 2023-05-02 13:20:26 - progress_bar.py[line:274] - INFO: epoch 006: 392 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7605.1, nsentences=120, sample_size=3898, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1906.2, ups=0.25, wpb=7605.1, bsz=120, num_updates=30550, lr=1.57778e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=125239 2023-05-02 13:21:06 - progress_bar.py[line:274] - INFO: epoch 006: 402 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7843.5, nsentences=120, sample_size=4076.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1955.9, ups=0.25, wpb=7843.5, bsz=120, num_updates=30560, lr=1.57725e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=125279 2023-05-02 13:21:47 - progress_bar.py[line:274] - INFO: epoch 006: 412 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7794.4, nsentences=120, sample_size=4007.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1938.6, ups=0.25, wpb=7794.4, bsz=120, num_updates=30570, lr=1.57672e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=125319 2023-05-02 13:22:26 - progress_bar.py[line:274] - INFO: epoch 006: 422 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7703.5, nsentences=120, sample_size=4203.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1938.4, ups=0.25, wpb=7703.5, bsz=120, num_updates=30580, lr=1.5762e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=125359 2023-05-02 13:23:07 - progress_bar.py[line:274] - INFO: epoch 006: 432 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7719.7, nsentences=120, sample_size=4070, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1918.2, ups=0.25, wpb=7719.7, bsz=120, num_updates=30590, lr=1.57567e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=125399 2023-05-02 13:23:47 - progress_bar.py[line:274] - INFO: epoch 006: 442 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7853.3, nsentences=120, sample_size=3871.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1923.3, ups=0.24, wpb=7853.3, bsz=120, num_updates=30600, lr=1.57514e-05, gnorm=1.017, clip=50, loss_scale=64, train_wall=41, gb_free=30.6, wall=125440 2023-05-02 13:24:27 - progress_bar.py[line:274] - INFO: epoch 006: 452 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7921.1, nsentences=120, sample_size=3754.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2001.6, ups=0.25, wpb=7921.1, bsz=120, num_updates=30610, lr=1.57461e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=125479 2023-05-02 13:25:07 - progress_bar.py[line:274] - INFO: epoch 006: 462 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7485.9, nsentences=120, sample_size=4134.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1881.3, ups=0.25, wpb=7485.9, bsz=120, num_updates=30620, lr=1.57408e-05, gnorm=0.994, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=125519 2023-05-02 13:25:46 - progress_bar.py[line:274] - INFO: epoch 006: 472 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7819.9, nsentences=120, sample_size=3846.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1989.3, ups=0.25, wpb=7819.9, bsz=120, num_updates=30630, lr=1.57355e-05, gnorm=1.009, clip=60, loss_scale=64, train_wall=39, gb_free=28, wall=125558 2023-05-02 13:26:25 - progress_bar.py[line:274] - INFO: epoch 006: 482 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7757.1, nsentences=120, sample_size=4343.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1980.1, ups=0.26, wpb=7757.1, bsz=120, num_updates=30640, lr=1.57303e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=125598 2023-05-02 13:27:05 - progress_bar.py[line:274] - INFO: epoch 006: 492 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7585.6, nsentences=120, sample_size=4065.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1921.2, ups=0.25, wpb=7585.6, bsz=120, num_updates=30650, lr=1.5725e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=28.6, wall=125637 2023-05-02 13:27:44 - progress_bar.py[line:274] - INFO: epoch 006: 502 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7753.7, nsentences=120, sample_size=3800.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1949.2, ups=0.25, wpb=7753.7, bsz=120, num_updates=30660, lr=1.57197e-05, gnorm=1.026, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=125677 2023-05-02 13:28:24 - progress_bar.py[line:274] - INFO: epoch 006: 512 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7831.7, nsentences=120, sample_size=3980.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.5, ups=0.25, wpb=7831.7, bsz=120, num_updates=30670, lr=1.57144e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=125717 2023-05-02 13:29:04 - progress_bar.py[line:274] - INFO: epoch 006: 522 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7459.6, nsentences=120, sample_size=3911, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.4, ups=0.26, wpb=7459.6, bsz=120, num_updates=30680, lr=1.57091e-05, gnorm=1.017, clip=40, loss_scale=64, train_wall=39, gb_free=28.6, wall=125756 2023-05-02 13:29:43 - progress_bar.py[line:274] - INFO: epoch 006: 532 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7804.6, nsentences=120, sample_size=4272.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2005.2, ups=0.26, wpb=7804.6, bsz=120, num_updates=30690, lr=1.57038e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=125795 2023-05-02 13:30:22 - progress_bar.py[line:274] - INFO: epoch 006: 542 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7602, nsentences=120, sample_size=4160.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929.6, ups=0.25, wpb=7602, bsz=120, num_updates=30700, lr=1.56986e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=125834 2023-05-02 13:31:03 - progress_bar.py[line:274] - INFO: epoch 006: 552 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7630.8, nsentences=120, sample_size=4152.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1884.8, ups=0.25, wpb=7630.8, bsz=120, num_updates=30710, lr=1.56933e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=125875 2023-05-02 13:31:42 - progress_bar.py[line:274] - INFO: epoch 006: 562 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7765.1, nsentences=120, sample_size=3921.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1974.4, ups=0.25, wpb=7765.1, bsz=120, num_updates=30720, lr=1.5688e-05, gnorm=0.999, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=125914 2023-05-02 13:32:22 - progress_bar.py[line:274] - INFO: epoch 006: 572 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7657.2, nsentences=120, sample_size=3989.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1916.9, ups=0.25, wpb=7657.2, bsz=120, num_updates=30730, lr=1.56827e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=125954 2023-05-02 13:33:02 - progress_bar.py[line:274] - INFO: epoch 006: 582 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7685.7, nsentences=120, sample_size=4318.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1915.5, ups=0.25, wpb=7685.7, bsz=120, num_updates=30740, lr=1.56774e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=31.2, wall=125994 2023-05-02 13:33:41 - progress_bar.py[line:274] - INFO: epoch 006: 592 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7720.1, nsentences=120, sample_size=4016.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953.9, ups=0.25, wpb=7720.1, bsz=120, num_updates=30750, lr=1.56722e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=126034 2023-05-02 13:34:21 - progress_bar.py[line:274] - INFO: epoch 006: 602 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7778.2, nsentences=120, sample_size=3824.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1944.4, ups=0.25, wpb=7778.2, bsz=120, num_updates=30760, lr=1.56669e-05, gnorm=1.025, clip=70, loss_scale=64, train_wall=40, gb_free=29.9, wall=126074 2023-05-02 13:35:01 - progress_bar.py[line:274] - INFO: epoch 006: 612 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7792.3, nsentences=120, sample_size=4006.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1983.2, ups=0.25, wpb=7792.3, bsz=120, num_updates=30770, lr=1.56616e-05, gnorm=1.1, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=126113 2023-05-02 13:35:42 - progress_bar.py[line:274] - INFO: epoch 006: 622 / 6042 loss=2.456, loss_v1=0, loss_v2=0, nll_loss=1.22, ntokens=7804.1, nsentences=120, sample_size=3738.8, sample_size_v1=0, sample_size_v2=0, ppl=2.33, wps=1891.3, ups=0.24, wpb=7804.1, bsz=120, num_updates=30780, lr=1.56563e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=41, gb_free=31.5, wall=126154 2023-05-02 13:36:22 - progress_bar.py[line:274] - INFO: epoch 006: 632 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7850.6, nsentences=120, sample_size=3644.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1964, ups=0.25, wpb=7850.6, bsz=120, num_updates=30790, lr=1.5651e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=126194 2023-05-02 13:37:02 - progress_bar.py[line:274] - INFO: epoch 006: 642 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7781.7, nsentences=120, sample_size=4045.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1951.7, ups=0.25, wpb=7781.7, bsz=120, num_updates=30800, lr=1.56457e-05, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=126234 2023-05-02 13:37:41 - progress_bar.py[line:274] - INFO: epoch 006: 652 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7702.8, nsentences=120, sample_size=3970.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1942.8, ups=0.25, wpb=7702.8, bsz=120, num_updates=30810, lr=1.56405e-05, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=126274 2023-05-02 13:38:21 - progress_bar.py[line:274] - INFO: epoch 006: 662 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7758.7, nsentences=120, sample_size=4193.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1958.3, ups=0.25, wpb=7758.7, bsz=120, num_updates=30820, lr=1.56352e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=126314 2023-05-02 13:39:01 - progress_bar.py[line:274] - INFO: epoch 006: 672 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7606.7, nsentences=120, sample_size=4200, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1891.4, ups=0.25, wpb=7606.7, bsz=120, num_updates=30830, lr=1.56299e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=126354 2023-05-02 13:39:41 - progress_bar.py[line:274] - INFO: epoch 006: 682 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7246.2, nsentences=120, sample_size=4183.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1841, ups=0.25, wpb=7246.2, bsz=120, num_updates=30840, lr=1.56246e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=126393 2023-05-02 13:40:21 - progress_bar.py[line:274] - INFO: epoch 006: 692 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7604.8, nsentences=120, sample_size=3806.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1901.5, ups=0.25, wpb=7604.8, bsz=120, num_updates=30850, lr=1.56193e-05, gnorm=0.997, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=126433 2023-05-02 13:41:01 - progress_bar.py[line:274] - INFO: epoch 006: 702 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7800.2, nsentences=120, sample_size=3866.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1948.7, ups=0.25, wpb=7800.2, bsz=120, num_updates=30860, lr=1.56141e-05, gnorm=0.986, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=126473 2023-05-02 13:41:41 - progress_bar.py[line:274] - INFO: epoch 006: 712 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7724.2, nsentences=120, sample_size=3957.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1898.5, ups=0.25, wpb=7724.2, bsz=120, num_updates=30870, lr=1.56088e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=41, gb_free=30.8, wall=126514 2023-05-02 13:42:21 - progress_bar.py[line:274] - INFO: epoch 006: 722 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7455.7, nsentences=120, sample_size=4053.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1888.1, ups=0.25, wpb=7455.7, bsz=120, num_updates=30880, lr=1.56035e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=126553 2023-05-02 13:43:01 - progress_bar.py[line:274] - INFO: epoch 006: 732 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7785.5, nsentences=120, sample_size=3994.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.2, ups=0.25, wpb=7785.5, bsz=120, num_updates=30890, lr=1.55982e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=126594 2023-05-02 13:43:41 - progress_bar.py[line:274] - INFO: epoch 006: 742 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7500.3, nsentences=120, sample_size=3982, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1879.3, ups=0.25, wpb=7500.3, bsz=120, num_updates=30900, lr=1.55929e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=126634 2023-05-02 13:44:21 - progress_bar.py[line:274] - INFO: epoch 006: 752 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7833.6, nsentences=120, sample_size=4090.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1944.7, ups=0.25, wpb=7833.6, bsz=120, num_updates=30910, lr=1.55876e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=27.6, wall=126674 2023-05-02 13:45:01 - progress_bar.py[line:274] - INFO: epoch 006: 762 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7375, nsentences=120, sample_size=3968.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1882.5, ups=0.26, wpb=7375, bsz=120, num_updates=30920, lr=1.55824e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=39, gb_free=31.2, wall=126713 2023-05-02 13:45:40 - progress_bar.py[line:274] - INFO: epoch 006: 772 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7794.4, nsentences=120, sample_size=4199.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1965.6, ups=0.25, wpb=7794.4, bsz=120, num_updates=30930, lr=1.55771e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=126753 2023-05-02 13:46:20 - progress_bar.py[line:274] - INFO: epoch 006: 782 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7776.3, nsentences=120, sample_size=4092.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1951.1, ups=0.25, wpb=7776.3, bsz=120, num_updates=30940, lr=1.55718e-05, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=126793 2023-05-02 13:46:59 - progress_bar.py[line:274] - INFO: epoch 006: 792 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7634.2, nsentences=120, sample_size=4034.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1955, ups=0.26, wpb=7634.2, bsz=120, num_updates=30950, lr=1.55665e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=126832 2023-05-02 13:47:39 - progress_bar.py[line:274] - INFO: epoch 006: 802 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7467.3, nsentences=120, sample_size=4184.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1892.4, ups=0.25, wpb=7467.3, bsz=120, num_updates=30960, lr=1.55612e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=126871 2023-05-02 13:48:19 - progress_bar.py[line:274] - INFO: epoch 006: 812 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7769.6, nsentences=120, sample_size=4152.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1938.3, ups=0.25, wpb=7769.6, bsz=120, num_updates=30970, lr=1.55559e-05, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=126911 2023-05-02 13:48:59 - progress_bar.py[line:274] - INFO: epoch 006: 822 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7437.5, nsentences=120, sample_size=3869.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1861.7, ups=0.25, wpb=7437.5, bsz=120, num_updates=30980, lr=1.55507e-05, gnorm=0.983, clip=40, loss_scale=128, train_wall=40, gb_free=30.6, wall=126951 2023-05-02 13:49:39 - progress_bar.py[line:274] - INFO: epoch 006: 832 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7867.4, nsentences=120, sample_size=4126.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1954.1, ups=0.25, wpb=7867.4, bsz=120, num_updates=30990, lr=1.55454e-05, gnorm=0.949, clip=20, loss_scale=128, train_wall=40, gb_free=30.1, wall=126991 2023-05-02 13:50:15 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 13:50:23 - progress_bar.py[line:274] - INFO: epoch 006: 843 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7605.5, nsentences=120, sample_size=4021.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1733.4, ups=0.23, wpb=7605.5, bsz=120, num_updates=31000, lr=1.55401e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=44, gb_free=29.9, wall=127035 2023-05-02 13:50:23 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 13:50:25 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 13:50:25 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:50:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 13:50:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:50:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 13:50:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:50:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:50:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:50:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 13:51:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:51:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:09 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 13:51:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:51:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:14 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 13:51:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 13:51:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 13:51:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 13:51:14 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.235 | loss_v1 0 | loss_v2 0 | nll_loss 2.07 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.2 | score 0.7495 | wps 3300.4 | wpb 3202.1 | bsz 39.4 | num_updates 31000 | best_score 0.7627 2023-05-02 13:51:14 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 31000 updates 2023-05-02 13:51:14 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_31000.pt 2023-05-02 13:51:39 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_31000.pt 2023-05-02 13:51:53 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_31000.pt (epoch 6 @ 31000 updates, score 0.7495) (writing took 39.32874794001691 seconds) 2023-05-02 13:52:33 - progress_bar.py[line:274] - INFO: epoch 006: 853 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7767.4, nsentences=120, sample_size=4008.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=597.2, ups=0.08, wpb=7767.4, bsz=120, num_updates=31010, lr=1.55348e-05, gnorm=0.947, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=127165 2023-05-02 13:53:12 - progress_bar.py[line:274] - INFO: epoch 006: 863 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7573.1, nsentences=120, sample_size=4020.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1930, ups=0.25, wpb=7573.1, bsz=120, num_updates=31020, lr=1.55295e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=127204 2023-05-02 13:53:52 - progress_bar.py[line:274] - INFO: epoch 006: 873 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7680.6, nsentences=120, sample_size=4317.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1926.2, ups=0.25, wpb=7680.6, bsz=120, num_updates=31030, lr=1.55243e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=28.6, wall=127244 2023-05-02 13:54:31 - progress_bar.py[line:274] - INFO: epoch 006: 883 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7649.6, nsentences=120, sample_size=3941.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1964.3, ups=0.26, wpb=7649.6, bsz=120, num_updates=31040, lr=1.5519e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=127283 2023-05-02 13:55:11 - progress_bar.py[line:274] - INFO: epoch 006: 893 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7649.7, nsentences=120, sample_size=4345.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1927.5, ups=0.25, wpb=7649.7, bsz=120, num_updates=31050, lr=1.55137e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=127323 2023-05-02 13:55:46 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 13:55:54 - progress_bar.py[line:274] - INFO: epoch 006: 904 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7467.1, nsentences=120, sample_size=4183.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1726.1, ups=0.23, wpb=7467.1, bsz=120, num_updates=31060, lr=1.55084e-05, gnorm=0.966, clip=20, loss_scale=32, train_wall=43, gb_free=30.8, wall=127366 2023-05-02 13:56:34 - progress_bar.py[line:274] - INFO: epoch 006: 914 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7801.1, nsentences=120, sample_size=3964.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1930.2, ups=0.25, wpb=7801.1, bsz=120, num_updates=31070, lr=1.55031e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=127407 2023-05-02 13:57:14 - progress_bar.py[line:274] - INFO: epoch 006: 924 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7571.9, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1910.9, ups=0.25, wpb=7571.9, bsz=120, num_updates=31080, lr=1.54978e-05, gnorm=0.971, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=127446 2023-05-02 13:57:53 - progress_bar.py[line:274] - INFO: epoch 006: 934 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7464.1, nsentences=120, sample_size=4339.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1884.6, ups=0.25, wpb=7464.1, bsz=120, num_updates=31090, lr=1.54926e-05, gnorm=0.943, clip=20, loss_scale=32, train_wall=40, gb_free=29.2, wall=127486 2023-05-02 13:58:33 - progress_bar.py[line:274] - INFO: epoch 006: 944 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7467.4, nsentences=120, sample_size=4247.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1873.2, ups=0.25, wpb=7467.4, bsz=120, num_updates=31100, lr=1.54873e-05, gnorm=0.952, clip=10, loss_scale=32, train_wall=40, gb_free=29, wall=127526 2023-05-02 13:59:13 - progress_bar.py[line:274] - INFO: epoch 006: 954 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7554.9, nsentences=120, sample_size=3920.4, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1881.6, ups=0.25, wpb=7554.9, bsz=120, num_updates=31110, lr=1.5482e-05, gnorm=0.983, clip=30, loss_scale=32, train_wall=40, gb_free=30.3, wall=127566 2023-05-02 13:59:54 - progress_bar.py[line:274] - INFO: epoch 006: 964 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7741, nsentences=120, sample_size=4073.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1921, ups=0.25, wpb=7741, bsz=120, num_updates=31120, lr=1.54767e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=127606 2023-05-02 14:00:34 - progress_bar.py[line:274] - INFO: epoch 006: 974 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7950.1, nsentences=120, sample_size=4040.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1997.1, ups=0.25, wpb=7950.1, bsz=120, num_updates=31130, lr=1.54714e-05, gnorm=0.98, clip=30, loss_scale=32, train_wall=40, gb_free=29.4, wall=127646 2023-05-02 14:01:14 - progress_bar.py[line:274] - INFO: epoch 006: 984 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7782.2, nsentences=120, sample_size=4322.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1941.8, ups=0.25, wpb=7782.2, bsz=120, num_updates=31140, lr=1.54662e-05, gnorm=0.929, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=127686 2023-05-02 14:01:53 - progress_bar.py[line:274] - INFO: epoch 006: 994 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7599.5, nsentences=120, sample_size=4022.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1918.6, ups=0.25, wpb=7599.5, bsz=120, num_updates=31150, lr=1.54609e-05, gnorm=0.949, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=127726 2023-05-02 14:02:33 - progress_bar.py[line:274] - INFO: epoch 006: 1004 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7814.1, nsentences=120, sample_size=4272.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1955.8, ups=0.25, wpb=7814.1, bsz=120, num_updates=31160, lr=1.54556e-05, gnorm=0.936, clip=20, loss_scale=32, train_wall=40, gb_free=28.9, wall=127766 2023-05-02 14:03:13 - progress_bar.py[line:274] - INFO: epoch 006: 1014 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7815.6, nsentences=120, sample_size=3644.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1956.6, ups=0.25, wpb=7815.6, bsz=120, num_updates=31170, lr=1.54503e-05, gnorm=0.984, clip=30, loss_scale=32, train_wall=40, gb_free=29.4, wall=127806 2023-05-02 14:03:53 - progress_bar.py[line:274] - INFO: epoch 006: 1024 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7818.1, nsentences=120, sample_size=4046.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1964.9, ups=0.25, wpb=7818.1, bsz=120, num_updates=31180, lr=1.5445e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=29.2, wall=127845 2023-05-02 14:04:33 - progress_bar.py[line:274] - INFO: epoch 006: 1034 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=8066.1, nsentences=120, sample_size=4011.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2006, ups=0.25, wpb=8066.1, bsz=120, num_updates=31190, lr=1.54397e-05, gnorm=0.968, clip=40, loss_scale=32, train_wall=40, gb_free=29.6, wall=127886 2023-05-02 14:05:13 - progress_bar.py[line:274] - INFO: epoch 006: 1044 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7873, nsentences=120, sample_size=3979.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1982.3, ups=0.25, wpb=7873, bsz=120, num_updates=31200, lr=1.54345e-05, gnorm=1.004, clip=40, loss_scale=32, train_wall=40, gb_free=29.2, wall=127925 2023-05-02 14:05:53 - progress_bar.py[line:274] - INFO: epoch 006: 1054 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7670.3, nsentences=120, sample_size=4014.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1928.9, ups=0.25, wpb=7670.3, bsz=120, num_updates=31210, lr=1.54292e-05, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=31, wall=127965 2023-05-02 14:06:32 - progress_bar.py[line:274] - INFO: epoch 006: 1064 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7892.9, nsentences=120, sample_size=4238.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1986.9, ups=0.25, wpb=7892.9, bsz=120, num_updates=31220, lr=1.54239e-05, gnorm=0.936, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=128005 2023-05-02 14:07:12 - progress_bar.py[line:274] - INFO: epoch 006: 1074 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7764.8, nsentences=120, sample_size=4088.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1946.7, ups=0.25, wpb=7764.8, bsz=120, num_updates=31230, lr=1.54186e-05, gnorm=0.958, clip=40, loss_scale=32, train_wall=40, gb_free=30, wall=128045 2023-05-02 14:07:52 - progress_bar.py[line:274] - INFO: epoch 006: 1084 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7594.4, nsentences=120, sample_size=4417, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1908.7, ups=0.25, wpb=7594.4, bsz=120, num_updates=31240, lr=1.54133e-05, gnorm=0.92, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=128085 2023-05-02 14:08:33 - progress_bar.py[line:274] - INFO: epoch 006: 1094 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7861.2, nsentences=120, sample_size=3930.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1932.3, ups=0.25, wpb=7861.2, bsz=120, num_updates=31250, lr=1.5408e-05, gnorm=1.013, clip=40, loss_scale=32, train_wall=41, gb_free=30, wall=128125 2023-05-02 14:09:12 - progress_bar.py[line:274] - INFO: epoch 006: 1104 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7803.9, nsentences=120, sample_size=3945.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1985.7, ups=0.25, wpb=7803.9, bsz=120, num_updates=31260, lr=1.54028e-05, gnorm=0.951, clip=10, loss_scale=32, train_wall=39, gb_free=30, wall=128165 2023-05-02 14:09:52 - progress_bar.py[line:274] - INFO: epoch 006: 1114 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7997, nsentences=120, sample_size=3836.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2012.8, ups=0.25, wpb=7997, bsz=120, num_updates=31270, lr=1.53975e-05, gnorm=0.976, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=128204 2023-05-02 14:10:31 - progress_bar.py[line:274] - INFO: epoch 006: 1124 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7708, nsentences=120, sample_size=4176.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1952.6, ups=0.25, wpb=7708, bsz=120, num_updates=31280, lr=1.53922e-05, gnorm=0.96, clip=20, loss_scale=32, train_wall=39, gb_free=32.1, wall=128244 2023-05-02 14:11:11 - progress_bar.py[line:274] - INFO: epoch 006: 1134 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7483.9, nsentences=120, sample_size=4069.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1900.9, ups=0.25, wpb=7483.9, bsz=120, num_updates=31290, lr=1.53869e-05, gnorm=0.955, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=128283 2023-05-02 14:11:52 - progress_bar.py[line:274] - INFO: epoch 006: 1144 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7866.6, nsentences=120, sample_size=4278.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1909.2, ups=0.24, wpb=7866.6, bsz=120, num_updates=31300, lr=1.53816e-05, gnorm=0.913, clip=20, loss_scale=32, train_wall=41, gb_free=31.2, wall=128324 2023-05-02 14:12:31 - progress_bar.py[line:274] - INFO: epoch 006: 1154 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7854.2, nsentences=120, sample_size=3786.5, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1993.1, ups=0.25, wpb=7854.2, bsz=120, num_updates=31310, lr=1.53764e-05, gnorm=1.014, clip=70, loss_scale=32, train_wall=39, gb_free=31.3, wall=128364 2023-05-02 14:13:11 - progress_bar.py[line:274] - INFO: epoch 006: 1164 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7856.2, nsentences=120, sample_size=4435.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1962.6, ups=0.25, wpb=7856.2, bsz=120, num_updates=31320, lr=1.53711e-05, gnorm=0.923, clip=10, loss_scale=32, train_wall=40, gb_free=30.2, wall=128404 2023-05-02 14:13:51 - progress_bar.py[line:274] - INFO: epoch 006: 1174 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7596.9, nsentences=120, sample_size=4213, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1921.4, ups=0.25, wpb=7596.9, bsz=120, num_updates=31330, lr=1.53658e-05, gnorm=0.96, clip=30, loss_scale=32, train_wall=39, gb_free=29.9, wall=128443 2023-05-02 14:14:30 - progress_bar.py[line:274] - INFO: epoch 006: 1184 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=8072.3, nsentences=120, sample_size=3850.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2050.6, ups=0.25, wpb=8072.3, bsz=120, num_updates=31340, lr=1.53605e-05, gnorm=0.981, clip=40, loss_scale=32, train_wall=39, gb_free=28.5, wall=128483 2023-05-02 14:15:09 - progress_bar.py[line:274] - INFO: epoch 006: 1194 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7413.8, nsentences=120, sample_size=4005.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1897.1, ups=0.26, wpb=7413.8, bsz=120, num_updates=31350, lr=1.53552e-05, gnorm=0.965, clip=30, loss_scale=32, train_wall=39, gb_free=30.6, wall=128522 2023-05-02 14:15:49 - progress_bar.py[line:274] - INFO: epoch 006: 1204 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7993.3, nsentences=120, sample_size=3641.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2000.2, ups=0.25, wpb=7993.3, bsz=120, num_updates=31360, lr=1.53499e-05, gnorm=1.021, clip=70, loss_scale=32, train_wall=40, gb_free=30.2, wall=128562 2023-05-02 14:16:29 - progress_bar.py[line:274] - INFO: epoch 006: 1214 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7754.1, nsentences=120, sample_size=4284.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1952.9, ups=0.25, wpb=7754.1, bsz=120, num_updates=31370, lr=1.53447e-05, gnorm=0.959, clip=10, loss_scale=32, train_wall=40, gb_free=29.2, wall=128601 2023-05-02 14:17:09 - progress_bar.py[line:274] - INFO: epoch 006: 1224 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7962.4, nsentences=120, sample_size=3814.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1999.9, ups=0.25, wpb=7962.4, bsz=120, num_updates=31380, lr=1.53394e-05, gnorm=0.973, clip=40, loss_scale=32, train_wall=40, gb_free=30.4, wall=128641 2023-05-02 14:17:49 - progress_bar.py[line:274] - INFO: epoch 006: 1234 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7567.4, nsentences=120, sample_size=3882.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1883.6, ups=0.25, wpb=7567.4, bsz=120, num_updates=31390, lr=1.53341e-05, gnorm=0.982, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=128681 2023-05-02 14:18:28 - progress_bar.py[line:274] - INFO: epoch 006: 1244 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7864.3, nsentences=120, sample_size=3827.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1998.9, ups=0.25, wpb=7864.3, bsz=120, num_updates=31400, lr=1.53288e-05, gnorm=0.972, clip=40, loss_scale=32, train_wall=39, gb_free=31.3, wall=128721 2023-05-02 14:19:07 - progress_bar.py[line:274] - INFO: epoch 006: 1254 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7755.9, nsentences=120, sample_size=3679.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1983.6, ups=0.26, wpb=7755.9, bsz=120, num_updates=31410, lr=1.53235e-05, gnorm=0.982, clip=40, loss_scale=32, train_wall=39, gb_free=30.8, wall=128760 2023-05-02 14:19:47 - progress_bar.py[line:274] - INFO: epoch 006: 1264 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=8033.5, nsentences=120, sample_size=3906.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2027, ups=0.25, wpb=8033.5, bsz=120, num_updates=31420, lr=1.53182e-05, gnorm=0.992, clip=60, loss_scale=32, train_wall=40, gb_free=29.2, wall=128799 2023-05-02 14:20:27 - progress_bar.py[line:274] - INFO: epoch 006: 1274 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7859.7, nsentences=120, sample_size=4029.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1960.8, ups=0.25, wpb=7859.7, bsz=120, num_updates=31430, lr=1.5313e-05, gnorm=0.964, clip=20, loss_scale=32, train_wall=40, gb_free=30.2, wall=128840 2023-05-02 14:21:08 - progress_bar.py[line:274] - INFO: epoch 006: 1284 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7587.5, nsentences=120, sample_size=4151.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1863.3, ups=0.25, wpb=7587.5, bsz=120, num_updates=31440, lr=1.53077e-05, gnorm=0.965, clip=40, loss_scale=32, train_wall=41, gb_free=30.3, wall=128880 2023-05-02 14:21:48 - progress_bar.py[line:274] - INFO: epoch 006: 1294 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7642.7, nsentences=120, sample_size=3764.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1923.5, ups=0.25, wpb=7642.7, bsz=120, num_updates=31450, lr=1.53024e-05, gnorm=0.992, clip=50, loss_scale=32, train_wall=40, gb_free=30.7, wall=128920 2023-05-02 14:22:28 - progress_bar.py[line:274] - INFO: epoch 006: 1304 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7753.1, nsentences=120, sample_size=4218.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1923.7, ups=0.25, wpb=7753.1, bsz=120, num_updates=31460, lr=1.52971e-05, gnorm=0.913, clip=10, loss_scale=32, train_wall=40, gb_free=31.2, wall=128960 2023-05-02 14:23:08 - progress_bar.py[line:274] - INFO: epoch 006: 1314 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7630.7, nsentences=120, sample_size=3918.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1917.7, ups=0.25, wpb=7630.7, bsz=120, num_updates=31470, lr=1.52918e-05, gnorm=0.98, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=129000 2023-05-02 14:23:48 - progress_bar.py[line:274] - INFO: epoch 006: 1324 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7769, nsentences=120, sample_size=3867.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1906.7, ups=0.25, wpb=7769, bsz=120, num_updates=31480, lr=1.52866e-05, gnorm=0.971, clip=30, loss_scale=32, train_wall=41, gb_free=29.2, wall=129041 2023-05-02 14:24:28 - progress_bar.py[line:274] - INFO: epoch 006: 1334 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7496.2, nsentences=120, sample_size=4307.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1879.3, ups=0.25, wpb=7496.2, bsz=120, num_updates=31490, lr=1.52813e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=40, gb_free=31.6, wall=129081 2023-05-02 14:25:08 - progress_bar.py[line:274] - INFO: epoch 006: 1344 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=8070.8, nsentences=120, sample_size=4094.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2047.5, ups=0.25, wpb=8070.8, bsz=120, num_updates=31500, lr=1.5276e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=129120 2023-05-02 14:25:48 - progress_bar.py[line:274] - INFO: epoch 006: 1354 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7402.4, nsentences=120, sample_size=4241.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1832.2, ups=0.25, wpb=7402.4, bsz=120, num_updates=31510, lr=1.52707e-05, gnorm=0.953, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=129161 2023-05-02 14:26:28 - progress_bar.py[line:274] - INFO: epoch 006: 1364 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7622, nsentences=120, sample_size=4289.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1909.8, ups=0.25, wpb=7622, bsz=120, num_updates=31520, lr=1.52654e-05, gnorm=0.937, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=129200 2023-05-02 14:27:09 - progress_bar.py[line:274] - INFO: epoch 006: 1374 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7966.3, nsentences=120, sample_size=4475.8, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1946.9, ups=0.24, wpb=7966.3, bsz=120, num_updates=31530, lr=1.52601e-05, gnorm=0.936, clip=10, loss_scale=32, train_wall=41, gb_free=28.6, wall=129241 2023-05-02 14:27:48 - progress_bar.py[line:274] - INFO: epoch 006: 1384 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7636.2, nsentences=120, sample_size=4089.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1943.1, ups=0.25, wpb=7636.2, bsz=120, num_updates=31540, lr=1.52549e-05, gnorm=0.949, clip=20, loss_scale=32, train_wall=39, gb_free=31.3, wall=129281 2023-05-02 14:28:28 - progress_bar.py[line:274] - INFO: epoch 006: 1394 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7580.4, nsentences=120, sample_size=3986.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1911.6, ups=0.25, wpb=7580.4, bsz=120, num_updates=31550, lr=1.52496e-05, gnorm=0.991, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=129320 2023-05-02 14:29:08 - progress_bar.py[line:274] - INFO: epoch 006: 1404 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7782.1, nsentences=120, sample_size=4017.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1938.8, ups=0.25, wpb=7782.1, bsz=120, num_updates=31560, lr=1.52443e-05, gnorm=0.929, clip=20, loss_scale=32, train_wall=40, gb_free=30.6, wall=129360 2023-05-02 14:29:47 - progress_bar.py[line:274] - INFO: epoch 006: 1414 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7990.7, nsentences=120, sample_size=4141.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2032.2, ups=0.25, wpb=7990.7, bsz=120, num_updates=31570, lr=1.5239e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=129400 2023-05-02 14:30:27 - progress_bar.py[line:274] - INFO: epoch 006: 1424 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7903.1, nsentences=120, sample_size=4223.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1977.8, ups=0.25, wpb=7903.1, bsz=120, num_updates=31580, lr=1.52337e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=129440 2023-05-02 14:31:08 - progress_bar.py[line:274] - INFO: epoch 006: 1434 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7902.7, nsentences=120, sample_size=4014.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1931.1, ups=0.24, wpb=7902.7, bsz=120, num_updates=31590, lr=1.52285e-05, gnorm=0.965, clip=50, loss_scale=64, train_wall=41, gb_free=30.9, wall=129481 2023-05-02 14:31:47 - progress_bar.py[line:274] - INFO: epoch 006: 1444 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7509.7, nsentences=120, sample_size=3937.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1921.6, ups=0.26, wpb=7509.7, bsz=120, num_updates=31600, lr=1.52232e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=129520 2023-05-02 14:32:26 - progress_bar.py[line:274] - INFO: epoch 006: 1454 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7627, nsentences=120, sample_size=4070.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1955.6, ups=0.26, wpb=7627, bsz=120, num_updates=31610, lr=1.52179e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=28.9, wall=129559 2023-05-02 14:33:06 - progress_bar.py[line:274] - INFO: epoch 006: 1464 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7910.3, nsentences=120, sample_size=4063.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1974.4, ups=0.25, wpb=7910.3, bsz=120, num_updates=31620, lr=1.52126e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=27.8, wall=129599 2023-05-02 14:33:47 - progress_bar.py[line:274] - INFO: epoch 006: 1474 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7936, nsentences=120, sample_size=3982.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1965.7, ups=0.25, wpb=7936, bsz=120, num_updates=31630, lr=1.52073e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=129639 2023-05-02 14:34:27 - progress_bar.py[line:274] - INFO: epoch 006: 1484 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=8048.3, nsentences=120, sample_size=4104.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2008.1, ups=0.25, wpb=8048.3, bsz=120, num_updates=31640, lr=1.5202e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=129679 2023-05-02 14:35:07 - progress_bar.py[line:274] - INFO: epoch 006: 1494 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7511.9, nsentences=120, sample_size=3835.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1872.5, ups=0.25, wpb=7511.9, bsz=120, num_updates=31650, lr=1.51968e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=129719 2023-05-02 14:35:47 - progress_bar.py[line:274] - INFO: epoch 006: 1504 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7258.9, nsentences=120, sample_size=4289.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1828.5, ups=0.25, wpb=7258.9, bsz=120, num_updates=31660, lr=1.51915e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=129759 2023-05-02 14:36:27 - progress_bar.py[line:274] - INFO: epoch 006: 1514 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7847.7, nsentences=120, sample_size=3756.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1940.8, ups=0.25, wpb=7847.7, bsz=120, num_updates=31670, lr=1.51862e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=129800 2023-05-02 14:37:06 - progress_bar.py[line:274] - INFO: epoch 006: 1524 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7418.2, nsentences=120, sample_size=4250.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1889.6, ups=0.25, wpb=7418.2, bsz=120, num_updates=31680, lr=1.51809e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=129839 2023-05-02 14:37:47 - progress_bar.py[line:274] - INFO: epoch 006: 1534 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7874.8, nsentences=120, sample_size=4027, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1955.2, ups=0.25, wpb=7874.8, bsz=120, num_updates=31690, lr=1.51756e-05, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=129879 2023-05-02 14:38:26 - progress_bar.py[line:274] - INFO: epoch 006: 1544 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7976.2, nsentences=120, sample_size=3684.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2015.6, ups=0.25, wpb=7976.2, bsz=120, num_updates=31700, lr=1.51703e-05, gnorm=1, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=129919 2023-05-02 14:39:06 - progress_bar.py[line:274] - INFO: epoch 006: 1554 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7685, nsentences=120, sample_size=3903.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1937.1, ups=0.25, wpb=7685, bsz=120, num_updates=31710, lr=1.51651e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=129958 2023-05-02 14:39:46 - progress_bar.py[line:274] - INFO: epoch 006: 1564 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7704.7, nsentences=120, sample_size=4004.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1918.7, ups=0.25, wpb=7704.7, bsz=120, num_updates=31720, lr=1.51598e-05, gnorm=0.98, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=129999 2023-05-02 14:40:27 - progress_bar.py[line:274] - INFO: epoch 006: 1574 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7839.3, nsentences=120, sample_size=4119.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1910.1, ups=0.24, wpb=7839.3, bsz=120, num_updates=31730, lr=1.51545e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=41, gb_free=25.4, wall=130040 2023-05-02 14:41:07 - progress_bar.py[line:274] - INFO: epoch 006: 1584 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7800.9, nsentences=120, sample_size=3911.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1972.8, ups=0.25, wpb=7800.9, bsz=120, num_updates=31740, lr=1.51492e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=130079 2023-05-02 14:41:46 - progress_bar.py[line:274] - INFO: epoch 006: 1594 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7837, nsentences=120, sample_size=4262.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1971.3, ups=0.25, wpb=7837, bsz=120, num_updates=31750, lr=1.51439e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=130119 2023-05-02 14:42:27 - progress_bar.py[line:274] - INFO: epoch 006: 1604 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7579.3, nsentences=120, sample_size=3833.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1867, ups=0.25, wpb=7579.3, bsz=120, num_updates=31760, lr=1.51387e-05, gnorm=0.997, clip=60, loss_scale=64, train_wall=41, gb_free=31.3, wall=130159 2023-05-02 14:43:07 - progress_bar.py[line:274] - INFO: epoch 006: 1614 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7989.4, nsentences=120, sample_size=4093, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2014.8, ups=0.25, wpb=7989.4, bsz=120, num_updates=31770, lr=1.51334e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=130199 2023-05-02 14:43:46 - progress_bar.py[line:274] - INFO: epoch 006: 1624 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7718.3, nsentences=120, sample_size=4169, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1948.5, ups=0.25, wpb=7718.3, bsz=120, num_updates=31780, lr=1.51281e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=28.7, wall=130239 2023-05-02 14:44:27 - progress_bar.py[line:274] - INFO: epoch 006: 1634 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.21, ntokens=7565.9, nsentences=120, sample_size=4340.7, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1861.3, ups=0.25, wpb=7565.9, bsz=120, num_updates=31790, lr=1.51228e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=41, gb_free=29.5, wall=130279 2023-05-02 14:45:07 - progress_bar.py[line:274] - INFO: epoch 006: 1644 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7691.6, nsentences=120, sample_size=4092.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1938.9, ups=0.25, wpb=7691.6, bsz=120, num_updates=31800, lr=1.51175e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=130319 2023-05-02 14:45:47 - progress_bar.py[line:274] - INFO: epoch 006: 1654 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7793.7, nsentences=120, sample_size=4047.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1945.7, ups=0.25, wpb=7793.7, bsz=120, num_updates=31810, lr=1.51122e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=130359 2023-05-02 14:46:26 - progress_bar.py[line:274] - INFO: epoch 006: 1664 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7490, nsentences=120, sample_size=4177.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1894.5, ups=0.25, wpb=7490, bsz=120, num_updates=31820, lr=1.5107e-05, gnorm=0.96, clip=40, loss_scale=64, train_wall=39, gb_free=28.3, wall=130399 2023-05-02 14:47:06 - progress_bar.py[line:274] - INFO: epoch 006: 1674 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7926.9, nsentences=120, sample_size=3935.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1990.3, ups=0.25, wpb=7926.9, bsz=120, num_updates=31830, lr=1.51017e-05, gnorm=0.98, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=130438 2023-05-02 14:47:45 - progress_bar.py[line:274] - INFO: epoch 006: 1684 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7541.7, nsentences=120, sample_size=3867.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1912.7, ups=0.25, wpb=7541.7, bsz=120, num_updates=31840, lr=1.50964e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=130478 2023-05-02 14:48:25 - progress_bar.py[line:274] - INFO: epoch 006: 1694 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7779.9, nsentences=120, sample_size=4125.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1975.7, ups=0.25, wpb=7779.9, bsz=120, num_updates=31850, lr=1.50911e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=130517 2023-05-02 14:49:04 - progress_bar.py[line:274] - INFO: epoch 006: 1704 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7724.3, nsentences=120, sample_size=4251.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1951.3, ups=0.25, wpb=7724.3, bsz=120, num_updates=31860, lr=1.50858e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=130557 2023-05-02 14:49:44 - progress_bar.py[line:274] - INFO: epoch 006: 1714 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7969.9, nsentences=120, sample_size=3940.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2016.9, ups=0.25, wpb=7969.9, bsz=120, num_updates=31870, lr=1.50806e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=130596 2023-05-02 14:50:24 - progress_bar.py[line:274] - INFO: epoch 006: 1724 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7524.5, nsentences=120, sample_size=4045.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1895.3, ups=0.25, wpb=7524.5, bsz=120, num_updates=31880, lr=1.50753e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=130636 2023-05-02 14:51:03 - progress_bar.py[line:274] - INFO: epoch 006: 1734 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7895.8, nsentences=120, sample_size=3980.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1984.5, ups=0.25, wpb=7895.8, bsz=120, num_updates=31890, lr=1.507e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=130676 2023-05-02 14:51:43 - progress_bar.py[line:274] - INFO: epoch 006: 1744 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7469, nsentences=120, sample_size=4336.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1872.3, ups=0.25, wpb=7469, bsz=120, num_updates=31900, lr=1.50647e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=130716 2023-05-02 14:52:24 - progress_bar.py[line:274] - INFO: epoch 006: 1754 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=8148.7, nsentences=120, sample_size=4190.7, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2010.6, ups=0.25, wpb=8148.7, bsz=120, num_updates=31910, lr=1.50594e-05, gnorm=0.934, clip=0, loss_scale=64, train_wall=40, gb_free=30.1, wall=130756 2023-05-02 14:53:03 - progress_bar.py[line:274] - INFO: epoch 006: 1764 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7416.9, nsentences=120, sample_size=3914.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1892.2, ups=0.26, wpb=7416.9, bsz=120, num_updates=31920, lr=1.50541e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=130795 2023-05-02 14:53:43 - progress_bar.py[line:274] - INFO: epoch 006: 1774 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7598.2, nsentences=120, sample_size=4430.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.1, ups=0.25, wpb=7598.2, bsz=120, num_updates=31930, lr=1.50489e-05, gnorm=0.925, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=130835 2023-05-02 14:54:24 - progress_bar.py[line:274] - INFO: epoch 006: 1784 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7841.2, nsentences=120, sample_size=4097.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1915.7, ups=0.24, wpb=7841.2, bsz=120, num_updates=31940, lr=1.50436e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=41, gb_free=29.7, wall=130876 2023-05-02 14:55:03 - progress_bar.py[line:274] - INFO: epoch 006: 1794 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7486.5, nsentences=120, sample_size=4012.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1900.7, ups=0.25, wpb=7486.5, bsz=120, num_updates=31950, lr=1.50383e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=130915 2023-05-02 14:55:43 - progress_bar.py[line:274] - INFO: epoch 006: 1804 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7692.1, nsentences=120, sample_size=4208.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1910.4, ups=0.25, wpb=7692.1, bsz=120, num_updates=31960, lr=1.5033e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=130956 2023-05-02 14:56:23 - progress_bar.py[line:274] - INFO: epoch 006: 1814 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7667.1, nsentences=120, sample_size=4008, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.4, ups=0.25, wpb=7667.1, bsz=120, num_updates=31970, lr=1.50277e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=130996 2023-05-02 14:57:03 - progress_bar.py[line:274] - INFO: epoch 006: 1824 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7551.7, nsentences=120, sample_size=4330.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1931.9, ups=0.26, wpb=7551.7, bsz=120, num_updates=31980, lr=1.50224e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=131035 2023-05-02 14:57:42 - progress_bar.py[line:274] - INFO: epoch 006: 1834 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7673.9, nsentences=120, sample_size=4204.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1936.7, ups=0.25, wpb=7673.9, bsz=120, num_updates=31990, lr=1.50172e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=27.3, wall=131075 2023-05-02 14:58:22 - progress_bar.py[line:274] - INFO: epoch 006: 1844 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7472.7, nsentences=120, sample_size=4280.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1859.9, ups=0.25, wpb=7472.7, bsz=120, num_updates=32000, lr=1.50119e-05, gnorm=0.919, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=131115 2023-05-02 14:58:22 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 14:58:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 14:58:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:58:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 14:58:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:58:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 14:58:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:58:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:58:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:58:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 14:59:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:59:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:09 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 14:59:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:59:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 14:59:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 14:59:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 14:59:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 14:59:14 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.222 | loss_v1 0 | loss_v2 0 | nll_loss 2.055 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.16 | score 0.7554 | wps 3306.4 | wpb 3202.1 | bsz 39.4 | num_updates 32000 | best_score 0.7627 2023-05-02 14:59:14 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 32000 updates 2023-05-02 14:59:14 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_32000.pt 2023-05-02 14:59:38 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_32000.pt 2023-05-02 14:59:52 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_32000.pt (epoch 6 @ 32000 updates, score 0.7554) (writing took 38.298060736153275 seconds) 2023-05-02 15:00:32 - progress_bar.py[line:274] - INFO: epoch 006: 1854 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7817.6, nsentences=120, sample_size=4202, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=604, ups=0.08, wpb=7817.6, bsz=120, num_updates=32010, lr=1.50066e-05, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=131244 2023-05-02 15:01:11 - progress_bar.py[line:274] - INFO: epoch 006: 1864 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7756.3, nsentences=120, sample_size=3734.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1955.4, ups=0.25, wpb=7756.3, bsz=120, num_updates=32020, lr=1.50013e-05, gnorm=1.02, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=131284 2023-05-02 15:01:51 - progress_bar.py[line:274] - INFO: epoch 006: 1874 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7613.7, nsentences=120, sample_size=3825.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1946.9, ups=0.26, wpb=7613.7, bsz=120, num_updates=32030, lr=1.4996e-05, gnorm=1.001, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=131323 2023-05-02 15:02:30 - progress_bar.py[line:274] - INFO: epoch 006: 1884 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7597.2, nsentences=120, sample_size=4221.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1938.2, ups=0.26, wpb=7597.2, bsz=120, num_updates=32040, lr=1.49908e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=23.6, wall=131362 2023-05-02 15:03:09 - progress_bar.py[line:274] - INFO: epoch 006: 1894 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7705.7, nsentences=120, sample_size=4039.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1960.8, ups=0.25, wpb=7705.7, bsz=120, num_updates=32050, lr=1.49855e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=28.5, wall=131402 2023-05-02 15:03:48 - progress_bar.py[line:274] - INFO: epoch 006: 1904 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7666.7, nsentences=120, sample_size=4124.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1946.2, ups=0.25, wpb=7666.7, bsz=120, num_updates=32060, lr=1.49802e-05, gnorm=0.962, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=131441 2023-05-02 15:04:28 - progress_bar.py[line:274] - INFO: epoch 006: 1914 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7953.8, nsentences=120, sample_size=4118.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2005.5, ups=0.25, wpb=7953.8, bsz=120, num_updates=32070, lr=1.49749e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=131481 2023-05-02 15:05:09 - progress_bar.py[line:274] - INFO: epoch 006: 1924 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8017.2, nsentences=120, sample_size=4105.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1974.1, ups=0.25, wpb=8017.2, bsz=120, num_updates=32080, lr=1.49696e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=41, gb_free=31.1, wall=131521 2023-05-02 15:05:44 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 15:05:52 - progress_bar.py[line:274] - INFO: epoch 006: 1935 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7910.1, nsentences=120, sample_size=4029.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1816, ups=0.23, wpb=7910.1, bsz=120, num_updates=32090, lr=1.49643e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=43, gb_free=24.8, wall=131565 2023-05-02 15:06:32 - progress_bar.py[line:274] - INFO: epoch 006: 1945 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7777, nsentences=120, sample_size=4323.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1953.9, ups=0.25, wpb=7777, bsz=120, num_updates=32100, lr=1.49591e-05, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=131605 2023-05-02 15:07:12 - progress_bar.py[line:274] - INFO: epoch 006: 1955 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7867, nsentences=120, sample_size=4192.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1969.2, ups=0.25, wpb=7867, bsz=120, num_updates=32110, lr=1.49538e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=131644 2023-05-02 15:07:53 - progress_bar.py[line:274] - INFO: epoch 006: 1965 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7681.6, nsentences=120, sample_size=4076.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1875.9, ups=0.24, wpb=7681.6, bsz=120, num_updates=32120, lr=1.49485e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=131685 2023-05-02 15:08:33 - progress_bar.py[line:274] - INFO: epoch 006: 1975 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7935.1, nsentences=120, sample_size=3966.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1974.6, ups=0.25, wpb=7935.1, bsz=120, num_updates=32130, lr=1.49432e-05, gnorm=0.941, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=131726 2023-05-02 15:09:13 - progress_bar.py[line:274] - INFO: epoch 006: 1985 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7777.6, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1929.9, ups=0.25, wpb=7777.6, bsz=120, num_updates=32140, lr=1.49379e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=131766 2023-05-02 15:09:53 - progress_bar.py[line:274] - INFO: epoch 006: 1995 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7576.9, nsentences=120, sample_size=3746.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1901.4, ups=0.25, wpb=7576.9, bsz=120, num_updates=32150, lr=1.49327e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=131806 2023-05-02 15:10:34 - progress_bar.py[line:274] - INFO: epoch 006: 2005 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=8129.7, nsentences=120, sample_size=3667.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=2009, ups=0.25, wpb=8129.7, bsz=120, num_updates=32160, lr=1.49274e-05, gnorm=1.006, clip=60, loss_scale=64, train_wall=40, gb_free=31.3, wall=131846 2023-05-02 15:11:13 - progress_bar.py[line:274] - INFO: epoch 006: 2015 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7461.8, nsentences=120, sample_size=4043, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1878.9, ups=0.25, wpb=7461.8, bsz=120, num_updates=32170, lr=1.49221e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=131886 2023-05-02 15:11:53 - progress_bar.py[line:274] - INFO: epoch 006: 2025 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7846.7, nsentences=120, sample_size=4145.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1965.2, ups=0.25, wpb=7846.7, bsz=120, num_updates=32180, lr=1.49168e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=28.6, wall=131926 2023-05-02 15:12:34 - progress_bar.py[line:274] - INFO: epoch 006: 2035 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=8110.1, nsentences=120, sample_size=3994.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2017, ups=0.25, wpb=8110.1, bsz=120, num_updates=32190, lr=1.49115e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=131966 2023-05-02 15:13:13 - progress_bar.py[line:274] - INFO: epoch 006: 2045 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7452.6, nsentences=120, sample_size=4234.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1874.7, ups=0.25, wpb=7452.6, bsz=120, num_updates=32200, lr=1.49062e-05, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=132006 2023-05-02 15:13:53 - progress_bar.py[line:274] - INFO: epoch 006: 2055 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7835, nsentences=120, sample_size=3822.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1987.1, ups=0.25, wpb=7835, bsz=120, num_updates=32210, lr=1.4901e-05, gnorm=0.993, clip=50, loss_scale=64, train_wall=39, gb_free=31.3, wall=132045 2023-05-02 15:14:32 - progress_bar.py[line:274] - INFO: epoch 006: 2065 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7809.4, nsentences=120, sample_size=3811, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1973.2, ups=0.25, wpb=7809.4, bsz=120, num_updates=32220, lr=1.48957e-05, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=27.9, wall=132085 2023-05-02 15:15:12 - progress_bar.py[line:274] - INFO: epoch 006: 2075 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7713.2, nsentences=120, sample_size=4137.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1928.9, ups=0.25, wpb=7713.2, bsz=120, num_updates=32230, lr=1.48904e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=132125 2023-05-02 15:15:51 - progress_bar.py[line:274] - INFO: epoch 006: 2085 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7629.4, nsentences=120, sample_size=3650.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1964.3, ups=0.26, wpb=7629.4, bsz=120, num_updates=32240, lr=1.48851e-05, gnorm=0.976, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=132164 2023-05-02 15:16:31 - progress_bar.py[line:274] - INFO: epoch 006: 2095 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7810.2, nsentences=120, sample_size=4080.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1952.3, ups=0.25, wpb=7810.2, bsz=120, num_updates=32250, lr=1.48798e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=132204 2023-05-02 15:17:12 - progress_bar.py[line:274] - INFO: epoch 006: 2105 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7733.9, nsentences=120, sample_size=4406, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1905.1, ups=0.25, wpb=7733.9, bsz=120, num_updates=32260, lr=1.48745e-05, gnorm=0.908, clip=10, loss_scale=64, train_wall=41, gb_free=30.6, wall=132244 2023-05-02 15:17:52 - progress_bar.py[line:274] - INFO: epoch 006: 2115 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7964.5, nsentences=120, sample_size=4061.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1992, ups=0.25, wpb=7964.5, bsz=120, num_updates=32270, lr=1.48693e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=132284 2023-05-02 15:18:32 - progress_bar.py[line:274] - INFO: epoch 006: 2125 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7689.4, nsentences=120, sample_size=3967.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1917.9, ups=0.25, wpb=7689.4, bsz=120, num_updates=32280, lr=1.4864e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=132324 2023-05-02 15:19:12 - progress_bar.py[line:274] - INFO: epoch 006: 2135 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7479, nsentences=120, sample_size=4076.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1872.9, ups=0.25, wpb=7479, bsz=120, num_updates=32290, lr=1.48587e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=132364 2023-05-02 15:19:52 - progress_bar.py[line:274] - INFO: epoch 006: 2145 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7838.7, nsentences=120, sample_size=4211.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1936.3, ups=0.25, wpb=7838.7, bsz=120, num_updates=32300, lr=1.48534e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=132405 2023-05-02 15:20:31 - progress_bar.py[line:274] - INFO: epoch 006: 2155 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7578.3, nsentences=120, sample_size=4190.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1937.8, ups=0.26, wpb=7578.3, bsz=120, num_updates=32310, lr=1.48481e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=39, gb_free=29, wall=132444 2023-05-02 15:21:11 - progress_bar.py[line:274] - INFO: epoch 006: 2165 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7657.6, nsentences=120, sample_size=4147, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1958, ups=0.26, wpb=7657.6, bsz=120, num_updates=32320, lr=1.48429e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=132483 2023-05-02 15:21:51 - progress_bar.py[line:274] - INFO: epoch 006: 2175 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.213, ntokens=8000.8, nsentences=120, sample_size=4020.4, sample_size_v1=0, sample_size_v2=0, ppl=2.32, wps=1996.7, ups=0.25, wpb=8000.8, bsz=120, num_updates=32330, lr=1.48376e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=132523 2023-05-02 15:22:30 - progress_bar.py[line:274] - INFO: epoch 006: 2185 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7554.2, nsentences=120, sample_size=4188.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1901.4, ups=0.25, wpb=7554.2, bsz=120, num_updates=32340, lr=1.48323e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=28.1, wall=132563 2023-05-02 15:23:10 - progress_bar.py[line:274] - INFO: epoch 006: 2195 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7534.3, nsentences=120, sample_size=4013.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1904, ups=0.25, wpb=7534.3, bsz=120, num_updates=32350, lr=1.4827e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=132602 2023-05-02 15:23:49 - progress_bar.py[line:274] - INFO: epoch 006: 2205 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7340.7, nsentences=120, sample_size=4134.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1881, ups=0.26, wpb=7340.7, bsz=120, num_updates=32360, lr=1.48217e-05, gnorm=0.961, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=132641 2023-05-02 15:24:28 - progress_bar.py[line:274] - INFO: epoch 006: 2215 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7645.7, nsentences=120, sample_size=3925.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1955.6, ups=0.26, wpb=7645.7, bsz=120, num_updates=32370, lr=1.48164e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=132681 2023-05-02 15:25:08 - progress_bar.py[line:274] - INFO: epoch 006: 2225 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7927.1, nsentences=120, sample_size=4098.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1973.7, ups=0.25, wpb=7927.1, bsz=120, num_updates=32380, lr=1.48112e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=132721 2023-05-02 15:25:48 - progress_bar.py[line:274] - INFO: epoch 006: 2235 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7833.2, nsentences=120, sample_size=4070.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1971.7, ups=0.25, wpb=7833.2, bsz=120, num_updates=32390, lr=1.48059e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=132760 2023-05-02 15:26:28 - progress_bar.py[line:274] - INFO: epoch 006: 2245 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7478.4, nsentences=120, sample_size=4308.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1870.2, ups=0.25, wpb=7478.4, bsz=120, num_updates=32400, lr=1.48006e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=132800 2023-05-02 15:27:08 - progress_bar.py[line:274] - INFO: epoch 006: 2255 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7673.2, nsentences=120, sample_size=4041.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1904.4, ups=0.25, wpb=7673.2, bsz=120, num_updates=32410, lr=1.47953e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=132841 2023-05-02 15:27:49 - progress_bar.py[line:274] - INFO: epoch 006: 2265 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7777.9, nsentences=120, sample_size=3752.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1919.3, ups=0.25, wpb=7777.9, bsz=120, num_updates=32420, lr=1.479e-05, gnorm=1.008, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=132881 2023-05-02 15:28:28 - progress_bar.py[line:274] - INFO: epoch 006: 2275 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7577.4, nsentences=120, sample_size=4176.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1914.2, ups=0.25, wpb=7577.4, bsz=120, num_updates=32430, lr=1.47848e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=132921 2023-05-02 15:29:09 - progress_bar.py[line:274] - INFO: epoch 006: 2285 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7846.2, nsentences=120, sample_size=4259.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1947.4, ups=0.25, wpb=7846.2, bsz=120, num_updates=32440, lr=1.47795e-05, gnorm=0.926, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=132961 2023-05-02 15:29:48 - progress_bar.py[line:274] - INFO: epoch 006: 2295 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=8057, nsentences=120, sample_size=3925.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2034.9, ups=0.25, wpb=8057, bsz=120, num_updates=32450, lr=1.47742e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=133001 2023-05-02 15:30:29 - progress_bar.py[line:274] - INFO: epoch 006: 2305 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7736.2, nsentences=120, sample_size=4089.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1898.4, ups=0.25, wpb=7736.2, bsz=120, num_updates=32460, lr=1.47689e-05, gnorm=0.958, clip=40, loss_scale=64, train_wall=41, gb_free=29.1, wall=133041 2023-05-02 15:31:09 - progress_bar.py[line:274] - INFO: epoch 006: 2315 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7589.7, nsentences=120, sample_size=3765.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1915.3, ups=0.25, wpb=7589.7, bsz=120, num_updates=32470, lr=1.47636e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=133081 2023-05-02 15:31:49 - progress_bar.py[line:274] - INFO: epoch 006: 2325 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7739.7, nsentences=120, sample_size=4227, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1931.3, ups=0.25, wpb=7739.7, bsz=120, num_updates=32480, lr=1.47583e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=27.7, wall=133121 2023-05-02 15:32:27 - progress_bar.py[line:274] - INFO: epoch 006: 2335 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7666.4, nsentences=120, sample_size=4064.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1990.8, ups=0.26, wpb=7666.4, bsz=120, num_updates=32490, lr=1.47531e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=38, gb_free=27.6, wall=133160 2023-05-02 15:33:06 - progress_bar.py[line:274] - INFO: epoch 006: 2345 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7854.7, nsentences=120, sample_size=3538.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2005.8, ups=0.26, wpb=7854.7, bsz=120, num_updates=32500, lr=1.47478e-05, gnorm=1.016, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=133199 2023-05-02 15:33:47 - progress_bar.py[line:274] - INFO: epoch 006: 2355 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7980.4, nsentences=120, sample_size=4061.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1979.1, ups=0.25, wpb=7980.4, bsz=120, num_updates=32510, lr=1.47425e-05, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=28.3, wall=133239 2023-05-02 15:34:27 - progress_bar.py[line:274] - INFO: epoch 006: 2365 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7742.9, nsentences=120, sample_size=3859.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1917.7, ups=0.25, wpb=7742.9, bsz=120, num_updates=32520, lr=1.47372e-05, gnorm=1.024, clip=60, loss_scale=64, train_wall=40, gb_free=29.5, wall=133280 2023-05-02 15:35:06 - progress_bar.py[line:274] - INFO: epoch 006: 2375 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7433.7, nsentences=120, sample_size=3902.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1896.2, ups=0.26, wpb=7433.7, bsz=120, num_updates=32530, lr=1.47319e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=133319 2023-05-02 15:35:46 - progress_bar.py[line:274] - INFO: epoch 006: 2385 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7799.1, nsentences=120, sample_size=4014.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1961.8, ups=0.25, wpb=7799.1, bsz=120, num_updates=32540, lr=1.47266e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=133358 2023-05-02 15:36:25 - progress_bar.py[line:274] - INFO: epoch 006: 2395 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7547, nsentences=120, sample_size=4178.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917.7, ups=0.25, wpb=7547, bsz=120, num_updates=32550, lr=1.47214e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=133398 2023-05-02 15:37:05 - progress_bar.py[line:274] - INFO: epoch 006: 2405 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7669.9, nsentences=120, sample_size=3842.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1936, ups=0.25, wpb=7669.9, bsz=120, num_updates=32560, lr=1.47161e-05, gnorm=1.01, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=133437 2023-05-02 15:37:45 - progress_bar.py[line:274] - INFO: epoch 006: 2415 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7550.5, nsentences=120, sample_size=4122.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1869.3, ups=0.25, wpb=7550.5, bsz=120, num_updates=32570, lr=1.47108e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=133478 2023-05-02 15:38:25 - progress_bar.py[line:274] - INFO: epoch 006: 2425 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7613.2, nsentences=120, sample_size=3723.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1909.5, ups=0.25, wpb=7613.2, bsz=120, num_updates=32580, lr=1.47055e-05, gnorm=0.989, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=133518 2023-05-02 15:39:05 - progress_bar.py[line:274] - INFO: epoch 006: 2435 / 6042 loss=2.437, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=8253.1, nsentences=120, sample_size=4272.7, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2084.9, ups=0.25, wpb=8253.1, bsz=120, num_updates=32590, lr=1.47002e-05, gnorm=0.931, clip=0, loss_scale=64, train_wall=40, gb_free=29.5, wall=133557 2023-05-02 15:39:45 - progress_bar.py[line:274] - INFO: epoch 006: 2445 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=8136.1, nsentences=120, sample_size=4088.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2028.7, ups=0.25, wpb=8136.1, bsz=120, num_updates=32600, lr=1.4695e-05, gnorm=0.948, clip=20, loss_scale=128, train_wall=40, gb_free=29.9, wall=133597 2023-05-02 15:40:25 - progress_bar.py[line:274] - INFO: epoch 006: 2455 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7687.6, nsentences=120, sample_size=3997.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1896.9, ups=0.25, wpb=7687.6, bsz=120, num_updates=32610, lr=1.46897e-05, gnorm=0.986, clip=50, loss_scale=128, train_wall=40, gb_free=30.2, wall=133638 2023-05-02 15:41:04 - progress_bar.py[line:274] - INFO: epoch 006: 2465 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7711.8, nsentences=120, sample_size=4280.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1980.3, ups=0.26, wpb=7711.8, bsz=120, num_updates=32620, lr=1.46844e-05, gnorm=0.94, clip=20, loss_scale=128, train_wall=39, gb_free=29.5, wall=133677 2023-05-02 15:41:09 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 15:41:49 - progress_bar.py[line:274] - INFO: epoch 006: 2476 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7979.6, nsentences=120, sample_size=3907.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1806.9, ups=0.23, wpb=7979.6, bsz=120, num_updates=32630, lr=1.46791e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=44, gb_free=30.3, wall=133721 2023-05-02 15:42:28 - progress_bar.py[line:274] - INFO: epoch 006: 2486 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=8089.4, nsentences=120, sample_size=4136.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2034.3, ups=0.25, wpb=8089.4, bsz=120, num_updates=32640, lr=1.46738e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=133761 2023-05-02 15:43:08 - progress_bar.py[line:274] - INFO: epoch 006: 2496 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7456.6, nsentences=120, sample_size=3911.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1881.1, ups=0.25, wpb=7456.6, bsz=120, num_updates=32650, lr=1.46685e-05, gnorm=1, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=133800 2023-05-02 15:43:48 - progress_bar.py[line:274] - INFO: epoch 006: 2506 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7682.6, nsentences=120, sample_size=3944.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1935.1, ups=0.25, wpb=7682.6, bsz=120, num_updates=32660, lr=1.46633e-05, gnorm=1.033, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=133840 2023-05-02 15:44:27 - progress_bar.py[line:274] - INFO: epoch 006: 2516 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7771.6, nsentences=120, sample_size=4038.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1958.4, ups=0.25, wpb=7771.6, bsz=120, num_updates=32670, lr=1.4658e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=133880 2023-05-02 15:45:07 - progress_bar.py[line:274] - INFO: epoch 006: 2526 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7858.3, nsentences=120, sample_size=4083.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1984.5, ups=0.25, wpb=7858.3, bsz=120, num_updates=32680, lr=1.46527e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=133919 2023-05-02 15:45:47 - progress_bar.py[line:274] - INFO: epoch 006: 2536 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7943.1, nsentences=120, sample_size=3832.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1988.4, ups=0.25, wpb=7943.1, bsz=120, num_updates=32690, lr=1.46474e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=31.6, wall=133959 2023-05-02 15:46:26 - progress_bar.py[line:274] - INFO: epoch 006: 2546 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7579.1, nsentences=120, sample_size=3852.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1920.3, ups=0.25, wpb=7579.1, bsz=120, num_updates=32700, lr=1.46421e-05, gnorm=1.008, clip=60, loss_scale=64, train_wall=39, gb_free=30.7, wall=133999 2023-05-02 15:47:06 - progress_bar.py[line:274] - INFO: epoch 006: 2556 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7927.1, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1989.2, ups=0.25, wpb=7927.1, bsz=120, num_updates=32710, lr=1.46369e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=134039 2023-05-02 15:47:46 - progress_bar.py[line:274] - INFO: epoch 006: 2566 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7506.6, nsentences=120, sample_size=4095.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1893.8, ups=0.25, wpb=7506.6, bsz=120, num_updates=32720, lr=1.46316e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=134078 2023-05-02 15:48:26 - progress_bar.py[line:274] - INFO: epoch 006: 2576 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7828.3, nsentences=120, sample_size=3856.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1964.9, ups=0.25, wpb=7828.3, bsz=120, num_updates=32730, lr=1.46263e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=134118 2023-05-02 15:49:05 - progress_bar.py[line:274] - INFO: epoch 006: 2586 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7910.9, nsentences=120, sample_size=3722.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2002, ups=0.25, wpb=7910.9, bsz=120, num_updates=32740, lr=1.4621e-05, gnorm=1.038, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=134158 2023-05-02 15:49:45 - progress_bar.py[line:274] - INFO: epoch 006: 2596 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7614, nsentences=120, sample_size=4006.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1926.9, ups=0.25, wpb=7614, bsz=120, num_updates=32750, lr=1.46157e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=30.6, wall=134197 2023-05-02 15:50:25 - progress_bar.py[line:274] - INFO: epoch 006: 2606 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7752.2, nsentences=120, sample_size=3683.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1938, ups=0.25, wpb=7752.2, bsz=120, num_updates=32760, lr=1.46104e-05, gnorm=1.012, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=134237 2023-05-02 15:50:49 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 15:51:09 - progress_bar.py[line:274] - INFO: epoch 006: 2617 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7670.8, nsentences=120, sample_size=3753.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1745, ups=0.23, wpb=7670.8, bsz=120, num_updates=32770, lr=1.46052e-05, gnorm=1.005, clip=60, loss_scale=32, train_wall=44, gb_free=29.6, wall=134281 2023-05-02 15:51:48 - progress_bar.py[line:274] - INFO: epoch 006: 2627 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7650.3, nsentences=120, sample_size=3822.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1926.4, ups=0.25, wpb=7650.3, bsz=120, num_updates=32780, lr=1.45999e-05, gnorm=1, clip=50, loss_scale=32, train_wall=40, gb_free=31.1, wall=134321 2023-05-02 15:52:28 - progress_bar.py[line:274] - INFO: epoch 006: 2637 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7309, nsentences=120, sample_size=4055.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1844.3, ups=0.25, wpb=7309, bsz=120, num_updates=32790, lr=1.45946e-05, gnorm=0.989, clip=40, loss_scale=32, train_wall=40, gb_free=30.1, wall=134361 2023-05-02 15:53:07 - progress_bar.py[line:274] - INFO: epoch 006: 2647 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7662.9, nsentences=120, sample_size=4138.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1956.5, ups=0.26, wpb=7662.9, bsz=120, num_updates=32800, lr=1.45893e-05, gnorm=0.951, clip=20, loss_scale=32, train_wall=39, gb_free=31, wall=134400 2023-05-02 15:53:46 - progress_bar.py[line:274] - INFO: epoch 006: 2657 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7513.6, nsentences=120, sample_size=3791.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1928.9, ups=0.26, wpb=7513.6, bsz=120, num_updates=32810, lr=1.4584e-05, gnorm=0.988, clip=40, loss_scale=32, train_wall=39, gb_free=31.1, wall=134439 2023-05-02 15:54:26 - progress_bar.py[line:274] - INFO: epoch 006: 2667 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7891.3, nsentences=120, sample_size=3962.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1964.5, ups=0.25, wpb=7891.3, bsz=120, num_updates=32820, lr=1.45787e-05, gnorm=0.971, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=134479 2023-05-02 15:55:06 - progress_bar.py[line:274] - INFO: epoch 006: 2677 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7656.6, nsentences=120, sample_size=4178.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1954.7, ups=0.26, wpb=7656.6, bsz=120, num_updates=32830, lr=1.45735e-05, gnorm=0.962, clip=40, loss_scale=32, train_wall=39, gb_free=28.5, wall=134518 2023-05-02 15:55:46 - progress_bar.py[line:274] - INFO: epoch 006: 2687 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7632.8, nsentences=120, sample_size=4239.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1897.4, ups=0.25, wpb=7632.8, bsz=120, num_updates=32840, lr=1.45682e-05, gnorm=0.948, clip=30, loss_scale=32, train_wall=40, gb_free=29.7, wall=134558 2023-05-02 15:56:26 - progress_bar.py[line:274] - INFO: epoch 006: 2697 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7937.6, nsentences=120, sample_size=3847.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1964.5, ups=0.25, wpb=7937.6, bsz=120, num_updates=32850, lr=1.45629e-05, gnorm=0.993, clip=60, loss_scale=32, train_wall=40, gb_free=30.3, wall=134599 2023-05-02 15:57:06 - progress_bar.py[line:274] - INFO: epoch 006: 2707 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7709.3, nsentences=120, sample_size=4254.9, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1944, ups=0.25, wpb=7709.3, bsz=120, num_updates=32860, lr=1.45576e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=28.6, wall=134638 2023-05-02 15:57:46 - progress_bar.py[line:274] - INFO: epoch 006: 2717 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7951.9, nsentences=120, sample_size=3812.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2004.7, ups=0.25, wpb=7951.9, bsz=120, num_updates=32870, lr=1.45523e-05, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=28.9, wall=134678 2023-05-02 15:58:26 - progress_bar.py[line:274] - INFO: epoch 006: 2727 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.195, ntokens=7958.2, nsentences=120, sample_size=3779.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1966.5, ups=0.25, wpb=7958.2, bsz=120, num_updates=32880, lr=1.45471e-05, gnorm=0.995, clip=50, loss_scale=32, train_wall=40, gb_free=29.2, wall=134718 2023-05-02 15:59:06 - progress_bar.py[line:274] - INFO: epoch 006: 2737 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7467.6, nsentences=120, sample_size=3831.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1872.8, ups=0.25, wpb=7467.6, bsz=120, num_updates=32890, lr=1.45418e-05, gnorm=0.992, clip=40, loss_scale=32, train_wall=40, gb_free=29.7, wall=134758 2023-05-02 15:59:46 - progress_bar.py[line:274] - INFO: epoch 006: 2747 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7854.8, nsentences=120, sample_size=3990.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1978, ups=0.25, wpb=7854.8, bsz=120, num_updates=32900, lr=1.45365e-05, gnorm=0.951, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=134798 2023-05-02 16:00:25 - progress_bar.py[line:274] - INFO: epoch 006: 2757 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7427.2, nsentences=120, sample_size=3938.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1882.3, ups=0.25, wpb=7427.2, bsz=120, num_updates=32910, lr=1.45312e-05, gnorm=0.986, clip=30, loss_scale=32, train_wall=39, gb_free=30.3, wall=134837 2023-05-02 16:01:05 - progress_bar.py[line:274] - INFO: epoch 006: 2767 / 6042 loss=2.454, loss_v1=0, loss_v2=0, nll_loss=1.206, ntokens=7976.1, nsentences=120, sample_size=4091.2, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1985.7, ups=0.25, wpb=7976.1, bsz=120, num_updates=32920, lr=1.45259e-05, gnorm=0.958, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=134878 2023-05-02 16:01:45 - progress_bar.py[line:274] - INFO: epoch 006: 2777 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7549.6, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.2, ups=0.25, wpb=7549.6, bsz=120, num_updates=32930, lr=1.45206e-05, gnorm=0.994, clip=40, loss_scale=32, train_wall=40, gb_free=28.3, wall=134917 2023-05-02 16:02:25 - progress_bar.py[line:274] - INFO: epoch 006: 2787 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7741.5, nsentences=120, sample_size=4031.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1939.7, ups=0.25, wpb=7741.5, bsz=120, num_updates=32940, lr=1.45154e-05, gnorm=0.966, clip=30, loss_scale=32, train_wall=40, gb_free=30, wall=134957 2023-05-02 16:03:05 - progress_bar.py[line:274] - INFO: epoch 006: 2797 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7988.1, nsentences=120, sample_size=4077.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1988.7, ups=0.25, wpb=7988.1, bsz=120, num_updates=32950, lr=1.45101e-05, gnorm=0.976, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=134997 2023-05-02 16:03:45 - progress_bar.py[line:274] - INFO: epoch 006: 2807 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7416.3, nsentences=120, sample_size=4211.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1861.7, ups=0.25, wpb=7416.3, bsz=120, num_updates=32960, lr=1.45048e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=135037 2023-05-02 16:04:25 - progress_bar.py[line:274] - INFO: epoch 006: 2817 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7847.4, nsentences=120, sample_size=3945, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1948.4, ups=0.25, wpb=7847.4, bsz=120, num_updates=32970, lr=1.44995e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=31, wall=135078 2023-05-02 16:05:05 - progress_bar.py[line:274] - INFO: epoch 006: 2827 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7805.6, nsentences=120, sample_size=3973.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1945.9, ups=0.25, wpb=7805.6, bsz=120, num_updates=32980, lr=1.44942e-05, gnorm=0.984, clip=30, loss_scale=32, train_wall=40, gb_free=29.1, wall=135118 2023-05-02 16:05:46 - progress_bar.py[line:274] - INFO: epoch 006: 2837 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7953, nsentences=120, sample_size=4181.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1966.8, ups=0.25, wpb=7953, bsz=120, num_updates=32990, lr=1.4489e-05, gnorm=0.973, clip=30, loss_scale=32, train_wall=40, gb_free=30.1, wall=135158 2023-05-02 16:06:25 - progress_bar.py[line:274] - INFO: epoch 006: 2847 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7601, nsentences=120, sample_size=4139.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1938, ups=0.25, wpb=7601, bsz=120, num_updates=33000, lr=1.44837e-05, gnorm=0.967, clip=40, loss_scale=32, train_wall=39, gb_free=29.2, wall=135197 2023-05-02 16:06:25 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 16:06:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 16:06:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:06:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 16:06:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:06:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 16:06:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:06:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:06:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:06:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 16:07:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:07:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:11 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 16:07:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:07:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:16 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 16:07:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 16:07:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 16:07:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 16:07:16 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.218 | loss_v1 0 | loss_v2 0 | nll_loss 2.052 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.15 | score 0.751 | wps 3296.4 | wpb 3202.1 | bsz 39.4 | num_updates 33000 | best_score 0.7627 2023-05-02 16:07:16 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 33000 updates 2023-05-02 16:07:16 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_33000.pt 2023-05-02 16:07:41 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_33000.pt 2023-05-02 16:07:54 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_33000.pt (epoch 6 @ 33000 updates, score 0.751) (writing took 37.99327437300235 seconds) 2023-05-02 16:08:33 - progress_bar.py[line:274] - INFO: epoch 006: 2857 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7545.6, nsentences=120, sample_size=4195.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=588.1, ups=0.08, wpb=7545.6, bsz=120, num_updates=33010, lr=1.44784e-05, gnorm=0.962, clip=30, loss_scale=32, train_wall=39, gb_free=27.9, wall=135326 2023-05-02 16:09:13 - progress_bar.py[line:274] - INFO: epoch 006: 2867 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7623.7, nsentences=120, sample_size=4018.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1911.4, ups=0.25, wpb=7623.7, bsz=120, num_updates=33020, lr=1.44731e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=40, gb_free=30.7, wall=135366 2023-05-02 16:09:53 - progress_bar.py[line:274] - INFO: epoch 006: 2877 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=8090.2, nsentences=120, sample_size=3713.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2031.8, ups=0.25, wpb=8090.2, bsz=120, num_updates=33030, lr=1.44678e-05, gnorm=0.986, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=135405 2023-05-02 16:10:33 - progress_bar.py[line:274] - INFO: epoch 006: 2887 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7336.9, nsentences=120, sample_size=4132.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1848.8, ups=0.25, wpb=7336.9, bsz=120, num_updates=33040, lr=1.44625e-05, gnorm=0.964, clip=20, loss_scale=32, train_wall=40, gb_free=30, wall=135445 2023-05-02 16:11:13 - progress_bar.py[line:274] - INFO: epoch 006: 2897 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7755.3, nsentences=120, sample_size=3999.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1918.6, ups=0.25, wpb=7755.3, bsz=120, num_updates=33050, lr=1.44573e-05, gnorm=0.974, clip=20, loss_scale=32, train_wall=40, gb_free=29.6, wall=135485 2023-05-02 16:11:53 - progress_bar.py[line:274] - INFO: epoch 006: 2907 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7578.3, nsentences=120, sample_size=3959, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1902.9, ups=0.25, wpb=7578.3, bsz=120, num_updates=33060, lr=1.4452e-05, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=30.2, wall=135525 2023-05-02 16:12:32 - progress_bar.py[line:274] - INFO: epoch 006: 2917 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7675.9, nsentences=120, sample_size=4030.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1956.5, ups=0.25, wpb=7675.9, bsz=120, num_updates=33070, lr=1.44467e-05, gnorm=0.955, clip=30, loss_scale=32, train_wall=39, gb_free=30.2, wall=135565 2023-05-02 16:13:12 - progress_bar.py[line:274] - INFO: epoch 006: 2927 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7668.3, nsentences=120, sample_size=4153.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1929.3, ups=0.25, wpb=7668.3, bsz=120, num_updates=33080, lr=1.44414e-05, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=30.4, wall=135604 2023-05-02 16:13:51 - progress_bar.py[line:274] - INFO: epoch 006: 2937 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7785.9, nsentences=120, sample_size=4377.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1967.7, ups=0.25, wpb=7785.9, bsz=120, num_updates=33090, lr=1.44361e-05, gnorm=0.927, clip=0, loss_scale=32, train_wall=39, gb_free=29.5, wall=135644 2023-05-02 16:14:33 - progress_bar.py[line:274] - INFO: epoch 006: 2947 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=8063.3, nsentences=120, sample_size=4102.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1958.9, ups=0.24, wpb=8063.3, bsz=120, num_updates=33100, lr=1.44308e-05, gnorm=0.95, clip=20, loss_scale=32, train_wall=41, gb_free=30, wall=135685 2023-05-02 16:15:12 - progress_bar.py[line:274] - INFO: epoch 006: 2957 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7656, nsentences=120, sample_size=4334.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1932.1, ups=0.25, wpb=7656, bsz=120, num_updates=33110, lr=1.44256e-05, gnorm=0.923, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=135725 2023-05-02 16:15:52 - progress_bar.py[line:274] - INFO: epoch 006: 2967 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7579.3, nsentences=120, sample_size=4195.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1917.6, ups=0.25, wpb=7579.3, bsz=120, num_updates=33120, lr=1.44203e-05, gnorm=0.941, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=135764 2023-05-02 16:16:31 - progress_bar.py[line:274] - INFO: epoch 006: 2977 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7590.6, nsentences=120, sample_size=4122.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1920.7, ups=0.25, wpb=7590.6, bsz=120, num_updates=33130, lr=1.4415e-05, gnorm=0.952, clip=0, loss_scale=32, train_wall=39, gb_free=27.1, wall=135804 2023-05-02 16:17:11 - progress_bar.py[line:274] - INFO: epoch 006: 2987 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7445.3, nsentences=120, sample_size=4137, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1862.6, ups=0.25, wpb=7445.3, bsz=120, num_updates=33140, lr=1.44097e-05, gnorm=1.013, clip=50, loss_scale=32, train_wall=40, gb_free=31.1, wall=135844 2023-05-02 16:17:51 - progress_bar.py[line:274] - INFO: epoch 006: 2997 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7895.4, nsentences=120, sample_size=3972.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1964.4, ups=0.25, wpb=7895.4, bsz=120, num_updates=33150, lr=1.44044e-05, gnorm=0.968, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=135884 2023-05-02 16:18:31 - progress_bar.py[line:274] - INFO: epoch 006: 3007 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7518.3, nsentences=120, sample_size=4474.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1882.1, ups=0.25, wpb=7518.3, bsz=120, num_updates=33160, lr=1.43992e-05, gnorm=0.912, clip=10, loss_scale=32, train_wall=40, gb_free=29.2, wall=135924 2023-05-02 16:19:12 - progress_bar.py[line:274] - INFO: epoch 006: 3017 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7533.5, nsentences=120, sample_size=4459.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1870.2, ups=0.25, wpb=7533.5, bsz=120, num_updates=33170, lr=1.43939e-05, gnorm=0.925, clip=10, loss_scale=32, train_wall=40, gb_free=28.5, wall=135964 2023-05-02 16:19:50 - progress_bar.py[line:274] - INFO: epoch 006: 3027 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7486.5, nsentences=120, sample_size=4213.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1930, ups=0.26, wpb=7486.5, bsz=120, num_updates=33180, lr=1.43886e-05, gnorm=0.946, clip=30, loss_scale=32, train_wall=39, gb_free=30.9, wall=136003 2023-05-02 16:20:31 - progress_bar.py[line:274] - INFO: epoch 006: 3037 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=8229, nsentences=120, sample_size=4099.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2033, ups=0.25, wpb=8229, bsz=120, num_updates=33190, lr=1.43833e-05, gnorm=0.942, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=136043 2023-05-02 16:21:11 - progress_bar.py[line:274] - INFO: epoch 006: 3047 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7628.9, nsentences=120, sample_size=3742.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1915.6, ups=0.25, wpb=7628.9, bsz=120, num_updates=33200, lr=1.4378e-05, gnorm=1.019, clip=60, loss_scale=32, train_wall=40, gb_free=27.9, wall=136083 2023-05-02 16:21:50 - progress_bar.py[line:274] - INFO: epoch 006: 3057 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=7817.8, nsentences=120, sample_size=4071, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1997.1, ups=0.26, wpb=7817.8, bsz=120, num_updates=33210, lr=1.43727e-05, gnorm=0.975, clip=30, loss_scale=32, train_wall=39, gb_free=30.6, wall=136122 2023-05-02 16:22:31 - progress_bar.py[line:274] - INFO: epoch 006: 3067 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7798, nsentences=120, sample_size=4108.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1904.4, ups=0.24, wpb=7798, bsz=120, num_updates=33220, lr=1.43675e-05, gnorm=0.971, clip=30, loss_scale=32, train_wall=41, gb_free=29.9, wall=136163 2023-05-02 16:23:11 - progress_bar.py[line:274] - INFO: epoch 006: 3077 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7769.6, nsentences=120, sample_size=4039.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1933.3, ups=0.25, wpb=7769.6, bsz=120, num_updates=33230, lr=1.43622e-05, gnorm=0.968, clip=30, loss_scale=32, train_wall=40, gb_free=28.4, wall=136203 2023-05-02 16:23:50 - progress_bar.py[line:274] - INFO: epoch 006: 3087 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7665.3, nsentences=120, sample_size=4218, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.9, ups=0.26, wpb=7665.3, bsz=120, num_updates=33240, lr=1.43569e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=39, gb_free=29.2, wall=136242 2023-05-02 16:24:29 - progress_bar.py[line:274] - INFO: epoch 006: 3097 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7525.2, nsentences=120, sample_size=3852.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1920.1, ups=0.26, wpb=7525.2, bsz=120, num_updates=33250, lr=1.43516e-05, gnorm=0.993, clip=30, loss_scale=32, train_wall=39, gb_free=31.3, wall=136282 2023-05-02 16:25:09 - progress_bar.py[line:274] - INFO: epoch 006: 3107 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7873.5, nsentences=120, sample_size=3749.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1973.7, ups=0.25, wpb=7873.5, bsz=120, num_updates=33260, lr=1.43463e-05, gnorm=0.989, clip=40, loss_scale=32, train_wall=40, gb_free=28.5, wall=136321 2023-05-02 16:25:48 - progress_bar.py[line:274] - INFO: epoch 006: 3117 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7724, nsentences=120, sample_size=4098.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958.1, ups=0.25, wpb=7724, bsz=120, num_updates=33270, lr=1.43411e-05, gnorm=0.944, clip=30, loss_scale=32, train_wall=39, gb_free=29.8, wall=136361 2023-05-02 16:26:28 - progress_bar.py[line:274] - INFO: epoch 006: 3127 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7511.4, nsentences=120, sample_size=3741.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1897.9, ups=0.25, wpb=7511.4, bsz=120, num_updates=33280, lr=1.43358e-05, gnorm=1.035, clip=70, loss_scale=64, train_wall=40, gb_free=29.9, wall=136401 2023-05-02 16:27:07 - progress_bar.py[line:274] - INFO: epoch 006: 3137 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7530.1, nsentences=120, sample_size=4198.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1927.9, ups=0.26, wpb=7530.1, bsz=120, num_updates=33290, lr=1.43305e-05, gnorm=0.965, clip=10, loss_scale=64, train_wall=39, gb_free=30.4, wall=136440 2023-05-02 16:27:47 - progress_bar.py[line:274] - INFO: epoch 006: 3147 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7626.7, nsentences=120, sample_size=3933.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1920.7, ups=0.25, wpb=7626.7, bsz=120, num_updates=33300, lr=1.43252e-05, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=136479 2023-05-02 16:28:27 - progress_bar.py[line:274] - INFO: epoch 006: 3157 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7899.1, nsentences=120, sample_size=4078.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1969.1, ups=0.25, wpb=7899.1, bsz=120, num_updates=33310, lr=1.43199e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=136519 2023-05-02 16:29:07 - progress_bar.py[line:274] - INFO: epoch 006: 3167 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7860.5, nsentences=120, sample_size=3832.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1980.7, ups=0.25, wpb=7860.5, bsz=120, num_updates=33320, lr=1.43146e-05, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=136559 2023-05-02 16:29:46 - progress_bar.py[line:274] - INFO: epoch 006: 3177 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7625.3, nsentences=120, sample_size=3601.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1915.4, ups=0.25, wpb=7625.3, bsz=120, num_updates=33330, lr=1.43094e-05, gnorm=1.03, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=136599 2023-05-02 16:30:26 - progress_bar.py[line:274] - INFO: epoch 006: 3187 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7725.1, nsentences=120, sample_size=3951.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1955.2, ups=0.25, wpb=7725.1, bsz=120, num_updates=33340, lr=1.43041e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=136638 2023-05-02 16:31:06 - progress_bar.py[line:274] - INFO: epoch 006: 3197 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7866.1, nsentences=120, sample_size=4129.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1957.5, ups=0.25, wpb=7866.1, bsz=120, num_updates=33350, lr=1.42988e-05, gnorm=0.963, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=136679 2023-05-02 16:31:45 - progress_bar.py[line:274] - INFO: epoch 006: 3207 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7751.6, nsentences=120, sample_size=3990.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1971.7, ups=0.25, wpb=7751.6, bsz=120, num_updates=33360, lr=1.42935e-05, gnorm=0.98, clip=50, loss_scale=64, train_wall=39, gb_free=29.5, wall=136718 2023-05-02 16:32:25 - progress_bar.py[line:274] - INFO: epoch 006: 3217 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7721.8, nsentences=120, sample_size=4269.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1943.9, ups=0.25, wpb=7721.8, bsz=120, num_updates=33370, lr=1.42882e-05, gnorm=0.939, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=136758 2023-05-02 16:33:05 - progress_bar.py[line:274] - INFO: epoch 006: 3227 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=8106.8, nsentences=120, sample_size=3940.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2048.1, ups=0.25, wpb=8106.8, bsz=120, num_updates=33380, lr=1.42829e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=136797 2023-05-02 16:33:44 - progress_bar.py[line:274] - INFO: epoch 006: 3237 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7801.4, nsentences=120, sample_size=4107.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1986.8, ups=0.25, wpb=7801.4, bsz=120, num_updates=33390, lr=1.42777e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=136836 2023-05-02 16:34:24 - progress_bar.py[line:274] - INFO: epoch 006: 3247 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7748.8, nsentences=120, sample_size=4024.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1938.7, ups=0.25, wpb=7748.8, bsz=120, num_updates=33400, lr=1.42724e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=136876 2023-05-02 16:35:03 - progress_bar.py[line:274] - INFO: epoch 006: 3257 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7491.7, nsentences=120, sample_size=4197.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1909.7, ups=0.25, wpb=7491.7, bsz=120, num_updates=33410, lr=1.42671e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=136916 2023-05-02 16:35:44 - progress_bar.py[line:274] - INFO: epoch 006: 3267 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7667.7, nsentences=120, sample_size=4111.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1894.1, ups=0.25, wpb=7667.7, bsz=120, num_updates=33420, lr=1.42618e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=136956 2023-05-02 16:36:23 - progress_bar.py[line:274] - INFO: epoch 006: 3277 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7910.4, nsentences=120, sample_size=3872.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2007.5, ups=0.25, wpb=7910.4, bsz=120, num_updates=33430, lr=1.42565e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=136996 2023-05-02 16:37:02 - progress_bar.py[line:274] - INFO: epoch 006: 3287 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7702.2, nsentences=120, sample_size=3564.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1964, ups=0.25, wpb=7702.2, bsz=120, num_updates=33440, lr=1.42513e-05, gnorm=1.015, clip=60, loss_scale=64, train_wall=39, gb_free=28.9, wall=137035 2023-05-02 16:37:43 - progress_bar.py[line:274] - INFO: epoch 006: 3297 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7793.1, nsentences=120, sample_size=4104.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1915, ups=0.25, wpb=7793.1, bsz=120, num_updates=33450, lr=1.4246e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=41, gb_free=30.6, wall=137076 2023-05-02 16:38:22 - progress_bar.py[line:274] - INFO: epoch 006: 3307 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7524.6, nsentences=120, sample_size=3946.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1917.7, ups=0.25, wpb=7524.6, bsz=120, num_updates=33460, lr=1.42407e-05, gnorm=0.967, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=137115 2023-05-02 16:39:02 - progress_bar.py[line:274] - INFO: epoch 006: 3317 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7537, nsentences=120, sample_size=4009.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1902.9, ups=0.25, wpb=7537, bsz=120, num_updates=33470, lr=1.42354e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=137154 2023-05-02 16:39:42 - progress_bar.py[line:274] - INFO: epoch 006: 3327 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7468.8, nsentences=120, sample_size=3863.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1855, ups=0.25, wpb=7468.8, bsz=120, num_updates=33480, lr=1.42301e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=137195 2023-05-02 16:40:22 - progress_bar.py[line:274] - INFO: epoch 006: 3337 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7667.3, nsentences=120, sample_size=4171.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1912.5, ups=0.25, wpb=7667.3, bsz=120, num_updates=33490, lr=1.42248e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=137235 2023-05-02 16:41:03 - progress_bar.py[line:274] - INFO: epoch 006: 3347 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7933.9, nsentences=120, sample_size=4234.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1961, ups=0.25, wpb=7933.9, bsz=120, num_updates=33500, lr=1.42196e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=29, wall=137275 2023-05-02 16:41:43 - progress_bar.py[line:274] - INFO: epoch 006: 3357 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7501.8, nsentences=120, sample_size=3870.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1881.6, ups=0.25, wpb=7501.8, bsz=120, num_updates=33510, lr=1.42143e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=137315 2023-05-02 16:42:22 - progress_bar.py[line:274] - INFO: epoch 006: 3367 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7549.5, nsentences=120, sample_size=4422.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1905.1, ups=0.25, wpb=7549.5, bsz=120, num_updates=33520, lr=1.4209e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=137355 2023-05-02 16:43:02 - progress_bar.py[line:274] - INFO: epoch 006: 3377 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7675.9, nsentences=120, sample_size=4276.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1927.5, ups=0.25, wpb=7675.9, bsz=120, num_updates=33530, lr=1.42037e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=28.8, wall=137394 2023-05-02 16:43:41 - progress_bar.py[line:274] - INFO: epoch 006: 3387 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7729.3, nsentences=120, sample_size=4208.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1975.3, ups=0.26, wpb=7729.3, bsz=120, num_updates=33540, lr=1.41984e-05, gnorm=0.933, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=137434 2023-05-02 16:44:21 - progress_bar.py[line:274] - INFO: epoch 006: 3397 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7469.4, nsentences=120, sample_size=3939.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1881.2, ups=0.25, wpb=7469.4, bsz=120, num_updates=33550, lr=1.41932e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=137473 2023-05-02 16:45:01 - progress_bar.py[line:274] - INFO: epoch 006: 3407 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7676.7, nsentences=120, sample_size=4132.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1923.3, ups=0.25, wpb=7676.7, bsz=120, num_updates=33560, lr=1.41879e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=137513 2023-05-02 16:45:40 - progress_bar.py[line:274] - INFO: epoch 006: 3417 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7713, nsentences=120, sample_size=4466.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1971.2, ups=0.26, wpb=7713, bsz=120, num_updates=33570, lr=1.41826e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=39, gb_free=27.9, wall=137552 2023-05-02 16:46:20 - progress_bar.py[line:274] - INFO: epoch 006: 3427 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7605.5, nsentences=120, sample_size=4038.7, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1913.7, ups=0.25, wpb=7605.5, bsz=120, num_updates=33580, lr=1.41773e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=137592 2023-05-02 16:47:00 - progress_bar.py[line:274] - INFO: epoch 006: 3437 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7730.9, nsentences=120, sample_size=4123.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926, ups=0.25, wpb=7730.9, bsz=120, num_updates=33590, lr=1.4172e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=137632 2023-05-02 16:47:39 - progress_bar.py[line:274] - INFO: epoch 006: 3447 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7913.8, nsentences=120, sample_size=4053.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1993.4, ups=0.25, wpb=7913.8, bsz=120, num_updates=33600, lr=1.41667e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=137672 2023-05-02 16:48:20 - progress_bar.py[line:274] - INFO: epoch 006: 3457 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7487.5, nsentences=120, sample_size=3959.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1856.7, ups=0.25, wpb=7487.5, bsz=120, num_updates=33610, lr=1.41615e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=137712 2023-05-02 16:49:00 - progress_bar.py[line:274] - INFO: epoch 006: 3467 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7787, nsentences=120, sample_size=3814, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1960.7, ups=0.25, wpb=7787, bsz=120, num_updates=33620, lr=1.41562e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=137752 2023-05-02 16:49:39 - progress_bar.py[line:274] - INFO: epoch 006: 3477 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7887.5, nsentences=120, sample_size=4026.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1976, ups=0.25, wpb=7887.5, bsz=120, num_updates=33630, lr=1.41509e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=137792 2023-05-02 16:50:19 - progress_bar.py[line:274] - INFO: epoch 006: 3487 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7555.4, nsentences=120, sample_size=3965.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1908.6, ups=0.25, wpb=7555.4, bsz=120, num_updates=33640, lr=1.41456e-05, gnorm=0.994, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=137832 2023-05-02 16:50:58 - progress_bar.py[line:274] - INFO: epoch 006: 3497 / 6042 loss=2.445, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7942.9, nsentences=120, sample_size=4130.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2022.8, ups=0.25, wpb=7942.9, bsz=120, num_updates=33650, lr=1.41403e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=137871 2023-05-02 16:51:38 - progress_bar.py[line:274] - INFO: epoch 006: 3507 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7840.5, nsentences=120, sample_size=3951.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1973.5, ups=0.25, wpb=7840.5, bsz=120, num_updates=33660, lr=1.4135e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=137911 2023-05-02 16:52:18 - progress_bar.py[line:274] - INFO: epoch 006: 3517 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7771.7, nsentences=120, sample_size=3973.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1968.6, ups=0.25, wpb=7771.7, bsz=120, num_updates=33670, lr=1.41298e-05, gnorm=0.98, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=137950 2023-05-02 16:52:58 - progress_bar.py[line:274] - INFO: epoch 006: 3527 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7568.3, nsentences=120, sample_size=4097.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1870, ups=0.25, wpb=7568.3, bsz=120, num_updates=33680, lr=1.41245e-05, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=137990 2023-05-02 16:53:38 - progress_bar.py[line:274] - INFO: epoch 006: 3537 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7587.1, nsentences=120, sample_size=3710.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1905.2, ups=0.25, wpb=7587.1, bsz=120, num_updates=33690, lr=1.41192e-05, gnorm=1.012, clip=60, loss_scale=64, train_wall=40, gb_free=30.9, wall=138030 2023-05-02 16:54:17 - progress_bar.py[line:274] - INFO: epoch 006: 3547 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7936.8, nsentences=120, sample_size=3864.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2038.8, ups=0.26, wpb=7936.8, bsz=120, num_updates=33700, lr=1.41139e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=138069 2023-05-02 16:54:56 - progress_bar.py[line:274] - INFO: epoch 006: 3557 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7731.7, nsentences=120, sample_size=3937.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1972.2, ups=0.26, wpb=7731.7, bsz=120, num_updates=33710, lr=1.41086e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=138108 2023-05-02 16:55:36 - progress_bar.py[line:274] - INFO: epoch 006: 3567 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7722.4, nsentences=120, sample_size=3993.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1910.5, ups=0.25, wpb=7722.4, bsz=120, num_updates=33720, lr=1.41034e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=138149 2023-05-02 16:56:17 - progress_bar.py[line:274] - INFO: epoch 006: 3577 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7938.7, nsentences=120, sample_size=3916.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1962, ups=0.25, wpb=7938.7, bsz=120, num_updates=33730, lr=1.40981e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=138189 2023-05-02 16:56:57 - progress_bar.py[line:274] - INFO: epoch 006: 3587 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7834.4, nsentences=120, sample_size=3827.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1928.3, ups=0.25, wpb=7834.4, bsz=120, num_updates=33740, lr=1.40928e-05, gnorm=0.977, clip=40, loss_scale=64, train_wall=41, gb_free=29.9, wall=138230 2023-05-02 16:57:37 - progress_bar.py[line:274] - INFO: epoch 006: 3597 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=8149, nsentences=120, sample_size=4106.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2041.6, ups=0.25, wpb=8149, bsz=120, num_updates=33750, lr=1.40875e-05, gnorm=0.942, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=138270 2023-05-02 16:58:17 - progress_bar.py[line:274] - INFO: epoch 006: 3607 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7810.2, nsentences=120, sample_size=3760.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1986.1, ups=0.25, wpb=7810.2, bsz=120, num_updates=33760, lr=1.40822e-05, gnorm=0.993, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=138309 2023-05-02 16:58:57 - progress_bar.py[line:274] - INFO: epoch 006: 3617 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7868.7, nsentences=120, sample_size=3886.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1964.1, ups=0.25, wpb=7868.7, bsz=120, num_updates=33770, lr=1.40769e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=138349 2023-05-02 16:59:36 - progress_bar.py[line:274] - INFO: epoch 006: 3627 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7733.6, nsentences=120, sample_size=3950.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1956.6, ups=0.25, wpb=7733.6, bsz=120, num_updates=33780, lr=1.40717e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=138389 2023-05-02 17:00:16 - progress_bar.py[line:274] - INFO: epoch 006: 3637 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7847.8, nsentences=120, sample_size=3900.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1995.5, ups=0.25, wpb=7847.8, bsz=120, num_updates=33790, lr=1.40664e-05, gnorm=0.971, clip=40, loss_scale=128, train_wall=39, gb_free=29.8, wall=138428 2023-05-02 17:00:56 - progress_bar.py[line:274] - INFO: epoch 006: 3647 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7501.9, nsentences=120, sample_size=4089.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1872.9, ups=0.25, wpb=7501.9, bsz=120, num_updates=33800, lr=1.40611e-05, gnorm=0.979, clip=40, loss_scale=128, train_wall=40, gb_free=29.6, wall=138468 2023-05-02 17:01:36 - progress_bar.py[line:274] - INFO: epoch 006: 3657 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7530.7, nsentences=120, sample_size=4007.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1890.7, ups=0.25, wpb=7530.7, bsz=120, num_updates=33810, lr=1.40558e-05, gnorm=0.973, clip=40, loss_scale=128, train_wall=40, gb_free=31.1, wall=138508 2023-05-02 17:02:15 - progress_bar.py[line:274] - INFO: epoch 006: 3667 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7853, nsentences=120, sample_size=3928.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1980.6, ups=0.25, wpb=7853, bsz=120, num_updates=33820, lr=1.40505e-05, gnorm=1.003, clip=50, loss_scale=128, train_wall=40, gb_free=29.1, wall=138548 2023-05-02 17:02:55 - progress_bar.py[line:274] - INFO: epoch 006: 3677 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7485.6, nsentences=120, sample_size=3969.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1889.2, ups=0.25, wpb=7485.6, bsz=120, num_updates=33830, lr=1.40453e-05, gnorm=1, clip=50, loss_scale=128, train_wall=40, gb_free=29.3, wall=138587 2023-05-02 17:03:35 - progress_bar.py[line:274] - INFO: epoch 006: 3687 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7845.7, nsentences=120, sample_size=4126.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1971.4, ups=0.25, wpb=7845.7, bsz=120, num_updates=33840, lr=1.404e-05, gnorm=0.952, clip=40, loss_scale=128, train_wall=40, gb_free=30.6, wall=138627 2023-05-02 17:03:50 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 17:04:17 - progress_bar.py[line:274] - INFO: epoch 006: 3698 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7112.5, nsentences=120, sample_size=3915.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1661.4, ups=0.23, wpb=7112.5, bsz=120, num_updates=33850, lr=1.40347e-05, gnorm=0.998, clip=40, loss_scale=64, train_wall=43, gb_free=31.2, wall=138670 2023-05-02 17:04:58 - progress_bar.py[line:274] - INFO: epoch 006: 3708 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7870.8, nsentences=120, sample_size=3956.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1962.2, ups=0.25, wpb=7870.8, bsz=120, num_updates=33860, lr=1.40294e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=138710 2023-05-02 17:05:38 - progress_bar.py[line:274] - INFO: epoch 006: 3718 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7731.5, nsentences=120, sample_size=3793, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1907.4, ups=0.25, wpb=7731.5, bsz=120, num_updates=33870, lr=1.40241e-05, gnorm=1.039, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=138751 2023-05-02 17:06:17 - progress_bar.py[line:274] - INFO: epoch 006: 3728 / 6042 loss=2.43, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7793, nsentences=120, sample_size=3892, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1987.2, ups=0.26, wpb=7793, bsz=120, num_updates=33880, lr=1.40188e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=27.5, wall=138790 2023-05-02 17:06:56 - progress_bar.py[line:274] - INFO: epoch 006: 3738 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7600.9, nsentences=120, sample_size=4224.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1942.3, ups=0.26, wpb=7600.9, bsz=120, num_updates=33890, lr=1.40136e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=28.9, wall=138829 2023-05-02 17:07:37 - progress_bar.py[line:274] - INFO: epoch 006: 3748 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7599.6, nsentences=120, sample_size=4125.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1891.3, ups=0.25, wpb=7599.6, bsz=120, num_updates=33900, lr=1.40083e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=138869 2023-05-02 17:08:16 - progress_bar.py[line:274] - INFO: epoch 006: 3758 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7352.6, nsentences=120, sample_size=4168.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1849.4, ups=0.25, wpb=7352.6, bsz=120, num_updates=33910, lr=1.4003e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=138909 2023-05-02 17:08:56 - progress_bar.py[line:274] - INFO: epoch 006: 3768 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7891.6, nsentences=120, sample_size=3778.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1983.1, ups=0.25, wpb=7891.6, bsz=120, num_updates=33920, lr=1.39977e-05, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=138949 2023-05-02 17:09:37 - progress_bar.py[line:274] - INFO: epoch 006: 3778 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7763.7, nsentences=120, sample_size=4262.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1920.9, ups=0.25, wpb=7763.7, bsz=120, num_updates=33930, lr=1.39924e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=138989 2023-05-02 17:10:17 - progress_bar.py[line:274] - INFO: epoch 006: 3788 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7711.1, nsentences=120, sample_size=3957.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1916.9, ups=0.25, wpb=7711.1, bsz=120, num_updates=33940, lr=1.39871e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=139029 2023-05-02 17:10:57 - progress_bar.py[line:274] - INFO: epoch 006: 3798 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7581.9, nsentences=120, sample_size=4211.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1909.1, ups=0.25, wpb=7581.9, bsz=120, num_updates=33950, lr=1.39819e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=139069 2023-05-02 17:11:36 - progress_bar.py[line:274] - INFO: epoch 006: 3808 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7721.1, nsentences=120, sample_size=3999.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1937.9, ups=0.25, wpb=7721.1, bsz=120, num_updates=33960, lr=1.39766e-05, gnorm=0.975, clip=50, loss_scale=64, train_wall=40, gb_free=28.6, wall=139109 2023-05-02 17:12:16 - progress_bar.py[line:274] - INFO: epoch 006: 3818 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7618, nsentences=120, sample_size=3989.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.4, ups=0.25, wpb=7618, bsz=120, num_updates=33970, lr=1.39713e-05, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=139149 2023-05-02 17:12:56 - progress_bar.py[line:274] - INFO: epoch 006: 3828 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7844.2, nsentences=120, sample_size=4014.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1950.2, ups=0.25, wpb=7844.2, bsz=120, num_updates=33980, lr=1.3966e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=139189 2023-05-02 17:13:36 - progress_bar.py[line:274] - INFO: epoch 006: 3838 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7513.2, nsentences=120, sample_size=3799.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1902.3, ups=0.25, wpb=7513.2, bsz=120, num_updates=33990, lr=1.39607e-05, gnorm=1.017, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=139228 2023-05-02 17:14:17 - progress_bar.py[line:274] - INFO: epoch 006: 3848 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7709, nsentences=120, sample_size=4439, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1892.2, ups=0.25, wpb=7709, bsz=120, num_updates=34000, lr=1.39555e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=41, gb_free=29.5, wall=139269 2023-05-02 17:14:17 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 17:14:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 17:14:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:14:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 17:14:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:14:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:48 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 17:14:48 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:14:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:14:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:14:59 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 17:14:59 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:15:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:03 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 17:15:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:15:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:08 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 17:15:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 17:15:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 17:15:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 17:15:08 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.21 | loss_v1 0 | loss_v2 0 | nll_loss 2.042 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.12 | score 0.7539 | wps 3303.3 | wpb 3202.1 | bsz 39.4 | num_updates 34000 | best_score 0.7627 2023-05-02 17:15:08 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 34000 updates 2023-05-02 17:15:08 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_34000.pt 2023-05-02 17:15:33 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_34000.pt 2023-05-02 17:15:49 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_34000.pt (epoch 6 @ 34000 updates, score 0.7539) (writing took 40.397892168024555 seconds) 2023-05-02 17:16:28 - progress_bar.py[line:274] - INFO: epoch 006: 3858 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7719.2, nsentences=120, sample_size=4032.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=585.3, ups=0.08, wpb=7719.2, bsz=120, num_updates=34010, lr=1.39502e-05, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=139401 2023-05-02 17:17:08 - progress_bar.py[line:274] - INFO: epoch 006: 3868 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7633.6, nsentences=120, sample_size=4131.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1931.4, ups=0.25, wpb=7633.6, bsz=120, num_updates=34020, lr=1.39449e-05, gnorm=0.955, clip=40, loss_scale=64, train_wall=39, gb_free=29.1, wall=139440 2023-05-02 17:17:47 - progress_bar.py[line:274] - INFO: epoch 006: 3878 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7848.3, nsentences=120, sample_size=3699.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1995.4, ups=0.25, wpb=7848.3, bsz=120, num_updates=34030, lr=1.39396e-05, gnorm=1.003, clip=50, loss_scale=64, train_wall=39, gb_free=26.2, wall=139480 2023-05-02 17:18:27 - progress_bar.py[line:274] - INFO: epoch 006: 3888 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7566.3, nsentences=120, sample_size=4006.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1891, ups=0.25, wpb=7566.3, bsz=120, num_updates=34040, lr=1.39343e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=139520 2023-05-02 17:19:07 - progress_bar.py[line:274] - INFO: epoch 006: 3898 / 6042 loss=2.446, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7983.7, nsentences=120, sample_size=3963.2, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1997.3, ups=0.25, wpb=7983.7, bsz=120, num_updates=34050, lr=1.3929e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=139560 2023-05-02 17:19:47 - progress_bar.py[line:274] - INFO: epoch 006: 3908 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7867, nsentences=120, sample_size=3859.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1992.9, ups=0.25, wpb=7867, bsz=120, num_updates=34060, lr=1.39238e-05, gnorm=0.995, clip=50, loss_scale=64, train_wall=39, gb_free=28.2, wall=139599 2023-05-02 17:20:26 - progress_bar.py[line:274] - INFO: epoch 006: 3918 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7565.9, nsentences=120, sample_size=4066.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1930.1, ups=0.26, wpb=7565.9, bsz=120, num_updates=34070, lr=1.39185e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=30.5, wall=139638 2023-05-02 17:21:06 - progress_bar.py[line:274] - INFO: epoch 006: 3928 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7739.6, nsentences=120, sample_size=4102.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1935.6, ups=0.25, wpb=7739.6, bsz=120, num_updates=34080, lr=1.39132e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=139678 2023-05-02 17:21:46 - progress_bar.py[line:274] - INFO: epoch 006: 3938 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7796.4, nsentences=120, sample_size=4311.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1953.7, ups=0.25, wpb=7796.4, bsz=120, num_updates=34090, lr=1.39079e-05, gnorm=0.921, clip=10, loss_scale=64, train_wall=40, gb_free=27.9, wall=139718 2023-05-02 17:22:25 - progress_bar.py[line:274] - INFO: epoch 006: 3948 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7520.9, nsentences=120, sample_size=3837.5, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1935.8, ups=0.26, wpb=7520.9, bsz=120, num_updates=34100, lr=1.39026e-05, gnorm=1.008, clip=70, loss_scale=64, train_wall=39, gb_free=30.5, wall=139757 2023-05-02 17:23:04 - progress_bar.py[line:274] - INFO: epoch 006: 3958 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7905.2, nsentences=120, sample_size=3998.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2034.7, ups=0.26, wpb=7905.2, bsz=120, num_updates=34110, lr=1.38974e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=139796 2023-05-02 17:23:44 - progress_bar.py[line:274] - INFO: epoch 006: 3968 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7577.1, nsentences=120, sample_size=4086.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1878.3, ups=0.25, wpb=7577.1, bsz=120, num_updates=34120, lr=1.38921e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=139836 2023-05-02 17:24:24 - progress_bar.py[line:274] - INFO: epoch 006: 3978 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.197, ntokens=7744.4, nsentences=120, sample_size=4097.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1924.9, ups=0.25, wpb=7744.4, bsz=120, num_updates=34130, lr=1.38868e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=139877 2023-05-02 17:25:03 - progress_bar.py[line:274] - INFO: epoch 006: 3988 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7544.3, nsentences=120, sample_size=3874.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916.4, ups=0.25, wpb=7544.3, bsz=120, num_updates=34140, lr=1.38815e-05, gnorm=1.003, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=139916 2023-05-02 17:25:44 - progress_bar.py[line:274] - INFO: epoch 006: 3998 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7832.1, nsentences=120, sample_size=4019.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1948.4, ups=0.25, wpb=7832.1, bsz=120, num_updates=34150, lr=1.38762e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=28.7, wall=139956 2023-05-02 17:26:24 - progress_bar.py[line:274] - INFO: epoch 006: 4008 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7867.4, nsentences=120, sample_size=3979.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1968.6, ups=0.25, wpb=7867.4, bsz=120, num_updates=34160, lr=1.38709e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=139996 2023-05-02 17:27:03 - progress_bar.py[line:274] - INFO: epoch 006: 4018 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7467.5, nsentences=120, sample_size=4206.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1885.3, ups=0.25, wpb=7467.5, bsz=120, num_updates=34170, lr=1.38657e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=140036 2023-05-02 17:27:43 - progress_bar.py[line:274] - INFO: epoch 006: 4028 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7479.6, nsentences=120, sample_size=4301.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1870.9, ups=0.25, wpb=7479.6, bsz=120, num_updates=34180, lr=1.38604e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=28.2, wall=140076 2023-05-02 17:28:24 - progress_bar.py[line:274] - INFO: epoch 006: 4038 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.175, ntokens=8159.5, nsentences=120, sample_size=3738.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2012.2, ups=0.25, wpb=8159.5, bsz=120, num_updates=34190, lr=1.38551e-05, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=140116 2023-05-02 17:29:03 - progress_bar.py[line:274] - INFO: epoch 006: 4048 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7587.8, nsentences=120, sample_size=4317.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1913.2, ups=0.25, wpb=7587.8, bsz=120, num_updates=34200, lr=1.38498e-05, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=140156 2023-05-02 17:29:43 - progress_bar.py[line:274] - INFO: epoch 006: 4058 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7767.2, nsentences=120, sample_size=3955.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1970.2, ups=0.25, wpb=7767.2, bsz=120, num_updates=34210, lr=1.38445e-05, gnorm=1, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=140195 2023-05-02 17:30:22 - progress_bar.py[line:274] - INFO: epoch 006: 4068 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7723.6, nsentences=120, sample_size=4210, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1961, ups=0.25, wpb=7723.6, bsz=120, num_updates=34220, lr=1.38392e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=140235 2023-05-02 17:31:03 - progress_bar.py[line:274] - INFO: epoch 006: 4078 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7730.2, nsentences=120, sample_size=3798.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1915.8, ups=0.25, wpb=7730.2, bsz=120, num_updates=34230, lr=1.3834e-05, gnorm=0.973, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=140275 2023-05-02 17:31:42 - progress_bar.py[line:274] - INFO: epoch 006: 4088 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7536.8, nsentences=120, sample_size=3995.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1906.6, ups=0.25, wpb=7536.8, bsz=120, num_updates=34240, lr=1.38287e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=140315 2023-05-02 17:32:22 - progress_bar.py[line:274] - INFO: epoch 006: 4098 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7773.1, nsentences=120, sample_size=4259, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1956.9, ups=0.25, wpb=7773.1, bsz=120, num_updates=34250, lr=1.38234e-05, gnorm=0.966, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=140354 2023-05-02 17:33:02 - progress_bar.py[line:274] - INFO: epoch 006: 4108 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7795.3, nsentences=120, sample_size=4211.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1935.9, ups=0.25, wpb=7795.3, bsz=120, num_updates=34260, lr=1.38181e-05, gnorm=0.949, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=140395 2023-05-02 17:33:42 - progress_bar.py[line:274] - INFO: epoch 006: 4118 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7571.3, nsentences=120, sample_size=4114.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.8, ups=0.25, wpb=7571.3, bsz=120, num_updates=34270, lr=1.38128e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=39, gb_free=28.6, wall=140434 2023-05-02 17:34:21 - progress_bar.py[line:274] - INFO: epoch 006: 4128 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7973.6, nsentences=120, sample_size=4050.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1995.8, ups=0.25, wpb=7973.6, bsz=120, num_updates=34280, lr=1.38076e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=140474 2023-05-02 17:35:01 - progress_bar.py[line:274] - INFO: epoch 006: 4138 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7533.9, nsentences=120, sample_size=4296.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1897.8, ups=0.25, wpb=7533.9, bsz=120, num_updates=34290, lr=1.38023e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=27.8, wall=140514 2023-05-02 17:35:41 - progress_bar.py[line:274] - INFO: epoch 006: 4148 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=8086.4, nsentences=120, sample_size=4164.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2035.4, ups=0.25, wpb=8086.4, bsz=120, num_updates=34300, lr=1.3797e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=140553 2023-05-02 17:36:20 - progress_bar.py[line:274] - INFO: epoch 006: 4158 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7850.9, nsentences=120, sample_size=3803.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1992, ups=0.25, wpb=7850.9, bsz=120, num_updates=34310, lr=1.37917e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=140593 2023-05-02 17:37:00 - progress_bar.py[line:274] - INFO: epoch 006: 4168 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7798.9, nsentences=120, sample_size=4120.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1961.3, ups=0.25, wpb=7798.9, bsz=120, num_updates=34320, lr=1.37864e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=28.2, wall=140633 2023-05-02 17:37:40 - progress_bar.py[line:274] - INFO: epoch 006: 4178 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7595.7, nsentences=120, sample_size=4165.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1906.7, ups=0.25, wpb=7595.7, bsz=120, num_updates=34330, lr=1.37811e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=27.7, wall=140672 2023-05-02 17:38:19 - progress_bar.py[line:274] - INFO: epoch 006: 4188 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7902.2, nsentences=120, sample_size=3889.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2003.8, ups=0.25, wpb=7902.2, bsz=120, num_updates=34340, lr=1.37759e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=140712 2023-05-02 17:39:00 - progress_bar.py[line:274] - INFO: epoch 006: 4198 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7742.2, nsentences=120, sample_size=4162.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1924.4, ups=0.25, wpb=7742.2, bsz=120, num_updates=34350, lr=1.37706e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=140752 2023-05-02 17:39:40 - progress_bar.py[line:274] - INFO: epoch 006: 4208 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7798.6, nsentences=120, sample_size=3730.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1932.7, ups=0.25, wpb=7798.6, bsz=120, num_updates=34360, lr=1.37653e-05, gnorm=1.019, clip=80, loss_scale=128, train_wall=40, gb_free=28.6, wall=140792 2023-05-02 17:40:21 - progress_bar.py[line:274] - INFO: epoch 006: 4218 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8092, nsentences=120, sample_size=3909.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1992.9, ups=0.25, wpb=8092, bsz=120, num_updates=34370, lr=1.376e-05, gnorm=0.969, clip=20, loss_scale=128, train_wall=41, gb_free=29.8, wall=140833 2023-05-02 17:41:00 - progress_bar.py[line:274] - INFO: epoch 006: 4228 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7892.7, nsentences=120, sample_size=4133, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1991.6, ups=0.25, wpb=7892.7, bsz=120, num_updates=34380, lr=1.37547e-05, gnorm=0.955, clip=20, loss_scale=128, train_wall=40, gb_free=28.1, wall=140873 2023-05-02 17:41:40 - progress_bar.py[line:274] - INFO: epoch 006: 4238 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=7617.9, nsentences=120, sample_size=3901.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1926.9, ups=0.25, wpb=7617.9, bsz=120, num_updates=34390, lr=1.37494e-05, gnorm=0.962, clip=30, loss_scale=128, train_wall=39, gb_free=29.3, wall=140912 2023-05-02 17:41:44 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 17:42:23 - progress_bar.py[line:274] - INFO: epoch 006: 4249 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7795.3, nsentences=120, sample_size=4123.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1794.7, ups=0.23, wpb=7795.3, bsz=120, num_updates=34400, lr=1.37442e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=43, gb_free=30.6, wall=140956 2023-05-02 17:43:02 - progress_bar.py[line:274] - INFO: epoch 006: 4259 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7578.6, nsentences=120, sample_size=3701.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1929.8, ups=0.25, wpb=7578.6, bsz=120, num_updates=34410, lr=1.37389e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=140995 2023-05-02 17:43:42 - progress_bar.py[line:274] - INFO: epoch 006: 4269 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7621.8, nsentences=120, sample_size=3901.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1948.5, ups=0.26, wpb=7621.8, bsz=120, num_updates=34420, lr=1.37336e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=141034 2023-05-02 17:44:21 - progress_bar.py[line:274] - INFO: epoch 006: 4279 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7788.1, nsentences=120, sample_size=4040, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1963.1, ups=0.25, wpb=7788.1, bsz=120, num_updates=34430, lr=1.37283e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=141074 2023-05-02 17:45:01 - progress_bar.py[line:274] - INFO: epoch 006: 4289 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7785.5, nsentences=120, sample_size=4197.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1943.3, ups=0.25, wpb=7785.5, bsz=120, num_updates=34440, lr=1.3723e-05, gnorm=0.958, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=141114 2023-05-02 17:45:42 - progress_bar.py[line:274] - INFO: epoch 006: 4299 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7828.6, nsentences=120, sample_size=4389.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1940.3, ups=0.25, wpb=7828.6, bsz=120, num_updates=34450, lr=1.37178e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=141154 2023-05-02 17:46:20 - progress_bar.py[line:274] - INFO: epoch 006: 4309 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7231.1, nsentences=120, sample_size=4022.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1863, ups=0.26, wpb=7231.1, bsz=120, num_updates=34460, lr=1.37125e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=141193 2023-05-02 17:47:01 - progress_bar.py[line:274] - INFO: epoch 006: 4319 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7907, nsentences=120, sample_size=3881.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1974, ups=0.25, wpb=7907, bsz=120, num_updates=34470, lr=1.37072e-05, gnorm=0.952, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=141233 2023-05-02 17:47:40 - progress_bar.py[line:274] - INFO: epoch 006: 4329 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7667.9, nsentences=120, sample_size=4046.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.3, ups=0.25, wpb=7667.9, bsz=120, num_updates=34480, lr=1.37019e-05, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=141273 2023-05-02 17:48:20 - progress_bar.py[line:274] - INFO: epoch 006: 4339 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7645.2, nsentences=120, sample_size=3979.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1913.7, ups=0.25, wpb=7645.2, bsz=120, num_updates=34490, lr=1.36966e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=141313 2023-05-02 17:49:00 - progress_bar.py[line:274] - INFO: epoch 006: 4349 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7599.6, nsentences=120, sample_size=3928.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1926.4, ups=0.25, wpb=7599.6, bsz=120, num_updates=34500, lr=1.36913e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=31.2, wall=141352 2023-05-02 17:49:39 - progress_bar.py[line:274] - INFO: epoch 006: 4359 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7612.8, nsentences=120, sample_size=4260.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1939.1, ups=0.25, wpb=7612.8, bsz=120, num_updates=34510, lr=1.36861e-05, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=141392 2023-05-02 17:50:20 - progress_bar.py[line:274] - INFO: epoch 006: 4369 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7929.2, nsentences=120, sample_size=4032.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1952.2, ups=0.25, wpb=7929.2, bsz=120, num_updates=34520, lr=1.36808e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=41, gb_free=29.9, wall=141432 2023-05-02 17:50:59 - progress_bar.py[line:274] - INFO: epoch 006: 4379 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7597, nsentences=120, sample_size=3909.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1948.7, ups=0.26, wpb=7597, bsz=120, num_updates=34530, lr=1.36755e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=141471 2023-05-02 17:51:39 - progress_bar.py[line:274] - INFO: epoch 006: 4389 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7715.5, nsentences=120, sample_size=3959.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1939.8, ups=0.25, wpb=7715.5, bsz=120, num_updates=34540, lr=1.36702e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=141511 2023-05-02 17:52:18 - progress_bar.py[line:274] - INFO: epoch 006: 4399 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8011, nsentences=120, sample_size=4338.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2010.9, ups=0.25, wpb=8011, bsz=120, num_updates=34550, lr=1.36649e-05, gnorm=0.93, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=141551 2023-05-02 17:52:59 - progress_bar.py[line:274] - INFO: epoch 006: 4409 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7908.4, nsentences=120, sample_size=4338.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1963.3, ups=0.25, wpb=7908.4, bsz=120, num_updates=34560, lr=1.36597e-05, gnorm=0.934, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=141591 2023-05-02 17:53:38 - progress_bar.py[line:274] - INFO: epoch 006: 4419 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7870.2, nsentences=120, sample_size=4190.6, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=2005.6, ups=0.25, wpb=7870.2, bsz=120, num_updates=34570, lr=1.36544e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=141630 2023-05-02 17:54:18 - progress_bar.py[line:274] - INFO: epoch 006: 4429 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7866.9, nsentences=120, sample_size=4019, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1952.8, ups=0.25, wpb=7866.9, bsz=120, num_updates=34580, lr=1.36491e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=141671 2023-05-02 17:54:59 - progress_bar.py[line:274] - INFO: epoch 006: 4439 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7843.9, nsentences=120, sample_size=4361.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1934.7, ups=0.25, wpb=7843.9, bsz=120, num_updates=34590, lr=1.36438e-05, gnorm=0.918, clip=10, loss_scale=64, train_wall=40, gb_free=26, wall=141711 2023-05-02 17:55:39 - progress_bar.py[line:274] - INFO: epoch 006: 4449 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7626.8, nsentences=120, sample_size=4014.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1911.8, ups=0.25, wpb=7626.8, bsz=120, num_updates=34600, lr=1.36385e-05, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=141751 2023-05-02 17:56:18 - progress_bar.py[line:274] - INFO: epoch 006: 4459 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7627.2, nsentences=120, sample_size=3734.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1957.3, ups=0.26, wpb=7627.2, bsz=120, num_updates=34610, lr=1.36332e-05, gnorm=1.01, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=141790 2023-05-02 17:56:57 - progress_bar.py[line:274] - INFO: epoch 006: 4469 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7947.3, nsentences=120, sample_size=4070.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2009.1, ups=0.25, wpb=7947.3, bsz=120, num_updates=34620, lr=1.3628e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=141830 2023-05-02 17:57:36 - progress_bar.py[line:274] - INFO: epoch 006: 4479 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7931.4, nsentences=120, sample_size=3707.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2022.5, ups=0.26, wpb=7931.4, bsz=120, num_updates=34630, lr=1.36227e-05, gnorm=0.985, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=141869 2023-05-02 17:58:16 - progress_bar.py[line:274] - INFO: epoch 006: 4489 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=8016.6, nsentences=120, sample_size=3872.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2040.2, ups=0.25, wpb=8016.6, bsz=120, num_updates=34640, lr=1.36174e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=39, gb_free=29, wall=141908 2023-05-02 17:58:56 - progress_bar.py[line:274] - INFO: epoch 006: 4499 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7657.9, nsentences=120, sample_size=3860, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1905.4, ups=0.25, wpb=7657.9, bsz=120, num_updates=34650, lr=1.36121e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=141948 2023-05-02 17:59:36 - progress_bar.py[line:274] - INFO: epoch 006: 4509 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7619.7, nsentences=120, sample_size=4054.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1878.8, ups=0.25, wpb=7619.7, bsz=120, num_updates=34660, lr=1.36068e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=141989 2023-05-02 18:00:17 - progress_bar.py[line:274] - INFO: epoch 006: 4519 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7826.8, nsentences=120, sample_size=3991, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1949.7, ups=0.25, wpb=7826.8, bsz=120, num_updates=34670, lr=1.36015e-05, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=142029 2023-05-02 18:00:56 - progress_bar.py[line:274] - INFO: epoch 006: 4529 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7676.5, nsentences=120, sample_size=4162.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1926.9, ups=0.25, wpb=7676.5, bsz=120, num_updates=34680, lr=1.35963e-05, gnorm=0.947, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=142069 2023-05-02 18:01:36 - progress_bar.py[line:274] - INFO: epoch 006: 4539 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7626, nsentences=120, sample_size=4386.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1940.1, ups=0.25, wpb=7626, bsz=120, num_updates=34690, lr=1.3591e-05, gnorm=0.905, clip=10, loss_scale=64, train_wall=39, gb_free=28.1, wall=142108 2023-05-02 18:02:15 - progress_bar.py[line:274] - INFO: epoch 006: 4549 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7820.9, nsentences=120, sample_size=4228.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2014.6, ups=0.26, wpb=7820.9, bsz=120, num_updates=34700, lr=1.35857e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=29.5, wall=142147 2023-05-02 18:02:55 - progress_bar.py[line:274] - INFO: epoch 006: 4559 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7749.6, nsentences=120, sample_size=3972.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1912.9, ups=0.25, wpb=7749.6, bsz=120, num_updates=34710, lr=1.35804e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=27.2, wall=142187 2023-05-02 18:03:35 - progress_bar.py[line:274] - INFO: epoch 006: 4569 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7753.1, nsentences=120, sample_size=3934.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1919, ups=0.25, wpb=7753.1, bsz=120, num_updates=34720, lr=1.35751e-05, gnorm=0.989, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=142228 2023-05-02 18:04:16 - progress_bar.py[line:274] - INFO: epoch 006: 4579 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=8063.6, nsentences=120, sample_size=3941.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1997.7, ups=0.25, wpb=8063.6, bsz=120, num_updates=34730, lr=1.35699e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=142268 2023-05-02 18:04:55 - progress_bar.py[line:274] - INFO: epoch 006: 4589 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7714.8, nsentences=120, sample_size=4235.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1948.3, ups=0.25, wpb=7714.8, bsz=120, num_updates=34740, lr=1.35646e-05, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=142308 2023-05-02 18:05:35 - progress_bar.py[line:274] - INFO: epoch 006: 4599 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=8030.1, nsentences=120, sample_size=4075.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2034, ups=0.25, wpb=8030.1, bsz=120, num_updates=34750, lr=1.35593e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=142347 2023-05-02 18:06:15 - progress_bar.py[line:274] - INFO: epoch 006: 4609 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7986.9, nsentences=120, sample_size=4074.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2014.9, ups=0.25, wpb=7986.9, bsz=120, num_updates=34760, lr=1.3554e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=142387 2023-05-02 18:06:54 - progress_bar.py[line:274] - INFO: epoch 006: 4619 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7826.3, nsentences=120, sample_size=4026.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1978.2, ups=0.25, wpb=7826.3, bsz=120, num_updates=34770, lr=1.35487e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=39, gb_free=27.2, wall=142427 2023-05-02 18:07:34 - progress_bar.py[line:274] - INFO: epoch 006: 4629 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7662.8, nsentences=120, sample_size=3804.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1920.2, ups=0.25, wpb=7662.8, bsz=120, num_updates=34780, lr=1.35434e-05, gnorm=0.979, clip=50, loss_scale=64, train_wall=40, gb_free=29.1, wall=142466 2023-05-02 18:08:14 - progress_bar.py[line:274] - INFO: epoch 006: 4639 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7965.9, nsentences=120, sample_size=4103.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1993.7, ups=0.25, wpb=7965.9, bsz=120, num_updates=34790, lr=1.35382e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=142506 2023-05-02 18:08:54 - progress_bar.py[line:274] - INFO: epoch 006: 4649 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7881.4, nsentences=120, sample_size=4021.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1986, ups=0.25, wpb=7881.4, bsz=120, num_updates=34800, lr=1.35329e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=142546 2023-05-02 18:09:34 - progress_bar.py[line:274] - INFO: epoch 006: 4659 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7685.8, nsentences=120, sample_size=3930.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1922.5, ups=0.25, wpb=7685.8, bsz=120, num_updates=34810, lr=1.35276e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=142586 2023-05-02 18:10:14 - progress_bar.py[line:274] - INFO: epoch 006: 4669 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7572.8, nsentences=120, sample_size=3725.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1897.4, ups=0.25, wpb=7572.8, bsz=120, num_updates=34820, lr=1.35223e-05, gnorm=1.027, clip=70, loss_scale=64, train_wall=40, gb_free=30, wall=142626 2023-05-02 18:10:54 - progress_bar.py[line:274] - INFO: epoch 006: 4679 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7842, nsentences=120, sample_size=4061.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1957.2, ups=0.25, wpb=7842, bsz=120, num_updates=34830, lr=1.3517e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=142666 2023-05-02 18:11:34 - progress_bar.py[line:274] - INFO: epoch 006: 4689 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7750.4, nsentences=120, sample_size=4320.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1926.4, ups=0.25, wpb=7750.4, bsz=120, num_updates=34840, lr=1.35118e-05, gnorm=0.91, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=142706 2023-05-02 18:12:14 - progress_bar.py[line:274] - INFO: epoch 006: 4699 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7567.2, nsentences=120, sample_size=3941, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1892.7, ups=0.25, wpb=7567.2, bsz=120, num_updates=34850, lr=1.35065e-05, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=142746 2023-05-02 18:12:54 - progress_bar.py[line:274] - INFO: epoch 006: 4709 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7941.2, nsentences=120, sample_size=3640.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1979, ups=0.25, wpb=7941.2, bsz=120, num_updates=34860, lr=1.35012e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=142786 2023-05-02 18:13:33 - progress_bar.py[line:274] - INFO: epoch 006: 4719 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7645.8, nsentences=120, sample_size=3864.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1945.3, ups=0.25, wpb=7645.8, bsz=120, num_updates=34870, lr=1.34959e-05, gnorm=1.018, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=142826 2023-05-02 18:14:13 - progress_bar.py[line:274] - INFO: epoch 006: 4729 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7846.8, nsentences=120, sample_size=4068, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1988.2, ups=0.25, wpb=7846.8, bsz=120, num_updates=34880, lr=1.34906e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=142865 2023-05-02 18:14:52 - progress_bar.py[line:274] - INFO: epoch 006: 4739 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7789.8, nsentences=120, sample_size=4089.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1962.3, ups=0.25, wpb=7789.8, bsz=120, num_updates=34890, lr=1.34853e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=142905 2023-05-02 18:15:32 - progress_bar.py[line:274] - INFO: epoch 006: 4749 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7578.6, nsentences=120, sample_size=4144.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1901.6, ups=0.25, wpb=7578.6, bsz=120, num_updates=34900, lr=1.34801e-05, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=142945 2023-05-02 18:16:13 - progress_bar.py[line:274] - INFO: epoch 006: 4759 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7918.9, nsentences=120, sample_size=4229.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1965.2, ups=0.25, wpb=7918.9, bsz=120, num_updates=34910, lr=1.34748e-05, gnorm=0.96, clip=30, loss_scale=128, train_wall=40, gb_free=29.8, wall=142985 2023-05-02 18:16:52 - progress_bar.py[line:274] - INFO: epoch 006: 4769 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7500.6, nsentences=120, sample_size=4106.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1890.7, ups=0.25, wpb=7500.6, bsz=120, num_updates=34920, lr=1.34695e-05, gnorm=0.96, clip=30, loss_scale=128, train_wall=40, gb_free=28.8, wall=143025 2023-05-02 18:16:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 18:17:37 - progress_bar.py[line:274] - INFO: epoch 006: 4780 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7861.4, nsentences=120, sample_size=3922.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1765.3, ups=0.22, wpb=7861.4, bsz=120, num_updates=34930, lr=1.34642e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=44, gb_free=29.8, wall=143069 2023-05-02 18:18:16 - progress_bar.py[line:274] - INFO: epoch 006: 4790 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7435.1, nsentences=120, sample_size=4129.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1877.9, ups=0.25, wpb=7435.1, bsz=120, num_updates=34940, lr=1.34589e-05, gnorm=0.932, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=143109 2023-05-02 18:18:56 - progress_bar.py[line:274] - INFO: epoch 006: 4800 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7797.2, nsentences=120, sample_size=4001.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1970, ups=0.25, wpb=7797.2, bsz=120, num_updates=34950, lr=1.34536e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=143148 2023-05-02 18:19:36 - progress_bar.py[line:274] - INFO: epoch 006: 4810 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.18, ntokens=7552.6, nsentences=120, sample_size=4181.2, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1870, ups=0.25, wpb=7552.6, bsz=120, num_updates=34960, lr=1.34484e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=143189 2023-05-02 18:20:16 - progress_bar.py[line:274] - INFO: epoch 006: 4820 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.183, ntokens=7664, nsentences=120, sample_size=3997.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1954.1, ups=0.25, wpb=7664, bsz=120, num_updates=34970, lr=1.34431e-05, gnorm=0.977, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=143228 2023-05-02 18:20:55 - progress_bar.py[line:274] - INFO: epoch 006: 4830 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7771.9, nsentences=120, sample_size=4239.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1967.9, ups=0.25, wpb=7771.9, bsz=120, num_updates=34980, lr=1.34378e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=143268 2023-05-02 18:21:35 - progress_bar.py[line:274] - INFO: epoch 006: 4840 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7883.2, nsentences=120, sample_size=4126.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1950, ups=0.25, wpb=7883.2, bsz=120, num_updates=34990, lr=1.34325e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=143308 2023-05-02 18:22:15 - progress_bar.py[line:274] - INFO: epoch 006: 4850 / 6042 loss=2.453, loss_v1=0, loss_v2=0, nll_loss=1.203, ntokens=8026.8, nsentences=120, sample_size=4038, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2029.1, ups=0.25, wpb=8026.8, bsz=120, num_updates=35000, lr=1.34272e-05, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=143348 2023-05-02 18:22:15 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 18:22:18 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 18:22:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:22:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 18:22:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:22:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:46 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 18:22:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:22:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:22:58 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 18:22:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:22:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:22:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:02 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 18:23:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:23:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 18:23:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 18:23:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 18:23:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 18:23:07 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.213 | loss_v1 0 | loss_v2 0 | nll_loss 2.047 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.13 | score 0.7534 | wps 3309.1 | wpb 3202.1 | bsz 39.4 | num_updates 35000 | best_score 0.7627 2023-05-02 18:23:07 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 35000 updates 2023-05-02 18:23:07 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_35000.pt 2023-05-02 18:23:32 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_35000.pt 2023-05-02 18:23:46 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_35000.pt (epoch 6 @ 35000 updates, score 0.7534) (writing took 39.22962540201843 seconds) 2023-05-02 18:24:25 - progress_bar.py[line:274] - INFO: epoch 006: 4860 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7360.3, nsentences=120, sample_size=3810.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=565.3, ups=0.08, wpb=7360.3, bsz=120, num_updates=35010, lr=1.3422e-05, gnorm=1.002, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=143478 2023-05-02 18:25:05 - progress_bar.py[line:274] - INFO: epoch 006: 4870 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7778.3, nsentences=120, sample_size=4163.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1937.4, ups=0.25, wpb=7778.3, bsz=120, num_updates=35020, lr=1.34167e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=143518 2023-05-02 18:25:45 - progress_bar.py[line:274] - INFO: epoch 006: 4880 / 6042 loss=2.439, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7690, nsentences=120, sample_size=3960.8, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1930.1, ups=0.25, wpb=7690, bsz=120, num_updates=35030, lr=1.34114e-05, gnorm=1.005, clip=70, loss_scale=64, train_wall=40, gb_free=29.7, wall=143558 2023-05-02 18:26:24 - progress_bar.py[line:274] - INFO: epoch 006: 4890 / 6042 loss=2.45, loss_v1=0, loss_v2=0, nll_loss=1.204, ntokens=7816.2, nsentences=120, sample_size=3568.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1997, ups=0.26, wpb=7816.2, bsz=120, num_updates=35040, lr=1.34061e-05, gnorm=1.028, clip=60, loss_scale=64, train_wall=39, gb_free=31.1, wall=143597 2023-05-02 18:27:05 - progress_bar.py[line:274] - INFO: epoch 006: 4900 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7681.4, nsentences=120, sample_size=4018, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1884.7, ups=0.25, wpb=7681.4, bsz=120, num_updates=35050, lr=1.34008e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=41, gb_free=29.4, wall=143638 2023-05-02 18:27:46 - progress_bar.py[line:274] - INFO: epoch 006: 4910 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7651.4, nsentences=120, sample_size=4193.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1890.6, ups=0.25, wpb=7651.4, bsz=120, num_updates=35060, lr=1.33955e-05, gnorm=0.935, clip=0, loss_scale=64, train_wall=40, gb_free=30.6, wall=143678 2023-05-02 18:28:25 - progress_bar.py[line:274] - INFO: epoch 006: 4920 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7836.2, nsentences=120, sample_size=4247.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1969.8, ups=0.25, wpb=7836.2, bsz=120, num_updates=35070, lr=1.33903e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=25.8, wall=143718 2023-05-02 18:29:06 - progress_bar.py[line:274] - INFO: epoch 006: 4930 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.189, ntokens=7983.9, nsentences=120, sample_size=4028.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1982.7, ups=0.25, wpb=7983.9, bsz=120, num_updates=35080, lr=1.3385e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=143758 2023-05-02 18:29:46 - progress_bar.py[line:274] - INFO: epoch 006: 4940 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7835.4, nsentences=120, sample_size=4009.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1947.1, ups=0.25, wpb=7835.4, bsz=120, num_updates=35090, lr=1.33797e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=143798 2023-05-02 18:30:25 - progress_bar.py[line:274] - INFO: epoch 006: 4950 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7711.4, nsentences=120, sample_size=4416.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1947.8, ups=0.25, wpb=7711.4, bsz=120, num_updates=35100, lr=1.33744e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=143838 2023-05-02 18:31:05 - progress_bar.py[line:274] - INFO: epoch 006: 4960 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7768.5, nsentences=120, sample_size=3852.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1963.9, ups=0.25, wpb=7768.5, bsz=120, num_updates=35110, lr=1.33691e-05, gnorm=0.996, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=143878 2023-05-02 18:31:45 - progress_bar.py[line:274] - INFO: epoch 006: 4970 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7814.7, nsentences=120, sample_size=3844.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1979.6, ups=0.25, wpb=7814.7, bsz=120, num_updates=35120, lr=1.33639e-05, gnorm=1.007, clip=60, loss_scale=64, train_wall=39, gb_free=29.2, wall=143917 2023-05-02 18:32:25 - progress_bar.py[line:274] - INFO: epoch 006: 4980 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7632.2, nsentences=120, sample_size=4041.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1895.6, ups=0.25, wpb=7632.2, bsz=120, num_updates=35130, lr=1.33586e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=31.6, wall=143957 2023-05-02 18:33:04 - progress_bar.py[line:274] - INFO: epoch 006: 4990 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7764.8, nsentences=120, sample_size=3886.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1961.2, ups=0.25, wpb=7764.8, bsz=120, num_updates=35140, lr=1.33533e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=143997 2023-05-02 18:33:44 - progress_bar.py[line:274] - INFO: epoch 006: 5000 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7689.8, nsentences=120, sample_size=4102.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1927.7, ups=0.25, wpb=7689.8, bsz=120, num_updates=35150, lr=1.3348e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=144037 2023-05-02 18:34:25 - progress_bar.py[line:274] - INFO: epoch 006: 5010 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7654.8, nsentences=120, sample_size=4401.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1876.6, ups=0.25, wpb=7654.8, bsz=120, num_updates=35160, lr=1.33427e-05, gnorm=0.917, clip=20, loss_scale=64, train_wall=41, gb_free=29, wall=144078 2023-05-02 18:35:05 - progress_bar.py[line:274] - INFO: epoch 006: 5020 / 6042 loss=2.444, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=7706.7, nsentences=120, sample_size=4133.1, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1922.4, ups=0.25, wpb=7706.7, bsz=120, num_updates=35170, lr=1.33374e-05, gnorm=0.986, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=144118 2023-05-02 18:35:46 - progress_bar.py[line:274] - INFO: epoch 006: 5030 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7787.3, nsentences=120, sample_size=4014.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1928.4, ups=0.25, wpb=7787.3, bsz=120, num_updates=35180, lr=1.33322e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=144158 2023-05-02 18:36:25 - progress_bar.py[line:274] - INFO: epoch 006: 5040 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=8099.5, nsentences=120, sample_size=4229.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2041.5, ups=0.25, wpb=8099.5, bsz=120, num_updates=35190, lr=1.33269e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=144198 2023-05-02 18:37:05 - progress_bar.py[line:274] - INFO: epoch 006: 5050 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7586.2, nsentences=120, sample_size=3606.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1905.3, ups=0.25, wpb=7586.2, bsz=120, num_updates=35200, lr=1.33216e-05, gnorm=1.013, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=144238 2023-05-02 18:37:46 - progress_bar.py[line:274] - INFO: epoch 006: 5060 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7365.1, nsentences=120, sample_size=3816.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1814.6, ups=0.25, wpb=7365.1, bsz=120, num_updates=35210, lr=1.33163e-05, gnorm=0.998, clip=60, loss_scale=64, train_wall=41, gb_free=29.3, wall=144278 2023-05-02 18:38:24 - progress_bar.py[line:274] - INFO: epoch 006: 5070 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7626.3, nsentences=120, sample_size=4121, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1985.7, ups=0.26, wpb=7626.3, bsz=120, num_updates=35220, lr=1.3311e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=38, gb_free=29.4, wall=144317 2023-05-02 18:39:05 - progress_bar.py[line:274] - INFO: epoch 006: 5080 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7805.2, nsentences=120, sample_size=4034.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1925.7, ups=0.25, wpb=7805.2, bsz=120, num_updates=35230, lr=1.33057e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=144357 2023-05-02 18:39:44 - progress_bar.py[line:274] - INFO: epoch 006: 5090 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7628.2, nsentences=120, sample_size=4037.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1923.6, ups=0.25, wpb=7628.2, bsz=120, num_updates=35240, lr=1.33005e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=144397 2023-05-02 18:40:24 - progress_bar.py[line:274] - INFO: epoch 006: 5100 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7568.8, nsentences=120, sample_size=4050.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1902.5, ups=0.25, wpb=7568.8, bsz=120, num_updates=35250, lr=1.32952e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=144436 2023-05-02 18:41:04 - progress_bar.py[line:274] - INFO: epoch 006: 5110 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.191, ntokens=7692.8, nsentences=120, sample_size=3839.3, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1915.8, ups=0.25, wpb=7692.8, bsz=120, num_updates=35260, lr=1.32899e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=144477 2023-05-02 18:41:43 - progress_bar.py[line:274] - INFO: epoch 006: 5120 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7891.7, nsentences=120, sample_size=4002.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2012.1, ups=0.25, wpb=7891.7, bsz=120, num_updates=35270, lr=1.32846e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=144516 2023-05-02 18:42:23 - progress_bar.py[line:274] - INFO: epoch 006: 5130 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7656, nsentences=120, sample_size=4008.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1920.4, ups=0.25, wpb=7656, bsz=120, num_updates=35280, lr=1.32793e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=144556 2023-05-02 18:43:03 - progress_bar.py[line:274] - INFO: epoch 006: 5140 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7700.6, nsentences=120, sample_size=3998.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1914.5, ups=0.25, wpb=7700.6, bsz=120, num_updates=35290, lr=1.32741e-05, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=144596 2023-05-02 18:43:44 - progress_bar.py[line:274] - INFO: epoch 006: 5150 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7896.6, nsentences=120, sample_size=4148.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1955.4, ups=0.25, wpb=7896.6, bsz=120, num_updates=35300, lr=1.32688e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=144636 2023-05-02 18:44:24 - progress_bar.py[line:274] - INFO: epoch 006: 5160 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7741.4, nsentences=120, sample_size=4198.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1950.9, ups=0.25, wpb=7741.4, bsz=120, num_updates=35310, lr=1.32635e-05, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=144676 2023-05-02 18:45:03 - progress_bar.py[line:274] - INFO: epoch 006: 5170 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7431.8, nsentences=120, sample_size=4258.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1864, ups=0.25, wpb=7431.8, bsz=120, num_updates=35320, lr=1.32582e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=144716 2023-05-02 18:45:44 - progress_bar.py[line:274] - INFO: epoch 006: 5180 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7896.9, nsentences=120, sample_size=3962.9, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1965.1, ups=0.25, wpb=7896.9, bsz=120, num_updates=35330, lr=1.32529e-05, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=31.2, wall=144756 2023-05-02 18:46:24 - progress_bar.py[line:274] - INFO: epoch 006: 5190 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7989.9, nsentences=120, sample_size=4148.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1955, ups=0.24, wpb=7989.9, bsz=120, num_updates=35340, lr=1.32476e-05, gnorm=0.94, clip=30, loss_scale=64, train_wall=41, gb_free=30.1, wall=144797 2023-05-02 18:47:03 - progress_bar.py[line:274] - INFO: epoch 006: 5200 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7826.4, nsentences=120, sample_size=3785, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2026.5, ups=0.26, wpb=7826.4, bsz=120, num_updates=35350, lr=1.32424e-05, gnorm=1.023, clip=60, loss_scale=64, train_wall=39, gb_free=29.6, wall=144836 2023-05-02 18:47:42 - progress_bar.py[line:274] - INFO: epoch 006: 5210 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7758.9, nsentences=120, sample_size=4206.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1972.1, ups=0.25, wpb=7758.9, bsz=120, num_updates=35360, lr=1.32371e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=27.4, wall=144875 2023-05-02 18:48:23 - progress_bar.py[line:274] - INFO: epoch 006: 5220 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7872.2, nsentences=120, sample_size=3969.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1954.5, ups=0.25, wpb=7872.2, bsz=120, num_updates=35370, lr=1.32318e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=144915 2023-05-02 18:49:03 - progress_bar.py[line:274] - INFO: epoch 006: 5230 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7559, nsentences=120, sample_size=4030.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1898, ups=0.25, wpb=7559, bsz=120, num_updates=35380, lr=1.32265e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=144955 2023-05-02 18:49:42 - progress_bar.py[line:274] - INFO: epoch 006: 5240 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7698.2, nsentences=120, sample_size=4233.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1931.1, ups=0.25, wpb=7698.2, bsz=120, num_updates=35390, lr=1.32212e-05, gnorm=0.923, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=144995 2023-05-02 18:50:23 - progress_bar.py[line:274] - INFO: epoch 006: 5250 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=8073.6, nsentences=120, sample_size=3928.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1996.4, ups=0.25, wpb=8073.6, bsz=120, num_updates=35400, lr=1.3216e-05, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=145035 2023-05-02 18:51:03 - progress_bar.py[line:274] - INFO: epoch 006: 5260 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7983.2, nsentences=120, sample_size=3989.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973.6, ups=0.25, wpb=7983.2, bsz=120, num_updates=35410, lr=1.32107e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=145076 2023-05-02 18:51:42 - progress_bar.py[line:274] - INFO: epoch 006: 5270 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7736.4, nsentences=120, sample_size=4242.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1975.6, ups=0.26, wpb=7736.4, bsz=120, num_updates=35420, lr=1.32054e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=145115 2023-05-02 18:52:22 - progress_bar.py[line:274] - INFO: epoch 006: 5280 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7774.8, nsentences=120, sample_size=4002.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1961.2, ups=0.25, wpb=7774.8, bsz=120, num_updates=35430, lr=1.32001e-05, gnorm=0.957, clip=10, loss_scale=64, train_wall=40, gb_free=30, wall=145155 2023-05-02 18:53:01 - progress_bar.py[line:274] - INFO: epoch 006: 5290 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7550.4, nsentences=120, sample_size=3874.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1939, ups=0.26, wpb=7550.4, bsz=120, num_updates=35440, lr=1.31948e-05, gnorm=1.001, clip=40, loss_scale=128, train_wall=39, gb_free=29.2, wall=145194 2023-05-02 18:53:21 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 18:53:45 - progress_bar.py[line:274] - INFO: epoch 006: 5301 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7761.8, nsentences=120, sample_size=3732.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1776.3, ups=0.23, wpb=7761.8, bsz=120, num_updates=35450, lr=1.31895e-05, gnorm=1.004, clip=50, loss_scale=64, train_wall=44, gb_free=30, wall=145237 2023-05-02 18:54:24 - progress_bar.py[line:274] - INFO: epoch 006: 5311 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7422.8, nsentences=120, sample_size=4428.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1879.9, ups=0.25, wpb=7422.8, bsz=120, num_updates=35460, lr=1.31843e-05, gnorm=0.939, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=145277 2023-05-02 18:55:04 - progress_bar.py[line:274] - INFO: epoch 006: 5321 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7927.7, nsentences=120, sample_size=3993, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2012.5, ups=0.25, wpb=7927.7, bsz=120, num_updates=35470, lr=1.3179e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=39, gb_free=31.1, wall=145316 2023-05-02 18:55:44 - progress_bar.py[line:274] - INFO: epoch 006: 5331 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7897.1, nsentences=120, sample_size=4437.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1939.9, ups=0.25, wpb=7897.1, bsz=120, num_updates=35480, lr=1.31737e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=41, gb_free=28.8, wall=145357 2023-05-02 18:56:24 - progress_bar.py[line:274] - INFO: epoch 006: 5341 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7741.6, nsentences=120, sample_size=3976.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1930.2, ups=0.25, wpb=7741.6, bsz=120, num_updates=35490, lr=1.31684e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=145397 2023-05-02 18:57:05 - progress_bar.py[line:274] - INFO: epoch 006: 5351 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7535.8, nsentences=120, sample_size=4151.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1880.3, ups=0.25, wpb=7535.8, bsz=120, num_updates=35500, lr=1.31631e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=145437 2023-05-02 18:57:44 - progress_bar.py[line:274] - INFO: epoch 006: 5361 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7585.9, nsentences=120, sample_size=3895.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1902.4, ups=0.25, wpb=7585.9, bsz=120, num_updates=35510, lr=1.31578e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=26.8, wall=145477 2023-05-02 18:58:24 - progress_bar.py[line:274] - INFO: epoch 006: 5371 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7699.6, nsentences=120, sample_size=4096, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1935.8, ups=0.25, wpb=7699.6, bsz=120, num_updates=35520, lr=1.31526e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=145517 2023-05-02 18:59:04 - progress_bar.py[line:274] - INFO: epoch 006: 5381 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7868.6, nsentences=120, sample_size=3989, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1973.2, ups=0.25, wpb=7868.6, bsz=120, num_updates=35530, lr=1.31473e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=145557 2023-05-02 18:59:44 - progress_bar.py[line:274] - INFO: epoch 006: 5391 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7417.2, nsentences=120, sample_size=4203.6, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1864.5, ups=0.25, wpb=7417.2, bsz=120, num_updates=35540, lr=1.3142e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=145596 2023-05-02 19:00:23 - progress_bar.py[line:274] - INFO: epoch 006: 5401 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7551.4, nsentences=120, sample_size=4002.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1912.8, ups=0.25, wpb=7551.4, bsz=120, num_updates=35550, lr=1.31367e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=145636 2023-05-02 19:01:03 - progress_bar.py[line:274] - INFO: epoch 006: 5411 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7930.6, nsentences=120, sample_size=3983.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2003, ups=0.25, wpb=7930.6, bsz=120, num_updates=35560, lr=1.31314e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=145675 2023-05-02 19:01:43 - progress_bar.py[line:274] - INFO: epoch 006: 5421 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7636.2, nsentences=120, sample_size=4119.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1891.9, ups=0.25, wpb=7636.2, bsz=120, num_updates=35570, lr=1.31262e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=27.6, wall=145716 2023-05-02 19:02:24 - progress_bar.py[line:274] - INFO: epoch 006: 5431 / 6042 loss=2.443, loss_v1=0, loss_v2=0, nll_loss=1.2, ntokens=7976.5, nsentences=120, sample_size=4216.3, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1981.2, ups=0.25, wpb=7976.5, bsz=120, num_updates=35580, lr=1.31209e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=27.1, wall=145756 2023-05-02 19:03:03 - progress_bar.py[line:274] - INFO: epoch 006: 5441 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7555.6, nsentences=120, sample_size=4427.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1904.7, ups=0.25, wpb=7555.6, bsz=120, num_updates=35590, lr=1.31156e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=40, gb_free=25.5, wall=145796 2023-05-02 19:03:43 - progress_bar.py[line:274] - INFO: epoch 006: 5451 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7724, nsentences=120, sample_size=4066.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1919.8, ups=0.25, wpb=7724, bsz=120, num_updates=35600, lr=1.31103e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=145836 2023-05-02 19:04:24 - progress_bar.py[line:274] - INFO: epoch 006: 5461 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7786.1, nsentences=120, sample_size=4324.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1936.9, ups=0.25, wpb=7786.1, bsz=120, num_updates=35610, lr=1.3105e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=145876 2023-05-02 19:05:03 - progress_bar.py[line:274] - INFO: epoch 006: 5471 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7647.7, nsentences=120, sample_size=3788.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1946.4, ups=0.25, wpb=7647.7, bsz=120, num_updates=35620, lr=1.30997e-05, gnorm=1.03, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=145915 2023-05-02 19:05:42 - progress_bar.py[line:274] - INFO: epoch 006: 5481 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7942.8, nsentences=120, sample_size=3954.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2033.7, ups=0.26, wpb=7942.8, bsz=120, num_updates=35630, lr=1.30945e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=29, wall=145954 2023-05-02 19:06:22 - progress_bar.py[line:274] - INFO: epoch 006: 5491 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7826.1, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1949.8, ups=0.25, wpb=7826.1, bsz=120, num_updates=35640, lr=1.30892e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=145995 2023-05-02 19:07:02 - progress_bar.py[line:274] - INFO: epoch 006: 5501 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7854.2, nsentences=120, sample_size=3884, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1974.4, ups=0.25, wpb=7854.2, bsz=120, num_updates=35650, lr=1.30839e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=26.4, wall=146034 2023-05-02 19:07:41 - progress_bar.py[line:274] - INFO: epoch 006: 5511 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7699.7, nsentences=120, sample_size=4207.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1958.2, ups=0.25, wpb=7699.7, bsz=120, num_updates=35660, lr=1.30786e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=146074 2023-05-02 19:08:21 - progress_bar.py[line:274] - INFO: epoch 006: 5521 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7887.5, nsentences=120, sample_size=3804.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2001.3, ups=0.25, wpb=7887.5, bsz=120, num_updates=35670, lr=1.30733e-05, gnorm=0.985, clip=50, loss_scale=64, train_wall=39, gb_free=29.4, wall=146113 2023-05-02 19:09:00 - progress_bar.py[line:274] - INFO: epoch 006: 5531 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7300, nsentences=120, sample_size=4339.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1845.2, ups=0.25, wpb=7300, bsz=120, num_updates=35680, lr=1.30681e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=25.9, wall=146153 2023-05-02 19:09:40 - progress_bar.py[line:274] - INFO: epoch 006: 5541 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7726.5, nsentences=120, sample_size=4099.7, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1923.9, ups=0.25, wpb=7726.5, bsz=120, num_updates=35690, lr=1.30628e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=146193 2023-05-02 19:10:20 - progress_bar.py[line:274] - INFO: epoch 006: 5551 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7553.4, nsentences=120, sample_size=3944.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1916.8, ups=0.25, wpb=7553.4, bsz=120, num_updates=35700, lr=1.30575e-05, gnorm=1.032, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=146232 2023-05-02 19:10:59 - progress_bar.py[line:274] - INFO: epoch 006: 5561 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7878.8, nsentences=120, sample_size=4123.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2007.6, ups=0.25, wpb=7878.8, bsz=120, num_updates=35710, lr=1.30522e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=31.2, wall=146271 2023-05-02 19:11:39 - progress_bar.py[line:274] - INFO: epoch 006: 5571 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7936.5, nsentences=120, sample_size=4365.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988.6, ups=0.25, wpb=7936.5, bsz=120, num_updates=35720, lr=1.30469e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=146311 2023-05-02 19:12:18 - progress_bar.py[line:274] - INFO: epoch 006: 5581 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7511.5, nsentences=120, sample_size=3902.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1916.6, ups=0.26, wpb=7511.5, bsz=120, num_updates=35730, lr=1.30416e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=31.1, wall=146351 2023-05-02 19:12:59 - progress_bar.py[line:274] - INFO: epoch 006: 5591 / 6042 loss=2.438, loss_v1=0, loss_v2=0, nll_loss=1.202, ntokens=8014.6, nsentences=120, sample_size=4160.8, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1957.8, ups=0.24, wpb=8014.6, bsz=120, num_updates=35740, lr=1.30364e-05, gnorm=0.964, clip=40, loss_scale=64, train_wall=41, gb_free=28.7, wall=146392 2023-05-02 19:13:39 - progress_bar.py[line:274] - INFO: epoch 006: 5601 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7286, nsentences=120, sample_size=4178.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1842.3, ups=0.25, wpb=7286, bsz=120, num_updates=35750, lr=1.30311e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=146431 2023-05-02 19:14:20 - progress_bar.py[line:274] - INFO: epoch 006: 5611 / 6042 loss=2.487, loss_v1=0, loss_v2=0, nll_loss=1.246, ntokens=8105.8, nsentences=120, sample_size=4002.1, sample_size_v1=0, sample_size_v2=0, ppl=2.37, wps=1975.6, ups=0.24, wpb=8105.8, bsz=120, num_updates=35760, lr=1.30258e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=41, gb_free=31.5, wall=146472 2023-05-02 19:14:59 - progress_bar.py[line:274] - INFO: epoch 006: 5621 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7761.2, nsentences=120, sample_size=4020.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1955, ups=0.25, wpb=7761.2, bsz=120, num_updates=35770, lr=1.30205e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=28.4, wall=146512 2023-05-02 19:15:39 - progress_bar.py[line:274] - INFO: epoch 006: 5631 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7652, nsentences=120, sample_size=3910.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1910.4, ups=0.25, wpb=7652, bsz=120, num_updates=35780, lr=1.30152e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=146552 2023-05-02 19:16:19 - progress_bar.py[line:274] - INFO: epoch 006: 5641 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7449.6, nsentences=120, sample_size=4060.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1889.3, ups=0.25, wpb=7449.6, bsz=120, num_updates=35790, lr=1.30099e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=146591 2023-05-02 19:16:59 - progress_bar.py[line:274] - INFO: epoch 006: 5651 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7607.9, nsentences=120, sample_size=4356, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1883.8, ups=0.25, wpb=7607.9, bsz=120, num_updates=35800, lr=1.30047e-05, gnorm=0.928, clip=10, loss_scale=64, train_wall=40, gb_free=25.4, wall=146632 2023-05-02 19:17:40 - progress_bar.py[line:274] - INFO: epoch 006: 5661 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7779.9, nsentences=120, sample_size=3793.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1915.3, ups=0.25, wpb=7779.9, bsz=120, num_updates=35810, lr=1.29994e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=41, gb_free=30.8, wall=146672 2023-05-02 19:18:19 - progress_bar.py[line:274] - INFO: epoch 006: 5671 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7546.5, nsentences=120, sample_size=4030.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1906.2, ups=0.25, wpb=7546.5, bsz=120, num_updates=35820, lr=1.29941e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=146712 2023-05-02 19:18:59 - progress_bar.py[line:274] - INFO: epoch 006: 5681 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7918.3, nsentences=120, sample_size=4039.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1995.9, ups=0.25, wpb=7918.3, bsz=120, num_updates=35830, lr=1.29888e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=146752 2023-05-02 19:19:39 - progress_bar.py[line:274] - INFO: epoch 006: 5691 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7667.2, nsentences=120, sample_size=3994.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1941.3, ups=0.25, wpb=7667.2, bsz=120, num_updates=35840, lr=1.29835e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=146791 2023-05-02 19:20:18 - progress_bar.py[line:274] - INFO: epoch 006: 5701 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7454.3, nsentences=120, sample_size=4134.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.4, ups=0.25, wpb=7454.3, bsz=120, num_updates=35850, lr=1.29783e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=146830 2023-05-02 19:20:57 - progress_bar.py[line:274] - INFO: epoch 006: 5711 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7724.1, nsentences=120, sample_size=3949, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1949.8, ups=0.25, wpb=7724.1, bsz=120, num_updates=35860, lr=1.2973e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=146870 2023-05-02 19:21:37 - progress_bar.py[line:274] - INFO: epoch 006: 5721 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7701.5, nsentences=120, sample_size=3857.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1953.2, ups=0.25, wpb=7701.5, bsz=120, num_updates=35870, lr=1.29677e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=146909 2023-05-02 19:22:16 - progress_bar.py[line:274] - INFO: epoch 006: 5731 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7773.4, nsentences=120, sample_size=3872.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2004.7, ups=0.26, wpb=7773.4, bsz=120, num_updates=35880, lr=1.29624e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=146948 2023-05-02 19:22:55 - progress_bar.py[line:274] - INFO: epoch 006: 5741 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7914.6, nsentences=120, sample_size=3988.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1998.4, ups=0.25, wpb=7914.6, bsz=120, num_updates=35890, lr=1.29571e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=146988 2023-05-02 19:23:35 - progress_bar.py[line:274] - INFO: epoch 006: 5751 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7686.2, nsentences=120, sample_size=4195.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1926.8, ups=0.25, wpb=7686.2, bsz=120, num_updates=35900, lr=1.29518e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=147028 2023-05-02 19:24:16 - progress_bar.py[line:274] - INFO: epoch 006: 5761 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7663.3, nsentences=120, sample_size=3988.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1894.3, ups=0.25, wpb=7663.3, bsz=120, num_updates=35910, lr=1.29466e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=147068 2023-05-02 19:24:55 - progress_bar.py[line:274] - INFO: epoch 006: 5771 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7821.5, nsentences=120, sample_size=3596.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1979.7, ups=0.25, wpb=7821.5, bsz=120, num_updates=35920, lr=1.29413e-05, gnorm=0.992, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=147108 2023-05-02 19:25:36 - progress_bar.py[line:274] - INFO: epoch 006: 5781 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7936.2, nsentences=120, sample_size=3800.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1962, ups=0.25, wpb=7936.2, bsz=120, num_updates=35930, lr=1.2936e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=147148 2023-05-02 19:26:15 - progress_bar.py[line:274] - INFO: epoch 006: 5791 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7388.9, nsentences=120, sample_size=4062.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1859, ups=0.25, wpb=7388.9, bsz=120, num_updates=35940, lr=1.29307e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=147188 2023-05-02 19:26:55 - progress_bar.py[line:274] - INFO: epoch 006: 5801 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7855.3, nsentences=120, sample_size=4160, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1995.5, ups=0.25, wpb=7855.3, bsz=120, num_updates=35950, lr=1.29254e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=26.7, wall=147227 2023-05-02 19:27:35 - progress_bar.py[line:274] - INFO: epoch 006: 5811 / 6042 loss=2.476, loss_v1=0, loss_v2=0, nll_loss=1.233, ntokens=8082.9, nsentences=120, sample_size=4022.8, sample_size_v1=0, sample_size_v2=0, ppl=2.35, wps=1985.5, ups=0.25, wpb=8082.9, bsz=120, num_updates=35960, lr=1.29202e-05, gnorm=0.937, clip=20, loss_scale=128, train_wall=41, gb_free=29.3, wall=147268 2023-05-02 19:28:16 - progress_bar.py[line:274] - INFO: epoch 006: 5821 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7579.6, nsentences=120, sample_size=4206.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1878.1, ups=0.25, wpb=7579.6, bsz=120, num_updates=35970, lr=1.29149e-05, gnorm=0.946, clip=10, loss_scale=128, train_wall=40, gb_free=23.6, wall=147308 2023-05-02 19:28:55 - progress_bar.py[line:274] - INFO: epoch 006: 5831 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7422.9, nsentences=120, sample_size=3990.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1873.3, ups=0.25, wpb=7422.9, bsz=120, num_updates=35980, lr=1.29096e-05, gnorm=0.95, clip=20, loss_scale=128, train_wall=40, gb_free=30.4, wall=147348 2023-05-02 19:29:35 - progress_bar.py[line:274] - INFO: epoch 006: 5841 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=7858.3, nsentences=120, sample_size=4107.1, sample_size_v1=0, sample_size_v2=0, ppl=2.31, wps=1997.2, ups=0.25, wpb=7858.3, bsz=120, num_updates=35990, lr=1.29043e-05, gnorm=0.933, clip=0, loss_scale=128, train_wall=39, gb_free=29.7, wall=147387 2023-05-02 19:30:14 - progress_bar.py[line:274] - INFO: epoch 006: 5851 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7616.2, nsentences=120, sample_size=3754, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1938.9, ups=0.25, wpb=7616.2, bsz=120, num_updates=36000, lr=1.2899e-05, gnorm=1.034, clip=60, loss_scale=128, train_wall=39, gb_free=28.3, wall=147427 2023-05-02 19:30:14 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 19:30:16 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:30:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:30:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:30:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:30:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:45 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:30:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:30:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:30:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:30:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:30:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:30:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:00 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 19:31:00 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:31:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:31:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:31:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:31:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:31:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:31:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:31:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:31:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:31:05 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.218 | loss_v1 0 | loss_v2 0 | nll_loss 2.051 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.14 | score 0.7568 | wps 3305.9 | wpb 3202.1 | bsz 39.4 | num_updates 36000 | best_score 0.7627 2023-05-02 19:31:05 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 36000 updates 2023-05-02 19:31:05 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_36000.pt 2023-05-02 19:31:29 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_36000.pt 2023-05-02 19:31:43 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_6_36000.pt (epoch 6 @ 36000 updates, score 0.7568) (writing took 38.032116781920195 seconds) 2023-05-02 19:32:03 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 19:32:27 - progress_bar.py[line:274] - INFO: epoch 006: 5862 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7715, nsentences=120, sample_size=4042.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=579.9, ups=0.08, wpb=7715, bsz=120, num_updates=36010, lr=1.28937e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=44, gb_free=28, wall=147560 2023-05-02 19:33:07 - progress_bar.py[line:274] - INFO: epoch 006: 5872 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7723.2, nsentences=120, sample_size=3970.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1916.7, ups=0.25, wpb=7723.2, bsz=120, num_updates=36020, lr=1.28885e-05, gnorm=0.988, clip=60, loss_scale=64, train_wall=40, gb_free=31.3, wall=147600 2023-05-02 19:33:47 - progress_bar.py[line:274] - INFO: epoch 006: 5882 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7671.1, nsentences=120, sample_size=3933.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1926.9, ups=0.25, wpb=7671.1, bsz=120, num_updates=36030, lr=1.28832e-05, gnorm=0.978, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=147640 2023-05-02 19:34:27 - progress_bar.py[line:274] - INFO: epoch 006: 5892 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7588, nsentences=120, sample_size=3980.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1900.2, ups=0.25, wpb=7588, bsz=120, num_updates=36040, lr=1.28779e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=28.3, wall=147680 2023-05-02 19:35:07 - progress_bar.py[line:274] - INFO: epoch 006: 5902 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7493, nsentences=120, sample_size=3584.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1898.8, ups=0.25, wpb=7493, bsz=120, num_updates=36050, lr=1.28726e-05, gnorm=1.02, clip=70, loss_scale=64, train_wall=39, gb_free=30.5, wall=147719 2023-05-02 19:35:45 - progress_bar.py[line:274] - INFO: epoch 006: 5912 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7502.1, nsentences=120, sample_size=4032.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1934.1, ups=0.26, wpb=7502.1, bsz=120, num_updates=36060, lr=1.28673e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=147758 2023-05-02 19:36:25 - progress_bar.py[line:274] - INFO: epoch 006: 5922 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7614.8, nsentences=120, sample_size=4009.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1933.4, ups=0.25, wpb=7614.8, bsz=120, num_updates=36070, lr=1.2862e-05, gnorm=0.987, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=147797 2023-05-02 19:37:05 - progress_bar.py[line:274] - INFO: epoch 006: 5932 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7514.1, nsentences=120, sample_size=4019.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1875.6, ups=0.25, wpb=7514.1, bsz=120, num_updates=36080, lr=1.28568e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=147837 2023-05-02 19:37:44 - progress_bar.py[line:274] - INFO: epoch 006: 5942 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7792.3, nsentences=120, sample_size=4421.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1968.5, ups=0.25, wpb=7792.3, bsz=120, num_updates=36090, lr=1.28515e-05, gnorm=0.913, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=147877 2023-05-02 19:38:25 - progress_bar.py[line:274] - INFO: epoch 006: 5952 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7681, nsentences=120, sample_size=4358.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1882.3, ups=0.25, wpb=7681, bsz=120, num_updates=36100, lr=1.28462e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=41, gb_free=30.7, wall=147918 2023-05-02 19:39:05 - progress_bar.py[line:274] - INFO: epoch 006: 5962 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7564.1, nsentences=120, sample_size=4092.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.2, ups=0.25, wpb=7564.1, bsz=120, num_updates=36110, lr=1.28409e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=147957 2023-05-02 19:39:44 - progress_bar.py[line:274] - INFO: epoch 006: 5972 / 6042 loss=2.431, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7625.9, nsentences=120, sample_size=4091, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1949.2, ups=0.26, wpb=7625.9, bsz=120, num_updates=36120, lr=1.28356e-05, gnorm=0.994, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=147997 2023-05-02 19:40:23 - progress_bar.py[line:274] - INFO: epoch 006: 5982 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7557.2, nsentences=120, sample_size=4013.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1944.5, ups=0.26, wpb=7557.2, bsz=120, num_updates=36130, lr=1.28304e-05, gnorm=0.973, clip=50, loss_scale=64, train_wall=39, gb_free=29.4, wall=148035 2023-05-02 19:41:03 - progress_bar.py[line:274] - INFO: epoch 006: 5992 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7931.8, nsentences=120, sample_size=4068.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1969.3, ups=0.25, wpb=7931.8, bsz=120, num_updates=36140, lr=1.28251e-05, gnorm=0.942, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=148076 2023-05-02 19:41:43 - progress_bar.py[line:274] - INFO: epoch 006: 6002 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7742.6, nsentences=120, sample_size=3909.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1955.1, ups=0.25, wpb=7742.6, bsz=120, num_updates=36150, lr=1.28198e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=148115 2023-05-02 19:42:23 - progress_bar.py[line:274] - INFO: epoch 006: 6012 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=8013.7, nsentences=120, sample_size=3991.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2000, ups=0.25, wpb=8013.7, bsz=120, num_updates=36160, lr=1.28145e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=148155 2023-05-02 19:43:03 - progress_bar.py[line:274] - INFO: epoch 006: 6022 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=8124.4, nsentences=120, sample_size=4128.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2029.1, ups=0.25, wpb=8124.4, bsz=120, num_updates=36170, lr=1.28092e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=148195 2023-05-02 19:43:43 - progress_bar.py[line:274] - INFO: epoch 006: 6032 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7644.5, nsentences=120, sample_size=4160, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1923.2, ups=0.25, wpb=7644.5, bsz=120, num_updates=36180, lr=1.28039e-05, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=29, wall=148235 2023-05-02 19:44:21 - progress_bar.py[line:274] - INFO: epoch 006: 6042 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7150.6, nsentences=116, sample_size=3942.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1880.5, ups=0.26, wpb=7150.6, bsz=116, num_updates=36190, lr=1.27987e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=38, gb_free=28.6, wall=148273 2023-05-02 19:44:21 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 19:44:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:44:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:44:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:39 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:44:39 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:44:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:44:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:44:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:44:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:44:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:03 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:45:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:45:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:07 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 19:45:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:45:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 19:45:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 19:45:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 19:45:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 19:45:12 - progress_bar.py[line:282] - INFO: epoch 006 | valid on 'valid' subset | loss 3.224 | loss_v1 0 | loss_v2 0 | nll_loss 2.058 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.17 | score 0.7554 | wps 3294.7 | wpb 3202.1 | bsz 39.4 | num_updates 36190 | best_score 0.7627 2023-05-02 19:45:12 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 6 @ 36190 updates 2023-05-02 19:45:12 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-02 19:45:38 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-02 19:45:39 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 6 @ 36190 updates, score 0.7554) (writing took 26.57640655292198 seconds) 2023-05-02 19:45:39 - train.py[line:332] - INFO: end of epoch 6 (average epoch stats below) 2023-05-02 19:45:39 - progress_bar.py[line:282] - INFO: epoch 006 | loss 2.404 | loss_v1 0 | loss_v2 0 | nll_loss 1.152 | ntokens 7726.84 | nsentences 119.992 | sample_size 4036.54 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.22 | wps 1888.2 | ups 0.24 | wpb 7726.8 | bsz 120 | num_updates 36190 | lr 1.27987e-05 | gnorm 0.967 | clip 30.1 | loss_scale 64 | train_wall 24007 | gb_free 28.6 | wall 148351 2023-05-02 19:45:39 - trainer.py[line:639] - INFO: loading train data for epoch 7 2023-05-02 19:45:39 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-02 19:45:39 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-02 19:45:41 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-02 19:45:42 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-02 19:45:42 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-02 19:45:43 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-02 19:45:43 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-02 19:45:43 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-02 19:45:44 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-02 19:45:44 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-02 19:45:45 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-02 19:45:45 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-02 19:45:45 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-02 19:45:45 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-02 19:45:45 - trainer.py[line:703] - INFO: begin training epoch 7 2023-05-02 19:45:45 - train.py[line:305] - INFO: Start iterating over samples 2023-05-02 19:46:25 - progress_bar.py[line:274] - INFO: epoch 007: 10 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7811.4, nsentences=120, sample_size=3994.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=630.1, ups=0.08, wpb=7811.4, bsz=120, num_updates=36200, lr=1.27934e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=148397 2023-05-02 19:47:04 - progress_bar.py[line:274] - INFO: epoch 007: 20 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7685.2, nsentences=120, sample_size=3757.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1971.7, ups=0.26, wpb=7685.2, bsz=120, num_updates=36210, lr=1.27881e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=148436 2023-05-02 19:47:44 - progress_bar.py[line:274] - INFO: epoch 007: 30 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7552.1, nsentences=120, sample_size=4186.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1884, ups=0.25, wpb=7552.1, bsz=120, num_updates=36220, lr=1.27828e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=27.1, wall=148476 2023-05-02 19:48:23 - progress_bar.py[line:274] - INFO: epoch 007: 40 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7757.7, nsentences=120, sample_size=4040.5, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1965.9, ups=0.25, wpb=7757.7, bsz=120, num_updates=36230, lr=1.27775e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=148516 2023-05-02 19:49:02 - progress_bar.py[line:274] - INFO: epoch 007: 50 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7695.7, nsentences=120, sample_size=3967, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1963.6, ups=0.26, wpb=7695.7, bsz=120, num_updates=36240, lr=1.27723e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=148555 2023-05-02 19:49:41 - progress_bar.py[line:274] - INFO: epoch 007: 60 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7756.4, nsentences=120, sample_size=3904, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1986.1, ups=0.26, wpb=7756.4, bsz=120, num_updates=36250, lr=1.2767e-05, gnorm=0.99, clip=40, loss_scale=64, train_wall=39, gb_free=30.6, wall=148594 2023-05-02 19:50:21 - progress_bar.py[line:274] - INFO: epoch 007: 70 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7897.8, nsentences=120, sample_size=4047.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1972.2, ups=0.25, wpb=7897.8, bsz=120, num_updates=36260, lr=1.27617e-05, gnorm=0.996, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=148634 2023-05-02 19:51:01 - progress_bar.py[line:274] - INFO: epoch 007: 80 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7783, nsentences=120, sample_size=3783.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1950.4, ups=0.25, wpb=7783, bsz=120, num_updates=36270, lr=1.27564e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=28.1, wall=148674 2023-05-02 19:51:41 - progress_bar.py[line:274] - INFO: epoch 007: 90 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7806.7, nsentences=120, sample_size=3888.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1968.1, ups=0.25, wpb=7806.7, bsz=120, num_updates=36280, lr=1.27511e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=148714 2023-05-02 19:52:22 - progress_bar.py[line:274] - INFO: epoch 007: 100 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7783.9, nsentences=120, sample_size=4099.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908.6, ups=0.25, wpb=7783.9, bsz=120, num_updates=36290, lr=1.27458e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=41, gb_free=31.1, wall=148754 2023-05-02 19:53:03 - progress_bar.py[line:274] - INFO: epoch 007: 110 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7992.6, nsentences=120, sample_size=3964.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1961.9, ups=0.25, wpb=7992.6, bsz=120, num_updates=36300, lr=1.27406e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=41, gb_free=29, wall=148795 2023-05-02 19:53:42 - progress_bar.py[line:274] - INFO: epoch 007: 120 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7998.6, nsentences=120, sample_size=4040.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2021.6, ups=0.25, wpb=7998.6, bsz=120, num_updates=36310, lr=1.27353e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=39, gb_free=27.2, wall=148835 2023-05-02 19:54:22 - progress_bar.py[line:274] - INFO: epoch 007: 130 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7536.3, nsentences=120, sample_size=3879.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1907.6, ups=0.25, wpb=7536.3, bsz=120, num_updates=36320, lr=1.273e-05, gnorm=0.993, clip=60, loss_scale=64, train_wall=39, gb_free=30.2, wall=148874 2023-05-02 19:55:02 - progress_bar.py[line:274] - INFO: epoch 007: 140 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7806.1, nsentences=120, sample_size=3771.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1934.7, ups=0.25, wpb=7806.1, bsz=120, num_updates=36330, lr=1.27247e-05, gnorm=1.006, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=148914 2023-05-02 19:55:42 - progress_bar.py[line:274] - INFO: epoch 007: 150 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7690.9, nsentences=120, sample_size=4080.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1916.3, ups=0.25, wpb=7690.9, bsz=120, num_updates=36340, lr=1.27194e-05, gnorm=0.942, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=148955 2023-05-02 19:56:23 - progress_bar.py[line:274] - INFO: epoch 007: 160 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7942.2, nsentences=120, sample_size=3979.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1951.7, ups=0.25, wpb=7942.2, bsz=120, num_updates=36350, lr=1.27141e-05, gnorm=0.983, clip=40, loss_scale=64, train_wall=41, gb_free=31.1, wall=148995 2023-05-02 19:57:02 - progress_bar.py[line:274] - INFO: epoch 007: 170 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7693.8, nsentences=120, sample_size=4329.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.7, ups=0.25, wpb=7693.8, bsz=120, num_updates=36360, lr=1.27089e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=39, gb_free=28.9, wall=149035 2023-05-02 19:57:42 - progress_bar.py[line:274] - INFO: epoch 007: 180 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=8043.4, nsentences=120, sample_size=3626.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2012.6, ups=0.25, wpb=8043.4, bsz=120, num_updates=36370, lr=1.27036e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=149075 2023-05-02 19:58:22 - progress_bar.py[line:274] - INFO: epoch 007: 190 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7639.9, nsentences=120, sample_size=3975.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1910.4, ups=0.25, wpb=7639.9, bsz=120, num_updates=36380, lr=1.26983e-05, gnorm=1.004, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=149115 2023-05-02 19:59:02 - progress_bar.py[line:274] - INFO: epoch 007: 200 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7739.2, nsentences=120, sample_size=4093.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1934.9, ups=0.25, wpb=7739.2, bsz=120, num_updates=36390, lr=1.2693e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=149155 2023-05-02 19:59:42 - progress_bar.py[line:274] - INFO: epoch 007: 210 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=8017.3, nsentences=120, sample_size=4007.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2001.7, ups=0.25, wpb=8017.3, bsz=120, num_updates=36400, lr=1.26877e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=149195 2023-05-02 20:00:23 - progress_bar.py[line:274] - INFO: epoch 007: 220 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7742.2, nsentences=120, sample_size=4337.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1925.9, ups=0.25, wpb=7742.2, bsz=120, num_updates=36410, lr=1.26825e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=149235 2023-05-02 20:01:03 - progress_bar.py[line:274] - INFO: epoch 007: 230 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7783.3, nsentences=120, sample_size=4252.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1941.5, ups=0.25, wpb=7783.3, bsz=120, num_updates=36420, lr=1.26772e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=149275 2023-05-02 20:01:42 - progress_bar.py[line:274] - INFO: epoch 007: 240 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7908.5, nsentences=120, sample_size=3994.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2021, ups=0.26, wpb=7908.5, bsz=120, num_updates=36430, lr=1.26719e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=149314 2023-05-02 20:02:22 - progress_bar.py[line:274] - INFO: epoch 007: 250 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7727.5, nsentences=120, sample_size=4348.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1940.6, ups=0.25, wpb=7727.5, bsz=120, num_updates=36440, lr=1.26666e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=149354 2023-05-02 20:03:01 - progress_bar.py[line:274] - INFO: epoch 007: 260 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7904.5, nsentences=120, sample_size=3998.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1986.6, ups=0.25, wpb=7904.5, bsz=120, num_updates=36450, lr=1.26613e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=149394 2023-05-02 20:03:41 - progress_bar.py[line:274] - INFO: epoch 007: 270 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7394.5, nsentences=120, sample_size=4237.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1876, ups=0.25, wpb=7394.5, bsz=120, num_updates=36460, lr=1.2656e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=149433 2023-05-02 20:04:21 - progress_bar.py[line:274] - INFO: epoch 007: 280 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7914.2, nsentences=120, sample_size=3825.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.3, ups=0.25, wpb=7914.2, bsz=120, num_updates=36470, lr=1.26508e-05, gnorm=1.022, clip=70, loss_scale=64, train_wall=40, gb_free=30.7, wall=149473 2023-05-02 20:05:00 - progress_bar.py[line:274] - INFO: epoch 007: 290 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7742.2, nsentences=120, sample_size=4092.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1993.5, ups=0.26, wpb=7742.2, bsz=120, num_updates=36480, lr=1.26455e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=39, gb_free=31.6, wall=149512 2023-05-02 20:05:40 - progress_bar.py[line:274] - INFO: epoch 007: 300 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7628.7, nsentences=120, sample_size=3767.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1922.7, ups=0.25, wpb=7628.7, bsz=120, num_updates=36490, lr=1.26402e-05, gnorm=1.007, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=149552 2023-05-02 20:06:19 - progress_bar.py[line:274] - INFO: epoch 007: 310 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7403.3, nsentences=120, sample_size=3929.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1876.5, ups=0.25, wpb=7403.3, bsz=120, num_updates=36500, lr=1.26349e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=39, gb_free=28.8, wall=149591 2023-05-02 20:06:59 - progress_bar.py[line:274] - INFO: epoch 007: 320 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7637, nsentences=120, sample_size=3831, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1899.7, ups=0.25, wpb=7637, bsz=120, num_updates=36510, lr=1.26296e-05, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=149632 2023-05-02 20:07:36 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 20:07:43 - progress_bar.py[line:274] - INFO: epoch 007: 331 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7547.5, nsentences=120, sample_size=4042.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1704.2, ups=0.23, wpb=7547.5, bsz=120, num_updates=36520, lr=1.26244e-05, gnorm=0.936, clip=0, loss_scale=64, train_wall=44, gb_free=29.6, wall=149676 2023-05-02 20:08:23 - progress_bar.py[line:274] - INFO: epoch 007: 341 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7643.3, nsentences=120, sample_size=3773.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1930.2, ups=0.25, wpb=7643.3, bsz=120, num_updates=36530, lr=1.26191e-05, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=149716 2023-05-02 20:09:02 - progress_bar.py[line:274] - INFO: epoch 007: 351 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7715.2, nsentences=120, sample_size=3679.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1969.6, ups=0.26, wpb=7715.2, bsz=120, num_updates=36540, lr=1.26138e-05, gnorm=1.047, clip=60, loss_scale=64, train_wall=39, gb_free=30.2, wall=149755 2023-05-02 20:09:42 - progress_bar.py[line:274] - INFO: epoch 007: 361 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7636, nsentences=120, sample_size=4208.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1927.5, ups=0.25, wpb=7636, bsz=120, num_updates=36550, lr=1.26085e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=149794 2023-05-02 20:10:22 - progress_bar.py[line:274] - INFO: epoch 007: 371 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7672.4, nsentences=120, sample_size=3934.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1901.7, ups=0.25, wpb=7672.4, bsz=120, num_updates=36560, lr=1.26032e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=149835 2023-05-02 20:11:02 - progress_bar.py[line:274] - INFO: epoch 007: 381 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7933, nsentences=120, sample_size=3792.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2004.1, ups=0.25, wpb=7933, bsz=120, num_updates=36570, lr=1.25979e-05, gnorm=1.008, clip=60, loss_scale=64, train_wall=40, gb_free=31, wall=149874 2023-05-02 20:11:42 - progress_bar.py[line:274] - INFO: epoch 007: 391 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7845.1, nsentences=120, sample_size=4024.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1964.5, ups=0.25, wpb=7845.1, bsz=120, num_updates=36580, lr=1.25927e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=149914 2023-05-02 20:12:22 - progress_bar.py[line:274] - INFO: epoch 007: 401 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7652.3, nsentences=120, sample_size=4087.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1914.3, ups=0.25, wpb=7652.3, bsz=120, num_updates=36590, lr=1.25874e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=149954 2023-05-02 20:13:03 - progress_bar.py[line:274] - INFO: epoch 007: 411 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7584.2, nsentences=120, sample_size=4108.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1859.3, ups=0.25, wpb=7584.2, bsz=120, num_updates=36600, lr=1.25821e-05, gnorm=0.986, clip=40, loss_scale=64, train_wall=41, gb_free=30.3, wall=149995 2023-05-02 20:13:43 - progress_bar.py[line:274] - INFO: epoch 007: 421 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7938.2, nsentences=120, sample_size=4078.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1968.4, ups=0.25, wpb=7938.2, bsz=120, num_updates=36610, lr=1.25768e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=150035 2023-05-02 20:14:22 - progress_bar.py[line:274] - INFO: epoch 007: 431 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7567.8, nsentences=120, sample_size=3601.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1914.4, ups=0.25, wpb=7567.8, bsz=120, num_updates=36620, lr=1.25715e-05, gnorm=1.019, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=150075 2023-05-02 20:15:03 - progress_bar.py[line:274] - INFO: epoch 007: 441 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=8072.9, nsentences=120, sample_size=4230.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1995, ups=0.25, wpb=8072.9, bsz=120, num_updates=36630, lr=1.25662e-05, gnorm=0.905, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=150115 2023-05-02 20:15:43 - progress_bar.py[line:274] - INFO: epoch 007: 451 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7863.2, nsentences=120, sample_size=4115.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1956.7, ups=0.25, wpb=7863.2, bsz=120, num_updates=36640, lr=1.2561e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=150155 2023-05-02 20:16:23 - progress_bar.py[line:274] - INFO: epoch 007: 461 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7718.4, nsentences=120, sample_size=4058.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1930.3, ups=0.25, wpb=7718.4, bsz=120, num_updates=36650, lr=1.25557e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=150195 2023-05-02 20:17:03 - progress_bar.py[line:274] - INFO: epoch 007: 471 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=8003.7, nsentences=120, sample_size=4068.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1997.9, ups=0.25, wpb=8003.7, bsz=120, num_updates=36660, lr=1.25504e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=150236 2023-05-02 20:17:43 - progress_bar.py[line:274] - INFO: epoch 007: 481 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7851.4, nsentences=120, sample_size=4169.8, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1981, ups=0.25, wpb=7851.4, bsz=120, num_updates=36670, lr=1.25451e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=25.6, wall=150275 2023-05-02 20:18:22 - progress_bar.py[line:274] - INFO: epoch 007: 491 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7746.3, nsentences=120, sample_size=3790, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1977.4, ups=0.26, wpb=7746.3, bsz=120, num_updates=36680, lr=1.25398e-05, gnorm=1.002, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=150314 2023-05-02 20:19:01 - progress_bar.py[line:274] - INFO: epoch 007: 501 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7644.4, nsentences=120, sample_size=3869.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1941.2, ups=0.25, wpb=7644.4, bsz=120, num_updates=36690, lr=1.25346e-05, gnorm=1.02, clip=60, loss_scale=64, train_wall=39, gb_free=29.6, wall=150354 2023-05-02 20:19:41 - progress_bar.py[line:274] - INFO: epoch 007: 511 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7730.5, nsentences=120, sample_size=4058.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1947.9, ups=0.25, wpb=7730.5, bsz=120, num_updates=36700, lr=1.25293e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=150393 2023-05-02 20:20:21 - progress_bar.py[line:274] - INFO: epoch 007: 521 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7891.3, nsentences=120, sample_size=4017.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1983.8, ups=0.25, wpb=7891.3, bsz=120, num_updates=36710, lr=1.2524e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=150433 2023-05-02 20:21:00 - progress_bar.py[line:274] - INFO: epoch 007: 531 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7861.7, nsentences=120, sample_size=3639, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1986.9, ups=0.25, wpb=7861.7, bsz=120, num_updates=36720, lr=1.25187e-05, gnorm=1.014, clip=60, loss_scale=64, train_wall=39, gb_free=29.6, wall=150473 2023-05-02 20:21:40 - progress_bar.py[line:274] - INFO: epoch 007: 541 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7312.3, nsentences=120, sample_size=4150.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1845.4, ups=0.25, wpb=7312.3, bsz=120, num_updates=36730, lr=1.25134e-05, gnorm=0.955, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=150512 2023-05-02 20:22:20 - progress_bar.py[line:274] - INFO: epoch 007: 551 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7860.3, nsentences=120, sample_size=4136.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1957.2, ups=0.25, wpb=7860.3, bsz=120, num_updates=36740, lr=1.25081e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=31.3, wall=150553 2023-05-02 20:23:00 - progress_bar.py[line:274] - INFO: epoch 007: 561 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7752, nsentences=120, sample_size=3877.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1928.8, ups=0.25, wpb=7752, bsz=120, num_updates=36750, lr=1.25029e-05, gnorm=1.013, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=150593 2023-05-02 20:23:40 - progress_bar.py[line:274] - INFO: epoch 007: 571 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7749, nsentences=120, sample_size=4306.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968.2, ups=0.25, wpb=7749, bsz=120, num_updates=36760, lr=1.24976e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=150632 2023-05-02 20:24:19 - progress_bar.py[line:274] - INFO: epoch 007: 581 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7767, nsentences=120, sample_size=3984.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1964.2, ups=0.25, wpb=7767, bsz=120, num_updates=36770, lr=1.24923e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=150672 2023-05-02 20:24:59 - progress_bar.py[line:274] - INFO: epoch 007: 591 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7570.8, nsentences=120, sample_size=4143.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1904.9, ups=0.25, wpb=7570.8, bsz=120, num_updates=36780, lr=1.2487e-05, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=28.2, wall=150711 2023-05-02 20:25:39 - progress_bar.py[line:274] - INFO: epoch 007: 601 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8002.1, nsentences=120, sample_size=4066.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1994.7, ups=0.25, wpb=8002.1, bsz=120, num_updates=36790, lr=1.24817e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=150752 2023-05-02 20:26:19 - progress_bar.py[line:274] - INFO: epoch 007: 611 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7835.2, nsentences=120, sample_size=4218.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1958, ups=0.25, wpb=7835.2, bsz=120, num_updates=36800, lr=1.24765e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=31.5, wall=150792 2023-05-02 20:26:59 - progress_bar.py[line:274] - INFO: epoch 007: 621 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7395.6, nsentences=120, sample_size=4136.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1842.2, ups=0.25, wpb=7395.6, bsz=120, num_updates=36810, lr=1.24712e-05, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=150832 2023-05-02 20:27:39 - progress_bar.py[line:274] - INFO: epoch 007: 631 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7558.4, nsentences=120, sample_size=4091.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1914.4, ups=0.25, wpb=7558.4, bsz=120, num_updates=36820, lr=1.24659e-05, gnorm=0.93, clip=0, loss_scale=64, train_wall=39, gb_free=29.5, wall=150871 2023-05-02 20:28:18 - progress_bar.py[line:274] - INFO: epoch 007: 641 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7731.2, nsentences=120, sample_size=3839.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1958.6, ups=0.25, wpb=7731.2, bsz=120, num_updates=36830, lr=1.24606e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=150911 2023-05-02 20:28:58 - progress_bar.py[line:274] - INFO: epoch 007: 651 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7854.4, nsentences=120, sample_size=4166.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1962.6, ups=0.25, wpb=7854.4, bsz=120, num_updates=36840, lr=1.24553e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=150951 2023-05-02 20:29:38 - progress_bar.py[line:274] - INFO: epoch 007: 661 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7694.5, nsentences=120, sample_size=4307.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1922.4, ups=0.25, wpb=7694.5, bsz=120, num_updates=36850, lr=1.245e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=150991 2023-05-02 20:30:18 - progress_bar.py[line:274] - INFO: epoch 007: 671 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7686, nsentences=120, sample_size=4143.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1916.1, ups=0.25, wpb=7686, bsz=120, num_updates=36860, lr=1.24448e-05, gnorm=0.924, clip=0, loss_scale=64, train_wall=40, gb_free=30.7, wall=151031 2023-05-02 20:30:59 - progress_bar.py[line:274] - INFO: epoch 007: 681 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7595.8, nsentences=120, sample_size=3982.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1888.4, ups=0.25, wpb=7595.8, bsz=120, num_updates=36870, lr=1.24395e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=151071 2023-05-02 20:31:38 - progress_bar.py[line:274] - INFO: epoch 007: 691 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7862.4, nsentences=120, sample_size=3897.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1969.3, ups=0.25, wpb=7862.4, bsz=120, num_updates=36880, lr=1.24342e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=27.8, wall=151111 2023-05-02 20:32:19 - progress_bar.py[line:274] - INFO: epoch 007: 701 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7738.6, nsentences=120, sample_size=4182.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1900.1, ups=0.25, wpb=7738.6, bsz=120, num_updates=36890, lr=1.24289e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=41, gb_free=29.5, wall=151152 2023-05-02 20:32:59 - progress_bar.py[line:274] - INFO: epoch 007: 711 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7956, nsentences=120, sample_size=3956.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1986.4, ups=0.25, wpb=7956, bsz=120, num_updates=36900, lr=1.24236e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=151192 2023-05-02 20:33:39 - progress_bar.py[line:274] - INFO: epoch 007: 721 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7374.7, nsentences=120, sample_size=4214.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1839.9, ups=0.25, wpb=7374.7, bsz=120, num_updates=36910, lr=1.24183e-05, gnorm=0.976, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=151232 2023-05-02 20:34:19 - progress_bar.py[line:274] - INFO: epoch 007: 731 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7478.5, nsentences=120, sample_size=4259.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1876.5, ups=0.25, wpb=7478.5, bsz=120, num_updates=36920, lr=1.24131e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=151272 2023-05-02 20:34:58 - progress_bar.py[line:274] - INFO: epoch 007: 741 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7410.6, nsentences=120, sample_size=3878.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1896.5, ups=0.26, wpb=7410.6, bsz=120, num_updates=36930, lr=1.24078e-05, gnorm=0.981, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=151311 2023-05-02 20:35:38 - progress_bar.py[line:274] - INFO: epoch 007: 751 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7890.9, nsentences=120, sample_size=3985.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988.5, ups=0.25, wpb=7890.9, bsz=120, num_updates=36940, lr=1.24025e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=151350 2023-05-02 20:36:18 - progress_bar.py[line:274] - INFO: epoch 007: 761 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7789.8, nsentences=120, sample_size=4120.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1954.2, ups=0.25, wpb=7789.8, bsz=120, num_updates=36950, lr=1.23972e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=151390 2023-05-02 20:36:58 - progress_bar.py[line:274] - INFO: epoch 007: 771 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7740.5, nsentences=120, sample_size=4285.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1945.3, ups=0.25, wpb=7740.5, bsz=120, num_updates=36960, lr=1.23919e-05, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=25.1, wall=151430 2023-05-02 20:37:37 - progress_bar.py[line:274] - INFO: epoch 007: 781 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7624.6, nsentences=120, sample_size=3977.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1941.3, ups=0.25, wpb=7624.6, bsz=120, num_updates=36970, lr=1.23867e-05, gnorm=0.975, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=151469 2023-05-02 20:38:17 - progress_bar.py[line:274] - INFO: epoch 007: 791 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7616.5, nsentences=120, sample_size=4143.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1900.9, ups=0.25, wpb=7616.5, bsz=120, num_updates=36980, lr=1.23814e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.6, wall=151509 2023-05-02 20:38:56 - progress_bar.py[line:274] - INFO: epoch 007: 801 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7776.2, nsentences=120, sample_size=3709.9, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1980.9, ups=0.25, wpb=7776.2, bsz=120, num_updates=36990, lr=1.23761e-05, gnorm=1.014, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=151549 2023-05-02 20:39:36 - progress_bar.py[line:274] - INFO: epoch 007: 811 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7624.2, nsentences=120, sample_size=4247.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1935, ups=0.25, wpb=7624.2, bsz=120, num_updates=37000, lr=1.23708e-05, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=151588 2023-05-02 20:39:36 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 20:39:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 20:39:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:39:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:55 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 20:39:55 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:39:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:39:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:39:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 20:40:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:40:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:18 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 20:40:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:40:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:22 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 20:40:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:40:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 20:40:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 20:40:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 20:40:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 20:40:27 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.239 | loss_v1 0 | loss_v2 0 | nll_loss 2.075 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.21 | score 0.7598 | wps 3307.9 | wpb 3202.1 | bsz 39.4 | num_updates 37000 | best_score 0.7627 2023-05-02 20:40:27 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 37000 updates 2023-05-02 20:40:27 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_37000.pt 2023-05-02 20:40:53 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_37000.pt 2023-05-02 20:41:07 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_37000.pt (epoch 7 @ 37000 updates, score 0.7598) (writing took 39.78311283607036 seconds) 2023-05-02 20:41:47 - progress_bar.py[line:274] - INFO: epoch 007: 821 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7713.4, nsentences=120, sample_size=3925.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=588.3, ups=0.08, wpb=7713.4, bsz=120, num_updates=37010, lr=1.23655e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=151719 2023-05-02 20:42:27 - progress_bar.py[line:274] - INFO: epoch 007: 831 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7678.2, nsentences=120, sample_size=4214.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1921.3, ups=0.25, wpb=7678.2, bsz=120, num_updates=37020, lr=1.23602e-05, gnorm=0.956, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=151759 2023-05-02 20:43:06 - progress_bar.py[line:274] - INFO: epoch 007: 841 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7489.7, nsentences=120, sample_size=3971.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1893.6, ups=0.25, wpb=7489.7, bsz=120, num_updates=37030, lr=1.2355e-05, gnorm=0.968, clip=20, loss_scale=128, train_wall=39, gb_free=29.9, wall=151799 2023-05-02 20:43:46 - progress_bar.py[line:274] - INFO: epoch 007: 851 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7594.8, nsentences=120, sample_size=4091.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1918.4, ups=0.25, wpb=7594.8, bsz=120, num_updates=37040, lr=1.23497e-05, gnorm=0.965, clip=30, loss_scale=128, train_wall=40, gb_free=30.6, wall=151838 2023-05-02 20:44:26 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 20:44:31 - progress_bar.py[line:274] - INFO: epoch 007: 862 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7822.9, nsentences=120, sample_size=4128.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1752.7, ups=0.22, wpb=7822.9, bsz=120, num_updates=37050, lr=1.23444e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=45, gb_free=29.9, wall=151883 2023-05-02 20:45:10 - progress_bar.py[line:274] - INFO: epoch 007: 872 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7695.5, nsentences=120, sample_size=3830.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1947.4, ups=0.25, wpb=7695.5, bsz=120, num_updates=37060, lr=1.23391e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=151922 2023-05-02 20:45:50 - progress_bar.py[line:274] - INFO: epoch 007: 882 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7568.4, nsentences=120, sample_size=4181.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1900.7, ups=0.25, wpb=7568.4, bsz=120, num_updates=37070, lr=1.23338e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=25.8, wall=151962 2023-05-02 20:46:30 - progress_bar.py[line:274] - INFO: epoch 007: 892 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7882.9, nsentences=120, sample_size=4068.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1978.1, ups=0.25, wpb=7882.9, bsz=120, num_updates=37080, lr=1.23286e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=28.4, wall=152002 2023-05-02 20:47:10 - progress_bar.py[line:274] - INFO: epoch 007: 902 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7464.7, nsentences=120, sample_size=4190.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1859.9, ups=0.25, wpb=7464.7, bsz=120, num_updates=37090, lr=1.23233e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=152042 2023-05-02 20:47:49 - progress_bar.py[line:274] - INFO: epoch 007: 912 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7657.4, nsentences=120, sample_size=4387.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1946.6, ups=0.25, wpb=7657.4, bsz=120, num_updates=37100, lr=1.2318e-05, gnorm=0.922, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=152082 2023-05-02 20:48:29 - progress_bar.py[line:274] - INFO: epoch 007: 922 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7369.2, nsentences=120, sample_size=4184.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1867.4, ups=0.25, wpb=7369.2, bsz=120, num_updates=37110, lr=1.23127e-05, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=152121 2023-05-02 20:49:09 - progress_bar.py[line:274] - INFO: epoch 007: 932 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7832.5, nsentences=120, sample_size=4237.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1928.6, ups=0.25, wpb=7832.5, bsz=120, num_updates=37120, lr=1.23074e-05, gnorm=0.963, clip=10, loss_scale=64, train_wall=41, gb_free=30.1, wall=152162 2023-05-02 20:49:49 - progress_bar.py[line:274] - INFO: epoch 007: 942 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7648.1, nsentences=120, sample_size=3967.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1917, ups=0.25, wpb=7648.1, bsz=120, num_updates=37130, lr=1.23021e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=152202 2023-05-02 20:50:29 - progress_bar.py[line:274] - INFO: epoch 007: 952 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7805.4, nsentences=120, sample_size=4200.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1954.8, ups=0.25, wpb=7805.4, bsz=120, num_updates=37140, lr=1.22969e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=152242 2023-05-02 20:51:09 - progress_bar.py[line:274] - INFO: epoch 007: 962 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7802.3, nsentences=120, sample_size=4316.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1952.2, ups=0.25, wpb=7802.3, bsz=120, num_updates=37150, lr=1.22916e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=152282 2023-05-02 20:51:49 - progress_bar.py[line:274] - INFO: epoch 007: 972 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7778.9, nsentences=120, sample_size=4245.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1946.5, ups=0.25, wpb=7778.9, bsz=120, num_updates=37160, lr=1.22863e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=28.5, wall=152321 2023-05-02 20:52:29 - progress_bar.py[line:274] - INFO: epoch 007: 982 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=8109.5, nsentences=120, sample_size=4365.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2003.9, ups=0.25, wpb=8109.5, bsz=120, num_updates=37170, lr=1.2281e-05, gnorm=0.926, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=152362 2023-05-02 20:53:10 - progress_bar.py[line:274] - INFO: epoch 007: 992 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7841.8, nsentences=120, sample_size=4299.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1948.7, ups=0.25, wpb=7841.8, bsz=120, num_updates=37180, lr=1.22757e-05, gnorm=0.95, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=152402 2023-05-02 20:53:49 - progress_bar.py[line:274] - INFO: epoch 007: 1002 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7684.2, nsentences=120, sample_size=4149.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1936.9, ups=0.25, wpb=7684.2, bsz=120, num_updates=37190, lr=1.22704e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=152442 2023-05-02 20:54:29 - progress_bar.py[line:274] - INFO: epoch 007: 1012 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7812.9, nsentences=120, sample_size=4075.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1959.6, ups=0.25, wpb=7812.9, bsz=120, num_updates=37200, lr=1.22652e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=152482 2023-05-02 20:55:10 - progress_bar.py[line:274] - INFO: epoch 007: 1022 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7709.3, nsentences=120, sample_size=3984.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1905.4, ups=0.25, wpb=7709.3, bsz=120, num_updates=37210, lr=1.22599e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=152522 2023-05-02 20:55:49 - progress_bar.py[line:274] - INFO: epoch 007: 1032 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7765.9, nsentences=120, sample_size=4215.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953.3, ups=0.25, wpb=7765.9, bsz=120, num_updates=37220, lr=1.22546e-05, gnorm=0.951, clip=0, loss_scale=64, train_wall=40, gb_free=31.2, wall=152562 2023-05-02 20:56:29 - progress_bar.py[line:274] - INFO: epoch 007: 1042 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7937.4, nsentences=120, sample_size=4019.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2023.2, ups=0.25, wpb=7937.4, bsz=120, num_updates=37230, lr=1.22493e-05, gnorm=0.982, clip=50, loss_scale=64, train_wall=39, gb_free=29.5, wall=152601 2023-05-02 20:57:08 - progress_bar.py[line:274] - INFO: epoch 007: 1052 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7634, nsentences=120, sample_size=4182.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1924.5, ups=0.25, wpb=7634, bsz=120, num_updates=37240, lr=1.2244e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=152641 2023-05-02 20:57:49 - progress_bar.py[line:274] - INFO: epoch 007: 1062 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7780.4, nsentences=120, sample_size=4199.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1920.5, ups=0.25, wpb=7780.4, bsz=120, num_updates=37250, lr=1.22388e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=152681 2023-05-02 20:58:29 - progress_bar.py[line:274] - INFO: epoch 007: 1072 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7955.6, nsentences=120, sample_size=4055.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1993.1, ups=0.25, wpb=7955.6, bsz=120, num_updates=37260, lr=1.22335e-05, gnorm=0.948, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=152721 2023-05-02 20:59:09 - progress_bar.py[line:274] - INFO: epoch 007: 1082 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7888.6, nsentences=120, sample_size=3786.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1970.9, ups=0.25, wpb=7888.6, bsz=120, num_updates=37270, lr=1.22282e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=152761 2023-05-02 20:59:50 - progress_bar.py[line:274] - INFO: epoch 007: 1092 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=8305.9, nsentences=120, sample_size=3945.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2029.8, ups=0.24, wpb=8305.9, bsz=120, num_updates=37280, lr=1.22229e-05, gnorm=0.999, clip=60, loss_scale=64, train_wall=41, gb_free=29.4, wall=152802 2023-05-02 21:00:30 - progress_bar.py[line:274] - INFO: epoch 007: 1102 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7791.6, nsentences=120, sample_size=4002.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1933.5, ups=0.25, wpb=7791.6, bsz=120, num_updates=37290, lr=1.22176e-05, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=152843 2023-05-02 21:01:10 - progress_bar.py[line:274] - INFO: epoch 007: 1112 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7594, nsentences=120, sample_size=4057.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.9, ups=0.25, wpb=7594, bsz=120, num_updates=37300, lr=1.22123e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=152882 2023-05-02 21:01:50 - progress_bar.py[line:274] - INFO: epoch 007: 1122 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7964.5, nsentences=120, sample_size=3744.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1990.3, ups=0.25, wpb=7964.5, bsz=120, num_updates=37310, lr=1.22071e-05, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=152922 2023-05-02 21:02:30 - progress_bar.py[line:274] - INFO: epoch 007: 1132 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7891, nsentences=120, sample_size=4174.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1973.9, ups=0.25, wpb=7891, bsz=120, num_updates=37320, lr=1.22018e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=152962 2023-05-02 21:03:10 - progress_bar.py[line:274] - INFO: epoch 007: 1142 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=8124.6, nsentences=120, sample_size=4006.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2016.2, ups=0.25, wpb=8124.6, bsz=120, num_updates=37330, lr=1.21965e-05, gnorm=0.959, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=153002 2023-05-02 21:03:50 - progress_bar.py[line:274] - INFO: epoch 007: 1152 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7761.8, nsentences=120, sample_size=4098.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1952.5, ups=0.25, wpb=7761.8, bsz=120, num_updates=37340, lr=1.21912e-05, gnorm=0.927, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=153042 2023-05-02 21:04:29 - progress_bar.py[line:274] - INFO: epoch 007: 1162 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7897.5, nsentences=120, sample_size=3877.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2018, ups=0.26, wpb=7897.5, bsz=120, num_updates=37350, lr=1.21859e-05, gnorm=1.012, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=153081 2023-05-02 21:05:08 - progress_bar.py[line:274] - INFO: epoch 007: 1172 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7894.7, nsentences=120, sample_size=4014, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2022.9, ups=0.26, wpb=7894.7, bsz=120, num_updates=37360, lr=1.21806e-05, gnorm=0.978, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=153120 2023-05-02 21:05:48 - progress_bar.py[line:274] - INFO: epoch 007: 1182 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7921.4, nsentences=120, sample_size=4360.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1988.1, ups=0.25, wpb=7921.4, bsz=120, num_updates=37370, lr=1.21754e-05, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=153160 2023-05-02 21:06:27 - progress_bar.py[line:274] - INFO: epoch 007: 1192 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7496, nsentences=120, sample_size=4266, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1892.6, ups=0.25, wpb=7496, bsz=120, num_updates=37380, lr=1.21701e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=153200 2023-05-02 21:07:06 - progress_bar.py[line:274] - INFO: epoch 007: 1202 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7769.2, nsentences=120, sample_size=4041.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1990.1, ups=0.26, wpb=7769.2, bsz=120, num_updates=37390, lr=1.21648e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=153239 2023-05-02 21:07:47 - progress_bar.py[line:274] - INFO: epoch 007: 1212 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7755, nsentences=120, sample_size=4282.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1911.9, ups=0.25, wpb=7755, bsz=120, num_updates=37400, lr=1.21595e-05, gnorm=0.915, clip=10, loss_scale=64, train_wall=40, gb_free=27.6, wall=153279 2023-05-02 21:08:28 - progress_bar.py[line:274] - INFO: epoch 007: 1222 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7952.4, nsentences=120, sample_size=4374.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1950.3, ups=0.25, wpb=7952.4, bsz=120, num_updates=37410, lr=1.21542e-05, gnorm=0.921, clip=20, loss_scale=64, train_wall=41, gb_free=28.6, wall=153320 2023-05-02 21:09:08 - progress_bar.py[line:274] - INFO: epoch 007: 1232 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7792.7, nsentences=120, sample_size=3816.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1924.5, ups=0.25, wpb=7792.7, bsz=120, num_updates=37420, lr=1.2149e-05, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=153361 2023-05-02 21:09:48 - progress_bar.py[line:274] - INFO: epoch 007: 1242 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=8069.7, nsentences=120, sample_size=3934.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2034.6, ups=0.25, wpb=8069.7, bsz=120, num_updates=37430, lr=1.21437e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=153400 2023-05-02 21:10:28 - progress_bar.py[line:274] - INFO: epoch 007: 1252 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7977.4, nsentences=120, sample_size=4372.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1998.2, ups=0.25, wpb=7977.4, bsz=120, num_updates=37440, lr=1.21384e-05, gnorm=0.923, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=153440 2023-05-02 21:11:07 - progress_bar.py[line:274] - INFO: epoch 007: 1262 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7514.6, nsentences=120, sample_size=4271, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1901, ups=0.25, wpb=7514.6, bsz=120, num_updates=37450, lr=1.21331e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=153480 2023-05-02 21:11:47 - progress_bar.py[line:274] - INFO: epoch 007: 1272 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7788.7, nsentences=120, sample_size=4129.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1965.3, ups=0.25, wpb=7788.7, bsz=120, num_updates=37460, lr=1.21278e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=153519 2023-05-02 21:12:27 - progress_bar.py[line:274] - INFO: epoch 007: 1282 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7928.1, nsentences=120, sample_size=4128.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1971.2, ups=0.25, wpb=7928.1, bsz=120, num_updates=37470, lr=1.21225e-05, gnorm=0.925, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=153560 2023-05-02 21:13:07 - progress_bar.py[line:274] - INFO: epoch 007: 1292 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7576.3, nsentences=120, sample_size=4247.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1896.8, ups=0.25, wpb=7576.3, bsz=120, num_updates=37480, lr=1.21173e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=153600 2023-05-02 21:13:47 - progress_bar.py[line:274] - INFO: epoch 007: 1302 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7935, nsentences=120, sample_size=3944.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1980.1, ups=0.25, wpb=7935, bsz=120, num_updates=37490, lr=1.2112e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=153640 2023-05-02 21:14:27 - progress_bar.py[line:274] - INFO: epoch 007: 1312 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7750.5, nsentences=120, sample_size=4064.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1926.1, ups=0.25, wpb=7750.5, bsz=120, num_updates=37500, lr=1.21067e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=153680 2023-05-02 21:15:07 - progress_bar.py[line:274] - INFO: epoch 007: 1322 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7821.5, nsentences=120, sample_size=3837.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1955.1, ups=0.25, wpb=7821.5, bsz=120, num_updates=37510, lr=1.21014e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=153720 2023-05-02 21:15:47 - progress_bar.py[line:274] - INFO: epoch 007: 1332 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7696.4, nsentences=120, sample_size=3861.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1937.7, ups=0.25, wpb=7696.4, bsz=120, num_updates=37520, lr=1.20961e-05, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=23.5, wall=153760 2023-05-02 21:16:27 - progress_bar.py[line:274] - INFO: epoch 007: 1342 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=8227.6, nsentences=120, sample_size=3986.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2049.3, ups=0.25, wpb=8227.6, bsz=120, num_updates=37530, lr=1.20909e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=153800 2023-05-02 21:17:07 - progress_bar.py[line:274] - INFO: epoch 007: 1352 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7670, nsentences=120, sample_size=3923.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949.4, ups=0.25, wpb=7670, bsz=120, num_updates=37540, lr=1.20856e-05, gnorm=0.958, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=153839 2023-05-02 21:17:46 - progress_bar.py[line:274] - INFO: epoch 007: 1362 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7456.2, nsentences=120, sample_size=3694.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1890.1, ups=0.25, wpb=7456.2, bsz=120, num_updates=37550, lr=1.20803e-05, gnorm=1.012, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=153879 2023-05-02 21:18:25 - progress_bar.py[line:274] - INFO: epoch 007: 1372 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7894.7, nsentences=120, sample_size=3745.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2014.1, ups=0.26, wpb=7894.7, bsz=120, num_updates=37560, lr=1.2075e-05, gnorm=0.999, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=153918 2023-05-02 21:19:06 - progress_bar.py[line:274] - INFO: epoch 007: 1382 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7654.5, nsentences=120, sample_size=4396.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1883, ups=0.25, wpb=7654.5, bsz=120, num_updates=37570, lr=1.20697e-05, gnorm=0.932, clip=20, loss_scale=128, train_wall=41, gb_free=29.6, wall=153958 2023-05-02 21:19:46 - progress_bar.py[line:274] - INFO: epoch 007: 1392 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7986, nsentences=120, sample_size=4002.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2010.4, ups=0.25, wpb=7986, bsz=120, num_updates=37580, lr=1.20644e-05, gnorm=0.981, clip=40, loss_scale=128, train_wall=40, gb_free=29.7, wall=153998 2023-05-02 21:20:26 - progress_bar.py[line:274] - INFO: epoch 007: 1402 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7586.5, nsentences=120, sample_size=3887.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1901.5, ups=0.25, wpb=7586.5, bsz=120, num_updates=37590, lr=1.20592e-05, gnorm=0.999, clip=50, loss_scale=128, train_wall=40, gb_free=29.8, wall=154038 2023-05-02 21:20:45 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 21:21:09 - progress_bar.py[line:274] - INFO: epoch 007: 1413 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7488.5, nsentences=120, sample_size=4046.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1709.4, ups=0.23, wpb=7488.5, bsz=120, num_updates=37600, lr=1.20539e-05, gnorm=0.968, clip=40, loss_scale=64, train_wall=44, gb_free=30.6, wall=154082 2023-05-02 21:21:49 - progress_bar.py[line:274] - INFO: epoch 007: 1423 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7898.9, nsentences=120, sample_size=3664.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1975.1, ups=0.25, wpb=7898.9, bsz=120, num_updates=37610, lr=1.20486e-05, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=154122 2023-05-02 21:22:30 - progress_bar.py[line:274] - INFO: epoch 007: 1433 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=8125.8, nsentences=120, sample_size=3980.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2014.3, ups=0.25, wpb=8125.8, bsz=120, num_updates=37620, lr=1.20433e-05, gnorm=0.96, clip=10, loss_scale=64, train_wall=40, gb_free=30.5, wall=154162 2023-05-02 21:23:09 - progress_bar.py[line:274] - INFO: epoch 007: 1443 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7624, nsentences=120, sample_size=3812.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1922.1, ups=0.25, wpb=7624, bsz=120, num_updates=37630, lr=1.2038e-05, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=28.9, wall=154202 2023-05-02 21:23:48 - progress_bar.py[line:274] - INFO: epoch 007: 1453 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7506.8, nsentences=120, sample_size=4143.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1929.8, ups=0.26, wpb=7506.8, bsz=120, num_updates=37640, lr=1.20327e-05, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=154241 2023-05-02 21:24:28 - progress_bar.py[line:274] - INFO: epoch 007: 1463 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=8202.3, nsentences=120, sample_size=4103.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2060.5, ups=0.25, wpb=8202.3, bsz=120, num_updates=37650, lr=1.20275e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=23.6, wall=154281 2023-05-02 21:25:08 - progress_bar.py[line:274] - INFO: epoch 007: 1473 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7936.2, nsentences=120, sample_size=4181.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2004.2, ups=0.25, wpb=7936.2, bsz=120, num_updates=37660, lr=1.20222e-05, gnorm=0.961, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=154320 2023-05-02 21:25:48 - progress_bar.py[line:274] - INFO: epoch 007: 1483 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7804, nsentences=120, sample_size=3910.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1955.5, ups=0.25, wpb=7804, bsz=120, num_updates=37670, lr=1.20169e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=154360 2023-05-02 21:26:27 - progress_bar.py[line:274] - INFO: epoch 007: 1493 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7534.5, nsentences=119.2, sample_size=4437.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1896.2, ups=0.25, wpb=7534.5, bsz=119.2, num_updates=37680, lr=1.20116e-05, gnorm=0.917, clip=0, loss_scale=64, train_wall=40, gb_free=28, wall=154400 2023-05-02 21:27:08 - progress_bar.py[line:274] - INFO: epoch 007: 1503 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=8068.1, nsentences=120, sample_size=3854.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2005.8, ups=0.25, wpb=8068.1, bsz=120, num_updates=37690, lr=1.20063e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=154440 2023-05-02 21:27:47 - progress_bar.py[line:274] - INFO: epoch 007: 1513 / 6042 loss=2.433, loss_v1=0, loss_v2=0, nll_loss=1.181, ntokens=7750, nsentences=120, sample_size=4324, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1976.9, ups=0.26, wpb=7750, bsz=120, num_updates=37700, lr=1.20011e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=154479 2023-05-02 21:28:26 - progress_bar.py[line:274] - INFO: epoch 007: 1523 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7320.5, nsentences=120, sample_size=4224.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1858.2, ups=0.25, wpb=7320.5, bsz=120, num_updates=37710, lr=1.19958e-05, gnorm=0.944, clip=0, loss_scale=64, train_wall=39, gb_free=29.8, wall=154519 2023-05-02 21:29:07 - progress_bar.py[line:274] - INFO: epoch 007: 1533 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7693.2, nsentences=120, sample_size=4211.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1895.4, ups=0.25, wpb=7693.2, bsz=120, num_updates=37720, lr=1.19905e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=41, gb_free=31.7, wall=154559 2023-05-02 21:29:46 - progress_bar.py[line:274] - INFO: epoch 007: 1543 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7941.4, nsentences=120, sample_size=4004.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2002.7, ups=0.25, wpb=7941.4, bsz=120, num_updates=37730, lr=1.19852e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=154599 2023-05-02 21:30:26 - progress_bar.py[line:274] - INFO: epoch 007: 1553 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7714.6, nsentences=120, sample_size=3887.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1931.9, ups=0.25, wpb=7714.6, bsz=120, num_updates=37740, lr=1.19799e-05, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=154639 2023-05-02 21:31:06 - progress_bar.py[line:274] - INFO: epoch 007: 1563 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7494.8, nsentences=120, sample_size=3994.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1893.5, ups=0.25, wpb=7494.8, bsz=120, num_updates=37750, lr=1.19746e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=27.8, wall=154678 2023-05-02 21:31:46 - progress_bar.py[line:274] - INFO: epoch 007: 1573 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7582.5, nsentences=120, sample_size=4174.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1885.9, ups=0.25, wpb=7582.5, bsz=120, num_updates=37760, lr=1.19694e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=154719 2023-05-02 21:32:27 - progress_bar.py[line:274] - INFO: epoch 007: 1583 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7775.2, nsentences=120, sample_size=4056.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917, ups=0.25, wpb=7775.2, bsz=120, num_updates=37770, lr=1.19641e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=154759 2023-05-02 21:33:08 - progress_bar.py[line:274] - INFO: epoch 007: 1593 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7750.1, nsentences=120, sample_size=4232.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1897.4, ups=0.24, wpb=7750.1, bsz=120, num_updates=37780, lr=1.19588e-05, gnorm=0.927, clip=10, loss_scale=64, train_wall=41, gb_free=30.3, wall=154800 2023-05-02 21:33:47 - progress_bar.py[line:274] - INFO: epoch 007: 1603 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7593.4, nsentences=120, sample_size=3694.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1924.4, ups=0.25, wpb=7593.4, bsz=120, num_updates=37790, lr=1.19535e-05, gnorm=1.01, clip=60, loss_scale=64, train_wall=39, gb_free=30.9, wall=154839 2023-05-02 21:34:28 - progress_bar.py[line:274] - INFO: epoch 007: 1613 / 6042 loss=2.451, loss_v1=0, loss_v2=0, nll_loss=1.205, ntokens=8034.1, nsentences=120, sample_size=3955.9, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=1973.3, ups=0.25, wpb=8034.1, bsz=120, num_updates=37800, lr=1.19482e-05, gnorm=0.966, clip=50, loss_scale=64, train_wall=41, gb_free=29.4, wall=154880 2023-05-02 21:35:07 - progress_bar.py[line:274] - INFO: epoch 007: 1623 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7897.9, nsentences=120, sample_size=4297.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2007.7, ups=0.25, wpb=7897.9, bsz=120, num_updates=37810, lr=1.1943e-05, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=31.4, wall=154919 2023-05-02 21:35:47 - progress_bar.py[line:274] - INFO: epoch 007: 1633 / 6042 loss=2.434, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7682.1, nsentences=120, sample_size=4034.9, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1934.3, ups=0.25, wpb=7682.1, bsz=120, num_updates=37820, lr=1.19377e-05, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=26.4, wall=154959 2023-05-02 21:36:26 - progress_bar.py[line:274] - INFO: epoch 007: 1643 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=8005.1, nsentences=120, sample_size=4113.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2015.1, ups=0.25, wpb=8005.1, bsz=120, num_updates=37830, lr=1.19324e-05, gnorm=0.959, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=154999 2023-05-02 21:37:06 - progress_bar.py[line:274] - INFO: epoch 007: 1653 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7704.7, nsentences=120, sample_size=4235.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1941.2, ups=0.25, wpb=7704.7, bsz=120, num_updates=37840, lr=1.19271e-05, gnorm=0.941, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=155039 2023-05-02 21:37:45 - progress_bar.py[line:274] - INFO: epoch 007: 1663 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7260.3, nsentences=120, sample_size=4024.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1851.2, ups=0.25, wpb=7260.3, bsz=120, num_updates=37850, lr=1.19218e-05, gnorm=0.951, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=155078 2023-05-02 21:38:25 - progress_bar.py[line:274] - INFO: epoch 007: 1673 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7824, nsentences=120, sample_size=3975.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1984.7, ups=0.25, wpb=7824, bsz=120, num_updates=37860, lr=1.19165e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=155117 2023-05-02 21:39:05 - progress_bar.py[line:274] - INFO: epoch 007: 1683 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7508.7, nsentences=120, sample_size=4328.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1873.5, ups=0.25, wpb=7508.7, bsz=120, num_updates=37870, lr=1.19113e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=155157 2023-05-02 21:39:44 - progress_bar.py[line:274] - INFO: epoch 007: 1693 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7730.5, nsentences=120, sample_size=3867.4, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1964.8, ups=0.25, wpb=7730.5, bsz=120, num_updates=37880, lr=1.1906e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=155197 2023-05-02 21:40:25 - progress_bar.py[line:274] - INFO: epoch 007: 1703 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7815.4, nsentences=120, sample_size=4235.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1930.8, ups=0.25, wpb=7815.4, bsz=120, num_updates=37890, lr=1.19007e-05, gnorm=0.948, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=155237 2023-05-02 21:41:05 - progress_bar.py[line:274] - INFO: epoch 007: 1713 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7775.1, nsentences=120, sample_size=3808.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1935.3, ups=0.25, wpb=7775.1, bsz=120, num_updates=37900, lr=1.18954e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=155277 2023-05-02 21:41:45 - progress_bar.py[line:274] - INFO: epoch 007: 1723 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7638.2, nsentences=120, sample_size=4285.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1896.2, ups=0.25, wpb=7638.2, bsz=120, num_updates=37910, lr=1.18901e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=155318 2023-05-02 21:42:26 - progress_bar.py[line:274] - INFO: epoch 007: 1733 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7824.6, nsentences=120, sample_size=3768.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1916.9, ups=0.24, wpb=7824.6, bsz=120, num_updates=37920, lr=1.18848e-05, gnorm=0.972, clip=40, loss_scale=64, train_wall=41, gb_free=30.4, wall=155358 2023-05-02 21:43:05 - progress_bar.py[line:274] - INFO: epoch 007: 1743 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7674.7, nsentences=120, sample_size=3909.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1943.2, ups=0.25, wpb=7674.7, bsz=120, num_updates=37930, lr=1.18796e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=155398 2023-05-02 21:43:45 - progress_bar.py[line:274] - INFO: epoch 007: 1753 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7812.5, nsentences=120, sample_size=4031.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1962.7, ups=0.25, wpb=7812.5, bsz=120, num_updates=37940, lr=1.18743e-05, gnorm=0.944, clip=10, loss_scale=64, train_wall=40, gb_free=28.6, wall=155438 2023-05-02 21:44:25 - progress_bar.py[line:274] - INFO: epoch 007: 1763 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7674.3, nsentences=120, sample_size=3916.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1941.3, ups=0.25, wpb=7674.3, bsz=120, num_updates=37950, lr=1.1869e-05, gnorm=0.987, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=155477 2023-05-02 21:45:04 - progress_bar.py[line:274] - INFO: epoch 007: 1773 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7756.2, nsentences=120, sample_size=3924.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.9, ups=0.25, wpb=7756.2, bsz=120, num_updates=37960, lr=1.18637e-05, gnorm=0.993, clip=50, loss_scale=64, train_wall=39, gb_free=27.6, wall=155517 2023-05-02 21:45:44 - progress_bar.py[line:274] - INFO: epoch 007: 1783 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7724.6, nsentences=120, sample_size=4102, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949, ups=0.25, wpb=7724.6, bsz=120, num_updates=37970, lr=1.18584e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=155556 2023-05-02 21:46:24 - progress_bar.py[line:274] - INFO: epoch 007: 1793 / 6042 loss=2.447, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7820.7, nsentences=120, sample_size=4049.3, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1945.7, ups=0.25, wpb=7820.7, bsz=120, num_updates=37980, lr=1.18532e-05, gnorm=0.933, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=155596 2023-05-02 21:47:05 - progress_bar.py[line:274] - INFO: epoch 007: 1803 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7582.6, nsentences=120, sample_size=4080.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1862.8, ups=0.25, wpb=7582.6, bsz=120, num_updates=37990, lr=1.18479e-05, gnorm=0.963, clip=30, loss_scale=64, train_wall=41, gb_free=29.5, wall=155637 2023-05-02 21:47:45 - progress_bar.py[line:274] - INFO: epoch 007: 1813 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7404.9, nsentences=120, sample_size=3877.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1854, ups=0.25, wpb=7404.9, bsz=120, num_updates=38000, lr=1.18426e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=155677 2023-05-02 21:47:45 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 21:47:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 21:47:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:47:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:47:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:47:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 21:48:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:48:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:16 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 21:48:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:48:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 21:48:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:48:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:31 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 21:48:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:48:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:36 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 21:48:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 21:48:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 21:48:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 21:48:36 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.239 | loss_v1 0 | loss_v2 0 | nll_loss 2.072 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.21 | score 0.7563 | wps 3284 | wpb 3202.1 | bsz 39.4 | num_updates 38000 | best_score 0.7627 2023-05-02 21:48:36 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 38000 updates 2023-05-02 21:48:36 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_38000.pt 2023-05-02 21:49:02 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_38000.pt 2023-05-02 21:49:16 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_38000.pt (epoch 7 @ 38000 updates, score 0.7563) (writing took 39.54747355496511 seconds) 2023-05-02 21:49:55 - progress_bar.py[line:274] - INFO: epoch 007: 1823 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7524.7, nsentences=120, sample_size=3998.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=577.6, ups=0.08, wpb=7524.7, bsz=120, num_updates=38010, lr=1.18373e-05, gnorm=0.989, clip=50, loss_scale=64, train_wall=39, gb_free=31.2, wall=155807 2023-05-02 21:50:34 - progress_bar.py[line:274] - INFO: epoch 007: 1833 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7826, nsentences=120, sample_size=4346.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1979.9, ups=0.25, wpb=7826, bsz=120, num_updates=38020, lr=1.1832e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=39, gb_free=29.2, wall=155847 2023-05-02 21:51:14 - progress_bar.py[line:274] - INFO: epoch 007: 1843 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7507.4, nsentences=120, sample_size=4159.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1881, ups=0.25, wpb=7507.4, bsz=120, num_updates=38030, lr=1.18267e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=155887 2023-05-02 21:51:54 - progress_bar.py[line:274] - INFO: epoch 007: 1853 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7694, nsentences=120, sample_size=4159.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1963.6, ups=0.26, wpb=7694, bsz=120, num_updates=38040, lr=1.18215e-05, gnorm=0.912, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=155926 2023-05-02 21:52:33 - progress_bar.py[line:274] - INFO: epoch 007: 1863 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7634.6, nsentences=120, sample_size=4075.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1915.5, ups=0.25, wpb=7634.6, bsz=120, num_updates=38050, lr=1.18162e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=155966 2023-05-02 21:53:13 - progress_bar.py[line:274] - INFO: epoch 007: 1873 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7497.6, nsentences=120, sample_size=4042.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1907.7, ups=0.25, wpb=7497.6, bsz=120, num_updates=38060, lr=1.18109e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=156005 2023-05-02 21:53:53 - progress_bar.py[line:274] - INFO: epoch 007: 1883 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7543.6, nsentences=120, sample_size=4116.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1870.8, ups=0.25, wpb=7543.6, bsz=120, num_updates=38070, lr=1.18056e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=156046 2023-05-02 21:54:33 - progress_bar.py[line:274] - INFO: epoch 007: 1893 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=8004.7, nsentences=120, sample_size=4225.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2028.7, ups=0.25, wpb=8004.7, bsz=120, num_updates=38080, lr=1.18003e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=39, gb_free=30.4, wall=156085 2023-05-02 21:55:12 - progress_bar.py[line:274] - INFO: epoch 007: 1903 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7650.2, nsentences=120, sample_size=4088.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1945.2, ups=0.25, wpb=7650.2, bsz=120, num_updates=38090, lr=1.17951e-05, gnorm=0.98, clip=10, loss_scale=64, train_wall=39, gb_free=30, wall=156124 2023-05-02 21:55:51 - progress_bar.py[line:274] - INFO: epoch 007: 1913 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7624, nsentences=120, sample_size=4119.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1927.7, ups=0.25, wpb=7624, bsz=120, num_updates=38100, lr=1.17898e-05, gnorm=0.962, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=156164 2023-05-02 21:56:32 - progress_bar.py[line:274] - INFO: epoch 007: 1923 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7950.3, nsentences=120, sample_size=4010, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1979, ups=0.25, wpb=7950.3, bsz=120, num_updates=38110, lr=1.17845e-05, gnorm=0.975, clip=20, loss_scale=128, train_wall=40, gb_free=29.4, wall=156204 2023-05-02 21:57:11 - progress_bar.py[line:274] - INFO: epoch 007: 1933 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7675.3, nsentences=120, sample_size=4170.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1947.5, ups=0.25, wpb=7675.3, bsz=120, num_updates=38120, lr=1.17792e-05, gnorm=0.937, clip=10, loss_scale=128, train_wall=39, gb_free=31, wall=156243 2023-05-02 21:57:51 - progress_bar.py[line:274] - INFO: epoch 007: 1943 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7671.5, nsentences=120, sample_size=3806.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908.4, ups=0.25, wpb=7671.5, bsz=120, num_updates=38130, lr=1.17739e-05, gnorm=1.017, clip=30, loss_scale=128, train_wall=40, gb_free=30.3, wall=156284 2023-05-02 21:58:30 - progress_bar.py[line:274] - INFO: epoch 007: 1953 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7458.2, nsentences=120, sample_size=4213.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1900.1, ups=0.25, wpb=7458.2, bsz=120, num_updates=38140, lr=1.17686e-05, gnorm=0.946, clip=20, loss_scale=128, train_wall=39, gb_free=31, wall=156323 2023-05-02 21:59:10 - progress_bar.py[line:274] - INFO: epoch 007: 1963 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7624.6, nsentences=120, sample_size=3989.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1935.7, ups=0.25, wpb=7624.6, bsz=120, num_updates=38150, lr=1.17634e-05, gnorm=0.976, clip=40, loss_scale=128, train_wall=39, gb_free=29.2, wall=156362 2023-05-02 21:59:49 - progress_bar.py[line:274] - INFO: epoch 007: 1973 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7474.8, nsentences=120, sample_size=3814.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1898, ups=0.25, wpb=7474.8, bsz=120, num_updates=38160, lr=1.17581e-05, gnorm=1.001, clip=30, loss_scale=128, train_wall=39, gb_free=30.6, wall=156402 2023-05-02 22:00:29 - progress_bar.py[line:274] - INFO: epoch 007: 1983 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7649, nsentences=120, sample_size=3916.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914.8, ups=0.25, wpb=7649, bsz=120, num_updates=38170, lr=1.17528e-05, gnorm=0.987, clip=40, loss_scale=128, train_wall=40, gb_free=30.2, wall=156442 2023-05-02 22:01:09 - progress_bar.py[line:274] - INFO: epoch 007: 1993 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7598.2, nsentences=120, sample_size=3854.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1896.1, ups=0.25, wpb=7598.2, bsz=120, num_updates=38180, lr=1.17475e-05, gnorm=1, clip=30, loss_scale=128, train_wall=40, gb_free=30.3, wall=156482 2023-05-02 22:01:46 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 22:01:54 - progress_bar.py[line:274] - INFO: epoch 007: 2004 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7603, nsentences=120, sample_size=3946.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1711.9, ups=0.23, wpb=7603, bsz=120, num_updates=38190, lr=1.17422e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=44, gb_free=30.2, wall=156526 2023-05-02 22:02:34 - progress_bar.py[line:274] - INFO: epoch 007: 2014 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7696.3, nsentences=120, sample_size=4072.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908.6, ups=0.25, wpb=7696.3, bsz=120, num_updates=38200, lr=1.17369e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=156566 2023-05-02 22:03:15 - progress_bar.py[line:274] - INFO: epoch 007: 2024 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7719.6, nsentences=120, sample_size=4439.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1902.7, ups=0.25, wpb=7719.6, bsz=120, num_updates=38210, lr=1.17317e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.3, wall=156607 2023-05-02 22:03:55 - progress_bar.py[line:274] - INFO: epoch 007: 2034 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=8064.4, nsentences=120, sample_size=3938.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1986.3, ups=0.25, wpb=8064.4, bsz=120, num_updates=38220, lr=1.17264e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=41, gb_free=30.8, wall=156648 2023-05-02 22:04:35 - progress_bar.py[line:274] - INFO: epoch 007: 2044 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7903.7, nsentences=120, sample_size=3997.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1967.1, ups=0.25, wpb=7903.7, bsz=120, num_updates=38230, lr=1.17211e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=156688 2023-05-02 22:05:14 - progress_bar.py[line:274] - INFO: epoch 007: 2054 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7951.3, nsentences=120, sample_size=3793.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2040.4, ups=0.26, wpb=7951.3, bsz=120, num_updates=38240, lr=1.17158e-05, gnorm=0.986, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=156727 2023-05-02 22:05:55 - progress_bar.py[line:274] - INFO: epoch 007: 2064 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7617.1, nsentences=120, sample_size=4093.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1891.1, ups=0.25, wpb=7617.1, bsz=120, num_updates=38250, lr=1.17105e-05, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=156767 2023-05-02 22:06:35 - progress_bar.py[line:274] - INFO: epoch 007: 2074 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7709.6, nsentences=120, sample_size=4088.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1924.7, ups=0.25, wpb=7709.6, bsz=120, num_updates=38260, lr=1.17053e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=156807 2023-05-02 22:07:14 - progress_bar.py[line:274] - INFO: epoch 007: 2084 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7841.3, nsentences=120, sample_size=4059.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2005.9, ups=0.26, wpb=7841.3, bsz=120, num_updates=38270, lr=1.17e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=28.4, wall=156846 2023-05-02 22:07:54 - progress_bar.py[line:274] - INFO: epoch 007: 2094 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7598.6, nsentences=120, sample_size=3920.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1895, ups=0.25, wpb=7598.6, bsz=120, num_updates=38280, lr=1.16947e-05, gnorm=1.012, clip=60, loss_scale=64, train_wall=40, gb_free=27.7, wall=156886 2023-05-02 22:08:34 - progress_bar.py[line:274] - INFO: epoch 007: 2104 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7721, nsentences=120, sample_size=4020.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1922.2, ups=0.25, wpb=7721, bsz=120, num_updates=38290, lr=1.16894e-05, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=156926 2023-05-02 22:09:14 - progress_bar.py[line:274] - INFO: epoch 007: 2114 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=8019.9, nsentences=120, sample_size=3750.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2017.9, ups=0.25, wpb=8019.9, bsz=120, num_updates=38300, lr=1.16841e-05, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=156966 2023-05-02 22:09:54 - progress_bar.py[line:274] - INFO: epoch 007: 2124 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7925.8, nsentences=120, sample_size=3987.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1945.4, ups=0.25, wpb=7925.8, bsz=120, num_updates=38310, lr=1.16788e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=41, gb_free=27.3, wall=157007 2023-05-02 22:10:35 - progress_bar.py[line:274] - INFO: epoch 007: 2134 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7766.1, nsentences=120, sample_size=3795.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1931.8, ups=0.25, wpb=7766.1, bsz=120, num_updates=38320, lr=1.16736e-05, gnorm=1.025, clip=50, loss_scale=64, train_wall=40, gb_free=28, wall=157047 2023-05-02 22:11:15 - progress_bar.py[line:274] - INFO: epoch 007: 2144 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7658.8, nsentences=120, sample_size=4132, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1907.4, ups=0.25, wpb=7658.8, bsz=120, num_updates=38330, lr=1.16683e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=31.6, wall=157087 2023-05-02 22:11:54 - progress_bar.py[line:274] - INFO: epoch 007: 2154 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7721, nsentences=120, sample_size=3880.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1980.1, ups=0.26, wpb=7721, bsz=120, num_updates=38340, lr=1.1663e-05, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=157126 2023-05-02 22:12:34 - progress_bar.py[line:274] - INFO: epoch 007: 2164 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7644.3, nsentences=120, sample_size=4090.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1892, ups=0.25, wpb=7644.3, bsz=120, num_updates=38350, lr=1.16577e-05, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=157167 2023-05-02 22:13:13 - progress_bar.py[line:274] - INFO: epoch 007: 2174 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7640.4, nsentences=120, sample_size=4018.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1954.1, ups=0.26, wpb=7640.4, bsz=120, num_updates=38360, lr=1.16524e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=39, gb_free=27.6, wall=157206 2023-05-02 22:13:53 - progress_bar.py[line:274] - INFO: epoch 007: 2184 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=8027, nsentences=120, sample_size=3804.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2010.1, ups=0.25, wpb=8027, bsz=120, num_updates=38370, lr=1.16472e-05, gnorm=0.991, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=157246 2023-05-02 22:14:32 - progress_bar.py[line:274] - INFO: epoch 007: 2194 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7530.5, nsentences=120, sample_size=4171, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1921.6, ups=0.26, wpb=7530.5, bsz=120, num_updates=38380, lr=1.16419e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=157285 2023-05-02 22:15:13 - progress_bar.py[line:274] - INFO: epoch 007: 2204 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7716.6, nsentences=120, sample_size=3952.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1920.7, ups=0.25, wpb=7716.6, bsz=120, num_updates=38390, lr=1.16366e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=157325 2023-05-02 22:15:52 - progress_bar.py[line:274] - INFO: epoch 007: 2214 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7506.1, nsentences=120, sample_size=3970, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1899.3, ups=0.25, wpb=7506.1, bsz=120, num_updates=38400, lr=1.16313e-05, gnorm=0.976, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=157365 2023-05-02 22:16:31 - progress_bar.py[line:274] - INFO: epoch 007: 2224 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7837.9, nsentences=120, sample_size=3957.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1993.9, ups=0.25, wpb=7837.9, bsz=120, num_updates=38410, lr=1.1626e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=28.9, wall=157404 2023-05-02 22:17:11 - progress_bar.py[line:274] - INFO: epoch 007: 2234 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.198, ntokens=7613.7, nsentences=120, sample_size=4316.6, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1911.7, ups=0.25, wpb=7613.7, bsz=120, num_updates=38420, lr=1.16207e-05, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=31.4, wall=157444 2023-05-02 22:17:51 - progress_bar.py[line:274] - INFO: epoch 007: 2244 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7381.1, nsentences=120, sample_size=4225.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1839.4, ups=0.25, wpb=7381.1, bsz=120, num_updates=38430, lr=1.16155e-05, gnorm=0.94, clip=0, loss_scale=64, train_wall=40, gb_free=29.9, wall=157484 2023-05-02 22:18:31 - progress_bar.py[line:274] - INFO: epoch 007: 2254 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7679.4, nsentences=120, sample_size=4048.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1917.6, ups=0.25, wpb=7679.4, bsz=120, num_updates=38440, lr=1.16102e-05, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=157524 2023-05-02 22:19:11 - progress_bar.py[line:274] - INFO: epoch 007: 2264 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7909.7, nsentences=120, sample_size=3932, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1979, ups=0.25, wpb=7909.7, bsz=120, num_updates=38450, lr=1.16049e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=157564 2023-05-02 22:19:51 - progress_bar.py[line:274] - INFO: epoch 007: 2274 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7903, nsentences=120, sample_size=4017.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2008.5, ups=0.25, wpb=7903, bsz=120, num_updates=38460, lr=1.15996e-05, gnorm=0.958, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=157603 2023-05-02 22:20:32 - progress_bar.py[line:274] - INFO: epoch 007: 2284 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7750.8, nsentences=120, sample_size=4013.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1877.6, ups=0.24, wpb=7750.8, bsz=120, num_updates=38470, lr=1.15943e-05, gnorm=0.95, clip=40, loss_scale=64, train_wall=41, gb_free=28.5, wall=157645 2023-05-02 22:21:12 - progress_bar.py[line:274] - INFO: epoch 007: 2294 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7551.9, nsentences=120, sample_size=4073.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1887.3, ups=0.25, wpb=7551.9, bsz=120, num_updates=38480, lr=1.1589e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=27.8, wall=157685 2023-05-02 22:21:51 - progress_bar.py[line:274] - INFO: epoch 007: 2304 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7619.9, nsentences=120, sample_size=4088.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1944.8, ups=0.26, wpb=7619.9, bsz=120, num_updates=38490, lr=1.15838e-05, gnorm=0.971, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=157724 2023-05-02 22:22:31 - progress_bar.py[line:274] - INFO: epoch 007: 2314 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7572, nsentences=120, sample_size=3808.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1909, ups=0.25, wpb=7572, bsz=120, num_updates=38500, lr=1.15785e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=157763 2023-05-02 22:23:10 - progress_bar.py[line:274] - INFO: epoch 007: 2324 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7621.5, nsentences=120, sample_size=3750.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1952.1, ups=0.26, wpb=7621.5, bsz=120, num_updates=38510, lr=1.15732e-05, gnorm=1.004, clip=40, loss_scale=64, train_wall=39, gb_free=29.3, wall=157802 2023-05-02 22:23:50 - progress_bar.py[line:274] - INFO: epoch 007: 2334 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7677.2, nsentences=120, sample_size=4167.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1899.8, ups=0.25, wpb=7677.2, bsz=120, num_updates=38520, lr=1.15679e-05, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=157843 2023-05-02 22:24:30 - progress_bar.py[line:274] - INFO: epoch 007: 2344 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7963.6, nsentences=120, sample_size=3766.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1996.3, ups=0.25, wpb=7963.6, bsz=120, num_updates=38530, lr=1.15626e-05, gnorm=0.984, clip=20, loss_scale=64, train_wall=40, gb_free=25.1, wall=157883 2023-05-02 22:25:10 - progress_bar.py[line:274] - INFO: epoch 007: 2354 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7661.4, nsentences=120, sample_size=3935.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.6, ups=0.25, wpb=7661.4, bsz=120, num_updates=38540, lr=1.15574e-05, gnorm=0.98, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=157923 2023-05-02 22:25:50 - progress_bar.py[line:274] - INFO: epoch 007: 2364 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=8018.3, nsentences=120, sample_size=3931.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2033.2, ups=0.25, wpb=8018.3, bsz=120, num_updates=38550, lr=1.15521e-05, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=157962 2023-05-02 22:26:30 - progress_bar.py[line:274] - INFO: epoch 007: 2374 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7646.6, nsentences=120, sample_size=3889.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1915.2, ups=0.25, wpb=7646.6, bsz=120, num_updates=38560, lr=1.15468e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=158002 2023-05-02 22:27:09 - progress_bar.py[line:274] - INFO: epoch 007: 2384 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7739.3, nsentences=120, sample_size=4064.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1948.4, ups=0.25, wpb=7739.3, bsz=120, num_updates=38570, lr=1.15415e-05, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=158042 2023-05-02 22:27:50 - progress_bar.py[line:274] - INFO: epoch 007: 2394 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7799.2, nsentences=120, sample_size=4161.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1919.2, ups=0.25, wpb=7799.2, bsz=120, num_updates=38580, lr=1.15362e-05, gnorm=0.959, clip=50, loss_scale=64, train_wall=41, gb_free=29.7, wall=158082 2023-05-02 22:28:30 - progress_bar.py[line:274] - INFO: epoch 007: 2404 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7880.9, nsentences=120, sample_size=3809.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1946.6, ups=0.25, wpb=7880.9, bsz=120, num_updates=38590, lr=1.15309e-05, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=158123 2023-05-02 22:29:10 - progress_bar.py[line:274] - INFO: epoch 007: 2414 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7774.7, nsentences=120, sample_size=4243, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1977.2, ups=0.25, wpb=7774.7, bsz=120, num_updates=38600, lr=1.15257e-05, gnorm=0.963, clip=40, loss_scale=64, train_wall=39, gb_free=30.7, wall=158162 2023-05-02 22:29:49 - progress_bar.py[line:274] - INFO: epoch 007: 2424 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7640.7, nsentences=120, sample_size=3824.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1929.6, ups=0.25, wpb=7640.7, bsz=120, num_updates=38610, lr=1.15204e-05, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=158202 2023-05-02 22:30:29 - progress_bar.py[line:274] - INFO: epoch 007: 2434 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7660.5, nsentences=120, sample_size=3975.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1909.6, ups=0.25, wpb=7660.5, bsz=120, num_updates=38620, lr=1.15151e-05, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=158242 2023-05-02 22:31:10 - progress_bar.py[line:274] - INFO: epoch 007: 2444 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=8373.2, nsentences=120, sample_size=3986.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2055.8, ups=0.25, wpb=8373.2, bsz=120, num_updates=38630, lr=1.15098e-05, gnorm=0.96, clip=20, loss_scale=64, train_wall=41, gb_free=28.6, wall=158283 2023-05-02 22:31:50 - progress_bar.py[line:274] - INFO: epoch 007: 2454 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7823.5, nsentences=120, sample_size=4341.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1952.2, ups=0.25, wpb=7823.5, bsz=120, num_updates=38640, lr=1.15045e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=158323 2023-05-02 22:32:31 - progress_bar.py[line:274] - INFO: epoch 007: 2464 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.19, ntokens=7939.3, nsentences=120, sample_size=4017.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1965, ups=0.25, wpb=7939.3, bsz=120, num_updates=38650, lr=1.14993e-05, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=29.5, wall=158363 2023-05-02 22:33:11 - progress_bar.py[line:274] - INFO: epoch 007: 2474 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7637.4, nsentences=120, sample_size=3854.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1913.8, ups=0.25, wpb=7637.4, bsz=120, num_updates=38660, lr=1.1494e-05, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=158403 2023-05-02 22:33:51 - progress_bar.py[line:274] - INFO: epoch 007: 2484 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7942.9, nsentences=120, sample_size=4143.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1973.7, ups=0.25, wpb=7942.9, bsz=120, num_updates=38670, lr=1.14887e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=158443 2023-05-02 22:34:30 - progress_bar.py[line:274] - INFO: epoch 007: 2494 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7315.7, nsentences=120, sample_size=4179.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1867.9, ups=0.26, wpb=7315.7, bsz=120, num_updates=38680, lr=1.14834e-05, gnorm=1.008, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=158482 2023-05-02 22:35:10 - progress_bar.py[line:274] - INFO: epoch 007: 2504 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7797.8, nsentences=120, sample_size=4169.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1951.1, ups=0.25, wpb=7797.8, bsz=120, num_updates=38690, lr=1.14781e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=158522 2023-05-02 22:35:50 - progress_bar.py[line:274] - INFO: epoch 007: 2514 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7798.9, nsentences=120, sample_size=3954.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1963.4, ups=0.25, wpb=7798.9, bsz=120, num_updates=38700, lr=1.14728e-05, gnorm=0.994, clip=40, loss_scale=128, train_wall=40, gb_free=28.5, wall=158562 2023-05-02 22:36:30 - progress_bar.py[line:274] - INFO: epoch 007: 2524 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=8169.7, nsentences=120, sample_size=4357.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2035, ups=0.25, wpb=8169.7, bsz=120, num_updates=38710, lr=1.14676e-05, gnorm=0.94, clip=10, loss_scale=128, train_wall=40, gb_free=28.4, wall=158602 2023-05-02 22:37:09 - progress_bar.py[line:274] - INFO: epoch 007: 2534 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7824.7, nsentences=120, sample_size=4236.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1979, ups=0.25, wpb=7824.7, bsz=120, num_updates=38720, lr=1.14623e-05, gnorm=0.964, clip=30, loss_scale=128, train_wall=39, gb_free=29.8, wall=158642 2023-05-02 22:37:13 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 22:37:53 - progress_bar.py[line:274] - INFO: epoch 007: 2545 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7682.1, nsentences=120, sample_size=4007.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1781.1, ups=0.23, wpb=7682.1, bsz=120, num_updates=38730, lr=1.1457e-05, gnorm=0.989, clip=30, loss_scale=64, train_wall=43, gb_free=28.4, wall=158685 2023-05-02 22:38:33 - progress_bar.py[line:274] - INFO: epoch 007: 2555 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7948.7, nsentences=120, sample_size=3562.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1951.7, ups=0.25, wpb=7948.7, bsz=120, num_updates=38740, lr=1.14517e-05, gnorm=0.989, clip=60, loss_scale=64, train_wall=41, gb_free=30.4, wall=158726 2023-05-02 22:39:13 - progress_bar.py[line:274] - INFO: epoch 007: 2565 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7501.6, nsentences=120, sample_size=4156, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1880.8, ups=0.25, wpb=7501.6, bsz=120, num_updates=38750, lr=1.14464e-05, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=158766 2023-05-02 22:39:53 - progress_bar.py[line:274] - INFO: epoch 007: 2575 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7716.5, nsentences=120, sample_size=3903.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1957.8, ups=0.25, wpb=7716.5, bsz=120, num_updates=38760, lr=1.14411e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=31.6, wall=158805 2023-05-02 22:40:32 - progress_bar.py[line:274] - INFO: epoch 007: 2585 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7899.9, nsentences=120, sample_size=3889.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1977.5, ups=0.25, wpb=7899.9, bsz=120, num_updates=38770, lr=1.14359e-05, gnorm=0.997, clip=50, loss_scale=64, train_wall=40, gb_free=31.3, wall=158845 2023-05-02 22:41:13 - progress_bar.py[line:274] - INFO: epoch 007: 2595 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7751.7, nsentences=120, sample_size=3670.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1914.3, ups=0.25, wpb=7751.7, bsz=120, num_updates=38780, lr=1.14306e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=158885 2023-05-02 22:41:53 - progress_bar.py[line:274] - INFO: epoch 007: 2605 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7749, nsentences=120, sample_size=4120.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1941.4, ups=0.25, wpb=7749, bsz=120, num_updates=38790, lr=1.14253e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=158925 2023-05-02 22:42:34 - progress_bar.py[line:274] - INFO: epoch 007: 2615 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7705.6, nsentences=120, sample_size=3885.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1884.4, ups=0.24, wpb=7705.6, bsz=120, num_updates=38800, lr=1.142e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=41, gb_free=28.6, wall=158966 2023-05-02 22:43:14 - progress_bar.py[line:274] - INFO: epoch 007: 2625 / 6042 loss=2.413, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7812.6, nsentences=120, sample_size=3824.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1925.2, ups=0.25, wpb=7812.6, bsz=120, num_updates=38810, lr=1.14147e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=31, wall=159007 2023-05-02 22:43:54 - progress_bar.py[line:274] - INFO: epoch 007: 2635 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7593.9, nsentences=120, sample_size=3942.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1928.3, ups=0.25, wpb=7593.9, bsz=120, num_updates=38820, lr=1.14095e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=27.9, wall=159046 2023-05-02 22:44:34 - progress_bar.py[line:274] - INFO: epoch 007: 2645 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7498.7, nsentences=120, sample_size=4209.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1884.4, ups=0.25, wpb=7498.7, bsz=120, num_updates=38830, lr=1.14042e-05, gnorm=0.971, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=159086 2023-05-02 22:45:13 - progress_bar.py[line:274] - INFO: epoch 007: 2655 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7713.1, nsentences=120, sample_size=4115.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1971, ups=0.26, wpb=7713.1, bsz=120, num_updates=38840, lr=1.13989e-05, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=31, wall=159125 2023-05-02 22:45:53 - progress_bar.py[line:274] - INFO: epoch 007: 2665 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7927.7, nsentences=120, sample_size=3895.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1977.6, ups=0.25, wpb=7927.7, bsz=120, num_updates=38850, lr=1.13936e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=159165 2023-05-02 22:46:32 - progress_bar.py[line:274] - INFO: epoch 007: 2675 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7461.7, nsentences=120, sample_size=3898.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1903.3, ups=0.26, wpb=7461.7, bsz=120, num_updates=38860, lr=1.13883e-05, gnorm=1.009, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=159204 2023-05-02 22:47:11 - progress_bar.py[line:274] - INFO: epoch 007: 2685 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7892.6, nsentences=120, sample_size=4162.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2010.2, ups=0.25, wpb=7892.6, bsz=120, num_updates=38870, lr=1.1383e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=159244 2023-05-02 22:47:52 - progress_bar.py[line:274] - INFO: epoch 007: 2695 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7892.6, nsentences=120, sample_size=4452.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1947.8, ups=0.25, wpb=7892.6, bsz=120, num_updates=38880, lr=1.13778e-05, gnorm=0.918, clip=0, loss_scale=64, train_wall=40, gb_free=30.3, wall=159284 2023-05-02 22:48:32 - progress_bar.py[line:274] - INFO: epoch 007: 2705 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=7861.6, nsentences=120, sample_size=3856.6, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1978.4, ups=0.25, wpb=7861.6, bsz=120, num_updates=38890, lr=1.13725e-05, gnorm=0.992, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=159324 2023-05-02 22:49:12 - progress_bar.py[line:274] - INFO: epoch 007: 2715 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7706, nsentences=120, sample_size=4146.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1919.1, ups=0.25, wpb=7706, bsz=120, num_updates=38900, lr=1.13672e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=159364 2023-05-02 22:49:51 - progress_bar.py[line:274] - INFO: epoch 007: 2725 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7813.8, nsentences=120, sample_size=3968.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1986.9, ups=0.25, wpb=7813.8, bsz=120, num_updates=38910, lr=1.13619e-05, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=159403 2023-05-02 22:50:31 - progress_bar.py[line:274] - INFO: epoch 007: 2735 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7418.8, nsentences=120, sample_size=4130.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1866.7, ups=0.25, wpb=7418.8, bsz=120, num_updates=38920, lr=1.13566e-05, gnorm=0.981, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=159443 2023-05-02 22:51:10 - progress_bar.py[line:274] - INFO: epoch 007: 2745 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7806.9, nsentences=120, sample_size=4150.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1988.7, ups=0.25, wpb=7806.9, bsz=120, num_updates=38930, lr=1.13514e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=39, gb_free=31.1, wall=159482 2023-05-02 22:51:49 - progress_bar.py[line:274] - INFO: epoch 007: 2755 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7564, nsentences=120, sample_size=4208.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1920.2, ups=0.25, wpb=7564, bsz=120, num_updates=38940, lr=1.13461e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=159522 2023-05-02 22:52:30 - progress_bar.py[line:274] - INFO: epoch 007: 2765 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7608.3, nsentences=120, sample_size=3906.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1886, ups=0.25, wpb=7608.3, bsz=120, num_updates=38950, lr=1.13408e-05, gnorm=0.985, clip=60, loss_scale=64, train_wall=40, gb_free=30.5, wall=159562 2023-05-02 22:53:10 - progress_bar.py[line:274] - INFO: epoch 007: 2775 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7863.4, nsentences=120, sample_size=3876.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1967, ups=0.25, wpb=7863.4, bsz=120, num_updates=38960, lr=1.13355e-05, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=159602 2023-05-02 22:53:49 - progress_bar.py[line:274] - INFO: epoch 007: 2785 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7739.5, nsentences=120, sample_size=4015.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1960.6, ups=0.25, wpb=7739.5, bsz=120, num_updates=38970, lr=1.13302e-05, gnorm=0.979, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=159642 2023-05-02 22:54:28 - progress_bar.py[line:274] - INFO: epoch 007: 2795 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7676.7, nsentences=120, sample_size=4085.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1959.9, ups=0.26, wpb=7676.7, bsz=120, num_updates=38980, lr=1.13249e-05, gnorm=0.976, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=159681 2023-05-02 22:55:09 - progress_bar.py[line:274] - INFO: epoch 007: 2805 / 6042 loss=2.427, loss_v1=0, loss_v2=0, nll_loss=1.185, ntokens=7657.4, nsentences=120, sample_size=4221.3, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1887.6, ups=0.25, wpb=7657.4, bsz=120, num_updates=38990, lr=1.13197e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=159721 2023-05-02 22:55:49 - progress_bar.py[line:274] - INFO: epoch 007: 2815 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7585, nsentences=120, sample_size=4286.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1896.2, ups=0.25, wpb=7585, bsz=120, num_updates=39000, lr=1.13144e-05, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=159761 2023-05-02 22:55:49 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-02 22:55:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 22:55:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:55:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:55:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:55:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:08 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 22:56:08 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:56:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:20 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 22:56:20 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:56:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 22:56:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:56:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:35 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-02 22:56:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:56:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-02 22:56:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-02 22:56:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-02 22:56:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-02 22:56:41 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.251 | loss_v1 0 | loss_v2 0 | nll_loss 2.086 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.25 | score 0.7539 | wps 3274.6 | wpb 3202.1 | bsz 39.4 | num_updates 39000 | best_score 0.7627 2023-05-02 22:56:41 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 39000 updates 2023-05-02 22:56:41 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_39000.pt 2023-05-02 22:57:05 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_39000.pt 2023-05-02 22:57:19 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_39000.pt (epoch 7 @ 39000 updates, score 0.7539) (writing took 38.38670942489989 seconds) 2023-05-02 22:57:58 - progress_bar.py[line:274] - INFO: epoch 007: 2825 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7603.8, nsentences=120, sample_size=4051.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=588.5, ups=0.08, wpb=7603.8, bsz=120, num_updates=39010, lr=1.13091e-05, gnorm=0.967, clip=10, loss_scale=64, train_wall=39, gb_free=29.8, wall=159891 2023-05-02 22:58:38 - progress_bar.py[line:274] - INFO: epoch 007: 2835 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7855.4, nsentences=120, sample_size=3950.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1976, ups=0.25, wpb=7855.4, bsz=120, num_updates=39020, lr=1.13038e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=27.6, wall=159930 2023-05-02 22:59:19 - progress_bar.py[line:274] - INFO: epoch 007: 2845 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7625.1, nsentences=120, sample_size=4033, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1877.5, ups=0.25, wpb=7625.1, bsz=120, num_updates=39030, lr=1.12985e-05, gnorm=0.957, clip=10, loss_scale=64, train_wall=41, gb_free=29.6, wall=159971 2023-05-02 22:59:58 - progress_bar.py[line:274] - INFO: epoch 007: 2855 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7377.1, nsentences=120, sample_size=4068.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1860.8, ups=0.25, wpb=7377.1, bsz=120, num_updates=39040, lr=1.12932e-05, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=160011 2023-05-02 23:00:38 - progress_bar.py[line:274] - INFO: epoch 007: 2865 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7864.7, nsentences=120, sample_size=3923.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1987.3, ups=0.25, wpb=7864.7, bsz=120, num_updates=39050, lr=1.1288e-05, gnorm=0.958, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=160050 2023-05-02 23:01:17 - progress_bar.py[line:274] - INFO: epoch 007: 2875 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7697, nsentences=120, sample_size=4029.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1953.4, ups=0.25, wpb=7697, bsz=120, num_updates=39060, lr=1.12827e-05, gnorm=0.967, clip=50, loss_scale=64, train_wall=39, gb_free=30.9, wall=160090 2023-05-02 23:01:56 - progress_bar.py[line:274] - INFO: epoch 007: 2885 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7481.5, nsentences=120, sample_size=3942.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1904.6, ups=0.25, wpb=7481.5, bsz=120, num_updates=39070, lr=1.12774e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=28.1, wall=160129 2023-05-02 23:02:37 - progress_bar.py[line:274] - INFO: epoch 007: 2895 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7707.6, nsentences=120, sample_size=4082.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1902.9, ups=0.25, wpb=7707.6, bsz=120, num_updates=39080, lr=1.12721e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=160169 2023-05-02 23:03:16 - progress_bar.py[line:274] - INFO: epoch 007: 2905 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7585, nsentences=120, sample_size=4057.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1921.7, ups=0.25, wpb=7585, bsz=120, num_updates=39090, lr=1.12668e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=160209 2023-05-02 23:03:55 - progress_bar.py[line:274] - INFO: epoch 007: 2915 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7660.4, nsentences=120, sample_size=3847, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1967.6, ups=0.26, wpb=7660.4, bsz=120, num_updates=39100, lr=1.12616e-05, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=160248 2023-05-02 23:04:35 - progress_bar.py[line:274] - INFO: epoch 007: 2925 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7391.8, nsentences=120, sample_size=4105.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1879.6, ups=0.25, wpb=7391.8, bsz=120, num_updates=39110, lr=1.12563e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=160287 2023-05-02 23:05:14 - progress_bar.py[line:274] - INFO: epoch 007: 2935 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7681.9, nsentences=120, sample_size=4127.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1946.4, ups=0.25, wpb=7681.9, bsz=120, num_updates=39120, lr=1.1251e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=39, gb_free=28.6, wall=160327 2023-05-02 23:05:55 - progress_bar.py[line:274] - INFO: epoch 007: 2945 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7792, nsentences=120, sample_size=4239.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1919.9, ups=0.25, wpb=7792, bsz=120, num_updates=39130, lr=1.12457e-05, gnorm=0.934, clip=20, loss_scale=64, train_wall=41, gb_free=28.7, wall=160367 2023-05-02 23:06:35 - progress_bar.py[line:274] - INFO: epoch 007: 2955 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7821.6, nsentences=120, sample_size=3882.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1933.4, ups=0.25, wpb=7821.6, bsz=120, num_updates=39140, lr=1.12404e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=160408 2023-05-02 23:07:14 - progress_bar.py[line:274] - INFO: epoch 007: 2965 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7274.5, nsentences=120, sample_size=4090.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1860.3, ups=0.26, wpb=7274.5, bsz=120, num_updates=39150, lr=1.12351e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=160447 2023-05-02 23:07:54 - progress_bar.py[line:274] - INFO: epoch 007: 2975 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7663.6, nsentences=120, sample_size=4065.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1937.5, ups=0.25, wpb=7663.6, bsz=120, num_updates=39160, lr=1.12299e-05, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=29, wall=160486 2023-05-02 23:08:34 - progress_bar.py[line:274] - INFO: epoch 007: 2985 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=7793.4, nsentences=120, sample_size=3909.4, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1962.7, ups=0.25, wpb=7793.4, bsz=120, num_updates=39170, lr=1.12246e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=160526 2023-05-02 23:09:13 - progress_bar.py[line:274] - INFO: epoch 007: 2995 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7984.8, nsentences=120, sample_size=3898.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2016.4, ups=0.25, wpb=7984.8, bsz=120, num_updates=39180, lr=1.12193e-05, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=160566 2023-05-02 23:09:54 - progress_bar.py[line:274] - INFO: epoch 007: 3005 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7638.7, nsentences=120, sample_size=4266, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1886.5, ups=0.25, wpb=7638.7, bsz=120, num_updates=39190, lr=1.1214e-05, gnorm=0.939, clip=0, loss_scale=64, train_wall=40, gb_free=31, wall=160606 2023-05-02 23:10:33 - progress_bar.py[line:274] - INFO: epoch 007: 3015 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.188, ntokens=7921.3, nsentences=120, sample_size=4006.1, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1997.8, ups=0.25, wpb=7921.3, bsz=120, num_updates=39200, lr=1.12087e-05, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=160646 2023-05-02 23:11:14 - progress_bar.py[line:274] - INFO: epoch 007: 3025 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7584.6, nsentences=120, sample_size=4124.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1880.3, ups=0.25, wpb=7584.6, bsz=120, num_updates=39210, lr=1.12035e-05, gnorm=0.945, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=160686 2023-05-02 23:11:53 - progress_bar.py[line:274] - INFO: epoch 007: 3035 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7920.7, nsentences=120, sample_size=3750.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1998.6, ups=0.25, wpb=7920.7, bsz=120, num_updates=39220, lr=1.11982e-05, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=160726 2023-05-02 23:12:33 - progress_bar.py[line:274] - INFO: epoch 007: 3045 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7652, nsentences=120, sample_size=3772.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1927.3, ups=0.25, wpb=7652, bsz=120, num_updates=39230, lr=1.11929e-05, gnorm=1.01, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=160765 2023-05-02 23:13:12 - progress_bar.py[line:274] - INFO: epoch 007: 3055 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7886.1, nsentences=120, sample_size=3541.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2007.5, ups=0.25, wpb=7886.1, bsz=120, num_updates=39240, lr=1.11876e-05, gnorm=1.027, clip=70, loss_scale=128, train_wall=39, gb_free=30.8, wall=160805 2023-05-02 23:13:53 - progress_bar.py[line:274] - INFO: epoch 007: 3065 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7575.8, nsentences=120, sample_size=3876, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1858.7, ups=0.25, wpb=7575.8, bsz=120, num_updates=39250, lr=1.11823e-05, gnorm=0.987, clip=40, loss_scale=128, train_wall=41, gb_free=30.1, wall=160845 2023-05-02 23:14:33 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-02 23:14:36 - progress_bar.py[line:274] - INFO: epoch 007: 3076 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7706.9, nsentences=120, sample_size=3818.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1777.7, ups=0.23, wpb=7706.9, bsz=120, num_updates=39260, lr=1.1177e-05, gnorm=0.997, clip=60, loss_scale=64, train_wall=43, gb_free=30, wall=160889 2023-05-02 23:15:17 - progress_bar.py[line:274] - INFO: epoch 007: 3086 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7713.1, nsentences=120, sample_size=4028, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1903.5, ups=0.25, wpb=7713.1, bsz=120, num_updates=39270, lr=1.11718e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=160929 2023-05-02 23:15:56 - progress_bar.py[line:274] - INFO: epoch 007: 3096 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7592, nsentences=120, sample_size=3737.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917.4, ups=0.25, wpb=7592, bsz=120, num_updates=39280, lr=1.11665e-05, gnorm=1.04, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=160969 2023-05-02 23:16:36 - progress_bar.py[line:274] - INFO: epoch 007: 3106 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7545.8, nsentences=120, sample_size=4267, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903, ups=0.25, wpb=7545.8, bsz=120, num_updates=39290, lr=1.11612e-05, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=161009 2023-05-02 23:17:15 - progress_bar.py[line:274] - INFO: epoch 007: 3116 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7722.3, nsentences=120, sample_size=3852.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1975.6, ups=0.26, wpb=7722.3, bsz=120, num_updates=39300, lr=1.11559e-05, gnorm=0.975, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=161048 2023-05-02 23:17:55 - progress_bar.py[line:274] - INFO: epoch 007: 3126 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7562.6, nsentences=120, sample_size=4165.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.9, ups=0.25, wpb=7562.6, bsz=120, num_updates=39310, lr=1.11506e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=161087 2023-05-02 23:18:35 - progress_bar.py[line:274] - INFO: epoch 007: 3136 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7271.2, nsentences=120, sample_size=4328.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1832.4, ups=0.25, wpb=7271.2, bsz=120, num_updates=39320, lr=1.11453e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=28.5, wall=161127 2023-05-02 23:19:15 - progress_bar.py[line:274] - INFO: epoch 007: 3146 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7677.1, nsentences=120, sample_size=4075.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1914, ups=0.25, wpb=7677.1, bsz=120, num_updates=39330, lr=1.11401e-05, gnorm=0.968, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=161167 2023-05-02 23:19:54 - progress_bar.py[line:274] - INFO: epoch 007: 3156 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7704.5, nsentences=120, sample_size=3964.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1965.3, ups=0.26, wpb=7704.5, bsz=120, num_updates=39340, lr=1.11348e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=161206 2023-05-02 23:20:34 - progress_bar.py[line:274] - INFO: epoch 007: 3166 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7605.4, nsentences=120, sample_size=4223.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1920.4, ups=0.25, wpb=7605.4, bsz=120, num_updates=39350, lr=1.11295e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=161246 2023-05-02 23:21:13 - progress_bar.py[line:274] - INFO: epoch 007: 3176 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7714.4, nsentences=120, sample_size=3971.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1955, ups=0.25, wpb=7714.4, bsz=120, num_updates=39360, lr=1.11242e-05, gnorm=0.984, clip=30, loss_scale=64, train_wall=39, gb_free=28.1, wall=161285 2023-05-02 23:21:52 - progress_bar.py[line:274] - INFO: epoch 007: 3186 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7585.7, nsentences=120, sample_size=4007.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1922.9, ups=0.25, wpb=7585.7, bsz=120, num_updates=39370, lr=1.11189e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=161325 2023-05-02 23:22:32 - progress_bar.py[line:274] - INFO: epoch 007: 3196 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7720.6, nsentences=120, sample_size=3891, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1951.2, ups=0.25, wpb=7720.6, bsz=120, num_updates=39380, lr=1.11137e-05, gnorm=0.999, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=161365 2023-05-02 23:23:12 - progress_bar.py[line:274] - INFO: epoch 007: 3206 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7735.4, nsentences=120, sample_size=3594.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1933.2, ups=0.25, wpb=7735.4, bsz=120, num_updates=39390, lr=1.11084e-05, gnorm=1.02, clip=70, loss_scale=64, train_wall=40, gb_free=29, wall=161405 2023-05-02 23:23:52 - progress_bar.py[line:274] - INFO: epoch 007: 3216 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7659.2, nsentences=120, sample_size=4383.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1921.7, ups=0.25, wpb=7659.2, bsz=120, num_updates=39400, lr=1.11031e-05, gnorm=0.926, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=161444 2023-05-02 23:24:31 - progress_bar.py[line:274] - INFO: epoch 007: 3226 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7701.1, nsentences=120, sample_size=4284.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1973, ups=0.26, wpb=7701.1, bsz=120, num_updates=39410, lr=1.10978e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=161483 2023-05-02 23:25:11 - progress_bar.py[line:274] - INFO: epoch 007: 3236 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7850.8, nsentences=120, sample_size=4153.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1981.2, ups=0.25, wpb=7850.8, bsz=120, num_updates=39420, lr=1.10925e-05, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=26, wall=161523 2023-05-02 23:25:51 - progress_bar.py[line:274] - INFO: epoch 007: 3246 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7796.1, nsentences=120, sample_size=3936.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1945, ups=0.25, wpb=7796.1, bsz=120, num_updates=39430, lr=1.10872e-05, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=31.5, wall=161563 2023-05-02 23:26:30 - progress_bar.py[line:274] - INFO: epoch 007: 3256 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7911.3, nsentences=120, sample_size=4111.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1991, ups=0.25, wpb=7911.3, bsz=120, num_updates=39440, lr=1.1082e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=40, gb_free=28, wall=161603 2023-05-02 23:27:10 - progress_bar.py[line:274] - INFO: epoch 007: 3266 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7405.7, nsentences=120, sample_size=4293.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1859.9, ups=0.25, wpb=7405.7, bsz=120, num_updates=39450, lr=1.10767e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=161643 2023-05-02 23:27:50 - progress_bar.py[line:274] - INFO: epoch 007: 3276 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7936.8, nsentences=120, sample_size=3962.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2013.2, ups=0.25, wpb=7936.8, bsz=120, num_updates=39460, lr=1.10714e-05, gnorm=0.983, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=161682 2023-05-02 23:28:30 - progress_bar.py[line:274] - INFO: epoch 007: 3286 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7640.3, nsentences=120, sample_size=3786.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1890.6, ups=0.25, wpb=7640.3, bsz=120, num_updates=39470, lr=1.10661e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=161723 2023-05-02 23:29:09 - progress_bar.py[line:274] - INFO: epoch 007: 3296 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7636.1, nsentences=120, sample_size=4034.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1950.7, ups=0.26, wpb=7636.1, bsz=120, num_updates=39480, lr=1.10608e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=161762 2023-05-02 23:29:49 - progress_bar.py[line:274] - INFO: epoch 007: 3306 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7733.6, nsentences=120, sample_size=3974.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1922.6, ups=0.25, wpb=7733.6, bsz=120, num_updates=39490, lr=1.10556e-05, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=161802 2023-05-02 23:30:30 - progress_bar.py[line:274] - INFO: epoch 007: 3316 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7688.3, nsentences=120, sample_size=3970.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1905.9, ups=0.25, wpb=7688.3, bsz=120, num_updates=39500, lr=1.10503e-05, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=161842 2023-05-02 23:31:09 - progress_bar.py[line:274] - INFO: epoch 007: 3326 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7800.5, nsentences=120, sample_size=3937.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1991, ups=0.26, wpb=7800.5, bsz=120, num_updates=39510, lr=1.1045e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=28.3, wall=161881 2023-05-02 23:31:49 - progress_bar.py[line:274] - INFO: epoch 007: 3336 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7681.2, nsentences=120, sample_size=4085.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1909.9, ups=0.25, wpb=7681.2, bsz=120, num_updates=39520, lr=1.10397e-05, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=161922 2023-05-02 23:32:29 - progress_bar.py[line:274] - INFO: epoch 007: 3346 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7695.1, nsentences=120, sample_size=3876.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1925.1, ups=0.25, wpb=7695.1, bsz=120, num_updates=39530, lr=1.10344e-05, gnorm=0.995, clip=50, loss_scale=64, train_wall=40, gb_free=28, wall=161962 2023-05-02 23:33:10 - progress_bar.py[line:274] - INFO: epoch 007: 3356 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7821.6, nsentences=120, sample_size=4344.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1914.2, ups=0.24, wpb=7821.6, bsz=120, num_updates=39540, lr=1.10291e-05, gnorm=0.935, clip=10, loss_scale=64, train_wall=41, gb_free=29.5, wall=162002 2023-05-02 23:33:50 - progress_bar.py[line:274] - INFO: epoch 007: 3366 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7969.3, nsentences=120, sample_size=3823.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1996.2, ups=0.25, wpb=7969.3, bsz=120, num_updates=39550, lr=1.10239e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=162042 2023-05-02 23:34:30 - progress_bar.py[line:274] - INFO: epoch 007: 3376 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7701.1, nsentences=120, sample_size=4103.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1922.7, ups=0.25, wpb=7701.1, bsz=120, num_updates=39560, lr=1.10186e-05, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=162082 2023-05-02 23:35:10 - progress_bar.py[line:274] - INFO: epoch 007: 3386 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7330.9, nsentences=120, sample_size=4081, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1843.5, ups=0.25, wpb=7330.9, bsz=120, num_updates=39570, lr=1.10133e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=162122 2023-05-02 23:35:49 - progress_bar.py[line:274] - INFO: epoch 007: 3396 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7655.4, nsentences=120, sample_size=3928.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1928.9, ups=0.25, wpb=7655.4, bsz=120, num_updates=39580, lr=1.1008e-05, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=162162 2023-05-02 23:36:30 - progress_bar.py[line:274] - INFO: epoch 007: 3406 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7624, nsentences=120, sample_size=4063.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1880.1, ups=0.25, wpb=7624, bsz=120, num_updates=39590, lr=1.10027e-05, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=162202 2023-05-02 23:37:10 - progress_bar.py[line:274] - INFO: epoch 007: 3416 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7818.5, nsentences=120, sample_size=3903.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968.6, ups=0.25, wpb=7818.5, bsz=120, num_updates=39600, lr=1.09974e-05, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=31.3, wall=162242 2023-05-02 23:37:50 - progress_bar.py[line:274] - INFO: epoch 007: 3426 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7491, nsentences=120, sample_size=4311.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1848.1, ups=0.25, wpb=7491, bsz=120, num_updates=39610, lr=1.09922e-05, gnorm=0.934, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=162283 2023-05-02 23:38:30 - progress_bar.py[line:274] - INFO: epoch 007: 3436 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7904.5, nsentences=120, sample_size=3901.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1999.6, ups=0.25, wpb=7904.5, bsz=120, num_updates=39620, lr=1.09869e-05, gnorm=0.981, clip=40, loss_scale=64, train_wall=39, gb_free=30.8, wall=162322 2023-05-02 23:39:10 - progress_bar.py[line:274] - INFO: epoch 007: 3446 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7630.4, nsentences=120, sample_size=3975.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1897.3, ups=0.25, wpb=7630.4, bsz=120, num_updates=39630, lr=1.09816e-05, gnorm=0.99, clip=60, loss_scale=64, train_wall=40, gb_free=29.7, wall=162362 2023-05-02 23:39:50 - progress_bar.py[line:274] - INFO: epoch 007: 3456 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=8192.4, nsentences=120, sample_size=4060.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2052.5, ups=0.25, wpb=8192.4, bsz=120, num_updates=39640, lr=1.09763e-05, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=162402 2023-05-02 23:40:31 - progress_bar.py[line:274] - INFO: epoch 007: 3466 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.193, ntokens=7982.2, nsentences=120, sample_size=4378.5, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=1927.1, ups=0.24, wpb=7982.2, bsz=120, num_updates=39650, lr=1.0971e-05, gnorm=0.93, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=162444 2023-05-02 23:41:10 - progress_bar.py[line:274] - INFO: epoch 007: 3476 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7510.2, nsentences=120, sample_size=3982.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1925.1, ups=0.26, wpb=7510.2, bsz=120, num_updates=39660, lr=1.09658e-05, gnorm=1.005, clip=60, loss_scale=64, train_wall=39, gb_free=31.2, wall=162483 2023-05-02 23:41:49 - progress_bar.py[line:274] - INFO: epoch 007: 3486 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7696.4, nsentences=120, sample_size=3624.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1972, ups=0.26, wpb=7696.4, bsz=120, num_updates=39670, lr=1.09605e-05, gnorm=0.987, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=162522 2023-05-02 23:42:29 - progress_bar.py[line:274] - INFO: epoch 007: 3496 / 6042 loss=2.44, loss_v1=0, loss_v2=0, nll_loss=1.194, ntokens=7874, nsentences=120, sample_size=3922.4, sample_size_v1=0, sample_size_v2=0, ppl=2.29, wps=2003.5, ups=0.25, wpb=7874, bsz=120, num_updates=39680, lr=1.09552e-05, gnorm=0.98, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=162561 2023-05-02 23:43:09 - progress_bar.py[line:274] - INFO: epoch 007: 3506 / 6042 loss=2.426, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=8051, nsentences=120, sample_size=3775.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1987.8, ups=0.25, wpb=8051, bsz=120, num_updates=39690, lr=1.09499e-05, gnorm=1.014, clip=50, loss_scale=64, train_wall=40, gb_free=28, wall=162602 2023-05-02 23:43:49 - progress_bar.py[line:274] - INFO: epoch 007: 3516 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7814.9, nsentences=120, sample_size=4115.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.9, ups=0.25, wpb=7814.9, bsz=120, num_updates=39700, lr=1.09446e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=162642 2023-05-02 23:44:21 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-02 23:44:33 - progress_bar.py[line:274] - INFO: epoch 007: 3527 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7708.2, nsentences=120, sample_size=3969.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1771.4, ups=0.23, wpb=7708.2, bsz=120, num_updates=39710, lr=1.09393e-05, gnorm=0.975, clip=30, loss_scale=32, train_wall=43, gb_free=31, wall=162685 2023-05-02 23:45:13 - progress_bar.py[line:274] - INFO: epoch 007: 3537 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7519.3, nsentences=120, sample_size=4100.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1875.7, ups=0.25, wpb=7519.3, bsz=120, num_updates=39720, lr=1.09341e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=162725 2023-05-02 23:45:54 - progress_bar.py[line:274] - INFO: epoch 007: 3547 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=8253.9, nsentences=120, sample_size=4028.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2017.1, ups=0.24, wpb=8253.9, bsz=120, num_updates=39730, lr=1.09288e-05, gnorm=0.956, clip=30, loss_scale=32, train_wall=41, gb_free=30, wall=162766 2023-05-02 23:46:34 - progress_bar.py[line:274] - INFO: epoch 007: 3557 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7600.2, nsentences=120, sample_size=3762.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1911.5, ups=0.25, wpb=7600.2, bsz=120, num_updates=39740, lr=1.09235e-05, gnorm=1.02, clip=60, loss_scale=32, train_wall=40, gb_free=29.8, wall=162806 2023-05-02 23:47:13 - progress_bar.py[line:274] - INFO: epoch 007: 3567 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7673.6, nsentences=120, sample_size=4144, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1929.8, ups=0.25, wpb=7673.6, bsz=120, num_updates=39750, lr=1.09182e-05, gnorm=0.946, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=162846 2023-05-02 23:47:53 - progress_bar.py[line:274] - INFO: epoch 007: 3577 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7632.7, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1939.3, ups=0.25, wpb=7632.7, bsz=120, num_updates=39760, lr=1.09129e-05, gnorm=0.946, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=162885 2023-05-02 23:48:33 - progress_bar.py[line:274] - INFO: epoch 007: 3587 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7536.5, nsentences=120, sample_size=4289, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1867.3, ups=0.25, wpb=7536.5, bsz=120, num_updates=39770, lr=1.09077e-05, gnorm=0.922, clip=0, loss_scale=32, train_wall=40, gb_free=30.1, wall=162926 2023-05-02 23:49:13 - progress_bar.py[line:274] - INFO: epoch 007: 3597 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7486.7, nsentences=120, sample_size=4103.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1885.8, ups=0.25, wpb=7486.7, bsz=120, num_updates=39780, lr=1.09024e-05, gnorm=0.984, clip=30, loss_scale=32, train_wall=40, gb_free=30, wall=162965 2023-05-02 23:49:52 - progress_bar.py[line:274] - INFO: epoch 007: 3607 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7560.3, nsentences=120, sample_size=3823.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1912, ups=0.25, wpb=7560.3, bsz=120, num_updates=39790, lr=1.08971e-05, gnorm=1.004, clip=60, loss_scale=32, train_wall=39, gb_free=29.4, wall=163005 2023-05-02 23:50:32 - progress_bar.py[line:274] - INFO: epoch 007: 3617 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7766.2, nsentences=120, sample_size=4318.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.8, ups=0.25, wpb=7766.2, bsz=120, num_updates=39800, lr=1.08918e-05, gnorm=0.955, clip=40, loss_scale=32, train_wall=40, gb_free=30, wall=163045 2023-05-02 23:51:12 - progress_bar.py[line:274] - INFO: epoch 007: 3627 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7567.4, nsentences=120, sample_size=4236.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1907.2, ups=0.25, wpb=7567.4, bsz=120, num_updates=39810, lr=1.08865e-05, gnorm=0.924, clip=10, loss_scale=32, train_wall=40, gb_free=31.7, wall=163084 2023-05-02 23:51:51 - progress_bar.py[line:274] - INFO: epoch 007: 3637 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7745.8, nsentences=120, sample_size=4130.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958.1, ups=0.25, wpb=7745.8, bsz=120, num_updates=39820, lr=1.08812e-05, gnorm=0.98, clip=60, loss_scale=32, train_wall=39, gb_free=29.7, wall=163124 2023-05-02 23:52:31 - progress_bar.py[line:274] - INFO: epoch 007: 3647 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7538, nsentences=120, sample_size=4253.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1911.7, ups=0.25, wpb=7538, bsz=120, num_updates=39830, lr=1.0876e-05, gnorm=0.965, clip=30, loss_scale=32, train_wall=39, gb_free=30.2, wall=163163 2023-05-02 23:53:11 - progress_bar.py[line:274] - INFO: epoch 007: 3657 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7750, nsentences=120, sample_size=4228.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1922.7, ups=0.25, wpb=7750, bsz=120, num_updates=39840, lr=1.08707e-05, gnorm=0.945, clip=10, loss_scale=32, train_wall=40, gb_free=30.4, wall=163204 2023-05-02 23:53:51 - progress_bar.py[line:274] - INFO: epoch 007: 3667 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=8010.6, nsentences=120, sample_size=3870.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2015.7, ups=0.25, wpb=8010.6, bsz=120, num_updates=39850, lr=1.08654e-05, gnorm=0.993, clip=50, loss_scale=32, train_wall=40, gb_free=28.8, wall=163243 2023-05-02 23:54:31 - progress_bar.py[line:274] - INFO: epoch 007: 3677 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7356.2, nsentences=120, sample_size=4328.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1832.8, ups=0.25, wpb=7356.2, bsz=120, num_updates=39860, lr=1.08601e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=29.5, wall=163283 2023-05-02 23:55:11 - progress_bar.py[line:274] - INFO: epoch 007: 3687 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7587.9, nsentences=120, sample_size=3849.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1913.3, ups=0.25, wpb=7587.9, bsz=120, num_updates=39870, lr=1.08548e-05, gnorm=0.983, clip=40, loss_scale=32, train_wall=40, gb_free=26.5, wall=163323 2023-05-02 23:55:50 - progress_bar.py[line:274] - INFO: epoch 007: 3697 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7410.2, nsentences=120, sample_size=4029.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1870.7, ups=0.25, wpb=7410.2, bsz=120, num_updates=39880, lr=1.08495e-05, gnorm=0.946, clip=10, loss_scale=32, train_wall=40, gb_free=29.7, wall=163363 2023-05-02 23:56:30 - progress_bar.py[line:274] - INFO: epoch 007: 3707 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7628.2, nsentences=120, sample_size=4012.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1930.3, ups=0.25, wpb=7628.2, bsz=120, num_updates=39890, lr=1.08443e-05, gnorm=1.005, clip=40, loss_scale=32, train_wall=39, gb_free=30.7, wall=163402 2023-05-02 23:57:09 - progress_bar.py[line:274] - INFO: epoch 007: 3717 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7905, nsentences=120, sample_size=3783.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2002.3, ups=0.25, wpb=7905, bsz=120, num_updates=39900, lr=1.0839e-05, gnorm=0.98, clip=50, loss_scale=32, train_wall=39, gb_free=29.7, wall=163442 2023-05-02 23:57:50 - progress_bar.py[line:274] - INFO: epoch 007: 3727 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7841.8, nsentences=120, sample_size=3952.9, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1929.4, ups=0.25, wpb=7841.8, bsz=120, num_updates=39910, lr=1.08337e-05, gnorm=0.969, clip=40, loss_scale=32, train_wall=41, gb_free=30.1, wall=163482 2023-05-02 23:58:30 - progress_bar.py[line:274] - INFO: epoch 007: 3737 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7758.9, nsentences=120, sample_size=3726.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1953, ups=0.25, wpb=7758.9, bsz=120, num_updates=39920, lr=1.08284e-05, gnorm=1.01, clip=60, loss_scale=32, train_wall=40, gb_free=29.7, wall=163522 2023-05-02 23:59:10 - progress_bar.py[line:274] - INFO: epoch 007: 3747 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7702.4, nsentences=120, sample_size=4213.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1907.9, ups=0.25, wpb=7702.4, bsz=120, num_updates=39930, lr=1.08231e-05, gnorm=0.953, clip=30, loss_scale=32, train_wall=40, gb_free=29, wall=163562 2023-05-02 23:59:50 - progress_bar.py[line:274] - INFO: epoch 007: 3757 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7852.7, nsentences=120, sample_size=3943.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1940.7, ups=0.25, wpb=7852.7, bsz=120, num_updates=39940, lr=1.08179e-05, gnorm=0.957, clip=30, loss_scale=32, train_wall=40, gb_free=30.7, wall=163603 2023-05-03 00:00:31 - progress_bar.py[line:274] - INFO: epoch 007: 3767 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=8175.3, nsentences=120, sample_size=3914.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2026.1, ups=0.25, wpb=8175.3, bsz=120, num_updates=39950, lr=1.08126e-05, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=163643 2023-05-03 00:01:11 - progress_bar.py[line:274] - INFO: epoch 007: 3777 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7623.4, nsentences=120, sample_size=3978.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1910, ups=0.25, wpb=7623.4, bsz=120, num_updates=39960, lr=1.08073e-05, gnorm=0.992, clip=50, loss_scale=32, train_wall=40, gb_free=30.7, wall=163683 2023-05-03 00:01:51 - progress_bar.py[line:274] - INFO: epoch 007: 3787 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7611.7, nsentences=120, sample_size=3793.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1908.7, ups=0.25, wpb=7611.7, bsz=120, num_updates=39970, lr=1.0802e-05, gnorm=1.011, clip=50, loss_scale=32, train_wall=40, gb_free=27.4, wall=163723 2023-05-03 00:02:30 - progress_bar.py[line:274] - INFO: epoch 007: 3797 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7736.9, nsentences=120, sample_size=4011.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1948.4, ups=0.25, wpb=7736.9, bsz=120, num_updates=39980, lr=1.07967e-05, gnorm=0.973, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=163763 2023-05-03 00:03:10 - progress_bar.py[line:274] - INFO: epoch 007: 3807 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7526.5, nsentences=120, sample_size=3918.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1886.8, ups=0.25, wpb=7526.5, bsz=120, num_updates=39990, lr=1.07914e-05, gnorm=1.004, clip=40, loss_scale=32, train_wall=40, gb_free=29, wall=163803 2023-05-03 00:03:50 - progress_bar.py[line:274] - INFO: epoch 007: 3817 / 6042 loss=2.441, loss_v1=0, loss_v2=0, nll_loss=1.201, ntokens=7987.4, nsentences=120, sample_size=3883, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2003.8, ups=0.25, wpb=7987.4, bsz=120, num_updates=40000, lr=1.07862e-05, gnorm=0.977, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=163843 2023-05-03 00:03:50 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 00:03:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 00:03:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:03:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:03:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:03:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 00:04:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:04:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 00:04:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:04:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 00:04:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:04:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:36 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 00:04:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:04:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 00:04:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 00:04:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 00:04:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 00:04:42 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.24 | loss_v1 0 | loss_v2 0 | nll_loss 2.075 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.21 | score 0.75 | wps 3299.8 | wpb 3202.1 | bsz 39.4 | num_updates 40000 | best_score 0.7627 2023-05-03 00:04:42 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 40000 updates 2023-05-03 00:04:42 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_40000.pt 2023-05-03 00:05:06 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_40000.pt 2023-05-03 00:05:20 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_40000.pt (epoch 7 @ 40000 updates, score 0.75) (writing took 38.70317239896394 seconds) 2023-05-03 00:06:00 - progress_bar.py[line:274] - INFO: epoch 007: 3827 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7750.8, nsentences=120, sample_size=4182.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=595.4, ups=0.08, wpb=7750.8, bsz=120, num_updates=40010, lr=1.07809e-05, gnorm=0.93, clip=10, loss_scale=32, train_wall=40, gb_free=24, wall=163973 2023-05-03 00:06:40 - progress_bar.py[line:274] - INFO: epoch 007: 3837 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7663.2, nsentences=120, sample_size=4153, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1941.8, ups=0.25, wpb=7663.2, bsz=120, num_updates=40020, lr=1.07756e-05, gnorm=0.951, clip=10, loss_scale=32, train_wall=39, gb_free=30.2, wall=164012 2023-05-03 00:07:19 - progress_bar.py[line:274] - INFO: epoch 007: 3847 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7610.2, nsentences=120, sample_size=4208.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1914.7, ups=0.25, wpb=7610.2, bsz=120, num_updates=40030, lr=1.07703e-05, gnorm=0.943, clip=0, loss_scale=32, train_wall=40, gb_free=30.3, wall=164052 2023-05-03 00:07:59 - progress_bar.py[line:274] - INFO: epoch 007: 3857 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7840.5, nsentences=120, sample_size=3807.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1984.7, ups=0.25, wpb=7840.5, bsz=120, num_updates=40040, lr=1.0765e-05, gnorm=0.982, clip=40, loss_scale=32, train_wall=39, gb_free=30.4, wall=164091 2023-05-03 00:08:39 - progress_bar.py[line:274] - INFO: epoch 007: 3867 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7631.1, nsentences=120, sample_size=4192.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1930.9, ups=0.25, wpb=7631.1, bsz=120, num_updates=40050, lr=1.07597e-05, gnorm=0.942, clip=20, loss_scale=32, train_wall=39, gb_free=28.7, wall=164131 2023-05-03 00:09:19 - progress_bar.py[line:274] - INFO: epoch 007: 3877 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7910.2, nsentences=120, sample_size=3899.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1963.9, ups=0.25, wpb=7910.2, bsz=120, num_updates=40060, lr=1.07545e-05, gnorm=0.958, clip=10, loss_scale=32, train_wall=40, gb_free=30.1, wall=164171 2023-05-03 00:09:58 - progress_bar.py[line:274] - INFO: epoch 007: 3887 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7459.2, nsentences=120, sample_size=4172.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1913, ups=0.26, wpb=7459.2, bsz=120, num_updates=40070, lr=1.07492e-05, gnorm=0.936, clip=20, loss_scale=32, train_wall=39, gb_free=30, wall=164210 2023-05-03 00:10:38 - progress_bar.py[line:274] - INFO: epoch 007: 3897 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7774.9, nsentences=120, sample_size=3971.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1947.4, ups=0.25, wpb=7774.9, bsz=120, num_updates=40080, lr=1.07439e-05, gnorm=0.976, clip=10, loss_scale=32, train_wall=40, gb_free=23.6, wall=164250 2023-05-03 00:11:17 - progress_bar.py[line:274] - INFO: epoch 007: 3907 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7704.5, nsentences=120, sample_size=3869.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1966.8, ups=0.26, wpb=7704.5, bsz=120, num_updates=40090, lr=1.07386e-05, gnorm=0.986, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=164289 2023-05-03 00:11:56 - progress_bar.py[line:274] - INFO: epoch 007: 3917 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7667.4, nsentences=120, sample_size=4009.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1950.8, ups=0.25, wpb=7667.4, bsz=120, num_updates=40100, lr=1.07333e-05, gnorm=0.975, clip=30, loss_scale=32, train_wall=39, gb_free=30.1, wall=164329 2023-05-03 00:12:35 - progress_bar.py[line:274] - INFO: epoch 007: 3927 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7643.5, nsentences=120, sample_size=4115.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1961.3, ups=0.26, wpb=7643.5, bsz=120, num_updates=40110, lr=1.07281e-05, gnorm=0.972, clip=50, loss_scale=32, train_wall=39, gb_free=29.8, wall=164368 2023-05-03 00:13:16 - progress_bar.py[line:274] - INFO: epoch 007: 3937 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7940.7, nsentences=120, sample_size=3806.2, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1967.1, ups=0.25, wpb=7940.7, bsz=120, num_updates=40120, lr=1.07228e-05, gnorm=0.983, clip=40, loss_scale=32, train_wall=40, gb_free=28.7, wall=164408 2023-05-03 00:13:55 - progress_bar.py[line:274] - INFO: epoch 007: 3947 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7537.3, nsentences=120, sample_size=3983, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1918.1, ups=0.25, wpb=7537.3, bsz=120, num_updates=40130, lr=1.07175e-05, gnorm=0.984, clip=40, loss_scale=32, train_wall=39, gb_free=30.1, wall=164447 2023-05-03 00:14:34 - progress_bar.py[line:274] - INFO: epoch 007: 3957 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7537.3, nsentences=120, sample_size=3973.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1907.6, ups=0.25, wpb=7537.3, bsz=120, num_updates=40140, lr=1.07122e-05, gnorm=0.977, clip=20, loss_scale=32, train_wall=39, gb_free=30.5, wall=164487 2023-05-03 00:15:14 - progress_bar.py[line:274] - INFO: epoch 007: 3967 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7929.6, nsentences=120, sample_size=3890.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1989, ups=0.25, wpb=7929.6, bsz=120, num_updates=40150, lr=1.07069e-05, gnorm=0.983, clip=40, loss_scale=32, train_wall=40, gb_free=30.3, wall=164527 2023-05-03 00:15:54 - progress_bar.py[line:274] - INFO: epoch 007: 3977 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7502.9, nsentences=120, sample_size=4156.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1872.2, ups=0.25, wpb=7502.9, bsz=120, num_updates=40160, lr=1.07016e-05, gnorm=0.96, clip=20, loss_scale=32, train_wall=40, gb_free=28.3, wall=164567 2023-05-03 00:16:34 - progress_bar.py[line:274] - INFO: epoch 007: 3987 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7695.9, nsentences=120, sample_size=4176.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1923.4, ups=0.25, wpb=7695.9, bsz=120, num_updates=40170, lr=1.06964e-05, gnorm=0.938, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=164607 2023-05-03 00:17:13 - progress_bar.py[line:274] - INFO: epoch 007: 3997 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7930.6, nsentences=120, sample_size=3827.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2043.2, ups=0.26, wpb=7930.6, bsz=120, num_updates=40180, lr=1.06911e-05, gnorm=0.956, clip=20, loss_scale=32, train_wall=39, gb_free=29.9, wall=164646 2023-05-03 00:17:53 - progress_bar.py[line:274] - INFO: epoch 007: 4007 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7751.4, nsentences=120, sample_size=3646.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1944.9, ups=0.25, wpb=7751.4, bsz=120, num_updates=40190, lr=1.06858e-05, gnorm=0.989, clip=60, loss_scale=32, train_wall=40, gb_free=31.4, wall=164685 2023-05-03 00:18:33 - progress_bar.py[line:274] - INFO: epoch 007: 4017 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7732.1, nsentences=120, sample_size=4045.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1924.1, ups=0.25, wpb=7732.1, bsz=120, num_updates=40200, lr=1.06805e-05, gnorm=0.979, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=164726 2023-05-03 00:19:13 - progress_bar.py[line:274] - INFO: epoch 007: 4027 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7799, nsentences=120, sample_size=4067.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1959.6, ups=0.25, wpb=7799, bsz=120, num_updates=40210, lr=1.06752e-05, gnorm=0.982, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=164765 2023-05-03 00:19:52 - progress_bar.py[line:274] - INFO: epoch 007: 4037 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7930.5, nsentences=120, sample_size=3832.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2011.4, ups=0.25, wpb=7930.5, bsz=120, num_updates=40220, lr=1.067e-05, gnorm=0.968, clip=20, loss_scale=64, train_wall=39, gb_free=29.3, wall=164805 2023-05-03 00:20:32 - progress_bar.py[line:274] - INFO: epoch 007: 4047 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7691.4, nsentences=120, sample_size=4132.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1959.9, ups=0.25, wpb=7691.4, bsz=120, num_updates=40230, lr=1.06647e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=28.4, wall=164844 2023-05-03 00:21:11 - progress_bar.py[line:274] - INFO: epoch 007: 4057 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7642.6, nsentences=120, sample_size=4232.7, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1959.9, ups=0.26, wpb=7642.6, bsz=120, num_updates=40240, lr=1.06594e-05, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=28.4, wall=164883 2023-05-03 00:21:51 - progress_bar.py[line:274] - INFO: epoch 007: 4067 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7657, nsentences=120, sample_size=4156.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1880.4, ups=0.25, wpb=7657, bsz=120, num_updates=40250, lr=1.06541e-05, gnorm=0.966, clip=30, loss_scale=64, train_wall=41, gb_free=29.9, wall=164924 2023-05-03 00:22:31 - progress_bar.py[line:274] - INFO: epoch 007: 4077 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7822, nsentences=120, sample_size=4137.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1964.5, ups=0.25, wpb=7822, bsz=120, num_updates=40260, lr=1.06488e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=164964 2023-05-03 00:23:10 - progress_bar.py[line:274] - INFO: epoch 007: 4087 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7426.5, nsentences=120, sample_size=3914.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1888.2, ups=0.25, wpb=7426.5, bsz=120, num_updates=40270, lr=1.06435e-05, gnorm=0.985, clip=30, loss_scale=64, train_wall=39, gb_free=31.4, wall=165003 2023-05-03 00:23:50 - progress_bar.py[line:274] - INFO: epoch 007: 4097 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7863.2, nsentences=120, sample_size=3860.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2012.8, ups=0.26, wpb=7863.2, bsz=120, num_updates=40280, lr=1.06383e-05, gnorm=1.009, clip=60, loss_scale=64, train_wall=39, gb_free=29.1, wall=165042 2023-05-03 00:24:29 - progress_bar.py[line:274] - INFO: epoch 007: 4107 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7934.1, nsentences=120, sample_size=4269.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1996.6, ups=0.25, wpb=7934.1, bsz=120, num_updates=40290, lr=1.0633e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=165082 2023-05-03 00:25:09 - progress_bar.py[line:274] - INFO: epoch 007: 4117 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7572.4, nsentences=120, sample_size=4080.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1887.3, ups=0.25, wpb=7572.4, bsz=120, num_updates=40300, lr=1.06277e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=27.6, wall=165122 2023-05-03 00:25:50 - progress_bar.py[line:274] - INFO: epoch 007: 4127 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7837.4, nsentences=120, sample_size=4003.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1943.2, ups=0.25, wpb=7837.4, bsz=120, num_updates=40310, lr=1.06224e-05, gnorm=0.986, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=165162 2023-05-03 00:26:30 - progress_bar.py[line:274] - INFO: epoch 007: 4137 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7885.2, nsentences=120, sample_size=4146.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1970.6, ups=0.25, wpb=7885.2, bsz=120, num_updates=40320, lr=1.06171e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=165202 2023-05-03 00:27:10 - progress_bar.py[line:274] - INFO: epoch 007: 4147 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7941.8, nsentences=120, sample_size=3921.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1973.3, ups=0.25, wpb=7941.8, bsz=120, num_updates=40330, lr=1.06118e-05, gnorm=0.975, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=165242 2023-05-03 00:27:50 - progress_bar.py[line:274] - INFO: epoch 007: 4157 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7818, nsentences=120, sample_size=4090.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1950.5, ups=0.25, wpb=7818, bsz=120, num_updates=40340, lr=1.06066e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=165283 2023-05-03 00:28:30 - progress_bar.py[line:274] - INFO: epoch 007: 4167 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7696.5, nsentences=120, sample_size=4108.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1940.1, ups=0.25, wpb=7696.5, bsz=120, num_updates=40350, lr=1.06013e-05, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=165322 2023-05-03 00:29:10 - progress_bar.py[line:274] - INFO: epoch 007: 4177 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7828.1, nsentences=120, sample_size=3942.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1952.9, ups=0.25, wpb=7828.1, bsz=120, num_updates=40360, lr=1.0596e-05, gnorm=0.978, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=165362 2023-05-03 00:29:50 - progress_bar.py[line:274] - INFO: epoch 007: 4187 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7790.7, nsentences=120, sample_size=3936, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1928.2, ups=0.25, wpb=7790.7, bsz=120, num_updates=40370, lr=1.05907e-05, gnorm=0.957, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=165403 2023-05-03 00:30:30 - progress_bar.py[line:274] - INFO: epoch 007: 4197 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7765.8, nsentences=120, sample_size=3971.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1963.6, ups=0.25, wpb=7765.8, bsz=120, num_updates=40380, lr=1.05854e-05, gnorm=0.974, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=165442 2023-05-03 00:31:10 - progress_bar.py[line:274] - INFO: epoch 007: 4207 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7876.8, nsentences=120, sample_size=4283.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949, ups=0.25, wpb=7876.8, bsz=120, num_updates=40390, lr=1.05802e-05, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=165483 2023-05-03 00:31:50 - progress_bar.py[line:274] - INFO: epoch 007: 4217 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7730.1, nsentences=120, sample_size=4130.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1938, ups=0.25, wpb=7730.1, bsz=120, num_updates=40400, lr=1.05749e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=165523 2023-05-03 00:32:30 - progress_bar.py[line:274] - INFO: epoch 007: 4227 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7819.6, nsentences=120, sample_size=4020.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1967.2, ups=0.25, wpb=7819.6, bsz=120, num_updates=40410, lr=1.05696e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=165562 2023-05-03 00:33:10 - progress_bar.py[line:274] - INFO: epoch 007: 4237 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7651.2, nsentences=120, sample_size=3917.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1930.7, ups=0.25, wpb=7651.2, bsz=120, num_updates=40420, lr=1.05643e-05, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=165602 2023-05-03 00:33:50 - progress_bar.py[line:274] - INFO: epoch 007: 4247 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7958.7, nsentences=120, sample_size=3867.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973.7, ups=0.25, wpb=7958.7, bsz=120, num_updates=40430, lr=1.0559e-05, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=28.5, wall=165642 2023-05-03 00:34:29 - progress_bar.py[line:274] - INFO: epoch 007: 4257 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7944.2, nsentences=120, sample_size=3794.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2041.6, ups=0.26, wpb=7944.2, bsz=120, num_updates=40440, lr=1.05537e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=27.7, wall=165681 2023-05-03 00:35:09 - progress_bar.py[line:274] - INFO: epoch 007: 4267 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7447.2, nsentences=120, sample_size=4113.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1839.4, ups=0.25, wpb=7447.2, bsz=120, num_updates=40450, lr=1.05485e-05, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=165722 2023-05-03 00:35:50 - progress_bar.py[line:274] - INFO: epoch 007: 4277 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7941.6, nsentences=120, sample_size=4144.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1967.4, ups=0.25, wpb=7941.6, bsz=120, num_updates=40460, lr=1.05432e-05, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=165762 2023-05-03 00:36:29 - progress_bar.py[line:274] - INFO: epoch 007: 4287 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7420, nsentences=120, sample_size=4268.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1863.4, ups=0.25, wpb=7420, bsz=120, num_updates=40470, lr=1.05379e-05, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=165802 2023-05-03 00:37:10 - progress_bar.py[line:274] - INFO: epoch 007: 4297 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7789.1, nsentences=120, sample_size=4023.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1933.9, ups=0.25, wpb=7789.1, bsz=120, num_updates=40480, lr=1.05326e-05, gnorm=0.977, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=165842 2023-05-03 00:37:49 - progress_bar.py[line:274] - INFO: epoch 007: 4307 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7292.5, nsentences=120, sample_size=4187, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1859.4, ups=0.25, wpb=7292.5, bsz=120, num_updates=40490, lr=1.05273e-05, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=165881 2023-05-03 00:38:30 - progress_bar.py[line:274] - INFO: epoch 007: 4317 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7682.8, nsentences=120, sample_size=4109.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1868.4, ups=0.24, wpb=7682.8, bsz=120, num_updates=40500, lr=1.05221e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=41, gb_free=29.4, wall=165923 2023-05-03 00:39:10 - progress_bar.py[line:274] - INFO: epoch 007: 4327 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7592.8, nsentences=120, sample_size=3990.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1897.6, ups=0.25, wpb=7592.8, bsz=120, num_updates=40510, lr=1.05168e-05, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=165963 2023-05-03 00:39:50 - progress_bar.py[line:274] - INFO: epoch 007: 4337 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7639.4, nsentences=120, sample_size=4220.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1935.4, ups=0.25, wpb=7639.4, bsz=120, num_updates=40520, lr=1.05115e-05, gnorm=0.939, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=166002 2023-05-03 00:40:29 - progress_bar.py[line:274] - INFO: epoch 007: 4347 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7711, nsentences=120, sample_size=4095.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1942.3, ups=0.25, wpb=7711, bsz=120, num_updates=40530, lr=1.05062e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=166042 2023-05-03 00:41:10 - progress_bar.py[line:274] - INFO: epoch 007: 4357 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=7637.5, nsentences=120, sample_size=3881.7, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1896.7, ups=0.25, wpb=7637.5, bsz=120, num_updates=40540, lr=1.05009e-05, gnorm=1.014, clip=50, loss_scale=64, train_wall=40, gb_free=31.1, wall=166082 2023-05-03 00:41:49 - progress_bar.py[line:274] - INFO: epoch 007: 4367 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7761.2, nsentences=120, sample_size=4110.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1959.1, ups=0.25, wpb=7761.2, bsz=120, num_updates=40550, lr=1.04956e-05, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=166122 2023-05-03 00:42:29 - progress_bar.py[line:274] - INFO: epoch 007: 4377 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7839.6, nsentences=120, sample_size=4147.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1959.6, ups=0.25, wpb=7839.6, bsz=120, num_updates=40560, lr=1.04904e-05, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=166162 2023-05-03 00:43:08 - progress_bar.py[line:274] - INFO: epoch 007: 4387 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7580, nsentences=120, sample_size=4147.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1939.2, ups=0.26, wpb=7580, bsz=120, num_updates=40570, lr=1.04851e-05, gnorm=0.946, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=166201 2023-05-03 00:43:48 - progress_bar.py[line:274] - INFO: epoch 007: 4397 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7455.3, nsentences=120, sample_size=4012.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1870.2, ups=0.25, wpb=7455.3, bsz=120, num_updates=40580, lr=1.04798e-05, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=166241 2023-05-03 00:44:28 - progress_bar.py[line:274] - INFO: epoch 007: 4407 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7898.5, nsentences=120, sample_size=4327.3, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1975.7, ups=0.25, wpb=7898.5, bsz=120, num_updates=40590, lr=1.04745e-05, gnorm=0.934, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=166281 2023-05-03 00:45:08 - progress_bar.py[line:274] - INFO: epoch 007: 4417 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7555.6, nsentences=120, sample_size=4230.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1875.8, ups=0.25, wpb=7555.6, bsz=120, num_updates=40600, lr=1.04692e-05, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=166321 2023-05-03 00:45:48 - progress_bar.py[line:274] - INFO: epoch 007: 4427 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7625.4, nsentences=120, sample_size=4115, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1928.6, ups=0.25, wpb=7625.4, bsz=120, num_updates=40610, lr=1.04639e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=39, gb_free=31.5, wall=166360 2023-05-03 00:46:28 - progress_bar.py[line:274] - INFO: epoch 007: 4437 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7978.3, nsentences=120, sample_size=3913.8, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1985.4, ups=0.25, wpb=7978.3, bsz=120, num_updates=40620, lr=1.04587e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=166401 2023-05-03 00:47:08 - progress_bar.py[line:274] - INFO: epoch 007: 4447 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7867.1, nsentences=120, sample_size=3901.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1983.5, ups=0.25, wpb=7867.1, bsz=120, num_updates=40630, lr=1.04534e-05, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=166440 2023-05-03 00:47:47 - progress_bar.py[line:274] - INFO: epoch 007: 4457 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7877.8, nsentences=120, sample_size=4034.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1994.9, ups=0.25, wpb=7877.8, bsz=120, num_updates=40640, lr=1.04481e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=39, gb_free=26.3, wall=166480 2023-05-03 00:48:28 - progress_bar.py[line:274] - INFO: epoch 007: 4467 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7736.8, nsentences=120, sample_size=4052.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1905.5, ups=0.25, wpb=7736.8, bsz=120, num_updates=40650, lr=1.04428e-05, gnorm=0.978, clip=50, loss_scale=64, train_wall=41, gb_free=29.7, wall=166520 2023-05-03 00:49:08 - progress_bar.py[line:274] - INFO: epoch 007: 4477 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7706.5, nsentences=120, sample_size=3846.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1930.6, ups=0.25, wpb=7706.5, bsz=120, num_updates=40660, lr=1.04375e-05, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=166560 2023-05-03 00:49:48 - progress_bar.py[line:274] - INFO: epoch 007: 4487 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7877.5, nsentences=120, sample_size=3814.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1954.8, ups=0.25, wpb=7877.5, bsz=120, num_updates=40670, lr=1.04323e-05, gnorm=1.014, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=166601 2023-05-03 00:50:28 - progress_bar.py[line:274] - INFO: epoch 007: 4497 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7993.6, nsentences=120, sample_size=4108.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2004.1, ups=0.25, wpb=7993.6, bsz=120, num_updates=40680, lr=1.0427e-05, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=166640 2023-05-03 00:51:09 - progress_bar.py[line:274] - INFO: epoch 007: 4507 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7657.5, nsentences=120, sample_size=4397.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1868.6, ups=0.24, wpb=7657.5, bsz=120, num_updates=40690, lr=1.04217e-05, gnorm=0.954, clip=20, loss_scale=64, train_wall=41, gb_free=29.8, wall=166681 2023-05-03 00:51:49 - progress_bar.py[line:274] - INFO: epoch 007: 4517 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7741.1, nsentences=120, sample_size=4200.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1947.1, ups=0.25, wpb=7741.1, bsz=120, num_updates=40700, lr=1.04164e-05, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=31, wall=166721 2023-05-03 00:52:28 - progress_bar.py[line:274] - INFO: epoch 007: 4527 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7620.8, nsentences=120, sample_size=3877.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1928.2, ups=0.25, wpb=7620.8, bsz=120, num_updates=40710, lr=1.04111e-05, gnorm=1.02, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=166761 2023-05-03 00:53:08 - progress_bar.py[line:274] - INFO: epoch 007: 4537 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=7953.2, nsentences=120, sample_size=4202.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1995.1, ups=0.25, wpb=7953.2, bsz=120, num_updates=40720, lr=1.04058e-05, gnorm=0.958, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=166801 2023-05-03 00:53:47 - progress_bar.py[line:274] - INFO: epoch 007: 4547 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7779.2, nsentences=120, sample_size=3935.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1986.9, ups=0.26, wpb=7779.2, bsz=120, num_updates=40730, lr=1.04006e-05, gnorm=0.987, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=166840 2023-05-03 00:53:59 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 00:54:31 - progress_bar.py[line:274] - INFO: epoch 007: 4558 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7846, nsentences=120, sample_size=4253.5, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1776.1, ups=0.23, wpb=7846, bsz=120, num_updates=40740, lr=1.03953e-05, gnorm=0.935, clip=30, loss_scale=64, train_wall=44, gb_free=27.6, wall=166884 2023-05-03 00:55:11 - progress_bar.py[line:274] - INFO: epoch 007: 4568 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7612.1, nsentences=120, sample_size=4001.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1925.9, ups=0.25, wpb=7612.1, bsz=120, num_updates=40750, lr=1.039e-05, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=166923 2023-05-03 00:55:51 - progress_bar.py[line:274] - INFO: epoch 007: 4578 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7816.8, nsentences=120, sample_size=3660.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1972.8, ups=0.25, wpb=7816.8, bsz=120, num_updates=40760, lr=1.03847e-05, gnorm=1.029, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=166963 2023-05-03 00:56:31 - progress_bar.py[line:274] - INFO: epoch 007: 4588 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7782.5, nsentences=120, sample_size=4102.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1932.9, ups=0.25, wpb=7782.5, bsz=120, num_updates=40770, lr=1.03794e-05, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=167003 2023-05-03 00:57:10 - progress_bar.py[line:274] - INFO: epoch 007: 4598 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7774.7, nsentences=120, sample_size=4102, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968.8, ups=0.25, wpb=7774.7, bsz=120, num_updates=40780, lr=1.03742e-05, gnorm=0.94, clip=10, loss_scale=64, train_wall=39, gb_free=29.5, wall=167043 2023-05-03 00:57:50 - progress_bar.py[line:274] - INFO: epoch 007: 4608 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7814.5, nsentences=120, sample_size=4013.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1963.6, ups=0.25, wpb=7814.5, bsz=120, num_updates=40790, lr=1.03689e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=167083 2023-05-03 00:58:30 - progress_bar.py[line:274] - INFO: epoch 007: 4618 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=8177, nsentences=120, sample_size=4050.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2035.7, ups=0.25, wpb=8177, bsz=120, num_updates=40800, lr=1.03636e-05, gnorm=0.951, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=167123 2023-05-03 00:59:09 - progress_bar.py[line:274] - INFO: epoch 007: 4628 / 6042 loss=2.421, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7657.9, nsentences=120, sample_size=4356.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1953.3, ups=0.26, wpb=7657.9, bsz=120, num_updates=40810, lr=1.03583e-05, gnorm=0.949, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=167162 2023-05-03 00:59:50 - progress_bar.py[line:274] - INFO: epoch 007: 4638 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7857.7, nsentences=120, sample_size=4082.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1923, ups=0.24, wpb=7857.7, bsz=120, num_updates=40820, lr=1.0353e-05, gnorm=0.953, clip=20, loss_scale=64, train_wall=41, gb_free=30.1, wall=167203 2023-05-03 01:00:30 - progress_bar.py[line:274] - INFO: epoch 007: 4648 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7871.2, nsentences=120, sample_size=4016.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1996.3, ups=0.25, wpb=7871.2, bsz=120, num_updates=40830, lr=1.03477e-05, gnorm=0.971, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=167242 2023-05-03 01:01:10 - progress_bar.py[line:274] - INFO: epoch 007: 4658 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7483.8, nsentences=120, sample_size=4070.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1859.4, ups=0.25, wpb=7483.8, bsz=120, num_updates=40840, lr=1.03425e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=26.9, wall=167282 2023-05-03 01:01:50 - progress_bar.py[line:274] - INFO: epoch 007: 4668 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7471.3, nsentences=120, sample_size=3775.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1878.8, ups=0.25, wpb=7471.3, bsz=120, num_updates=40850, lr=1.03372e-05, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=167322 2023-05-03 01:02:29 - progress_bar.py[line:274] - INFO: epoch 007: 4678 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7658.3, nsentences=120, sample_size=3667.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1933.6, ups=0.25, wpb=7658.3, bsz=120, num_updates=40860, lr=1.03319e-05, gnorm=0.995, clip=60, loss_scale=64, train_wall=40, gb_free=29, wall=167362 2023-05-03 01:03:10 - progress_bar.py[line:274] - INFO: epoch 007: 4688 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7691.5, nsentences=120, sample_size=4460.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1879.1, ups=0.24, wpb=7691.5, bsz=120, num_updates=40870, lr=1.03266e-05, gnorm=0.941, clip=20, loss_scale=64, train_wall=41, gb_free=28.3, wall=167403 2023-05-03 01:03:50 - progress_bar.py[line:274] - INFO: epoch 007: 4698 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7633.2, nsentences=120, sample_size=4287.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1943.3, ups=0.25, wpb=7633.2, bsz=120, num_updates=40880, lr=1.03213e-05, gnorm=0.942, clip=10, loss_scale=64, train_wall=39, gb_free=29.4, wall=167442 2023-05-03 01:04:29 - progress_bar.py[line:274] - INFO: epoch 007: 4708 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7980.8, nsentences=120, sample_size=4023.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2002.7, ups=0.25, wpb=7980.8, bsz=120, num_updates=40890, lr=1.0316e-05, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=167482 2023-05-03 01:05:09 - progress_bar.py[line:274] - INFO: epoch 007: 4718 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7427.9, nsentences=120, sample_size=4048.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1871.3, ups=0.25, wpb=7427.9, bsz=120, num_updates=40900, lr=1.03108e-05, gnorm=1.009, clip=40, loss_scale=64, train_wall=40, gb_free=28.5, wall=167522 2023-05-03 01:05:48 - progress_bar.py[line:274] - INFO: epoch 007: 4728 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7622.1, nsentences=120, sample_size=4058.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1941.3, ups=0.25, wpb=7622.1, bsz=120, num_updates=40910, lr=1.03055e-05, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=167561 2023-05-03 01:06:28 - progress_bar.py[line:274] - INFO: epoch 007: 4738 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7695.1, nsentences=120, sample_size=3994.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1965.6, ups=0.26, wpb=7695.1, bsz=120, num_updates=40920, lr=1.03002e-05, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=167600 2023-05-03 01:07:07 - progress_bar.py[line:274] - INFO: epoch 007: 4748 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7516.9, nsentences=120, sample_size=3988.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1889.6, ups=0.25, wpb=7516.9, bsz=120, num_updates=40930, lr=1.02949e-05, gnorm=0.944, clip=0, loss_scale=64, train_wall=40, gb_free=31.4, wall=167640 2023-05-03 01:07:47 - progress_bar.py[line:274] - INFO: epoch 007: 4758 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7856.9, nsentences=120, sample_size=4323, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1996, ups=0.25, wpb=7856.9, bsz=120, num_updates=40940, lr=1.02896e-05, gnorm=0.948, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=167679 2023-05-03 01:08:26 - progress_bar.py[line:274] - INFO: epoch 007: 4768 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7611.3, nsentences=120, sample_size=4173.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1914.5, ups=0.25, wpb=7611.3, bsz=120, num_updates=40950, lr=1.02844e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=28.2, wall=167719 2023-05-03 01:09:07 - progress_bar.py[line:274] - INFO: epoch 007: 4778 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7903.1, nsentences=120, sample_size=4261.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1944.5, ups=0.25, wpb=7903.1, bsz=120, num_updates=40960, lr=1.02791e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=41, gb_free=30.4, wall=167760 2023-05-03 01:09:47 - progress_bar.py[line:274] - INFO: epoch 007: 4788 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7657.1, nsentences=120, sample_size=3857.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1903.9, ups=0.25, wpb=7657.1, bsz=120, num_updates=40970, lr=1.02738e-05, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=167800 2023-05-03 01:10:28 - progress_bar.py[line:274] - INFO: epoch 007: 4798 / 6042 loss=2.442, loss_v1=0, loss_v2=0, nll_loss=1.199, ntokens=8110.8, nsentences=120, sample_size=3924.5, sample_size_v1=0, sample_size_v2=0, ppl=2.3, wps=2013.7, ups=0.25, wpb=8110.8, bsz=120, num_updates=40980, lr=1.02685e-05, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=167840 2023-05-03 01:11:08 - progress_bar.py[line:274] - INFO: epoch 007: 4808 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7451.9, nsentences=120, sample_size=4200.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1845.3, ups=0.25, wpb=7451.9, bsz=120, num_updates=40990, lr=1.02632e-05, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=167880 2023-05-03 01:11:48 - progress_bar.py[line:274] - INFO: epoch 007: 4818 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7592.3, nsentences=120, sample_size=4252.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1890.2, ups=0.25, wpb=7592.3, bsz=120, num_updates=41000, lr=1.02579e-05, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=167921 2023-05-03 01:11:48 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 01:11:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 01:11:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:11:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:11:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:11:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 01:12:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:12:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 01:12:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:12:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 01:12:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:12:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:34 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 01:12:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:12:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:39 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 01:12:39 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 01:12:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 01:12:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 01:12:40 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.246 | loss_v1 0 | loss_v2 0 | nll_loss 2.08 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.23 | score 0.748 | wps 3289 | wpb 3202.1 | bsz 39.4 | num_updates 41000 | best_score 0.7627 2023-05-03 01:12:40 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 41000 updates 2023-05-03 01:12:40 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_41000.pt 2023-05-03 01:13:04 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_41000.pt 2023-05-03 01:13:18 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_41000.pt (epoch 7 @ 41000 updates, score 0.748) (writing took 38.043200745014474 seconds) 2023-05-03 01:13:57 - progress_bar.py[line:274] - INFO: epoch 007: 4828 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7799.4, nsentences=120, sample_size=4243.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=603.8, ups=0.08, wpb=7799.4, bsz=120, num_updates=41010, lr=1.02527e-05, gnorm=0.942, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=168050 2023-05-03 01:14:37 - progress_bar.py[line:274] - INFO: epoch 007: 4838 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7802.2, nsentences=120, sample_size=4040.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1943.8, ups=0.25, wpb=7802.2, bsz=120, num_updates=41020, lr=1.02474e-05, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=168090 2023-05-03 01:15:17 - progress_bar.py[line:274] - INFO: epoch 007: 4848 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7929.3, nsentences=120, sample_size=3714.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1986.1, ups=0.25, wpb=7929.3, bsz=120, num_updates=41030, lr=1.02421e-05, gnorm=1.017, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=168130 2023-05-03 01:15:57 - progress_bar.py[line:274] - INFO: epoch 007: 4858 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7630.5, nsentences=120, sample_size=3814.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1918, ups=0.25, wpb=7630.5, bsz=120, num_updates=41040, lr=1.02368e-05, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=168170 2023-05-03 01:16:38 - progress_bar.py[line:274] - INFO: epoch 007: 4868 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7421.6, nsentences=120, sample_size=4095.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1837.4, ups=0.25, wpb=7421.6, bsz=120, num_updates=41050, lr=1.02315e-05, gnorm=1.004, clip=50, loss_scale=64, train_wall=40, gb_free=29, wall=168210 2023-05-03 01:17:17 - progress_bar.py[line:274] - INFO: epoch 007: 4878 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7877.5, nsentences=120, sample_size=4225.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1996.7, ups=0.25, wpb=7877.5, bsz=120, num_updates=41060, lr=1.02263e-05, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=168249 2023-05-03 01:17:56 - progress_bar.py[line:274] - INFO: epoch 007: 4888 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7621.3, nsentences=120, sample_size=4168, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949.2, ups=0.26, wpb=7621.3, bsz=120, num_updates=41070, lr=1.0221e-05, gnorm=0.972, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=168289 2023-05-03 01:18:35 - progress_bar.py[line:274] - INFO: epoch 007: 4898 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7809.5, nsentences=120, sample_size=4221.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1988.3, ups=0.25, wpb=7809.5, bsz=120, num_updates=41080, lr=1.02157e-05, gnorm=0.95, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=168328 2023-05-03 01:19:14 - progress_bar.py[line:274] - INFO: epoch 007: 4908 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7322.3, nsentences=120, sample_size=3878.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1884, ups=0.26, wpb=7322.3, bsz=120, num_updates=41090, lr=1.02104e-05, gnorm=1.007, clip=50, loss_scale=64, train_wall=39, gb_free=28.7, wall=168367 2023-05-03 01:19:55 - progress_bar.py[line:274] - INFO: epoch 007: 4918 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7867.8, nsentences=120, sample_size=4163.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1952.5, ups=0.25, wpb=7867.8, bsz=120, num_updates=41100, lr=1.02051e-05, gnorm=0.984, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=168407 2023-05-03 01:20:34 - progress_bar.py[line:274] - INFO: epoch 007: 4928 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=8126, nsentences=120, sample_size=3966.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2048.2, ups=0.25, wpb=8126, bsz=120, num_updates=41110, lr=1.01998e-05, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=168447 2023-05-03 01:21:14 - progress_bar.py[line:274] - INFO: epoch 007: 4938 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7819.7, nsentences=120, sample_size=4088.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1990.8, ups=0.25, wpb=7819.7, bsz=120, num_updates=41120, lr=1.01946e-05, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=168486 2023-05-03 01:21:53 - progress_bar.py[line:274] - INFO: epoch 007: 4948 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=8011.3, nsentences=120, sample_size=3763.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2042.9, ups=0.26, wpb=8011.3, bsz=120, num_updates=41130, lr=1.01893e-05, gnorm=1.023, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=168525 2023-05-03 01:22:33 - progress_bar.py[line:274] - INFO: epoch 007: 4958 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7897.8, nsentences=120, sample_size=3684.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1977.4, ups=0.25, wpb=7897.8, bsz=120, num_updates=41140, lr=1.0184e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=168565 2023-05-03 01:23:12 - progress_bar.py[line:274] - INFO: epoch 007: 4968 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7957.6, nsentences=120, sample_size=3972.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2014.3, ups=0.25, wpb=7957.6, bsz=120, num_updates=41150, lr=1.01787e-05, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=168605 2023-05-03 01:23:52 - progress_bar.py[line:274] - INFO: epoch 007: 4978 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.173, ntokens=7662, nsentences=120, sample_size=3858.1, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1922.2, ups=0.25, wpb=7662, bsz=120, num_updates=41160, lr=1.01734e-05, gnorm=0.999, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=168645 2023-05-03 01:24:32 - progress_bar.py[line:274] - INFO: epoch 007: 4988 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7899.6, nsentences=120, sample_size=3998, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1990, ups=0.25, wpb=7899.6, bsz=120, num_updates=41170, lr=1.01681e-05, gnorm=0.973, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=168684 2023-05-03 01:25:11 - progress_bar.py[line:274] - INFO: epoch 007: 4998 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7915.1, nsentences=120, sample_size=3782.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2005, ups=0.25, wpb=7915.1, bsz=120, num_updates=41180, lr=1.01629e-05, gnorm=0.963, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=168724 2023-05-03 01:25:51 - progress_bar.py[line:274] - INFO: epoch 007: 5008 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7558.7, nsentences=120, sample_size=4108.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1923.7, ups=0.25, wpb=7558.7, bsz=120, num_updates=41190, lr=1.01576e-05, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=168763 2023-05-03 01:26:30 - progress_bar.py[line:274] - INFO: epoch 007: 5018 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7737.6, nsentences=120, sample_size=3946.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1941.2, ups=0.25, wpb=7737.6, bsz=120, num_updates=41200, lr=1.01523e-05, gnorm=0.965, clip=20, loss_scale=64, train_wall=40, gb_free=29.2, wall=168803 2023-05-03 01:27:10 - progress_bar.py[line:274] - INFO: epoch 007: 5028 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7581.4, nsentences=120, sample_size=4203, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1907.6, ups=0.25, wpb=7581.4, bsz=120, num_updates=41210, lr=1.0147e-05, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=168843 2023-05-03 01:27:50 - progress_bar.py[line:274] - INFO: epoch 007: 5038 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7880.5, nsentences=120, sample_size=4115.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1974, ups=0.25, wpb=7880.5, bsz=120, num_updates=41220, lr=1.01417e-05, gnorm=1.044, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=168883 2023-05-03 01:28:30 - progress_bar.py[line:274] - INFO: epoch 007: 5048 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7942.1, nsentences=120, sample_size=3899.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1998.5, ups=0.25, wpb=7942.1, bsz=120, num_updates=41230, lr=1.01365e-05, gnorm=1, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=168922 2023-05-03 01:29:10 - progress_bar.py[line:274] - INFO: epoch 007: 5058 / 6042 loss=2.436, loss_v1=0, loss_v2=0, nll_loss=1.192, ntokens=7708.2, nsentences=120, sample_size=3820.9, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1937.3, ups=0.25, wpb=7708.2, bsz=120, num_updates=41240, lr=1.01312e-05, gnorm=1.009, clip=60, loss_scale=64, train_wall=40, gb_free=31.6, wall=168962 2023-05-03 01:29:41 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 01:29:53 - progress_bar.py[line:274] - INFO: epoch 007: 5069 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7683.4, nsentences=120, sample_size=4028.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1776.6, ups=0.23, wpb=7683.4, bsz=120, num_updates=41250, lr=1.01259e-05, gnorm=0.985, clip=40, loss_scale=64, train_wall=43, gb_free=30.6, wall=169005 2023-05-03 01:30:32 - progress_bar.py[line:274] - INFO: epoch 007: 5079 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7723, nsentences=120, sample_size=4025.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1962.5, ups=0.25, wpb=7723, bsz=120, num_updates=41260, lr=1.01206e-05, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=30.6, wall=169045 2023-05-03 01:31:12 - progress_bar.py[line:274] - INFO: epoch 007: 5089 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7713.2, nsentences=120, sample_size=3994.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1942.9, ups=0.25, wpb=7713.2, bsz=120, num_updates=41270, lr=1.01153e-05, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=169084 2023-05-03 01:31:51 - progress_bar.py[line:274] - INFO: epoch 007: 5099 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7452.2, nsentences=120, sample_size=4076.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1911.8, ups=0.26, wpb=7452.2, bsz=120, num_updates=41280, lr=1.011e-05, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=169123 2023-05-03 01:32:30 - progress_bar.py[line:274] - INFO: epoch 007: 5109 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7542.5, nsentences=120, sample_size=4059.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1904.5, ups=0.25, wpb=7542.5, bsz=120, num_updates=41290, lr=1.01048e-05, gnorm=0.969, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=169163 2023-05-03 01:33:10 - progress_bar.py[line:274] - INFO: epoch 007: 5119 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7750.1, nsentences=120, sample_size=4008.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1952.1, ups=0.25, wpb=7750.1, bsz=120, num_updates=41300, lr=1.00995e-05, gnorm=0.973, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=169203 2023-05-03 01:33:50 - progress_bar.py[line:274] - INFO: epoch 007: 5129 / 6042 loss=2.432, loss_v1=0, loss_v2=0, nll_loss=1.184, ntokens=7802.7, nsentences=120, sample_size=4044.1, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1959.4, ups=0.25, wpb=7802.7, bsz=120, num_updates=41310, lr=1.00942e-05, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=169242 2023-05-03 01:34:30 - progress_bar.py[line:274] - INFO: epoch 007: 5139 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7554.8, nsentences=120, sample_size=3936.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1880, ups=0.25, wpb=7554.8, bsz=120, num_updates=41320, lr=1.00889e-05, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=169283 2023-05-03 01:35:10 - progress_bar.py[line:274] - INFO: epoch 007: 5149 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7575.9, nsentences=120, sample_size=4139.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1907.5, ups=0.25, wpb=7575.9, bsz=120, num_updates=41330, lr=1.00836e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=169322 2023-05-03 01:35:50 - progress_bar.py[line:274] - INFO: epoch 007: 5159 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=8023.1, nsentences=120, sample_size=3728.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2005.8, ups=0.25, wpb=8023.1, bsz=120, num_updates=41340, lr=1.00784e-05, gnorm=1.006, clip=60, loss_scale=64, train_wall=40, gb_free=30.9, wall=169362 2023-05-03 01:36:29 - progress_bar.py[line:274] - INFO: epoch 007: 5169 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7554.5, nsentences=120, sample_size=4108.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1915, ups=0.25, wpb=7554.5, bsz=120, num_updates=41350, lr=1.00731e-05, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=29.1, wall=169402 2023-05-03 01:37:11 - progress_bar.py[line:274] - INFO: epoch 007: 5179 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=8127.4, nsentences=120, sample_size=3839.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1970.3, ups=0.24, wpb=8127.4, bsz=120, num_updates=41360, lr=1.00678e-05, gnorm=1.005, clip=50, loss_scale=64, train_wall=41, gb_free=27.1, wall=169443 2023-05-03 01:37:50 - progress_bar.py[line:274] - INFO: epoch 007: 5189 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=8184.7, nsentences=120, sample_size=3874, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2069.7, ups=0.25, wpb=8184.7, bsz=120, num_updates=41370, lr=1.00625e-05, gnorm=0.98, clip=50, loss_scale=64, train_wall=39, gb_free=29, wall=169483 2023-05-03 01:38:31 - progress_bar.py[line:274] - INFO: epoch 007: 5199 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7902, nsentences=120, sample_size=4019.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1944.5, ups=0.25, wpb=7902, bsz=120, num_updates=41380, lr=1.00572e-05, gnorm=0.987, clip=30, loss_scale=64, train_wall=41, gb_free=27.1, wall=169523 2023-05-03 01:39:10 - progress_bar.py[line:274] - INFO: epoch 007: 5209 / 6042 loss=2.419, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7716.3, nsentences=120, sample_size=3774, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1963, ups=0.25, wpb=7716.3, bsz=120, num_updates=41390, lr=1.00519e-05, gnorm=0.978, clip=30, loss_scale=64, train_wall=39, gb_free=28.6, wall=169563 2023-05-03 01:39:50 - progress_bar.py[line:274] - INFO: epoch 007: 5219 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=8057.3, nsentences=120, sample_size=4081.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2023.9, ups=0.25, wpb=8057.3, bsz=120, num_updates=41400, lr=1.00467e-05, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=169602 2023-05-03 01:40:29 - progress_bar.py[line:274] - INFO: epoch 007: 5229 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7558, nsentences=120, sample_size=4114.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1916.2, ups=0.25, wpb=7558, bsz=120, num_updates=41410, lr=1.00414e-05, gnorm=0.994, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=169642 2023-05-03 01:41:08 - progress_bar.py[line:274] - INFO: epoch 007: 5239 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7558, nsentences=120, sample_size=4191.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1953.3, ups=0.26, wpb=7558, bsz=120, num_updates=41420, lr=1.00361e-05, gnorm=0.959, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=169681 2023-05-03 01:41:48 - progress_bar.py[line:274] - INFO: epoch 007: 5249 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=8023.6, nsentences=120, sample_size=4010.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2017, ups=0.25, wpb=8023.6, bsz=120, num_updates=41430, lr=1.00308e-05, gnorm=0.987, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=169720 2023-05-03 01:42:28 - progress_bar.py[line:274] - INFO: epoch 007: 5259 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7800.9, nsentences=120, sample_size=4054.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1954.1, ups=0.25, wpb=7800.9, bsz=120, num_updates=41440, lr=1.00255e-05, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=169760 2023-05-03 01:43:08 - progress_bar.py[line:274] - INFO: epoch 007: 5269 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7538.4, nsentences=120, sample_size=4241.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1882.6, ups=0.25, wpb=7538.4, bsz=120, num_updates=41450, lr=1.00202e-05, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=169800 2023-05-03 01:43:48 - progress_bar.py[line:274] - INFO: epoch 007: 5279 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7611.3, nsentences=120, sample_size=4164.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1908, ups=0.25, wpb=7611.3, bsz=120, num_updates=41460, lr=1.0015e-05, gnorm=1.085, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=169840 2023-05-03 01:44:27 - progress_bar.py[line:274] - INFO: epoch 007: 5289 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7734.1, nsentences=120, sample_size=4166.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1981.3, ups=0.26, wpb=7734.1, bsz=120, num_updates=41470, lr=1.00097e-05, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=169879 2023-05-03 01:45:06 - progress_bar.py[line:274] - INFO: epoch 007: 5299 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7413.2, nsentences=120, sample_size=4161.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1891.5, ups=0.26, wpb=7413.2, bsz=120, num_updates=41480, lr=1.00044e-05, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=169918 2023-05-03 01:45:46 - progress_bar.py[line:274] - INFO: epoch 007: 5309 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7794.5, nsentences=120, sample_size=4031, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1938.7, ups=0.25, wpb=7794.5, bsz=120, num_updates=41490, lr=9.99912e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=169959 2023-05-03 01:46:26 - progress_bar.py[line:274] - INFO: epoch 007: 5319 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7536.9, nsentences=120, sample_size=3946.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1890.4, ups=0.25, wpb=7536.9, bsz=120, num_updates=41500, lr=9.99384e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=169998 2023-05-03 01:47:06 - progress_bar.py[line:274] - INFO: epoch 007: 5329 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7619.7, nsentences=120, sample_size=4233.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1922.5, ups=0.25, wpb=7619.7, bsz=120, num_updates=41510, lr=9.98856e-06, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=170038 2023-05-03 01:47:45 - progress_bar.py[line:274] - INFO: epoch 007: 5339 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7808.8, nsentences=120, sample_size=4136.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1963.5, ups=0.25, wpb=7808.8, bsz=120, num_updates=41520, lr=9.98327e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=170078 2023-05-03 01:48:01 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 01:48:29 - progress_bar.py[line:274] - INFO: epoch 007: 5350 / 6042 loss=2.435, loss_v1=0, loss_v2=0, nll_loss=1.186, ntokens=7750.1, nsentences=120, sample_size=4105.4, sample_size_v1=0, sample_size_v2=0, ppl=2.28, wps=1780.9, ups=0.23, wpb=7750.1, bsz=120, num_updates=41530, lr=9.97799e-06, gnorm=0.99, clip=50, loss_scale=32, train_wall=43, gb_free=30.7, wall=170121 2023-05-03 01:49:08 - progress_bar.py[line:274] - INFO: epoch 007: 5360 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7659.7, nsentences=120, sample_size=3892.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1957.7, ups=0.26, wpb=7659.7, bsz=120, num_updates=41540, lr=9.97271e-06, gnorm=0.975, clip=30, loss_scale=32, train_wall=39, gb_free=28.9, wall=170161 2023-05-03 01:49:49 - progress_bar.py[line:274] - INFO: epoch 007: 5370 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7716.3, nsentences=120, sample_size=4384.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1871, ups=0.24, wpb=7716.3, bsz=120, num_updates=41550, lr=9.96743e-06, gnorm=0.94, clip=10, loss_scale=32, train_wall=41, gb_free=29.7, wall=170202 2023-05-03 01:50:29 - progress_bar.py[line:274] - INFO: epoch 007: 5380 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7935.1, nsentences=120, sample_size=3788.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2017.4, ups=0.25, wpb=7935.1, bsz=120, num_updates=41560, lr=9.96214e-06, gnorm=0.993, clip=40, loss_scale=32, train_wall=39, gb_free=29.7, wall=170241 2023-05-03 01:51:08 - progress_bar.py[line:274] - INFO: epoch 007: 5390 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7628.5, nsentences=120, sample_size=3852.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1932, ups=0.25, wpb=7628.5, bsz=120, num_updates=41570, lr=9.95686e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=39, gb_free=30.3, wall=170281 2023-05-03 01:51:48 - progress_bar.py[line:274] - INFO: epoch 007: 5400 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7720.1, nsentences=120, sample_size=4208.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1934.6, ups=0.25, wpb=7720.1, bsz=120, num_updates=41580, lr=9.95158e-06, gnorm=0.933, clip=0, loss_scale=32, train_wall=40, gb_free=31.1, wall=170320 2023-05-03 01:52:28 - progress_bar.py[line:274] - INFO: epoch 007: 5410 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7462.9, nsentences=120, sample_size=4080, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1890.4, ups=0.25, wpb=7462.9, bsz=120, num_updates=41590, lr=9.9463e-06, gnorm=0.979, clip=30, loss_scale=32, train_wall=39, gb_free=29.1, wall=170360 2023-05-03 01:53:07 - progress_bar.py[line:274] - INFO: epoch 007: 5420 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7594.3, nsentences=120, sample_size=4063.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1946.9, ups=0.26, wpb=7594.3, bsz=120, num_updates=41600, lr=9.94102e-06, gnorm=0.966, clip=40, loss_scale=32, train_wall=39, gb_free=29.5, wall=170399 2023-05-03 01:53:46 - progress_bar.py[line:274] - INFO: epoch 007: 5430 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7647.2, nsentences=120, sample_size=4216.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1927.5, ups=0.25, wpb=7647.2, bsz=120, num_updates=41610, lr=9.93573e-06, gnorm=0.972, clip=40, loss_scale=32, train_wall=40, gb_free=29.7, wall=170439 2023-05-03 01:54:25 - progress_bar.py[line:274] - INFO: epoch 007: 5440 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7561.7, nsentences=120, sample_size=4065.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1929.2, ups=0.26, wpb=7561.7, bsz=120, num_updates=41620, lr=9.93045e-06, gnorm=0.968, clip=40, loss_scale=32, train_wall=39, gb_free=30.3, wall=170478 2023-05-03 01:55:05 - progress_bar.py[line:274] - INFO: epoch 007: 5450 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7896.9, nsentences=120, sample_size=4095.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1979.5, ups=0.25, wpb=7896.9, bsz=120, num_updates=41630, lr=9.92517e-06, gnorm=0.965, clip=40, loss_scale=32, train_wall=40, gb_free=30.4, wall=170518 2023-05-03 01:55:44 - progress_bar.py[line:274] - INFO: epoch 007: 5460 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7509.2, nsentences=120, sample_size=4151, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1921.9, ups=0.26, wpb=7509.2, bsz=120, num_updates=41640, lr=9.91989e-06, gnorm=0.956, clip=20, loss_scale=32, train_wall=39, gb_free=29.7, wall=170557 2023-05-03 01:56:24 - progress_bar.py[line:274] - INFO: epoch 007: 5470 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7801.7, nsentences=120, sample_size=3787.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1970.1, ups=0.25, wpb=7801.7, bsz=120, num_updates=41650, lr=9.91461e-06, gnorm=0.977, clip=50, loss_scale=32, train_wall=40, gb_free=25.9, wall=170596 2023-05-03 01:57:04 - progress_bar.py[line:274] - INFO: epoch 007: 5480 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=8144.2, nsentences=120, sample_size=4276.9, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2029.6, ups=0.25, wpb=8144.2, bsz=120, num_updates=41660, lr=9.90932e-06, gnorm=0.921, clip=10, loss_scale=32, train_wall=40, gb_free=29.2, wall=170637 2023-05-03 01:57:44 - progress_bar.py[line:274] - INFO: epoch 007: 5490 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7523.4, nsentences=120, sample_size=4316.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1886.1, ups=0.25, wpb=7523.4, bsz=120, num_updates=41670, lr=9.90404e-06, gnorm=0.916, clip=0, loss_scale=32, train_wall=40, gb_free=30.9, wall=170676 2023-05-03 01:58:23 - progress_bar.py[line:274] - INFO: epoch 007: 5500 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7602, nsentences=120, sample_size=3746.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1938.8, ups=0.26, wpb=7602, bsz=120, num_updates=41680, lr=9.89876e-06, gnorm=1.013, clip=50, loss_scale=32, train_wall=39, gb_free=30.2, wall=170716 2023-05-03 01:59:03 - progress_bar.py[line:274] - INFO: epoch 007: 5510 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7756.7, nsentences=120, sample_size=3667, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1946.5, ups=0.25, wpb=7756.7, bsz=120, num_updates=41690, lr=9.89348e-06, gnorm=1.016, clip=60, loss_scale=32, train_wall=40, gb_free=30.6, wall=170756 2023-05-03 01:59:43 - progress_bar.py[line:274] - INFO: epoch 007: 5520 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7752.5, nsentences=120, sample_size=4242.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1958, ups=0.25, wpb=7752.5, bsz=120, num_updates=41700, lr=9.88819e-06, gnorm=0.958, clip=10, loss_scale=32, train_wall=40, gb_free=31.8, wall=170795 2023-05-03 02:00:23 - progress_bar.py[line:274] - INFO: epoch 007: 5530 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7350.6, nsentences=120, sample_size=4039.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1840.5, ups=0.25, wpb=7350.6, bsz=120, num_updates=41710, lr=9.88291e-06, gnorm=0.975, clip=60, loss_scale=32, train_wall=40, gb_free=29.5, wall=170835 2023-05-03 02:01:02 - progress_bar.py[line:274] - INFO: epoch 007: 5540 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=8065.1, nsentences=120, sample_size=4030.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2022.9, ups=0.25, wpb=8065.1, bsz=120, num_updates=41720, lr=9.87763e-06, gnorm=0.946, clip=30, loss_scale=32, train_wall=40, gb_free=25.9, wall=170875 2023-05-03 02:01:43 - progress_bar.py[line:274] - INFO: epoch 007: 5550 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7632.4, nsentences=120, sample_size=3900.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1899.9, ups=0.25, wpb=7632.4, bsz=120, num_updates=41730, lr=9.87235e-06, gnorm=0.962, clip=30, loss_scale=32, train_wall=40, gb_free=31.3, wall=170915 2023-05-03 02:02:23 - progress_bar.py[line:274] - INFO: epoch 007: 5560 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=8242, nsentences=120, sample_size=4546.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2060.7, ups=0.25, wpb=8242, bsz=120, num_updates=41740, lr=9.86707e-06, gnorm=0.921, clip=10, loss_scale=32, train_wall=40, gb_free=29.9, wall=170955 2023-05-03 02:03:02 - progress_bar.py[line:274] - INFO: epoch 007: 5570 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7927.5, nsentences=120, sample_size=4304.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2004.5, ups=0.25, wpb=7927.5, bsz=120, num_updates=41750, lr=9.86178e-06, gnorm=0.932, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=170995 2023-05-03 02:03:42 - progress_bar.py[line:274] - INFO: epoch 007: 5580 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7573.7, nsentences=120, sample_size=4021.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1909.8, ups=0.25, wpb=7573.7, bsz=120, num_updates=41760, lr=9.8565e-06, gnorm=0.974, clip=30, loss_scale=32, train_wall=40, gb_free=29.4, wall=171034 2023-05-03 02:04:22 - progress_bar.py[line:274] - INFO: epoch 007: 5590 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7973.6, nsentences=120, sample_size=3906.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1984.8, ups=0.25, wpb=7973.6, bsz=120, num_updates=41770, lr=9.85122e-06, gnorm=0.966, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=171074 2023-05-03 02:05:02 - progress_bar.py[line:274] - INFO: epoch 007: 5600 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7384.8, nsentences=120, sample_size=3995.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1859, ups=0.25, wpb=7384.8, bsz=120, num_updates=41780, lr=9.84594e-06, gnorm=0.983, clip=40, loss_scale=32, train_wall=40, gb_free=26.7, wall=171114 2023-05-03 02:05:42 - progress_bar.py[line:274] - INFO: epoch 007: 5610 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7750.5, nsentences=120, sample_size=4064, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1917.8, ups=0.25, wpb=7750.5, bsz=120, num_updates=41790, lr=9.84065e-06, gnorm=0.961, clip=30, loss_scale=32, train_wall=40, gb_free=30.6, wall=171155 2023-05-03 02:06:22 - progress_bar.py[line:274] - INFO: epoch 007: 5620 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7578.1, nsentences=120, sample_size=4400.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1886.7, ups=0.25, wpb=7578.1, bsz=120, num_updates=41800, lr=9.83537e-06, gnorm=0.947, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=171195 2023-05-03 02:07:02 - progress_bar.py[line:274] - INFO: epoch 007: 5630 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7674, nsentences=120, sample_size=4133.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1913.8, ups=0.25, wpb=7674, bsz=120, num_updates=41810, lr=9.83009e-06, gnorm=0.941, clip=10, loss_scale=32, train_wall=40, gb_free=29.5, wall=171235 2023-05-03 02:07:43 - progress_bar.py[line:274] - INFO: epoch 007: 5640 / 6042 loss=2.417, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=8045.7, nsentences=120, sample_size=3993.8, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1969.4, ups=0.24, wpb=8045.7, bsz=120, num_updates=41820, lr=9.82481e-06, gnorm=0.985, clip=50, loss_scale=32, train_wall=41, gb_free=29.9, wall=171276 2023-05-03 02:08:23 - progress_bar.py[line:274] - INFO: epoch 007: 5650 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7556.1, nsentences=120, sample_size=3915.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1885.3, ups=0.25, wpb=7556.1, bsz=120, num_updates=41830, lr=9.81953e-06, gnorm=0.98, clip=40, loss_scale=32, train_wall=40, gb_free=30.3, wall=171316 2023-05-03 02:09:03 - progress_bar.py[line:274] - INFO: epoch 007: 5660 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7765, nsentences=120, sample_size=4192.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1955.3, ups=0.25, wpb=7765, bsz=120, num_updates=41840, lr=9.81424e-06, gnorm=0.957, clip=40, loss_scale=32, train_wall=40, gb_free=30.2, wall=171356 2023-05-03 02:09:43 - progress_bar.py[line:274] - INFO: epoch 007: 5670 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7614.6, nsentences=120, sample_size=4500.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.4, ups=0.25, wpb=7614.6, bsz=120, num_updates=41850, lr=9.80896e-06, gnorm=0.926, clip=20, loss_scale=32, train_wall=39, gb_free=30.1, wall=171395 2023-05-03 02:10:23 - progress_bar.py[line:274] - INFO: epoch 007: 5680 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7837.4, nsentences=120, sample_size=4140.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1937, ups=0.25, wpb=7837.4, bsz=120, num_updates=41860, lr=9.80368e-06, gnorm=0.928, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=171436 2023-05-03 02:11:03 - progress_bar.py[line:274] - INFO: epoch 007: 5690 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7690.3, nsentences=120, sample_size=4078.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1914.9, ups=0.25, wpb=7690.3, bsz=120, num_updates=41870, lr=9.7984e-06, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=31.4, wall=171476 2023-05-03 02:11:43 - progress_bar.py[line:274] - INFO: epoch 007: 5700 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7620.9, nsentences=120, sample_size=4081.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1916.1, ups=0.25, wpb=7620.9, bsz=120, num_updates=41880, lr=9.79312e-06, gnorm=0.981, clip=30, loss_scale=32, train_wall=40, gb_free=29.7, wall=171515 2023-05-03 02:12:23 - progress_bar.py[line:274] - INFO: epoch 007: 5710 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7512, nsentences=120, sample_size=3973.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1886.4, ups=0.25, wpb=7512, bsz=120, num_updates=41890, lr=9.78783e-06, gnorm=0.978, clip=30, loss_scale=32, train_wall=40, gb_free=31.2, wall=171555 2023-05-03 02:13:03 - progress_bar.py[line:274] - INFO: epoch 007: 5720 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7766.5, nsentences=120, sample_size=4127.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1938.7, ups=0.25, wpb=7766.5, bsz=120, num_updates=41900, lr=9.78255e-06, gnorm=1.003, clip=40, loss_scale=32, train_wall=40, gb_free=25.8, wall=171595 2023-05-03 02:13:43 - progress_bar.py[line:274] - INFO: epoch 007: 5730 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7827.7, nsentences=120, sample_size=3939.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1949.3, ups=0.25, wpb=7827.7, bsz=120, num_updates=41910, lr=9.77727e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=29.6, wall=171636 2023-05-03 02:14:23 - progress_bar.py[line:274] - INFO: epoch 007: 5740 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7582.5, nsentences=120, sample_size=4351.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1904.2, ups=0.25, wpb=7582.5, bsz=120, num_updates=41920, lr=9.77199e-06, gnorm=0.946, clip=10, loss_scale=32, train_wall=40, gb_free=28.6, wall=171675 2023-05-03 02:15:02 - progress_bar.py[line:274] - INFO: epoch 007: 5750 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7810.5, nsentences=120, sample_size=4282.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1973.8, ups=0.25, wpb=7810.5, bsz=120, num_updates=41930, lr=9.7667e-06, gnorm=0.917, clip=10, loss_scale=32, train_wall=39, gb_free=29.4, wall=171715 2023-05-03 02:15:42 - progress_bar.py[line:274] - INFO: epoch 007: 5760 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7484.5, nsentences=120, sample_size=3717, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1886.6, ups=0.25, wpb=7484.5, bsz=120, num_updates=41940, lr=9.76142e-06, gnorm=1.006, clip=60, loss_scale=32, train_wall=40, gb_free=29.1, wall=171755 2023-05-03 02:16:22 - progress_bar.py[line:274] - INFO: epoch 007: 5770 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.167, ntokens=7964.2, nsentences=120, sample_size=3678.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2000.3, ups=0.25, wpb=7964.2, bsz=120, num_updates=41950, lr=9.75614e-06, gnorm=1.008, clip=50, loss_scale=32, train_wall=40, gb_free=29.4, wall=171794 2023-05-03 02:17:02 - progress_bar.py[line:274] - INFO: epoch 007: 5780 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7777.6, nsentences=120, sample_size=3895.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1948.1, ups=0.25, wpb=7777.6, bsz=120, num_updates=41960, lr=9.75086e-06, gnorm=0.986, clip=30, loss_scale=32, train_wall=40, gb_free=31.3, wall=171834 2023-05-03 02:17:42 - progress_bar.py[line:274] - INFO: epoch 007: 5790 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7758.4, nsentences=120, sample_size=4172.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1954.8, ups=0.25, wpb=7758.4, bsz=120, num_updates=41970, lr=9.74558e-06, gnorm=0.962, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=171874 2023-05-03 02:18:21 - progress_bar.py[line:274] - INFO: epoch 007: 5800 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7790.2, nsentences=120, sample_size=3994.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1986.8, ups=0.26, wpb=7790.2, bsz=120, num_updates=41980, lr=9.74029e-06, gnorm=0.975, clip=20, loss_scale=32, train_wall=39, gb_free=29.8, wall=171913 2023-05-03 02:19:01 - progress_bar.py[line:274] - INFO: epoch 007: 5810 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7884.3, nsentences=120, sample_size=4058.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1977.9, ups=0.25, wpb=7884.3, bsz=120, num_updates=41990, lr=9.73501e-06, gnorm=0.976, clip=50, loss_scale=32, train_wall=40, gb_free=29, wall=171953 2023-05-03 02:19:41 - progress_bar.py[line:274] - INFO: epoch 007: 5820 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7691.3, nsentences=120, sample_size=3921.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1911.9, ups=0.25, wpb=7691.3, bsz=120, num_updates=42000, lr=9.72973e-06, gnorm=0.971, clip=30, loss_scale=32, train_wall=40, gb_free=28.8, wall=171993 2023-05-03 02:19:41 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 02:19:43 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:19:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:19:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:19:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:19:59 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:19:59 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:20:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:11 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:20:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:20:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:20:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:20:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:27 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 02:20:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:20:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:20:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:20:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:20:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:20:32 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.238 | loss_v1 0 | loss_v2 0 | nll_loss 2.072 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.2 | score 0.7524 | wps 3296.6 | wpb 3202.1 | bsz 39.4 | num_updates 42000 | best_score 0.7627 2023-05-03 02:20:32 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 42000 updates 2023-05-03 02:20:32 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_42000.pt 2023-05-03 02:20:57 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_42000.pt 2023-05-03 02:21:11 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_7_42000.pt (epoch 7 @ 42000 updates, score 0.7524) (writing took 38.72852161992341 seconds) 2023-05-03 02:21:50 - progress_bar.py[line:274] - INFO: epoch 007: 5830 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7635.2, nsentences=120, sample_size=3876, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=590.6, ups=0.08, wpb=7635.2, bsz=120, num_updates=42010, lr=9.72445e-06, gnorm=1.01, clip=30, loss_scale=32, train_wall=39, gb_free=30.1, wall=172123 2023-05-03 02:22:30 - progress_bar.py[line:274] - INFO: epoch 007: 5840 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7852, nsentences=120, sample_size=4110.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1985.4, ups=0.25, wpb=7852, bsz=120, num_updates=42020, lr=9.71917e-06, gnorm=0.983, clip=60, loss_scale=32, train_wall=39, gb_free=29.9, wall=172162 2023-05-03 02:23:09 - progress_bar.py[line:274] - INFO: epoch 007: 5850 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7508.5, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1914.8, ups=0.26, wpb=7508.5, bsz=120, num_updates=42030, lr=9.71388e-06, gnorm=0.967, clip=30, loss_scale=32, train_wall=39, gb_free=30, wall=172201 2023-05-03 02:23:49 - progress_bar.py[line:274] - INFO: epoch 007: 5860 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7691.5, nsentences=120, sample_size=3892.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1924.3, ups=0.25, wpb=7691.5, bsz=120, num_updates=42040, lr=9.7086e-06, gnorm=1.03, clip=70, loss_scale=64, train_wall=40, gb_free=30.2, wall=172241 2023-05-03 02:24:29 - progress_bar.py[line:274] - INFO: epoch 007: 5870 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7670.2, nsentences=120, sample_size=3914.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1931.2, ups=0.25, wpb=7670.2, bsz=120, num_updates=42050, lr=9.70332e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=172281 2023-05-03 02:25:08 - progress_bar.py[line:274] - INFO: epoch 007: 5880 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7621.6, nsentences=120, sample_size=4070.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923, ups=0.25, wpb=7621.6, bsz=120, num_updates=42060, lr=9.69804e-06, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=172321 2023-05-03 02:25:47 - progress_bar.py[line:274] - INFO: epoch 007: 5890 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7616.9, nsentences=120, sample_size=4211.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1958.4, ups=0.26, wpb=7616.9, bsz=120, num_updates=42070, lr=9.69275e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=172360 2023-05-03 02:26:26 - progress_bar.py[line:274] - INFO: epoch 007: 5900 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.162, ntokens=7494.1, nsentences=120, sample_size=4137, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1910.1, ups=0.25, wpb=7494.1, bsz=120, num_updates=42080, lr=9.68747e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=172399 2023-05-03 02:27:05 - progress_bar.py[line:274] - INFO: epoch 007: 5910 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7369.1, nsentences=120, sample_size=4126, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1885.8, ups=0.26, wpb=7369.1, bsz=120, num_updates=42090, lr=9.68219e-06, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=27.6, wall=172438 2023-05-03 02:27:46 - progress_bar.py[line:274] - INFO: epoch 007: 5920 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7885, nsentences=120, sample_size=3809.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1968.3, ups=0.25, wpb=7885, bsz=120, num_updates=42100, lr=9.67691e-06, gnorm=1.024, clip=70, loss_scale=64, train_wall=40, gb_free=29.8, wall=172478 2023-05-03 02:28:25 - progress_bar.py[line:274] - INFO: epoch 007: 5930 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7792.4, nsentences=120, sample_size=4142.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1961.3, ups=0.25, wpb=7792.4, bsz=120, num_updates=42110, lr=9.67163e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=172518 2023-05-03 02:29:05 - progress_bar.py[line:274] - INFO: epoch 007: 5940 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7559.9, nsentences=120, sample_size=3898.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1909.8, ups=0.25, wpb=7559.9, bsz=120, num_updates=42120, lr=9.66634e-06, gnorm=0.995, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=172557 2023-05-03 02:29:44 - progress_bar.py[line:274] - INFO: epoch 007: 5950 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7615.9, nsentences=120, sample_size=3877.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1935.3, ups=0.25, wpb=7615.9, bsz=120, num_updates=42130, lr=9.66106e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=39, gb_free=30.5, wall=172597 2023-05-03 02:30:24 - progress_bar.py[line:274] - INFO: epoch 007: 5960 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7478.4, nsentences=120, sample_size=3946.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1870.5, ups=0.25, wpb=7478.4, bsz=120, num_updates=42140, lr=9.65578e-06, gnorm=1.02, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=172637 2023-05-03 02:31:04 - progress_bar.py[line:274] - INFO: epoch 007: 5970 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7611.9, nsentences=120, sample_size=3880.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1906, ups=0.25, wpb=7611.9, bsz=120, num_updates=42150, lr=9.6505e-06, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=172677 2023-05-03 02:31:43 - progress_bar.py[line:274] - INFO: epoch 007: 5980 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7469.3, nsentences=120, sample_size=4083.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1903.8, ups=0.25, wpb=7469.3, bsz=120, num_updates=42160, lr=9.64522e-06, gnorm=0.954, clip=10, loss_scale=64, train_wall=39, gb_free=27.8, wall=172716 2023-05-03 02:32:23 - progress_bar.py[line:274] - INFO: epoch 007: 5990 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7787.7, nsentences=120, sample_size=3960.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1975.2, ups=0.25, wpb=7787.7, bsz=120, num_updates=42170, lr=9.63993e-06, gnorm=0.988, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=172755 2023-05-03 02:33:03 - progress_bar.py[line:274] - INFO: epoch 007: 6000 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7415.9, nsentences=120, sample_size=3869.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1858.2, ups=0.25, wpb=7415.9, bsz=120, num_updates=42180, lr=9.63465e-06, gnorm=0.998, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=172795 2023-05-03 02:33:42 - progress_bar.py[line:274] - INFO: epoch 007: 6010 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7884.1, nsentences=120, sample_size=4409.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1984.6, ups=0.25, wpb=7884.1, bsz=120, num_updates=42190, lr=9.62937e-06, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=27.2, wall=172835 2023-05-03 02:34:23 - progress_bar.py[line:274] - INFO: epoch 007: 6020 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7772.1, nsentences=120, sample_size=4115.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1932.1, ups=0.25, wpb=7772.1, bsz=120, num_updates=42200, lr=9.62409e-06, gnorm=0.971, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=172875 2023-05-03 02:35:02 - progress_bar.py[line:274] - INFO: epoch 007: 6030 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7565.6, nsentences=120, sample_size=4232.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1913.6, ups=0.25, wpb=7565.6, bsz=120, num_updates=42210, lr=9.6188e-06, gnorm=0.964, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=172915 2023-05-03 02:35:42 - progress_bar.py[line:274] - INFO: epoch 007: 6040 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7517.1, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1866, ups=0.25, wpb=7517.1, bsz=120, num_updates=42220, lr=9.61352e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=172955 2023-05-03 02:35:49 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 02:35:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:35:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:35:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:35:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:35:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:36:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:36:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:36:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:36:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:36:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:36:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:35 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 02:36:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:36:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 02:36:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 02:36:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 02:36:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 02:36:40 - progress_bar.py[line:282] - INFO: epoch 007 | valid on 'valid' subset | loss 3.228 | loss_v1 0 | loss_v2 0 | nll_loss 2.064 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.18 | score 0.7549 | wps 3295.6 | wpb 3202.1 | bsz 39.4 | num_updates 42222 | best_score 0.7627 2023-05-03 02:36:40 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 7 @ 42222 updates 2023-05-03 02:36:40 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 02:37:07 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 02:37:07 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 7 @ 42222 updates, score 0.7549) (writing took 27.275630317861214 seconds) 2023-05-03 02:37:07 - train.py[line:332] - INFO: end of epoch 7 (average epoch stats below) 2023-05-03 02:37:07 - progress_bar.py[line:282] - INFO: epoch 007 | loss 2.388 | loss_v1 0 | loss_v2 0 | nll_loss 1.134 | ntokens 7729.28 | nsentences 119.992 | sample_size 4037.81 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.19 | wps 1888.4 | ups 0.24 | wpb 7729.3 | bsz 120 | num_updates 42222 | lr 9.61247e-06 | gnorm 0.971 | clip 31.6 | loss_scale 64 | train_wall 24015 | gb_free 29.9 | wall 173040 2023-05-03 02:37:07 - trainer.py[line:639] - INFO: loading train data for epoch 8 2023-05-03 02:37:07 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-03 02:37:08 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-03 02:37:09 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-03 02:37:11 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-03 02:37:11 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-03 02:37:11 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-03 02:37:12 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-03 02:37:12 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-03 02:37:12 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-03 02:37:13 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-03 02:37:13 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-03 02:37:13 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-03 02:37:14 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-03 02:37:14 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-03 02:37:14 - trainer.py[line:703] - INFO: begin training epoch 8 2023-05-03 02:37:14 - train.py[line:305] - INFO: Start iterating over samples 2023-05-03 02:37:46 - progress_bar.py[line:274] - INFO: epoch 008: 8 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7154.8, nsentences=116, sample_size=4017.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=578.5, ups=0.08, wpb=7154.8, bsz=116, num_updates=42230, lr=9.60824e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=38, gb_free=30.5, wall=173079 2023-05-03 02:38:26 - progress_bar.py[line:274] - INFO: epoch 008: 18 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7720.2, nsentences=120, sample_size=4114, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1920.9, ups=0.25, wpb=7720.2, bsz=120, num_updates=42240, lr=9.60296e-06, gnorm=0.947, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=173119 2023-05-03 02:39:06 - progress_bar.py[line:274] - INFO: epoch 008: 28 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7635.5, nsentences=120, sample_size=4252.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1919.7, ups=0.25, wpb=7635.5, bsz=120, num_updates=42250, lr=9.59768e-06, gnorm=0.976, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=173159 2023-05-03 02:39:45 - progress_bar.py[line:274] - INFO: epoch 008: 38 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7638.5, nsentences=120, sample_size=4330.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1946.5, ups=0.25, wpb=7638.5, bsz=120, num_updates=42260, lr=9.59239e-06, gnorm=0.934, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=173198 2023-05-03 02:40:25 - progress_bar.py[line:274] - INFO: epoch 008: 48 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7802.4, nsentences=120, sample_size=4140.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1986, ups=0.25, wpb=7802.4, bsz=120, num_updates=42270, lr=9.58711e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=173237 2023-05-03 02:41:04 - progress_bar.py[line:274] - INFO: epoch 008: 58 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7432.1, nsentences=120, sample_size=4168.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1886.2, ups=0.25, wpb=7432.1, bsz=120, num_updates=42280, lr=9.58183e-06, gnorm=0.978, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=173277 2023-05-03 02:41:44 - progress_bar.py[line:274] - INFO: epoch 008: 68 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7910.1, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1981, ups=0.25, wpb=7910.1, bsz=120, num_updates=42290, lr=9.57655e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=173316 2023-05-03 02:42:24 - progress_bar.py[line:274] - INFO: epoch 008: 78 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7768.6, nsentences=120, sample_size=3777.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1955.9, ups=0.25, wpb=7768.6, bsz=120, num_updates=42300, lr=9.57127e-06, gnorm=1.007, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=173356 2023-05-03 02:43:04 - progress_bar.py[line:274] - INFO: epoch 008: 88 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.166, ntokens=8178.9, nsentences=120, sample_size=4093.2, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2007.4, ups=0.25, wpb=8178.9, bsz=120, num_updates=42310, lr=9.56598e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=41, gb_free=30.3, wall=173397 2023-05-03 02:43:44 - progress_bar.py[line:274] - INFO: epoch 008: 98 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7930.7, nsentences=120, sample_size=3934.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2020.9, ups=0.25, wpb=7930.7, bsz=120, num_updates=42320, lr=9.5607e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=173436 2023-05-03 02:44:23 - progress_bar.py[line:274] - INFO: epoch 008: 108 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7746.4, nsentences=120, sample_size=4061.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1968.8, ups=0.25, wpb=7746.4, bsz=120, num_updates=42330, lr=9.55542e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=39, gb_free=28.2, wall=173475 2023-05-03 02:45:04 - progress_bar.py[line:274] - INFO: epoch 008: 118 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7915.9, nsentences=120, sample_size=3914.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1950, ups=0.25, wpb=7915.9, bsz=120, num_updates=42340, lr=9.55014e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=41, gb_free=30.1, wall=173516 2023-05-03 02:45:43 - progress_bar.py[line:274] - INFO: epoch 008: 128 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7691.6, nsentences=120, sample_size=4019.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1939.1, ups=0.25, wpb=7691.6, bsz=120, num_updates=42350, lr=9.54485e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=173556 2023-05-03 02:46:22 - progress_bar.py[line:274] - INFO: epoch 008: 138 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7656.1, nsentences=120, sample_size=4261.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1969.1, ups=0.26, wpb=7656.1, bsz=120, num_updates=42360, lr=9.53957e-06, gnorm=0.936, clip=0, loss_scale=64, train_wall=39, gb_free=30.7, wall=173595 2023-05-03 02:47:02 - progress_bar.py[line:274] - INFO: epoch 008: 148 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7687.7, nsentences=120, sample_size=4354.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1949.1, ups=0.25, wpb=7687.7, bsz=120, num_updates=42370, lr=9.53429e-06, gnorm=0.939, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=173634 2023-05-03 02:47:41 - progress_bar.py[line:274] - INFO: epoch 008: 158 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7535.4, nsentences=120, sample_size=4052.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1908.5, ups=0.25, wpb=7535.4, bsz=120, num_updates=42380, lr=9.52901e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=39, gb_free=27.5, wall=173674 2023-05-03 02:48:21 - progress_bar.py[line:274] - INFO: epoch 008: 168 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7943.1, nsentences=120, sample_size=3972, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1975.4, ups=0.25, wpb=7943.1, bsz=120, num_updates=42390, lr=9.52373e-06, gnorm=1.015, clip=60, loss_scale=64, train_wall=40, gb_free=29.4, wall=173714 2023-05-03 02:49:01 - progress_bar.py[line:274] - INFO: epoch 008: 178 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7748.6, nsentences=120, sample_size=3699.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1937.1, ups=0.25, wpb=7748.6, bsz=120, num_updates=42400, lr=9.51844e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=31.3, wall=173754 2023-05-03 02:49:42 - progress_bar.py[line:274] - INFO: epoch 008: 188 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7787.8, nsentences=120, sample_size=4212.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1929.6, ups=0.25, wpb=7787.8, bsz=120, num_updates=42410, lr=9.51316e-06, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=173794 2023-05-03 02:50:21 - progress_bar.py[line:274] - INFO: epoch 008: 198 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7482.2, nsentences=120, sample_size=3945.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1912.2, ups=0.26, wpb=7482.2, bsz=120, num_updates=42420, lr=9.50788e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=173833 2023-05-03 02:51:01 - progress_bar.py[line:274] - INFO: epoch 008: 208 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7826.3, nsentences=120, sample_size=4122, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1953.2, ups=0.25, wpb=7826.3, bsz=120, num_updates=42430, lr=9.5026e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=173873 2023-05-03 02:51:41 - progress_bar.py[line:274] - INFO: epoch 008: 218 / 6042 loss=2.405, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7762.2, nsentences=120, sample_size=4509.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1933.7, ups=0.25, wpb=7762.2, bsz=120, num_updates=42440, lr=9.49731e-06, gnorm=0.909, clip=0, loss_scale=64, train_wall=40, gb_free=27.6, wall=173913 2023-05-03 02:52:21 - progress_bar.py[line:274] - INFO: epoch 008: 228 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7811.6, nsentences=120, sample_size=4335.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1946.1, ups=0.25, wpb=7811.6, bsz=120, num_updates=42450, lr=9.49203e-06, gnorm=0.937, clip=0, loss_scale=64, train_wall=40, gb_free=30.8, wall=173954 2023-05-03 02:53:01 - progress_bar.py[line:274] - INFO: epoch 008: 238 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7846.2, nsentences=120, sample_size=4149.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1963.5, ups=0.25, wpb=7846.2, bsz=120, num_updates=42460, lr=9.48675e-06, gnorm=0.948, clip=0, loss_scale=64, train_wall=40, gb_free=31.1, wall=173994 2023-05-03 02:53:41 - progress_bar.py[line:274] - INFO: epoch 008: 248 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7853.2, nsentences=120, sample_size=3884.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1978.1, ups=0.25, wpb=7853.2, bsz=120, num_updates=42470, lr=9.48147e-06, gnorm=0.996, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=174033 2023-05-03 02:54:21 - progress_bar.py[line:274] - INFO: epoch 008: 258 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7901.1, nsentences=120, sample_size=4374.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1945.3, ups=0.25, wpb=7901.1, bsz=120, num_updates=42480, lr=9.47619e-06, gnorm=0.947, clip=20, loss_scale=64, train_wall=41, gb_free=31.4, wall=174074 2023-05-03 02:55:01 - progress_bar.py[line:274] - INFO: epoch 008: 268 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7797.9, nsentences=120, sample_size=4121.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1953.6, ups=0.25, wpb=7797.9, bsz=120, num_updates=42490, lr=9.4709e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=174114 2023-05-03 02:55:41 - progress_bar.py[line:274] - INFO: epoch 008: 278 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7948.7, nsentences=120, sample_size=4358.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2011.1, ups=0.25, wpb=7948.7, bsz=120, num_updates=42500, lr=9.46562e-06, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=31.4, wall=174153 2023-05-03 02:56:21 - progress_bar.py[line:274] - INFO: epoch 008: 288 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7395.8, nsentences=120, sample_size=4216.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1854.1, ups=0.25, wpb=7395.8, bsz=120, num_updates=42510, lr=9.46034e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=174193 2023-05-03 02:57:00 - progress_bar.py[line:274] - INFO: epoch 008: 298 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7778.1, nsentences=120, sample_size=4072.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2002.5, ups=0.26, wpb=7778.1, bsz=120, num_updates=42520, lr=9.45506e-06, gnorm=0.971, clip=50, loss_scale=64, train_wall=39, gb_free=28.7, wall=174232 2023-05-03 02:57:39 - progress_bar.py[line:274] - INFO: epoch 008: 308 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7744.6, nsentences=120, sample_size=3814.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1960, ups=0.25, wpb=7744.6, bsz=120, num_updates=42530, lr=9.44978e-06, gnorm=1.002, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=174272 2023-05-03 02:58:18 - progress_bar.py[line:274] - INFO: epoch 008: 318 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7419, nsentences=120, sample_size=3971.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1931.6, ups=0.26, wpb=7419, bsz=120, num_updates=42540, lr=9.44449e-06, gnorm=1.007, clip=40, loss_scale=64, train_wall=38, gb_free=30.5, wall=174310 2023-05-03 02:58:57 - progress_bar.py[line:274] - INFO: epoch 008: 328 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7702.7, nsentences=120, sample_size=4308.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1956, ups=0.25, wpb=7702.7, bsz=120, num_updates=42550, lr=9.43921e-06, gnorm=0.954, clip=30, loss_scale=128, train_wall=39, gb_free=31.1, wall=174349 2023-05-03 02:59:25 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 02:59:42 - progress_bar.py[line:274] - INFO: epoch 008: 339 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7874.5, nsentences=120, sample_size=4075.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1757.8, ups=0.22, wpb=7874.5, bsz=120, num_updates=42560, lr=9.43393e-06, gnorm=0.948, clip=20, loss_scale=64, train_wall=45, gb_free=30.4, wall=174394 2023-05-03 03:00:22 - progress_bar.py[line:274] - INFO: epoch 008: 349 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=8035.7, nsentences=120, sample_size=4119.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2005.2, ups=0.25, wpb=8035.7, bsz=120, num_updates=42570, lr=9.42865e-06, gnorm=0.97, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=174434 2023-05-03 03:01:01 - progress_bar.py[line:274] - INFO: epoch 008: 359 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7610.7, nsentences=120, sample_size=4037.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1918.6, ups=0.25, wpb=7610.7, bsz=120, num_updates=42580, lr=9.42336e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=174474 2023-05-03 03:01:41 - progress_bar.py[line:274] - INFO: epoch 008: 369 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7495, nsentences=120, sample_size=4052.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1888.8, ups=0.25, wpb=7495, bsz=120, num_updates=42590, lr=9.41808e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=174514 2023-05-03 03:02:21 - progress_bar.py[line:274] - INFO: epoch 008: 379 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7640.4, nsentences=120, sample_size=3855.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.7, ups=0.25, wpb=7640.4, bsz=120, num_updates=42600, lr=9.4128e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=174553 2023-05-03 03:03:01 - progress_bar.py[line:274] - INFO: epoch 008: 389 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7890.6, nsentences=120, sample_size=4045.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1963.9, ups=0.25, wpb=7890.6, bsz=120, num_updates=42610, lr=9.40752e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=174593 2023-05-03 03:03:40 - progress_bar.py[line:274] - INFO: epoch 008: 399 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7750.7, nsentences=120, sample_size=4087.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1964.3, ups=0.25, wpb=7750.7, bsz=120, num_updates=42620, lr=9.40224e-06, gnorm=0.978, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=174633 2023-05-03 03:04:20 - progress_bar.py[line:274] - INFO: epoch 008: 409 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7858.9, nsentences=120, sample_size=4149.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1969.8, ups=0.25, wpb=7858.9, bsz=120, num_updates=42630, lr=9.39695e-06, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=26.4, wall=174673 2023-05-03 03:05:00 - progress_bar.py[line:274] - INFO: epoch 008: 419 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7485.8, nsentences=120, sample_size=4133, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1888.9, ups=0.25, wpb=7485.8, bsz=120, num_updates=42640, lr=9.39167e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=174712 2023-05-03 03:05:40 - progress_bar.py[line:274] - INFO: epoch 008: 429 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7799.6, nsentences=120, sample_size=4195.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1954, ups=0.25, wpb=7799.6, bsz=120, num_updates=42650, lr=9.38639e-06, gnorm=0.964, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=174752 2023-05-03 03:06:20 - progress_bar.py[line:274] - INFO: epoch 008: 439 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7558.3, nsentences=120, sample_size=4268.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1893.3, ups=0.25, wpb=7558.3, bsz=120, num_updates=42660, lr=9.38111e-06, gnorm=0.946, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=174792 2023-05-03 03:07:00 - progress_bar.py[line:274] - INFO: epoch 008: 449 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7900.1, nsentences=120, sample_size=3869, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1992.4, ups=0.25, wpb=7900.1, bsz=120, num_updates=42670, lr=9.37583e-06, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=174832 2023-05-03 03:07:40 - progress_bar.py[line:274] - INFO: epoch 008: 459 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7569.1, nsentences=120, sample_size=4010, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1868.5, ups=0.25, wpb=7569.1, bsz=120, num_updates=42680, lr=9.37054e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=174872 2023-05-03 03:08:20 - progress_bar.py[line:274] - INFO: epoch 008: 469 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7456.3, nsentences=120, sample_size=4085.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1879.5, ups=0.25, wpb=7456.3, bsz=120, num_updates=42690, lr=9.36526e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=30.9, wall=174912 2023-05-03 03:09:00 - progress_bar.py[line:274] - INFO: epoch 008: 479 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7662.6, nsentences=120, sample_size=3695.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1916, ups=0.25, wpb=7662.6, bsz=120, num_updates=42700, lr=9.35998e-06, gnorm=1.02, clip=60, loss_scale=64, train_wall=40, gb_free=27.4, wall=174952 2023-05-03 03:09:38 - progress_bar.py[line:274] - INFO: epoch 008: 489 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7582.5, nsentences=120, sample_size=3970.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1954.8, ups=0.26, wpb=7582.5, bsz=120, num_updates=42710, lr=9.3547e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=39, gb_free=28, wall=174991 2023-05-03 03:10:19 - progress_bar.py[line:274] - INFO: epoch 008: 499 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7714.8, nsentences=120, sample_size=4177.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1923, ups=0.25, wpb=7714.8, bsz=120, num_updates=42720, lr=9.34941e-06, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=175031 2023-05-03 03:11:00 - progress_bar.py[line:274] - INFO: epoch 008: 509 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7975.1, nsentences=120, sample_size=4165.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1943.9, ups=0.24, wpb=7975.1, bsz=120, num_updates=42730, lr=9.34413e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=41, gb_free=28.8, wall=175072 2023-05-03 03:11:39 - progress_bar.py[line:274] - INFO: epoch 008: 519 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7604.3, nsentences=120, sample_size=3985.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1949.4, ups=0.26, wpb=7604.3, bsz=120, num_updates=42740, lr=9.33885e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=175111 2023-05-03 03:12:18 - progress_bar.py[line:274] - INFO: epoch 008: 529 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7705, nsentences=120, sample_size=4129.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1945.8, ups=0.25, wpb=7705, bsz=120, num_updates=42750, lr=9.33357e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=175151 2023-05-03 03:12:58 - progress_bar.py[line:274] - INFO: epoch 008: 539 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7513.1, nsentences=120, sample_size=4084, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1882.2, ups=0.25, wpb=7513.1, bsz=120, num_updates=42760, lr=9.32829e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=175191 2023-05-03 03:13:38 - progress_bar.py[line:274] - INFO: epoch 008: 549 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7889.8, nsentences=120, sample_size=4061.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1976.3, ups=0.25, wpb=7889.8, bsz=120, num_updates=42770, lr=9.323e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=175231 2023-05-03 03:14:18 - progress_bar.py[line:274] - INFO: epoch 008: 559 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7728.7, nsentences=120, sample_size=3980.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1923.3, ups=0.25, wpb=7728.7, bsz=120, num_updates=42780, lr=9.31772e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=175271 2023-05-03 03:14:57 - progress_bar.py[line:274] - INFO: epoch 008: 569 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7837.6, nsentences=120, sample_size=3599, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2007.1, ups=0.26, wpb=7837.6, bsz=120, num_updates=42790, lr=9.31244e-06, gnorm=1.049, clip=90, loss_scale=64, train_wall=39, gb_free=28.3, wall=175310 2023-05-03 03:15:36 - progress_bar.py[line:274] - INFO: epoch 008: 579 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7330, nsentences=120, sample_size=3974.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1872.2, ups=0.26, wpb=7330, bsz=120, num_updates=42800, lr=9.30716e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=175349 2023-05-03 03:16:17 - progress_bar.py[line:274] - INFO: epoch 008: 589 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7729.4, nsentences=120, sample_size=4192.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1920.5, ups=0.25, wpb=7729.4, bsz=120, num_updates=42810, lr=9.30188e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=24.8, wall=175389 2023-05-03 03:16:56 - progress_bar.py[line:274] - INFO: epoch 008: 599 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7938.1, nsentences=120, sample_size=4043.3, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1995.4, ups=0.25, wpb=7938.1, bsz=120, num_updates=42820, lr=9.29659e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=31.1, wall=175429 2023-05-03 03:17:36 - progress_bar.py[line:274] - INFO: epoch 008: 609 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7641.3, nsentences=120, sample_size=4352.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1917.3, ups=0.25, wpb=7641.3, bsz=120, num_updates=42830, lr=9.29131e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=175469 2023-05-03 03:18:16 - progress_bar.py[line:274] - INFO: epoch 008: 619 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7432, nsentences=120, sample_size=4102, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1897.8, ups=0.26, wpb=7432, bsz=120, num_updates=42840, lr=9.28603e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=27.8, wall=175508 2023-05-03 03:18:56 - progress_bar.py[line:274] - INFO: epoch 008: 629 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7877, nsentences=120, sample_size=3979.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1969.1, ups=0.25, wpb=7877, bsz=120, num_updates=42850, lr=9.28075e-06, gnorm=0.997, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=175548 2023-05-03 03:19:35 - progress_bar.py[line:274] - INFO: epoch 008: 639 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7956.7, nsentences=120, sample_size=3741.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2019, ups=0.25, wpb=7956.7, bsz=120, num_updates=42860, lr=9.27546e-06, gnorm=1.005, clip=60, loss_scale=64, train_wall=39, gb_free=27.3, wall=175587 2023-05-03 03:20:15 - progress_bar.py[line:274] - INFO: epoch 008: 649 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=8095.3, nsentences=120, sample_size=3867.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2025.2, ups=0.25, wpb=8095.3, bsz=120, num_updates=42870, lr=9.27018e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=175627 2023-05-03 03:20:55 - progress_bar.py[line:274] - INFO: epoch 008: 659 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7789.4, nsentences=120, sample_size=4155, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.7, ups=0.25, wpb=7789.4, bsz=120, num_updates=42880, lr=9.2649e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=41, gb_free=30.3, wall=175668 2023-05-03 03:21:35 - progress_bar.py[line:274] - INFO: epoch 008: 669 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7752.4, nsentences=120, sample_size=3948.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1973.6, ups=0.25, wpb=7752.4, bsz=120, num_updates=42890, lr=9.25962e-06, gnorm=1.011, clip=70, loss_scale=64, train_wall=39, gb_free=30.7, wall=175707 2023-05-03 03:22:15 - progress_bar.py[line:274] - INFO: epoch 008: 679 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7508.5, nsentences=120, sample_size=4196.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1884.8, ups=0.25, wpb=7508.5, bsz=120, num_updates=42900, lr=9.25434e-06, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=31.5, wall=175747 2023-05-03 03:22:55 - progress_bar.py[line:274] - INFO: epoch 008: 689 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7645.5, nsentences=120, sample_size=3996.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1914.3, ups=0.25, wpb=7645.5, bsz=120, num_updates=42910, lr=9.24905e-06, gnorm=1.013, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=175787 2023-05-03 03:23:34 - progress_bar.py[line:274] - INFO: epoch 008: 699 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7556.4, nsentences=120, sample_size=4261.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1924.7, ups=0.25, wpb=7556.4, bsz=120, num_updates=42920, lr=9.24377e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=28.5, wall=175826 2023-05-03 03:24:13 - progress_bar.py[line:274] - INFO: epoch 008: 709 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7932, nsentences=120, sample_size=3796.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2007.2, ups=0.25, wpb=7932, bsz=120, num_updates=42930, lr=9.23849e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=175866 2023-05-03 03:24:54 - progress_bar.py[line:274] - INFO: epoch 008: 719 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7758.3, nsentences=120, sample_size=3973, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1898.9, ups=0.24, wpb=7758.3, bsz=120, num_updates=42940, lr=9.23321e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=41, gb_free=29.9, wall=175907 2023-05-03 03:25:33 - progress_bar.py[line:274] - INFO: epoch 008: 729 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7868.6, nsentences=120, sample_size=4017.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2006.8, ups=0.26, wpb=7868.6, bsz=120, num_updates=42950, lr=9.22792e-06, gnorm=0.981, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=175946 2023-05-03 03:26:13 - progress_bar.py[line:274] - INFO: epoch 008: 739 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7479.9, nsentences=120, sample_size=4264.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1894.7, ups=0.25, wpb=7479.9, bsz=120, num_updates=42960, lr=9.22264e-06, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=175985 2023-05-03 03:26:53 - progress_bar.py[line:274] - INFO: epoch 008: 749 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7722.5, nsentences=120, sample_size=3875.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1927.1, ups=0.25, wpb=7722.5, bsz=120, num_updates=42970, lr=9.21736e-06, gnorm=0.992, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=176025 2023-05-03 03:27:33 - progress_bar.py[line:274] - INFO: epoch 008: 759 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7798, nsentences=120, sample_size=4053.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1942.8, ups=0.25, wpb=7798, bsz=120, num_updates=42980, lr=9.21208e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=176066 2023-05-03 03:28:13 - progress_bar.py[line:274] - INFO: epoch 008: 769 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7962.8, nsentences=120, sample_size=4004.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1992, ups=0.25, wpb=7962.8, bsz=120, num_updates=42990, lr=9.2068e-06, gnorm=1.022, clip=60, loss_scale=64, train_wall=40, gb_free=29.1, wall=176106 2023-05-03 03:28:53 - progress_bar.py[line:274] - INFO: epoch 008: 779 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7560.8, nsentences=120, sample_size=4093.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1873.5, ups=0.25, wpb=7560.8, bsz=120, num_updates=43000, lr=9.20151e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=176146 2023-05-03 03:28:53 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 03:28:55 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 03:28:55 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:28:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:28:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:28:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:28:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:28:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:28:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:28:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:28:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:12 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 03:29:12 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:29:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 03:29:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:29:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:35 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 03:29:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:29:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:40 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 03:29:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:29:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 03:29:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 03:29:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 03:29:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 03:29:45 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.264 | loss_v1 0 | loss_v2 0 | nll_loss 2.1 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.29 | score 0.7461 | wps 3296.3 | wpb 3202.1 | bsz 39.4 | num_updates 43000 | best_score 0.7627 2023-05-03 03:29:45 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 43000 updates 2023-05-03 03:29:45 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_43000.pt 2023-05-03 03:30:13 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_43000.pt 2023-05-03 03:30:27 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_43000.pt (epoch 8 @ 43000 updates, score 0.7461) (writing took 41.86478989315219 seconds) 2023-05-03 03:31:05 - progress_bar.py[line:274] - INFO: epoch 008: 789 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7626.4, nsentences=120, sample_size=3992.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=578.2, ups=0.08, wpb=7626.4, bsz=120, num_updates=43010, lr=9.19623e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=176278 2023-05-03 03:31:45 - progress_bar.py[line:274] - INFO: epoch 008: 799 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7447, nsentences=120, sample_size=3973.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1898.7, ups=0.25, wpb=7447, bsz=120, num_updates=43020, lr=9.19095e-06, gnorm=0.973, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=176317 2023-05-03 03:32:24 - progress_bar.py[line:274] - INFO: epoch 008: 809 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7838.7, nsentences=120, sample_size=3971.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1970.2, ups=0.25, wpb=7838.7, bsz=120, num_updates=43030, lr=9.18567e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=25.4, wall=176357 2023-05-03 03:33:05 - progress_bar.py[line:274] - INFO: epoch 008: 819 / 6042 loss=2.42, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7893.4, nsentences=120, sample_size=4046.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1951.4, ups=0.25, wpb=7893.4, bsz=120, num_updates=43040, lr=9.18039e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=176397 2023-05-03 03:33:45 - progress_bar.py[line:274] - INFO: epoch 008: 829 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7504.8, nsentences=120, sample_size=4278.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1882.9, ups=0.25, wpb=7504.8, bsz=120, num_updates=43050, lr=9.1751e-06, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=176437 2023-05-03 03:34:24 - progress_bar.py[line:274] - INFO: epoch 008: 839 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7640.1, nsentences=120, sample_size=4311.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1918.5, ups=0.25, wpb=7640.1, bsz=120, num_updates=43060, lr=9.16982e-06, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=176477 2023-05-03 03:35:04 - progress_bar.py[line:274] - INFO: epoch 008: 849 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7565.7, nsentences=120, sample_size=4247, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1902.2, ups=0.25, wpb=7565.7, bsz=120, num_updates=43070, lr=9.16454e-06, gnorm=0.926, clip=0, loss_scale=128, train_wall=40, gb_free=30.2, wall=176517 2023-05-03 03:35:43 - progress_bar.py[line:274] - INFO: epoch 008: 859 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7688.7, nsentences=120, sample_size=3764.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1966.1, ups=0.26, wpb=7688.7, bsz=120, num_updates=43080, lr=9.15926e-06, gnorm=1.02, clip=60, loss_scale=128, train_wall=39, gb_free=29.9, wall=176556 2023-05-03 03:36:22 - progress_bar.py[line:274] - INFO: epoch 008: 869 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7435, nsentences=120, sample_size=4195.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1916.9, ups=0.26, wpb=7435, bsz=120, num_updates=43090, lr=9.15397e-06, gnorm=1.019, clip=80, loss_scale=128, train_wall=39, gb_free=28.8, wall=176595 2023-05-03 03:36:46 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 03:37:06 - progress_bar.py[line:274] - INFO: epoch 008: 880 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7568, nsentences=120, sample_size=4077, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1705.9, ups=0.23, wpb=7568, bsz=120, num_updates=43100, lr=9.14869e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=44, gb_free=31, wall=176639 2023-05-03 03:37:47 - progress_bar.py[line:274] - INFO: epoch 008: 890 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7717.7, nsentences=120, sample_size=4356.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1922.1, ups=0.25, wpb=7717.7, bsz=120, num_updates=43110, lr=9.14341e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=25.1, wall=176679 2023-05-03 03:38:26 - progress_bar.py[line:274] - INFO: epoch 008: 900 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7695.8, nsentences=120, sample_size=4035.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1937.7, ups=0.25, wpb=7695.8, bsz=120, num_updates=43120, lr=9.13813e-06, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=176719 2023-05-03 03:39:06 - progress_bar.py[line:274] - INFO: epoch 008: 910 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7494.5, nsentences=120, sample_size=4271.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1873.3, ups=0.25, wpb=7494.5, bsz=120, num_updates=43130, lr=9.13285e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=176759 2023-05-03 03:39:46 - progress_bar.py[line:274] - INFO: epoch 008: 920 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7586.5, nsentences=120, sample_size=4117.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1905.1, ups=0.25, wpb=7586.5, bsz=120, num_updates=43140, lr=9.12756e-06, gnorm=0.957, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=176799 2023-05-03 03:40:25 - progress_bar.py[line:274] - INFO: epoch 008: 930 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7856.8, nsentences=120, sample_size=3532, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2014.2, ups=0.26, wpb=7856.8, bsz=120, num_updates=43150, lr=9.12228e-06, gnorm=1.035, clip=80, loss_scale=64, train_wall=39, gb_free=29.8, wall=176838 2023-05-03 03:41:05 - progress_bar.py[line:274] - INFO: epoch 008: 940 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7746.9, nsentences=120, sample_size=4201.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1950.2, ups=0.25, wpb=7746.9, bsz=120, num_updates=43160, lr=9.117e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=176877 2023-05-03 03:41:45 - progress_bar.py[line:274] - INFO: epoch 008: 950 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7694.6, nsentences=120, sample_size=3746.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1924.1, ups=0.25, wpb=7694.6, bsz=120, num_updates=43170, lr=9.11172e-06, gnorm=1.008, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=176917 2023-05-03 03:42:25 - progress_bar.py[line:274] - INFO: epoch 008: 960 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7582.5, nsentences=120, sample_size=4342.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1898.5, ups=0.25, wpb=7582.5, bsz=120, num_updates=43180, lr=9.10644e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=176957 2023-05-03 03:43:04 - progress_bar.py[line:274] - INFO: epoch 008: 970 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7840.7, nsentences=120, sample_size=3970.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2026, ups=0.26, wpb=7840.7, bsz=120, num_updates=43190, lr=9.10115e-06, gnorm=0.993, clip=40, loss_scale=64, train_wall=39, gb_free=28.6, wall=176996 2023-05-03 03:43:43 - progress_bar.py[line:274] - INFO: epoch 008: 980 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7552.9, nsentences=120, sample_size=4138.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1893.9, ups=0.25, wpb=7552.9, bsz=120, num_updates=43200, lr=9.09587e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=177036 2023-05-03 03:44:24 - progress_bar.py[line:274] - INFO: epoch 008: 990 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7842.7, nsentences=120, sample_size=3975.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1944, ups=0.25, wpb=7842.7, bsz=120, num_updates=43210, lr=9.09059e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=177076 2023-05-03 03:45:04 - progress_bar.py[line:274] - INFO: epoch 008: 1000 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7690.6, nsentences=120, sample_size=4057.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1914.1, ups=0.25, wpb=7690.6, bsz=120, num_updates=43220, lr=9.08531e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=177116 2023-05-03 03:45:44 - progress_bar.py[line:274] - INFO: epoch 008: 1010 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7652.5, nsentences=120, sample_size=4144.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1927.6, ups=0.25, wpb=7652.5, bsz=120, num_updates=43230, lr=9.08002e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=40, gb_free=29.1, wall=177156 2023-05-03 03:46:23 - progress_bar.py[line:274] - INFO: epoch 008: 1020 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7853.4, nsentences=120, sample_size=3957.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2001.2, ups=0.25, wpb=7853.4, bsz=120, num_updates=43240, lr=9.07474e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=177195 2023-05-03 03:47:03 - progress_bar.py[line:274] - INFO: epoch 008: 1030 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=8062.9, nsentences=120, sample_size=3804.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2000.9, ups=0.25, wpb=8062.9, bsz=120, num_updates=43250, lr=9.06946e-06, gnorm=0.992, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=177236 2023-05-03 03:47:43 - progress_bar.py[line:274] - INFO: epoch 008: 1040 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7807.2, nsentences=120, sample_size=4174, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1939.8, ups=0.25, wpb=7807.2, bsz=120, num_updates=43260, lr=9.06418e-06, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=177276 2023-05-03 03:48:23 - progress_bar.py[line:274] - INFO: epoch 008: 1050 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7604.3, nsentences=120, sample_size=4277.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1904.7, ups=0.25, wpb=7604.3, bsz=120, num_updates=43270, lr=9.0589e-06, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=177316 2023-05-03 03:49:03 - progress_bar.py[line:274] - INFO: epoch 008: 1060 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7752.5, nsentences=120, sample_size=4031.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1968.1, ups=0.25, wpb=7752.5, bsz=120, num_updates=43280, lr=9.05361e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=177355 2023-05-03 03:49:44 - progress_bar.py[line:274] - INFO: epoch 008: 1070 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7966.6, nsentences=120, sample_size=4370.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1940.8, ups=0.24, wpb=7966.6, bsz=120, num_updates=43290, lr=9.04833e-06, gnorm=0.952, clip=10, loss_scale=64, train_wall=41, gb_free=30.3, wall=177396 2023-05-03 03:50:24 - progress_bar.py[line:274] - INFO: epoch 008: 1080 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7977, nsentences=120, sample_size=3955.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1993.8, ups=0.25, wpb=7977, bsz=120, num_updates=43300, lr=9.04305e-06, gnorm=0.986, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=177436 2023-05-03 03:51:05 - progress_bar.py[line:274] - INFO: epoch 008: 1090 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=8046, nsentences=120, sample_size=4177.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1972.6, ups=0.25, wpb=8046, bsz=120, num_updates=43310, lr=9.03777e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=41, gb_free=29.5, wall=177477 2023-05-03 03:51:45 - progress_bar.py[line:274] - INFO: epoch 008: 1100 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7737.3, nsentences=120, sample_size=4060.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1919.5, ups=0.25, wpb=7737.3, bsz=120, num_updates=43320, lr=9.03249e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=177517 2023-05-03 03:52:25 - progress_bar.py[line:274] - INFO: epoch 008: 1110 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=8103.1, nsentences=120, sample_size=3799.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2041.4, ups=0.25, wpb=8103.1, bsz=120, num_updates=43330, lr=9.0272e-06, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=177557 2023-05-03 03:53:05 - progress_bar.py[line:274] - INFO: epoch 008: 1120 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7783, nsentences=120, sample_size=3835.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1943.6, ups=0.25, wpb=7783, bsz=120, num_updates=43340, lr=9.02192e-06, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=177597 2023-05-03 03:53:44 - progress_bar.py[line:274] - INFO: epoch 008: 1130 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7491.5, nsentences=120, sample_size=4082.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1893.3, ups=0.25, wpb=7491.5, bsz=120, num_updates=43350, lr=9.01664e-06, gnorm=0.983, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=177637 2023-05-03 03:54:24 - progress_bar.py[line:274] - INFO: epoch 008: 1140 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7663.4, nsentences=120, sample_size=4033.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1926.3, ups=0.25, wpb=7663.4, bsz=120, num_updates=43360, lr=9.01136e-06, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=177676 2023-05-03 03:55:04 - progress_bar.py[line:274] - INFO: epoch 008: 1150 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7911.9, nsentences=120, sample_size=3935.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1969.7, ups=0.25, wpb=7911.9, bsz=120, num_updates=43370, lr=9.00607e-06, gnorm=0.972, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=177717 2023-05-03 03:55:44 - progress_bar.py[line:274] - INFO: epoch 008: 1160 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7702.9, nsentences=120, sample_size=3946.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949.2, ups=0.25, wpb=7702.9, bsz=120, num_updates=43380, lr=9.00079e-06, gnorm=1.015, clip=70, loss_scale=64, train_wall=39, gb_free=29.8, wall=177756 2023-05-03 03:56:23 - progress_bar.py[line:274] - INFO: epoch 008: 1170 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7589.2, nsentences=120, sample_size=3991.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1916.2, ups=0.25, wpb=7589.2, bsz=120, num_updates=43390, lr=8.99551e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=177796 2023-05-03 03:57:03 - progress_bar.py[line:274] - INFO: epoch 008: 1180 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7900.4, nsentences=120, sample_size=4435.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1990.2, ups=0.25, wpb=7900.4, bsz=120, num_updates=43400, lr=8.99023e-06, gnorm=0.934, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=177835 2023-05-03 03:57:42 - progress_bar.py[line:274] - INFO: epoch 008: 1190 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7425.7, nsentences=120, sample_size=4263.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1898.4, ups=0.26, wpb=7425.7, bsz=120, num_updates=43410, lr=8.98495e-06, gnorm=0.946, clip=30, loss_scale=64, train_wall=39, gb_free=24.3, wall=177875 2023-05-03 03:58:22 - progress_bar.py[line:274] - INFO: epoch 008: 1200 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7881.8, nsentences=120, sample_size=4182.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1979.4, ups=0.25, wpb=7881.8, bsz=120, num_updates=43420, lr=8.97966e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=177914 2023-05-03 03:59:02 - progress_bar.py[line:274] - INFO: epoch 008: 1210 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7517.1, nsentences=120, sample_size=4542.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1877.1, ups=0.25, wpb=7517.1, bsz=120, num_updates=43430, lr=8.97438e-06, gnorm=0.935, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=177954 2023-05-03 03:59:42 - progress_bar.py[line:274] - INFO: epoch 008: 1220 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=8012.3, nsentences=120, sample_size=4102.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2016.6, ups=0.25, wpb=8012.3, bsz=120, num_updates=43440, lr=8.9691e-06, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=177994 2023-05-03 04:00:21 - progress_bar.py[line:274] - INFO: epoch 008: 1230 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7801.4, nsentences=120, sample_size=3877.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1983.2, ups=0.25, wpb=7801.4, bsz=120, num_updates=43450, lr=8.96382e-06, gnorm=0.974, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=178034 2023-05-03 04:01:01 - progress_bar.py[line:274] - INFO: epoch 008: 1240 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7785.1, nsentences=120, sample_size=4133.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1945.1, ups=0.25, wpb=7785.1, bsz=120, num_updates=43460, lr=8.95854e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=178074 2023-05-03 04:01:42 - progress_bar.py[line:274] - INFO: epoch 008: 1250 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7852, nsentences=120, sample_size=4011.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1924.4, ups=0.25, wpb=7852, bsz=120, num_updates=43470, lr=8.95325e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=41, gb_free=30.2, wall=178114 2023-05-03 04:02:22 - progress_bar.py[line:274] - INFO: epoch 008: 1260 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7788.4, nsentences=120, sample_size=3955.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1949.8, ups=0.25, wpb=7788.4, bsz=120, num_updates=43480, lr=8.94797e-06, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=31.1, wall=178154 2023-05-03 04:03:01 - progress_bar.py[line:274] - INFO: epoch 008: 1270 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7974.7, nsentences=120, sample_size=3897.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2020.8, ups=0.25, wpb=7974.7, bsz=120, num_updates=43490, lr=8.94269e-06, gnorm=1.01, clip=60, loss_scale=64, train_wall=39, gb_free=31, wall=178194 2023-05-03 04:03:41 - progress_bar.py[line:274] - INFO: epoch 008: 1280 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7974.6, nsentences=120, sample_size=3806.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2011.5, ups=0.25, wpb=7974.6, bsz=120, num_updates=43500, lr=8.93741e-06, gnorm=0.999, clip=30, loss_scale=64, train_wall=40, gb_free=23.8, wall=178233 2023-05-03 04:04:20 - progress_bar.py[line:274] - INFO: epoch 008: 1290 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7598.7, nsentences=120, sample_size=3998.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1952.5, ups=0.26, wpb=7598.7, bsz=120, num_updates=43510, lr=8.93212e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=178272 2023-05-03 04:05:00 - progress_bar.py[line:274] - INFO: epoch 008: 1300 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7834.9, nsentences=120, sample_size=4067.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1955, ups=0.25, wpb=7834.9, bsz=120, num_updates=43520, lr=8.92684e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=178312 2023-05-03 04:05:41 - progress_bar.py[line:274] - INFO: epoch 008: 1310 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7713.6, nsentences=120, sample_size=3976.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1897.9, ups=0.25, wpb=7713.6, bsz=120, num_updates=43530, lr=8.92156e-06, gnorm=0.974, clip=20, loss_scale=64, train_wall=41, gb_free=28.8, wall=178353 2023-05-03 04:06:20 - progress_bar.py[line:274] - INFO: epoch 008: 1320 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7602.4, nsentences=120, sample_size=4414.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1922.5, ups=0.25, wpb=7602.4, bsz=120, num_updates=43540, lr=8.91628e-06, gnorm=0.938, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=178393 2023-05-03 04:07:00 - progress_bar.py[line:274] - INFO: epoch 008: 1330 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7291.1, nsentences=120, sample_size=4030.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1820.5, ups=0.25, wpb=7291.1, bsz=120, num_updates=43550, lr=8.911e-06, gnorm=1.028, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=178433 2023-05-03 04:07:40 - progress_bar.py[line:274] - INFO: epoch 008: 1340 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=8237.1, nsentences=120, sample_size=3909.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2085.5, ups=0.25, wpb=8237.1, bsz=120, num_updates=43560, lr=8.90571e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=178472 2023-05-03 04:08:19 - progress_bar.py[line:274] - INFO: epoch 008: 1350 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7640.1, nsentences=120, sample_size=4294.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1956.4, ups=0.26, wpb=7640.1, bsz=120, num_updates=43570, lr=8.90043e-06, gnorm=0.955, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=178511 2023-05-03 04:08:58 - progress_bar.py[line:274] - INFO: epoch 008: 1360 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7577.8, nsentences=120, sample_size=4346.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1942.6, ups=0.26, wpb=7577.8, bsz=120, num_updates=43580, lr=8.89515e-06, gnorm=0.949, clip=10, loss_scale=64, train_wall=39, gb_free=25.1, wall=178550 2023-05-03 04:09:39 - progress_bar.py[line:274] - INFO: epoch 008: 1370 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7975.3, nsentences=120, sample_size=4159.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1944.9, ups=0.24, wpb=7975.3, bsz=120, num_updates=43590, lr=8.88987e-06, gnorm=0.962, clip=20, loss_scale=64, train_wall=41, gb_free=30.1, wall=178591 2023-05-03 04:10:19 - progress_bar.py[line:274] - INFO: epoch 008: 1380 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7667.1, nsentences=120, sample_size=4279.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1929.9, ups=0.25, wpb=7667.1, bsz=120, num_updates=43600, lr=8.88458e-06, gnorm=0.94, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=178631 2023-05-03 04:10:58 - progress_bar.py[line:274] - INFO: epoch 008: 1390 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7656.9, nsentences=120, sample_size=4159.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1941.5, ups=0.25, wpb=7656.9, bsz=120, num_updates=43610, lr=8.8793e-06, gnorm=0.951, clip=20, loss_scale=128, train_wall=39, gb_free=29.8, wall=178670 2023-05-03 04:11:38 - progress_bar.py[line:274] - INFO: epoch 008: 1400 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7887.2, nsentences=120, sample_size=3634.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1962.7, ups=0.25, wpb=7887.2, bsz=120, num_updates=43620, lr=8.87402e-06, gnorm=1.004, clip=50, loss_scale=128, train_wall=40, gb_free=29.1, wall=178711 2023-05-03 04:12:18 - progress_bar.py[line:274] - INFO: epoch 008: 1410 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7817.9, nsentences=120, sample_size=3842.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1968.8, ups=0.25, wpb=7817.9, bsz=120, num_updates=43630, lr=8.86874e-06, gnorm=0.996, clip=50, loss_scale=128, train_wall=40, gb_free=29.4, wall=178750 2023-05-03 04:12:57 - progress_bar.py[line:274] - INFO: epoch 008: 1420 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7846.2, nsentences=120, sample_size=4000.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1983.2, ups=0.25, wpb=7846.2, bsz=120, num_updates=43640, lr=8.86346e-06, gnorm=0.982, clip=50, loss_scale=128, train_wall=39, gb_free=30.5, wall=178790 2023-05-03 04:13:37 - progress_bar.py[line:274] - INFO: epoch 008: 1430 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7812.7, nsentences=120, sample_size=4254.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1975.3, ups=0.25, wpb=7812.7, bsz=120, num_updates=43650, lr=8.85817e-06, gnorm=0.954, clip=0, loss_scale=128, train_wall=39, gb_free=30.5, wall=178829 2023-05-03 04:14:18 - progress_bar.py[line:274] - INFO: epoch 008: 1440 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7768.3, nsentences=120, sample_size=3976.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1907.3, ups=0.25, wpb=7768.3, bsz=120, num_updates=43660, lr=8.85289e-06, gnorm=1.008, clip=60, loss_scale=128, train_wall=41, gb_free=29, wall=178870 2023-05-03 04:14:57 - progress_bar.py[line:274] - INFO: epoch 008: 1450 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7685.8, nsentences=120, sample_size=3798.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1932.1, ups=0.25, wpb=7685.8, bsz=120, num_updates=43670, lr=8.84761e-06, gnorm=0.998, clip=50, loss_scale=128, train_wall=40, gb_free=30, wall=178910 2023-05-03 04:15:37 - progress_bar.py[line:274] - INFO: epoch 008: 1460 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7391.8, nsentences=120, sample_size=3969, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1863.5, ups=0.25, wpb=7391.8, bsz=120, num_updates=43680, lr=8.84233e-06, gnorm=1.007, clip=50, loss_scale=128, train_wall=40, gb_free=29.8, wall=178950 2023-05-03 04:16:13 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 04:16:21 - progress_bar.py[line:274] - INFO: epoch 008: 1471 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7901.8, nsentences=120, sample_size=3944.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1796.6, ups=0.23, wpb=7901.8, bsz=120, num_updates=43690, lr=8.83705e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=44, gb_free=29.7, wall=178994 2023-05-03 04:17:01 - progress_bar.py[line:274] - INFO: epoch 008: 1481 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=8077, nsentences=120, sample_size=4127.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2013.4, ups=0.25, wpb=8077, bsz=120, num_updates=43700, lr=8.83176e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=179034 2023-05-03 04:17:41 - progress_bar.py[line:274] - INFO: epoch 008: 1491 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7593.7, nsentences=120, sample_size=4019.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1916.1, ups=0.25, wpb=7593.7, bsz=120, num_updates=43710, lr=8.82648e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=179073 2023-05-03 04:18:21 - progress_bar.py[line:274] - INFO: epoch 008: 1501 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7762.2, nsentences=120, sample_size=4138, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1921.5, ups=0.25, wpb=7762.2, bsz=120, num_updates=43720, lr=8.8212e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=179114 2023-05-03 04:19:02 - progress_bar.py[line:274] - INFO: epoch 008: 1511 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7645.9, nsentences=120, sample_size=4380.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1899.8, ups=0.25, wpb=7645.9, bsz=120, num_updates=43730, lr=8.81592e-06, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=179154 2023-05-03 04:19:41 - progress_bar.py[line:274] - INFO: epoch 008: 1521 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7379.6, nsentences=120, sample_size=4121.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1871.2, ups=0.25, wpb=7379.6, bsz=120, num_updates=43740, lr=8.81063e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=25.6, wall=179193 2023-05-03 04:20:20 - progress_bar.py[line:274] - INFO: epoch 008: 1531 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7494.4, nsentences=120, sample_size=4036.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1914.6, ups=0.26, wpb=7494.4, bsz=120, num_updates=43750, lr=8.80535e-06, gnorm=1.017, clip=60, loss_scale=64, train_wall=39, gb_free=29.6, wall=179233 2023-05-03 04:21:00 - progress_bar.py[line:274] - INFO: epoch 008: 1541 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7707.1, nsentences=120, sample_size=3693.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1921.9, ups=0.25, wpb=7707.1, bsz=120, num_updates=43760, lr=8.80007e-06, gnorm=1.037, clip=60, loss_scale=64, train_wall=40, gb_free=29.5, wall=179273 2023-05-03 04:21:41 - progress_bar.py[line:274] - INFO: epoch 008: 1551 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7943.6, nsentences=120, sample_size=4215.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1954, ups=0.25, wpb=7943.6, bsz=120, num_updates=43770, lr=8.79479e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=41, gb_free=30, wall=179313 2023-05-03 04:22:21 - progress_bar.py[line:274] - INFO: epoch 008: 1561 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7790.6, nsentences=120, sample_size=4149.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1952.4, ups=0.25, wpb=7790.6, bsz=120, num_updates=43780, lr=8.78951e-06, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=179353 2023-05-03 04:23:00 - progress_bar.py[line:274] - INFO: epoch 008: 1571 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7776.1, nsentences=120, sample_size=4103, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1989.5, ups=0.26, wpb=7776.1, bsz=120, num_updates=43790, lr=8.78422e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=179392 2023-05-03 04:23:41 - progress_bar.py[line:274] - INFO: epoch 008: 1581 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7734.3, nsentences=120, sample_size=4355.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1895.1, ups=0.25, wpb=7734.3, bsz=120, num_updates=43800, lr=8.77894e-06, gnorm=0.951, clip=30, loss_scale=64, train_wall=41, gb_free=29.8, wall=179433 2023-05-03 04:24:21 - progress_bar.py[line:274] - INFO: epoch 008: 1591 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7605.8, nsentences=120, sample_size=3951.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1872.1, ups=0.25, wpb=7605.8, bsz=120, num_updates=43810, lr=8.77366e-06, gnorm=1.013, clip=60, loss_scale=64, train_wall=41, gb_free=30.1, wall=179474 2023-05-03 04:25:01 - progress_bar.py[line:274] - INFO: epoch 008: 1601 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7750, nsentences=120, sample_size=3822.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1961.3, ups=0.25, wpb=7750, bsz=120, num_updates=43820, lr=8.76838e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=179513 2023-05-03 04:25:40 - progress_bar.py[line:274] - INFO: epoch 008: 1611 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7661.1, nsentences=120, sample_size=3831.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1940.4, ups=0.25, wpb=7661.1, bsz=120, num_updates=43830, lr=8.7631e-06, gnorm=1.023, clip=70, loss_scale=64, train_wall=39, gb_free=30.6, wall=179553 2023-05-03 04:26:20 - progress_bar.py[line:274] - INFO: epoch 008: 1621 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=8109.9, nsentences=120, sample_size=3847.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2033.4, ups=0.25, wpb=8109.9, bsz=120, num_updates=43840, lr=8.75781e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=179593 2023-05-03 04:27:00 - progress_bar.py[line:274] - INFO: epoch 008: 1631 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7513.9, nsentences=120, sample_size=3979.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1897.1, ups=0.25, wpb=7513.9, bsz=120, num_updates=43850, lr=8.75253e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=179632 2023-05-03 04:27:40 - progress_bar.py[line:274] - INFO: epoch 008: 1641 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7628.3, nsentences=120, sample_size=4080.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1917.5, ups=0.25, wpb=7628.3, bsz=120, num_updates=43860, lr=8.74725e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=179672 2023-05-03 04:28:19 - progress_bar.py[line:274] - INFO: epoch 008: 1651 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=8139, nsentences=120, sample_size=4138.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2045.7, ups=0.25, wpb=8139, bsz=120, num_updates=43870, lr=8.74197e-06, gnorm=0.957, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=179712 2023-05-03 04:28:59 - progress_bar.py[line:274] - INFO: epoch 008: 1661 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7510.8, nsentences=120, sample_size=3881.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1890.9, ups=0.25, wpb=7510.8, bsz=120, num_updates=43880, lr=8.73668e-06, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=179752 2023-05-03 04:29:39 - progress_bar.py[line:274] - INFO: epoch 008: 1671 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7580.5, nsentences=120, sample_size=3905.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1918, ups=0.25, wpb=7580.5, bsz=120, num_updates=43890, lr=8.7314e-06, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=179791 2023-05-03 04:30:18 - progress_bar.py[line:274] - INFO: epoch 008: 1681 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7327.4, nsentences=120, sample_size=4058.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1872.2, ups=0.26, wpb=7327.4, bsz=120, num_updates=43900, lr=8.72612e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=179830 2023-05-03 04:30:59 - progress_bar.py[line:274] - INFO: epoch 008: 1691 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7979.2, nsentences=120, sample_size=4013.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1947.7, ups=0.24, wpb=7979.2, bsz=120, num_updates=43910, lr=8.72084e-06, gnorm=1.003, clip=40, loss_scale=64, train_wall=41, gb_free=30.9, wall=179871 2023-05-03 04:31:38 - progress_bar.py[line:274] - INFO: epoch 008: 1701 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7861.2, nsentences=120, sample_size=3722.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1980.9, ups=0.25, wpb=7861.2, bsz=120, num_updates=43920, lr=8.71556e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=31.3, wall=179911 2023-05-03 04:32:19 - progress_bar.py[line:274] - INFO: epoch 008: 1711 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7831.9, nsentences=120, sample_size=4150, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1951.9, ups=0.25, wpb=7831.9, bsz=120, num_updates=43930, lr=8.71027e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=40, gb_free=29.4, wall=179951 2023-05-03 04:32:57 - progress_bar.py[line:274] - INFO: epoch 008: 1721 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7867.9, nsentences=120, sample_size=4043.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2025.5, ups=0.26, wpb=7867.9, bsz=120, num_updates=43940, lr=8.70499e-06, gnorm=0.953, clip=10, loss_scale=64, train_wall=39, gb_free=29.3, wall=179990 2023-05-03 04:33:37 - progress_bar.py[line:274] - INFO: epoch 008: 1731 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7815.4, nsentences=120, sample_size=4330.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1982.3, ups=0.25, wpb=7815.4, bsz=120, num_updates=43950, lr=8.69971e-06, gnorm=0.946, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=180029 2023-05-03 04:34:16 - progress_bar.py[line:274] - INFO: epoch 008: 1741 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7669, nsentences=120, sample_size=4030.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1945, ups=0.25, wpb=7669, bsz=120, num_updates=43960, lr=8.69443e-06, gnorm=1.009, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=180069 2023-05-03 04:34:57 - progress_bar.py[line:274] - INFO: epoch 008: 1751 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7867.9, nsentences=120, sample_size=4167.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1945.1, ups=0.25, wpb=7867.9, bsz=120, num_updates=43970, lr=8.68915e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=180109 2023-05-03 04:35:36 - progress_bar.py[line:274] - INFO: epoch 008: 1761 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7564.8, nsentences=120, sample_size=4055.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1924.9, ups=0.25, wpb=7564.8, bsz=120, num_updates=43980, lr=8.68386e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=180148 2023-05-03 04:36:16 - progress_bar.py[line:274] - INFO: epoch 008: 1771 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7611.9, nsentences=120, sample_size=4171.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1920.2, ups=0.25, wpb=7611.9, bsz=120, num_updates=43990, lr=8.67858e-06, gnorm=0.974, clip=50, loss_scale=64, train_wall=40, gb_free=28.8, wall=180188 2023-05-03 04:36:55 - progress_bar.py[line:274] - INFO: epoch 008: 1781 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7903.5, nsentences=120, sample_size=3687.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1984.8, ups=0.25, wpb=7903.5, bsz=120, num_updates=44000, lr=8.6733e-06, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=180228 2023-05-03 04:36:55 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 04:36:57 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 04:36:57 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:36:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:36:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:36:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:36:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:14 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 04:37:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:37:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:26 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 04:37:26 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:37:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:37 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 04:37:37 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:37:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:41 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 04:37:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:37:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:46 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 04:37:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 04:37:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 04:37:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 04:37:47 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.245 | loss_v1 0 | loss_v2 0 | nll_loss 2.079 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.23 | score 0.7505 | wps 3303.6 | wpb 3202.1 | bsz 39.4 | num_updates 44000 | best_score 0.7627 2023-05-03 04:37:47 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 44000 updates 2023-05-03 04:37:47 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_44000.pt 2023-05-03 04:38:11 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_44000.pt 2023-05-03 04:38:26 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_44000.pt (epoch 8 @ 44000 updates, score 0.7505) (writing took 39.272813766030595 seconds) 2023-05-03 04:39:04 - progress_bar.py[line:274] - INFO: epoch 008: 1791 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7588.6, nsentences=120, sample_size=4055, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=588.2, ups=0.08, wpb=7588.6, bsz=120, num_updates=44010, lr=8.66802e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=180357 2023-05-03 04:39:44 - progress_bar.py[line:274] - INFO: epoch 008: 1801 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7691.6, nsentences=120, sample_size=3743.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1934.9, ups=0.25, wpb=7691.6, bsz=120, num_updates=44020, lr=8.66273e-06, gnorm=0.99, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=180397 2023-05-03 04:40:24 - progress_bar.py[line:274] - INFO: epoch 008: 1811 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7582.6, nsentences=120, sample_size=4198.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1909.7, ups=0.25, wpb=7582.6, bsz=120, num_updates=44030, lr=8.65745e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=180436 2023-05-03 04:41:04 - progress_bar.py[line:274] - INFO: epoch 008: 1821 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7614.8, nsentences=120, sample_size=4004.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1895, ups=0.25, wpb=7614.8, bsz=120, num_updates=44040, lr=8.65217e-06, gnorm=0.973, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=180477 2023-05-03 04:41:44 - progress_bar.py[line:274] - INFO: epoch 008: 1831 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7774.5, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1969.2, ups=0.25, wpb=7774.5, bsz=120, num_updates=44050, lr=8.64689e-06, gnorm=1.004, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=180516 2023-05-03 04:42:25 - progress_bar.py[line:274] - INFO: epoch 008: 1841 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7651.8, nsentences=120, sample_size=3828.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1866.5, ups=0.24, wpb=7651.8, bsz=120, num_updates=44060, lr=8.64161e-06, gnorm=1, clip=50, loss_scale=64, train_wall=41, gb_free=30.9, wall=180557 2023-05-03 04:43:03 - progress_bar.py[line:274] - INFO: epoch 008: 1851 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7804, nsentences=120, sample_size=4093.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2010.8, ups=0.26, wpb=7804, bsz=120, num_updates=44070, lr=8.63632e-06, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=180596 2023-05-03 04:43:42 - progress_bar.py[line:274] - INFO: epoch 008: 1861 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7753.5, nsentences=120, sample_size=3798, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1987.8, ups=0.26, wpb=7753.5, bsz=120, num_updates=44080, lr=8.63104e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=180635 2023-05-03 04:44:22 - progress_bar.py[line:274] - INFO: epoch 008: 1871 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7565.4, nsentences=120, sample_size=4242.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1896.8, ups=0.25, wpb=7565.4, bsz=120, num_updates=44090, lr=8.62576e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=31.1, wall=180675 2023-05-03 04:45:01 - progress_bar.py[line:274] - INFO: epoch 008: 1881 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7639.7, nsentences=120, sample_size=4175, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1953.7, ups=0.26, wpb=7639.7, bsz=120, num_updates=44100, lr=8.62048e-06, gnorm=0.975, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=180714 2023-05-03 04:45:41 - progress_bar.py[line:274] - INFO: epoch 008: 1891 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7702.7, nsentences=120, sample_size=3985.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1924.5, ups=0.25, wpb=7702.7, bsz=120, num_updates=44110, lr=8.61519e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=180754 2023-05-03 04:46:22 - progress_bar.py[line:274] - INFO: epoch 008: 1901 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7930.2, nsentences=120, sample_size=3800.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1962.1, ups=0.25, wpb=7930.2, bsz=120, num_updates=44120, lr=8.60991e-06, gnorm=1.034, clip=80, loss_scale=64, train_wall=40, gb_free=30.6, wall=180794 2023-05-03 04:47:01 - progress_bar.py[line:274] - INFO: epoch 008: 1911 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7485.1, nsentences=120, sample_size=3905.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1894, ups=0.25, wpb=7485.1, bsz=120, num_updates=44130, lr=8.60463e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=29.2, wall=180834 2023-05-03 04:47:41 - progress_bar.py[line:274] - INFO: epoch 008: 1921 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7894.8, nsentences=120, sample_size=3882.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1997, ups=0.25, wpb=7894.8, bsz=120, num_updates=44140, lr=8.59935e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=180873 2023-05-03 04:48:21 - progress_bar.py[line:274] - INFO: epoch 008: 1931 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7791.4, nsentences=120, sample_size=3560.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1954.1, ups=0.25, wpb=7791.4, bsz=120, num_updates=44150, lr=8.59407e-06, gnorm=1.021, clip=70, loss_scale=64, train_wall=40, gb_free=28.9, wall=180913 2023-05-03 04:49:01 - progress_bar.py[line:274] - INFO: epoch 008: 1941 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7655, nsentences=120, sample_size=3921.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1920.4, ups=0.25, wpb=7655, bsz=120, num_updates=44160, lr=8.58878e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=180953 2023-05-03 04:49:41 - progress_bar.py[line:274] - INFO: epoch 008: 1951 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7743.7, nsentences=120, sample_size=3992.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1924.2, ups=0.25, wpb=7743.7, bsz=120, num_updates=44170, lr=8.5835e-06, gnorm=0.996, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=180993 2023-05-03 04:50:20 - progress_bar.py[line:274] - INFO: epoch 008: 1961 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7542.4, nsentences=120, sample_size=4016.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1905.8, ups=0.25, wpb=7542.4, bsz=120, num_updates=44180, lr=8.57822e-06, gnorm=0.96, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=181033 2023-05-03 04:51:01 - progress_bar.py[line:274] - INFO: epoch 008: 1971 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7596.1, nsentences=120, sample_size=3888.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1872.2, ups=0.25, wpb=7596.1, bsz=120, num_updates=44190, lr=8.57294e-06, gnorm=0.993, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=181073 2023-05-03 04:51:41 - progress_bar.py[line:274] - INFO: epoch 008: 1981 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7695.3, nsentences=120, sample_size=4006.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1944.8, ups=0.25, wpb=7695.3, bsz=120, num_updates=44200, lr=8.56766e-06, gnorm=1.004, clip=60, loss_scale=128, train_wall=39, gb_free=27, wall=181113 2023-05-03 04:52:20 - progress_bar.py[line:274] - INFO: epoch 008: 1991 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7687.3, nsentences=120, sample_size=4043.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1935.9, ups=0.25, wpb=7687.3, bsz=120, num_updates=44210, lr=8.56237e-06, gnorm=0.981, clip=30, loss_scale=128, train_wall=40, gb_free=30.9, wall=181153 2023-05-03 04:52:52 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 04:53:04 - progress_bar.py[line:274] - INFO: epoch 008: 2002 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7803.8, nsentences=120, sample_size=3908.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1792.3, ups=0.23, wpb=7803.8, bsz=120, num_updates=44220, lr=8.55709e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=43, gb_free=28, wall=181196 2023-05-03 04:53:43 - progress_bar.py[line:274] - INFO: epoch 008: 2012 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7571.1, nsentences=120, sample_size=3716.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1924.3, ups=0.25, wpb=7571.1, bsz=120, num_updates=44230, lr=8.55181e-06, gnorm=1.015, clip=60, loss_scale=64, train_wall=39, gb_free=29.3, wall=181236 2023-05-03 04:54:23 - progress_bar.py[line:274] - INFO: epoch 008: 2022 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7815.4, nsentences=120, sample_size=3768.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1970.8, ups=0.25, wpb=7815.4, bsz=120, num_updates=44240, lr=8.54653e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=181275 2023-05-03 04:55:03 - progress_bar.py[line:274] - INFO: epoch 008: 2032 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7931.3, nsentences=120, sample_size=3925.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1960.6, ups=0.25, wpb=7931.3, bsz=120, num_updates=44250, lr=8.54124e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=181316 2023-05-03 04:55:43 - progress_bar.py[line:274] - INFO: epoch 008: 2042 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7771.6, nsentences=120, sample_size=3878.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1975.9, ups=0.25, wpb=7771.6, bsz=120, num_updates=44260, lr=8.53596e-06, gnorm=1.006, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=181355 2023-05-03 04:56:22 - progress_bar.py[line:274] - INFO: epoch 008: 2052 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7672.7, nsentences=120, sample_size=4121.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953.3, ups=0.25, wpb=7672.7, bsz=120, num_updates=44270, lr=8.53068e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=181394 2023-05-03 04:57:02 - progress_bar.py[line:274] - INFO: epoch 008: 2062 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7585.8, nsentences=120, sample_size=4167.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1915.8, ups=0.25, wpb=7585.8, bsz=120, num_updates=44280, lr=8.5254e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=181434 2023-05-03 04:57:41 - progress_bar.py[line:274] - INFO: epoch 008: 2072 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7823.9, nsentences=120, sample_size=3953.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1986.2, ups=0.25, wpb=7823.9, bsz=120, num_updates=44290, lr=8.52012e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=39, gb_free=28.7, wall=181473 2023-05-03 04:58:21 - progress_bar.py[line:274] - INFO: epoch 008: 2082 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7675.5, nsentences=120, sample_size=4004.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1927.1, ups=0.25, wpb=7675.5, bsz=120, num_updates=44300, lr=8.51483e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=181513 2023-05-03 04:59:01 - progress_bar.py[line:274] - INFO: epoch 008: 2092 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7720.5, nsentences=120, sample_size=3861.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1895.7, ups=0.25, wpb=7720.5, bsz=120, num_updates=44310, lr=8.50955e-06, gnorm=1.015, clip=70, loss_scale=64, train_wall=41, gb_free=30.6, wall=181554 2023-05-03 04:59:42 - progress_bar.py[line:274] - INFO: epoch 008: 2102 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7671.9, nsentences=120, sample_size=4221.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1897.5, ups=0.25, wpb=7671.9, bsz=120, num_updates=44320, lr=8.50427e-06, gnorm=0.973, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=181594 2023-05-03 05:00:22 - progress_bar.py[line:274] - INFO: epoch 008: 2112 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7624.4, nsentences=120, sample_size=3974.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1914, ups=0.25, wpb=7624.4, bsz=120, num_updates=44330, lr=8.49899e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=181634 2023-05-03 05:01:02 - progress_bar.py[line:274] - INFO: epoch 008: 2122 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7848.9, nsentences=120, sample_size=4104.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1936, ups=0.25, wpb=7848.9, bsz=120, num_updates=44340, lr=8.49371e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=181675 2023-05-03 05:01:41 - progress_bar.py[line:274] - INFO: epoch 008: 2132 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7554.5, nsentences=120, sample_size=4080.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1926.9, ups=0.26, wpb=7554.5, bsz=120, num_updates=44350, lr=8.48842e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=39, gb_free=26, wall=181714 2023-05-03 05:02:21 - progress_bar.py[line:274] - INFO: epoch 008: 2142 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7790.2, nsentences=120, sample_size=4060.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1974.3, ups=0.25, wpb=7790.2, bsz=120, num_updates=44360, lr=8.48314e-06, gnorm=0.996, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=181753 2023-05-03 05:03:01 - progress_bar.py[line:274] - INFO: epoch 008: 2152 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7731.8, nsentences=120, sample_size=4002.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1949.3, ups=0.25, wpb=7731.8, bsz=120, num_updates=44370, lr=8.47786e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=181793 2023-05-03 05:03:40 - progress_bar.py[line:274] - INFO: epoch 008: 2162 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7459.4, nsentences=120, sample_size=4176, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1903.2, ups=0.26, wpb=7459.4, bsz=120, num_updates=44380, lr=8.47258e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=181832 2023-05-03 05:04:20 - progress_bar.py[line:274] - INFO: epoch 008: 2172 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7847.6, nsentences=120, sample_size=3724.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1972.6, ups=0.25, wpb=7847.6, bsz=120, num_updates=44390, lr=8.46729e-06, gnorm=0.977, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=181872 2023-05-03 05:05:00 - progress_bar.py[line:274] - INFO: epoch 008: 2182 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7873.6, nsentences=120, sample_size=3920.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1965.9, ups=0.25, wpb=7873.6, bsz=120, num_updates=44400, lr=8.46201e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=181912 2023-05-03 05:05:40 - progress_bar.py[line:274] - INFO: epoch 008: 2192 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7639.1, nsentences=120, sample_size=4215.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1900.6, ups=0.25, wpb=7639.1, bsz=120, num_updates=44410, lr=8.45673e-06, gnorm=0.936, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=181952 2023-05-03 05:06:20 - progress_bar.py[line:274] - INFO: epoch 008: 2202 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7674.9, nsentences=120, sample_size=4059.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1913.2, ups=0.25, wpb=7674.9, bsz=120, num_updates=44420, lr=8.45145e-06, gnorm=0.932, clip=10, loss_scale=64, train_wall=40, gb_free=27.8, wall=181992 2023-05-03 05:07:00 - progress_bar.py[line:274] - INFO: epoch 008: 2212 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7443.8, nsentences=120, sample_size=3827.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1874.3, ups=0.25, wpb=7443.8, bsz=120, num_updates=44430, lr=8.44617e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=182032 2023-05-03 05:07:40 - progress_bar.py[line:274] - INFO: epoch 008: 2222 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=8164.1, nsentences=120, sample_size=3956.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1999.6, ups=0.24, wpb=8164.1, bsz=120, num_updates=44440, lr=8.44088e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=41, gb_free=29.1, wall=182073 2023-05-03 05:08:21 - progress_bar.py[line:274] - INFO: epoch 008: 2232 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7914.6, nsentences=120, sample_size=3938.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1964.7, ups=0.25, wpb=7914.6, bsz=120, num_updates=44450, lr=8.4356e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=25.9, wall=182113 2023-05-03 05:09:01 - progress_bar.py[line:274] - INFO: epoch 008: 2242 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7715.5, nsentences=120, sample_size=4267, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1908.1, ups=0.25, wpb=7715.5, bsz=120, num_updates=44460, lr=8.43032e-06, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=182154 2023-05-03 05:09:40 - progress_bar.py[line:274] - INFO: epoch 008: 2252 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7314.5, nsentences=120, sample_size=3985.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1862.8, ups=0.25, wpb=7314.5, bsz=120, num_updates=44470, lr=8.42504e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=39, gb_free=29.5, wall=182193 2023-05-03 05:10:21 - progress_bar.py[line:274] - INFO: epoch 008: 2262 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7862.1, nsentences=120, sample_size=4219.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1939.2, ups=0.25, wpb=7862.1, bsz=120, num_updates=44480, lr=8.41976e-06, gnorm=0.958, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=182233 2023-05-03 05:11:02 - progress_bar.py[line:274] - INFO: epoch 008: 2272 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7729.3, nsentences=120, sample_size=4090, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1908.6, ups=0.25, wpb=7729.3, bsz=120, num_updates=44490, lr=8.41447e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=182274 2023-05-03 05:11:42 - progress_bar.py[line:274] - INFO: epoch 008: 2282 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7783.1, nsentences=120, sample_size=3901.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1916, ups=0.25, wpb=7783.1, bsz=120, num_updates=44500, lr=8.40919e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=41, gb_free=30.7, wall=182315 2023-05-03 05:12:21 - progress_bar.py[line:274] - INFO: epoch 008: 2292 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7740.8, nsentences=120, sample_size=3976.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1995.4, ups=0.26, wpb=7740.8, bsz=120, num_updates=44510, lr=8.40391e-06, gnorm=0.995, clip=60, loss_scale=64, train_wall=39, gb_free=30.8, wall=182353 2023-05-03 05:13:00 - progress_bar.py[line:274] - INFO: epoch 008: 2302 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7632.4, nsentences=120, sample_size=4229.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1948.7, ups=0.26, wpb=7632.4, bsz=120, num_updates=44520, lr=8.39863e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=30.7, wall=182393 2023-05-03 05:13:40 - progress_bar.py[line:274] - INFO: epoch 008: 2312 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7754.3, nsentences=120, sample_size=3761.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.6, ups=0.25, wpb=7754.3, bsz=120, num_updates=44530, lr=8.39334e-06, gnorm=1.013, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=182433 2023-05-03 05:14:19 - progress_bar.py[line:274] - INFO: epoch 008: 2322 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7513.7, nsentences=120, sample_size=3873.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1928.7, ups=0.26, wpb=7513.7, bsz=120, num_updates=44540, lr=8.38806e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=39, gb_free=28.2, wall=182472 2023-05-03 05:15:00 - progress_bar.py[line:274] - INFO: epoch 008: 2332 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7994.3, nsentences=120, sample_size=3754.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1987.7, ups=0.25, wpb=7994.3, bsz=120, num_updates=44550, lr=8.38278e-06, gnorm=0.993, clip=50, loss_scale=64, train_wall=40, gb_free=29, wall=182512 2023-05-03 05:15:39 - progress_bar.py[line:274] - INFO: epoch 008: 2342 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=8050.4, nsentences=120, sample_size=3987.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2023.9, ups=0.25, wpb=8050.4, bsz=120, num_updates=44560, lr=8.3775e-06, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=27.8, wall=182552 2023-05-03 05:16:19 - progress_bar.py[line:274] - INFO: epoch 008: 2352 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7911.2, nsentences=120, sample_size=3973.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1994.7, ups=0.25, wpb=7911.2, bsz=120, num_updates=44570, lr=8.37222e-06, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=182591 2023-05-03 05:16:59 - progress_bar.py[line:274] - INFO: epoch 008: 2362 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7717.9, nsentences=120, sample_size=4006.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1936.1, ups=0.25, wpb=7717.9, bsz=120, num_updates=44580, lr=8.36693e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=27.6, wall=182631 2023-05-03 05:17:39 - progress_bar.py[line:274] - INFO: epoch 008: 2372 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7850.3, nsentences=120, sample_size=3852.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1973.1, ups=0.25, wpb=7850.3, bsz=120, num_updates=44590, lr=8.36165e-06, gnorm=0.989, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=182671 2023-05-03 05:18:18 - progress_bar.py[line:274] - INFO: epoch 008: 2382 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7530.7, nsentences=120, sample_size=3980.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1909.5, ups=0.25, wpb=7530.7, bsz=120, num_updates=44600, lr=8.35637e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=182711 2023-05-03 05:18:58 - progress_bar.py[line:274] - INFO: epoch 008: 2392 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7644.8, nsentences=120, sample_size=4176.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1937.3, ups=0.25, wpb=7644.8, bsz=120, num_updates=44610, lr=8.35109e-06, gnorm=0.96, clip=40, loss_scale=64, train_wall=39, gb_free=30.9, wall=182750 2023-05-03 05:19:38 - progress_bar.py[line:274] - INFO: epoch 008: 2402 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7726.5, nsentences=120, sample_size=4016, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1935.3, ups=0.25, wpb=7726.5, bsz=120, num_updates=44620, lr=8.34581e-06, gnorm=0.992, clip=30, loss_scale=64, train_wall=40, gb_free=27.4, wall=182790 2023-05-03 05:20:17 - progress_bar.py[line:274] - INFO: epoch 008: 2412 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7726.7, nsentences=120, sample_size=4180.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1971, ups=0.26, wpb=7726.7, bsz=120, num_updates=44630, lr=8.34052e-06, gnorm=0.966, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=182829 2023-05-03 05:20:57 - progress_bar.py[line:274] - INFO: epoch 008: 2422 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7800, nsentences=120, sample_size=3897.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1923, ups=0.25, wpb=7800, bsz=120, num_updates=44640, lr=8.33524e-06, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=182870 2023-05-03 05:21:37 - progress_bar.py[line:274] - INFO: epoch 008: 2432 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7650, nsentences=120, sample_size=4015.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1921.2, ups=0.25, wpb=7650, bsz=120, num_updates=44650, lr=8.32996e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=27.1, wall=182910 2023-05-03 05:22:17 - progress_bar.py[line:274] - INFO: epoch 008: 2442 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7769.7, nsentences=120, sample_size=4055.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1934, ups=0.25, wpb=7769.7, bsz=120, num_updates=44660, lr=8.32468e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=182950 2023-05-03 05:22:57 - progress_bar.py[line:274] - INFO: epoch 008: 2452 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7823.4, nsentences=120, sample_size=3743.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1965.4, ups=0.25, wpb=7823.4, bsz=120, num_updates=44670, lr=8.31939e-06, gnorm=1.027, clip=60, loss_scale=64, train_wall=40, gb_free=29.7, wall=182990 2023-05-03 05:23:37 - progress_bar.py[line:274] - INFO: epoch 008: 2462 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7728.1, nsentences=120, sample_size=3911.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1939.6, ups=0.25, wpb=7728.1, bsz=120, num_updates=44680, lr=8.31411e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=183029 2023-05-03 05:24:17 - progress_bar.py[line:274] - INFO: epoch 008: 2472 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7613.2, nsentences=120, sample_size=3843.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1904.4, ups=0.25, wpb=7613.2, bsz=120, num_updates=44690, lr=8.30883e-06, gnorm=1.026, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=183069 2023-05-03 05:24:57 - progress_bar.py[line:274] - INFO: epoch 008: 2482 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7798.5, nsentences=120, sample_size=4091, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1964.6, ups=0.25, wpb=7798.5, bsz=120, num_updates=44700, lr=8.30355e-06, gnorm=1.018, clip=40, loss_scale=64, train_wall=40, gb_free=27.2, wall=183109 2023-05-03 05:25:36 - progress_bar.py[line:274] - INFO: epoch 008: 2492 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7666.8, nsentences=120, sample_size=3999.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1936.1, ups=0.25, wpb=7666.8, bsz=120, num_updates=44710, lr=8.29827e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=183149 2023-05-03 05:26:16 - progress_bar.py[line:274] - INFO: epoch 008: 2502 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7468.2, nsentences=120, sample_size=4071.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1895.7, ups=0.25, wpb=7468.2, bsz=120, num_updates=44720, lr=8.29298e-06, gnorm=0.968, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=183188 2023-05-03 05:26:55 - progress_bar.py[line:274] - INFO: epoch 008: 2512 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7649.1, nsentences=120, sample_size=3808.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1943.2, ups=0.25, wpb=7649.1, bsz=120, num_updates=44730, lr=8.2877e-06, gnorm=1.004, clip=60, loss_scale=128, train_wall=39, gb_free=31, wall=183227 2023-05-03 05:27:26 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 05:27:38 - progress_bar.py[line:274] - INFO: epoch 008: 2523 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7855.6, nsentences=120, sample_size=3772.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1827.9, ups=0.23, wpb=7855.6, bsz=120, num_updates=44740, lr=8.28242e-06, gnorm=1.051, clip=80, loss_scale=64, train_wall=43, gb_free=29.6, wall=183270 2023-05-03 05:28:18 - progress_bar.py[line:274] - INFO: epoch 008: 2533 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7629.9, nsentences=120, sample_size=4251.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1908.9, ups=0.25, wpb=7629.9, bsz=120, num_updates=44750, lr=8.27714e-06, gnorm=0.944, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=183310 2023-05-03 05:28:58 - progress_bar.py[line:274] - INFO: epoch 008: 2543 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7643.4, nsentences=120, sample_size=3871.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1912.7, ups=0.25, wpb=7643.4, bsz=120, num_updates=44760, lr=8.27185e-06, gnorm=1.032, clip=70, loss_scale=64, train_wall=40, gb_free=30.2, wall=183350 2023-05-03 05:29:38 - progress_bar.py[line:274] - INFO: epoch 008: 2553 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7803.8, nsentences=120, sample_size=3807.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1926.2, ups=0.25, wpb=7803.8, bsz=120, num_updates=44770, lr=8.26657e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=183391 2023-05-03 05:30:18 - progress_bar.py[line:274] - INFO: epoch 008: 2563 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7825.3, nsentences=120, sample_size=3803.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1968.4, ups=0.25, wpb=7825.3, bsz=120, num_updates=44780, lr=8.26129e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=40, gb_free=28.8, wall=183431 2023-05-03 05:30:58 - progress_bar.py[line:274] - INFO: epoch 008: 2573 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7512.4, nsentences=120, sample_size=4247.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1900.4, ups=0.25, wpb=7512.4, bsz=120, num_updates=44790, lr=8.25601e-06, gnorm=0.986, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=183470 2023-05-03 05:31:37 - progress_bar.py[line:274] - INFO: epoch 008: 2583 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7541.3, nsentences=120, sample_size=4200.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1911.5, ups=0.25, wpb=7541.3, bsz=120, num_updates=44800, lr=8.25073e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=31.1, wall=183510 2023-05-03 05:32:16 - progress_bar.py[line:274] - INFO: epoch 008: 2593 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7516.2, nsentences=120, sample_size=4362.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1925.6, ups=0.26, wpb=7516.2, bsz=120, num_updates=44810, lr=8.24544e-06, gnorm=0.943, clip=30, loss_scale=64, train_wall=39, gb_free=31.6, wall=183549 2023-05-03 05:32:56 - progress_bar.py[line:274] - INFO: epoch 008: 2603 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7810.6, nsentences=120, sample_size=3951.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1967.3, ups=0.25, wpb=7810.6, bsz=120, num_updates=44820, lr=8.24016e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=28.6, wall=183588 2023-05-03 05:33:36 - progress_bar.py[line:274] - INFO: epoch 008: 2613 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7699.3, nsentences=120, sample_size=4102.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1921.9, ups=0.25, wpb=7699.3, bsz=120, num_updates=44830, lr=8.23488e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=183628 2023-05-03 05:34:16 - progress_bar.py[line:274] - INFO: epoch 008: 2623 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7477.8, nsentences=120, sample_size=4209.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1877.1, ups=0.25, wpb=7477.8, bsz=120, num_updates=44840, lr=8.2296e-06, gnorm=0.943, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=183668 2023-05-03 05:34:55 - progress_bar.py[line:274] - INFO: epoch 008: 2633 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7629.5, nsentences=120, sample_size=3851.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1931.2, ups=0.25, wpb=7629.5, bsz=120, num_updates=44850, lr=8.22432e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=183708 2023-05-03 05:35:36 - progress_bar.py[line:274] - INFO: epoch 008: 2643 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7683.7, nsentences=120, sample_size=3993.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1898.4, ups=0.25, wpb=7683.7, bsz=120, num_updates=44860, lr=8.21903e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=183748 2023-05-03 05:36:15 - progress_bar.py[line:274] - INFO: epoch 008: 2653 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7661.2, nsentences=120, sample_size=4089.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1939.4, ups=0.25, wpb=7661.2, bsz=120, num_updates=44870, lr=8.21375e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=183788 2023-05-03 05:36:54 - progress_bar.py[line:274] - INFO: epoch 008: 2663 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7536.7, nsentences=120, sample_size=3894.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1925.6, ups=0.26, wpb=7536.7, bsz=120, num_updates=44880, lr=8.20847e-06, gnorm=1.019, clip=70, loss_scale=64, train_wall=39, gb_free=28.9, wall=183827 2023-05-03 05:37:34 - progress_bar.py[line:274] - INFO: epoch 008: 2673 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7529.8, nsentences=120, sample_size=3921.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1889.8, ups=0.25, wpb=7529.8, bsz=120, num_updates=44890, lr=8.20319e-06, gnorm=0.975, clip=20, loss_scale=64, train_wall=40, gb_free=28.5, wall=183867 2023-05-03 05:38:13 - progress_bar.py[line:274] - INFO: epoch 008: 2683 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7719.6, nsentences=120, sample_size=4014.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1967.1, ups=0.25, wpb=7719.6, bsz=120, num_updates=44900, lr=8.1979e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=183906 2023-05-03 05:38:53 - progress_bar.py[line:274] - INFO: epoch 008: 2693 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7864.6, nsentences=120, sample_size=3914.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1972.9, ups=0.25, wpb=7864.6, bsz=120, num_updates=44910, lr=8.19262e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=183946 2023-05-03 05:39:33 - progress_bar.py[line:274] - INFO: epoch 008: 2703 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7625.7, nsentences=120, sample_size=3876.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1917.6, ups=0.25, wpb=7625.7, bsz=120, num_updates=44920, lr=8.18734e-06, gnorm=1.029, clip=50, loss_scale=64, train_wall=40, gb_free=26.8, wall=183986 2023-05-03 05:40:12 - progress_bar.py[line:274] - INFO: epoch 008: 2713 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7836.7, nsentences=120, sample_size=4255, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1990.9, ups=0.25, wpb=7836.7, bsz=120, num_updates=44930, lr=8.18206e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=39, gb_free=30.8, wall=184025 2023-05-03 05:40:52 - progress_bar.py[line:274] - INFO: epoch 008: 2723 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7886.8, nsentences=120, sample_size=3770, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1979.5, ups=0.25, wpb=7886.8, bsz=120, num_updates=44940, lr=8.17678e-06, gnorm=0.999, clip=40, loss_scale=64, train_wall=40, gb_free=27.3, wall=184065 2023-05-03 05:41:32 - progress_bar.py[line:274] - INFO: epoch 008: 2733 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7590.7, nsentences=120, sample_size=3957.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1905.9, ups=0.25, wpb=7590.7, bsz=120, num_updates=44950, lr=8.17149e-06, gnorm=1.012, clip=60, loss_scale=64, train_wall=40, gb_free=30.3, wall=184105 2023-05-03 05:42:12 - progress_bar.py[line:274] - INFO: epoch 008: 2743 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7833.6, nsentences=120, sample_size=3956.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1953.8, ups=0.25, wpb=7833.6, bsz=120, num_updates=44960, lr=8.16621e-06, gnorm=0.972, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=184145 2023-05-03 05:42:25 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 05:42:56 - progress_bar.py[line:274] - INFO: epoch 008: 2754 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7540.2, nsentences=120, sample_size=3720.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1737.5, ups=0.23, wpb=7540.2, bsz=120, num_updates=44970, lr=8.16093e-06, gnorm=1.041, clip=70, loss_scale=32, train_wall=43, gb_free=31, wall=184188 2023-05-03 05:43:36 - progress_bar.py[line:274] - INFO: epoch 008: 2764 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7747.2, nsentences=120, sample_size=3967.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923.5, ups=0.25, wpb=7747.2, bsz=120, num_updates=44980, lr=8.15565e-06, gnorm=0.974, clip=30, loss_scale=32, train_wall=40, gb_free=30.3, wall=184228 2023-05-03 05:44:16 - progress_bar.py[line:274] - INFO: epoch 008: 2774 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7692.7, nsentences=120, sample_size=4057, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1907.9, ups=0.25, wpb=7692.7, bsz=120, num_updates=44990, lr=8.15037e-06, gnorm=0.99, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=184269 2023-05-03 05:44:56 - progress_bar.py[line:274] - INFO: epoch 008: 2784 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7894.1, nsentences=120, sample_size=4106.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1962, ups=0.25, wpb=7894.1, bsz=120, num_updates=45000, lr=8.14508e-06, gnorm=0.951, clip=20, loss_scale=32, train_wall=40, gb_free=29.5, wall=184309 2023-05-03 05:44:56 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 05:44:58 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 05:44:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:15 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 05:45:15 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 05:45:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:39 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 05:45:39 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:43 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 05:45:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 05:45:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 05:45:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 05:45:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 05:45:48 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.249 | loss_v1 0 | loss_v2 0 | nll_loss 2.084 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.24 | score 0.7534 | wps 3289 | wpb 3202.1 | bsz 39.4 | num_updates 45000 | best_score 0.7627 2023-05-03 05:45:48 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 45000 updates 2023-05-03 05:45:48 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_45000.pt 2023-05-03 05:46:12 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_45000.pt 2023-05-03 05:46:26 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_45000.pt (epoch 8 @ 45000 updates, score 0.7534) (writing took 37.95210030884482 seconds) 2023-05-03 05:47:05 - progress_bar.py[line:274] - INFO: epoch 008: 2794 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7835.7, nsentences=120, sample_size=4287.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=607.7, ups=0.08, wpb=7835.7, bsz=120, num_updates=45010, lr=8.1398e-06, gnorm=0.971, clip=30, loss_scale=32, train_wall=40, gb_free=29.5, wall=184438 2023-05-03 05:47:46 - progress_bar.py[line:274] - INFO: epoch 008: 2804 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7596.5, nsentences=120, sample_size=4508.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1852.8, ups=0.24, wpb=7596.5, bsz=120, num_updates=45020, lr=8.13452e-06, gnorm=0.931, clip=20, loss_scale=32, train_wall=41, gb_free=28.8, wall=184479 2023-05-03 05:48:27 - progress_bar.py[line:274] - INFO: epoch 008: 2814 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7913.3, nsentences=120, sample_size=4028.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1966.6, ups=0.25, wpb=7913.3, bsz=120, num_updates=45030, lr=8.12924e-06, gnorm=0.954, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=184519 2023-05-03 05:49:07 - progress_bar.py[line:274] - INFO: epoch 008: 2824 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7531.8, nsentences=120, sample_size=3956.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1888, ups=0.25, wpb=7531.8, bsz=120, num_updates=45040, lr=8.12395e-06, gnorm=0.99, clip=40, loss_scale=32, train_wall=40, gb_free=27.7, wall=184559 2023-05-03 05:49:46 - progress_bar.py[line:274] - INFO: epoch 008: 2834 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7888.1, nsentences=120, sample_size=3771.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1982.3, ups=0.25, wpb=7888.1, bsz=120, num_updates=45050, lr=8.11867e-06, gnorm=1.017, clip=50, loss_scale=32, train_wall=40, gb_free=30.7, wall=184599 2023-05-03 05:50:27 - progress_bar.py[line:274] - INFO: epoch 008: 2844 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7704.1, nsentences=120, sample_size=3926.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1913.7, ups=0.25, wpb=7704.1, bsz=120, num_updates=45060, lr=8.11339e-06, gnorm=0.985, clip=30, loss_scale=32, train_wall=40, gb_free=30.9, wall=184639 2023-05-03 05:51:07 - progress_bar.py[line:274] - INFO: epoch 008: 2854 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7729.9, nsentences=120, sample_size=3805.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1932.9, ups=0.25, wpb=7729.9, bsz=120, num_updates=45070, lr=8.10811e-06, gnorm=0.994, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=184679 2023-05-03 05:51:47 - progress_bar.py[line:274] - INFO: epoch 008: 2864 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7321.2, nsentences=120, sample_size=3829.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1815.9, ups=0.25, wpb=7321.2, bsz=120, num_updates=45080, lr=8.10283e-06, gnorm=1.03, clip=80, loss_scale=32, train_wall=40, gb_free=29.7, wall=184719 2023-05-03 05:52:27 - progress_bar.py[line:274] - INFO: epoch 008: 2874 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7900.4, nsentences=120, sample_size=3909.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1967.2, ups=0.25, wpb=7900.4, bsz=120, num_updates=45090, lr=8.09754e-06, gnorm=1.022, clip=50, loss_scale=32, train_wall=40, gb_free=30.9, wall=184760 2023-05-03 05:53:07 - progress_bar.py[line:274] - INFO: epoch 008: 2884 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7482.9, nsentences=120, sample_size=4122.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1887.9, ups=0.25, wpb=7482.9, bsz=120, num_updates=45100, lr=8.09226e-06, gnorm=0.986, clip=70, loss_scale=32, train_wall=40, gb_free=30.2, wall=184799 2023-05-03 05:53:46 - progress_bar.py[line:274] - INFO: epoch 008: 2894 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7740.8, nsentences=120, sample_size=4121.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1961.8, ups=0.25, wpb=7740.8, bsz=120, num_updates=45110, lr=8.08698e-06, gnorm=0.968, clip=30, loss_scale=32, train_wall=39, gb_free=30, wall=184839 2023-05-03 05:54:26 - progress_bar.py[line:274] - INFO: epoch 008: 2904 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7719.6, nsentences=120, sample_size=4262.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1937.8, ups=0.25, wpb=7719.6, bsz=120, num_updates=45120, lr=8.0817e-06, gnorm=0.96, clip=30, loss_scale=32, train_wall=40, gb_free=29.1, wall=184879 2023-05-03 05:55:05 - progress_bar.py[line:274] - INFO: epoch 008: 2914 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7390.8, nsentences=120, sample_size=3936, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.1, ups=0.26, wpb=7390.8, bsz=120, num_updates=45130, lr=8.07642e-06, gnorm=0.998, clip=40, loss_scale=32, train_wall=39, gb_free=30, wall=184917 2023-05-03 05:55:45 - progress_bar.py[line:274] - INFO: epoch 008: 2924 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7584.3, nsentences=120, sample_size=3711.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1870.6, ups=0.25, wpb=7584.3, bsz=120, num_updates=45140, lr=8.07113e-06, gnorm=1.035, clip=60, loss_scale=32, train_wall=40, gb_free=31.5, wall=184958 2023-05-03 05:56:25 - progress_bar.py[line:274] - INFO: epoch 008: 2934 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7931.8, nsentences=120, sample_size=4118.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2000.8, ups=0.25, wpb=7931.8, bsz=120, num_updates=45150, lr=8.06585e-06, gnorm=0.975, clip=20, loss_scale=32, train_wall=40, gb_free=29.4, wall=184998 2023-05-03 05:57:05 - progress_bar.py[line:274] - INFO: epoch 008: 2944 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7677, nsentences=120, sample_size=4136.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1916.6, ups=0.25, wpb=7677, bsz=120, num_updates=45160, lr=8.06057e-06, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=30.1, wall=185038 2023-05-03 05:57:45 - progress_bar.py[line:274] - INFO: epoch 008: 2954 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7795.7, nsentences=120, sample_size=4059.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1976.3, ups=0.25, wpb=7795.7, bsz=120, num_updates=45170, lr=8.05529e-06, gnorm=0.98, clip=30, loss_scale=32, train_wall=39, gb_free=29.9, wall=185077 2023-05-03 05:58:24 - progress_bar.py[line:274] - INFO: epoch 008: 2964 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7304.8, nsentences=120, sample_size=4355.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1844.5, ups=0.25, wpb=7304.8, bsz=120, num_updates=45180, lr=8.05e-06, gnorm=0.972, clip=40, loss_scale=32, train_wall=40, gb_free=29.5, wall=185117 2023-05-03 05:59:04 - progress_bar.py[line:274] - INFO: epoch 008: 2974 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7790, nsentences=120, sample_size=3790.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1963.5, ups=0.25, wpb=7790, bsz=120, num_updates=45190, lr=8.04472e-06, gnorm=1.014, clip=60, loss_scale=32, train_wall=40, gb_free=27.4, wall=185156 2023-05-03 05:59:43 - progress_bar.py[line:274] - INFO: epoch 008: 2984 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7278.8, nsentences=120, sample_size=3944.4, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1837.6, ups=0.25, wpb=7278.8, bsz=120, num_updates=45200, lr=8.03944e-06, gnorm=1.017, clip=40, loss_scale=32, train_wall=40, gb_free=31.4, wall=185196 2023-05-03 06:00:24 - progress_bar.py[line:274] - INFO: epoch 008: 2994 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7866.2, nsentences=120, sample_size=3937, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1962.1, ups=0.25, wpb=7866.2, bsz=120, num_updates=45210, lr=8.03416e-06, gnorm=1.01, clip=60, loss_scale=32, train_wall=40, gb_free=28.2, wall=185236 2023-05-03 06:01:03 - progress_bar.py[line:274] - INFO: epoch 008: 3004 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7740.6, nsentences=120, sample_size=4041.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1963.3, ups=0.25, wpb=7740.6, bsz=120, num_updates=45220, lr=8.02888e-06, gnorm=0.962, clip=20, loss_scale=32, train_wall=39, gb_free=30, wall=185275 2023-05-03 06:01:43 - progress_bar.py[line:274] - INFO: epoch 008: 3014 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7720.1, nsentences=120, sample_size=3982.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1935, ups=0.25, wpb=7720.1, bsz=120, num_updates=45230, lr=8.02359e-06, gnorm=1.006, clip=60, loss_scale=32, train_wall=40, gb_free=30, wall=185315 2023-05-03 06:02:23 - progress_bar.py[line:274] - INFO: epoch 008: 3024 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7598.3, nsentences=120, sample_size=3784.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1877.8, ups=0.25, wpb=7598.3, bsz=120, num_updates=45240, lr=8.01831e-06, gnorm=1.038, clip=70, loss_scale=32, train_wall=40, gb_free=30.1, wall=185356 2023-05-03 06:03:03 - progress_bar.py[line:274] - INFO: epoch 008: 3034 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7775.2, nsentences=120, sample_size=4021.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1961.1, ups=0.25, wpb=7775.2, bsz=120, num_updates=45250, lr=8.01303e-06, gnorm=0.973, clip=40, loss_scale=32, train_wall=40, gb_free=29.2, wall=185395 2023-05-03 06:03:44 - progress_bar.py[line:274] - INFO: epoch 008: 3044 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7842.8, nsentences=120, sample_size=4129.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1912.2, ups=0.24, wpb=7842.8, bsz=120, num_updates=45260, lr=8.00775e-06, gnorm=0.965, clip=30, loss_scale=32, train_wall=41, gb_free=30.3, wall=185436 2023-05-03 06:04:23 - progress_bar.py[line:274] - INFO: epoch 008: 3054 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7710.3, nsentences=120, sample_size=4010.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1964.8, ups=0.25, wpb=7710.3, bsz=120, num_updates=45270, lr=8.00247e-06, gnorm=0.994, clip=40, loss_scale=32, train_wall=39, gb_free=28.7, wall=185476 2023-05-03 06:05:03 - progress_bar.py[line:274] - INFO: epoch 008: 3064 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7978.8, nsentences=120, sample_size=3970.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2018.8, ups=0.25, wpb=7978.8, bsz=120, num_updates=45280, lr=7.99718e-06, gnorm=0.972, clip=20, loss_scale=32, train_wall=39, gb_free=30.2, wall=185515 2023-05-03 06:05:43 - progress_bar.py[line:274] - INFO: epoch 008: 3074 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7701.2, nsentences=120, sample_size=3880, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1920, ups=0.25, wpb=7701.2, bsz=120, num_updates=45290, lr=7.9919e-06, gnorm=1.015, clip=50, loss_scale=32, train_wall=40, gb_free=30.1, wall=185555 2023-05-03 06:06:24 - progress_bar.py[line:274] - INFO: epoch 008: 3084 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7973.5, nsentences=120, sample_size=3659.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1951.4, ups=0.24, wpb=7973.5, bsz=120, num_updates=45300, lr=7.98662e-06, gnorm=1.002, clip=50, loss_scale=32, train_wall=41, gb_free=29.5, wall=185596 2023-05-03 06:07:03 - progress_bar.py[line:274] - INFO: epoch 008: 3094 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7750, nsentences=120, sample_size=4063.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1965.5, ups=0.25, wpb=7750, bsz=120, num_updates=45310, lr=7.98134e-06, gnorm=0.983, clip=50, loss_scale=32, train_wall=39, gb_free=29.6, wall=185636 2023-05-03 06:07:42 - progress_bar.py[line:274] - INFO: epoch 008: 3104 / 6042 loss=2.318, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7502.1, nsentences=120, sample_size=3931.3, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1919.9, ups=0.26, wpb=7502.1, bsz=120, num_updates=45320, lr=7.97605e-06, gnorm=0.989, clip=40, loss_scale=32, train_wall=39, gb_free=30.1, wall=185675 2023-05-03 06:08:22 - progress_bar.py[line:274] - INFO: epoch 008: 3114 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7606.8, nsentences=120, sample_size=4293.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1899, ups=0.25, wpb=7606.8, bsz=120, num_updates=45330, lr=7.97077e-06, gnorm=0.955, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=185715 2023-05-03 06:09:03 - progress_bar.py[line:274] - INFO: epoch 008: 3124 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7600.8, nsentences=120, sample_size=4174.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1891.4, ups=0.25, wpb=7600.8, bsz=120, num_updates=45340, lr=7.96549e-06, gnorm=0.969, clip=40, loss_scale=32, train_wall=40, gb_free=29.1, wall=185755 2023-05-03 06:09:42 - progress_bar.py[line:274] - INFO: epoch 008: 3134 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7570.2, nsentences=120, sample_size=3918.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1922.7, ups=0.25, wpb=7570.2, bsz=120, num_updates=45350, lr=7.96021e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=39, gb_free=24.9, wall=185794 2023-05-03 06:10:21 - progress_bar.py[line:274] - INFO: epoch 008: 3144 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7686.7, nsentences=120, sample_size=4079.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1953.4, ups=0.25, wpb=7686.7, bsz=120, num_updates=45360, lr=7.95493e-06, gnorm=0.999, clip=50, loss_scale=32, train_wall=39, gb_free=29.7, wall=185834 2023-05-03 06:11:02 - progress_bar.py[line:274] - INFO: epoch 008: 3154 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7577.2, nsentences=120, sample_size=4385.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1874.4, ups=0.25, wpb=7577.2, bsz=120, num_updates=45370, lr=7.94964e-06, gnorm=0.966, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=185874 2023-05-03 06:11:42 - progress_bar.py[line:274] - INFO: epoch 008: 3164 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7886.5, nsentences=120, sample_size=3954.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1975.3, ups=0.25, wpb=7886.5, bsz=120, num_updates=45380, lr=7.94436e-06, gnorm=0.995, clip=60, loss_scale=32, train_wall=40, gb_free=28.3, wall=185914 2023-05-03 06:12:21 - progress_bar.py[line:274] - INFO: epoch 008: 3174 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7562.4, nsentences=120, sample_size=4031.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1904.8, ups=0.25, wpb=7562.4, bsz=120, num_updates=45390, lr=7.93908e-06, gnorm=1.015, clip=50, loss_scale=32, train_wall=40, gb_free=27.7, wall=185954 2023-05-03 06:13:01 - progress_bar.py[line:274] - INFO: epoch 008: 3184 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7465.5, nsentences=120, sample_size=3931.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1865, ups=0.25, wpb=7465.5, bsz=120, num_updates=45400, lr=7.9338e-06, gnorm=1.016, clip=50, loss_scale=32, train_wall=40, gb_free=30, wall=185994 2023-05-03 06:13:42 - progress_bar.py[line:274] - INFO: epoch 008: 3194 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=8025.7, nsentences=120, sample_size=4080.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1996.4, ups=0.25, wpb=8025.7, bsz=120, num_updates=45410, lr=7.92851e-06, gnorm=0.975, clip=30, loss_scale=32, train_wall=40, gb_free=30.7, wall=186034 2023-05-03 06:14:22 - progress_bar.py[line:274] - INFO: epoch 008: 3204 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7820.5, nsentences=120, sample_size=3966.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1948.3, ups=0.25, wpb=7820.5, bsz=120, num_updates=45420, lr=7.92323e-06, gnorm=0.998, clip=50, loss_scale=32, train_wall=40, gb_free=30.3, wall=186074 2023-05-03 06:15:01 - progress_bar.py[line:274] - INFO: epoch 008: 3214 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7586.4, nsentences=120, sample_size=4162.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1930.1, ups=0.25, wpb=7586.4, bsz=120, num_updates=45430, lr=7.91795e-06, gnorm=0.972, clip=40, loss_scale=32, train_wall=39, gb_free=29.7, wall=186113 2023-05-03 06:15:41 - progress_bar.py[line:274] - INFO: epoch 008: 3224 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7956, nsentences=120, sample_size=4206.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1989.2, ups=0.25, wpb=7956, bsz=120, num_updates=45440, lr=7.91267e-06, gnorm=0.976, clip=40, loss_scale=32, train_wall=40, gb_free=31.2, wall=186153 2023-05-03 06:16:21 - progress_bar.py[line:274] - INFO: epoch 008: 3234 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7864.8, nsentences=120, sample_size=3872.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1968, ups=0.25, wpb=7864.8, bsz=120, num_updates=45450, lr=7.90739e-06, gnorm=0.991, clip=50, loss_scale=32, train_wall=40, gb_free=28.3, wall=186193 2023-05-03 06:17:01 - progress_bar.py[line:274] - INFO: epoch 008: 3244 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7569, nsentences=120, sample_size=4023.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1897, ups=0.25, wpb=7569, bsz=120, num_updates=45460, lr=7.9021e-06, gnorm=1.009, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=186233 2023-05-03 06:17:40 - progress_bar.py[line:274] - INFO: epoch 008: 3254 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7765.1, nsentences=120, sample_size=3770.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1969.2, ups=0.25, wpb=7765.1, bsz=120, num_updates=45470, lr=7.89682e-06, gnorm=0.995, clip=30, loss_scale=32, train_wall=39, gb_free=30.2, wall=186273 2023-05-03 06:18:21 - progress_bar.py[line:274] - INFO: epoch 008: 3264 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7521.9, nsentences=120, sample_size=3679.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1870.2, ups=0.25, wpb=7521.9, bsz=120, num_updates=45480, lr=7.89154e-06, gnorm=1.023, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=186313 2023-05-03 06:19:01 - progress_bar.py[line:274] - INFO: epoch 008: 3274 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=8143.7, nsentences=120, sample_size=4171.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2024.7, ups=0.25, wpb=8143.7, bsz=120, num_updates=45490, lr=7.88626e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=186353 2023-05-03 06:19:40 - progress_bar.py[line:274] - INFO: epoch 008: 3284 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7573.6, nsentences=120, sample_size=3998.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1913.7, ups=0.25, wpb=7573.6, bsz=120, num_updates=45500, lr=7.88098e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=186393 2023-05-03 06:20:20 - progress_bar.py[line:274] - INFO: epoch 008: 3294 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7602.8, nsentences=120, sample_size=3885.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1915.8, ups=0.25, wpb=7602.8, bsz=120, num_updates=45510, lr=7.87569e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=186432 2023-05-03 06:20:59 - progress_bar.py[line:274] - INFO: epoch 008: 3304 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7579.8, nsentences=120, sample_size=4297.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1921.9, ups=0.25, wpb=7579.8, bsz=120, num_updates=45520, lr=7.87041e-06, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=28.7, wall=186472 2023-05-03 06:21:38 - progress_bar.py[line:274] - INFO: epoch 008: 3314 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7758.3, nsentences=120, sample_size=3682.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1991.3, ups=0.26, wpb=7758.3, bsz=120, num_updates=45530, lr=7.86513e-06, gnorm=1.004, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=186511 2023-05-03 06:22:18 - progress_bar.py[line:274] - INFO: epoch 008: 3324 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7514.3, nsentences=120, sample_size=4204.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1914.6, ups=0.25, wpb=7514.3, bsz=120, num_updates=45540, lr=7.85985e-06, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=186550 2023-05-03 06:22:58 - progress_bar.py[line:274] - INFO: epoch 008: 3334 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7851.3, nsentences=120, sample_size=3783.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968.1, ups=0.25, wpb=7851.3, bsz=120, num_updates=45550, lr=7.85456e-06, gnorm=0.971, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=186590 2023-05-03 06:23:37 - progress_bar.py[line:274] - INFO: epoch 008: 3344 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7836.8, nsentences=120, sample_size=4025.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1988.2, ups=0.25, wpb=7836.8, bsz=120, num_updates=45560, lr=7.84928e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=39, gb_free=29.3, wall=186629 2023-05-03 06:24:16 - progress_bar.py[line:274] - INFO: epoch 008: 3354 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7906, nsentences=120, sample_size=3675.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2005.7, ups=0.25, wpb=7906, bsz=120, num_updates=45570, lr=7.844e-06, gnorm=1.016, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=186669 2023-05-03 06:24:56 - progress_bar.py[line:274] - INFO: epoch 008: 3364 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7660.8, nsentences=120, sample_size=4268.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1929.5, ups=0.25, wpb=7660.8, bsz=120, num_updates=45580, lr=7.83872e-06, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.1, wall=186709 2023-05-03 06:25:36 - progress_bar.py[line:274] - INFO: epoch 008: 3374 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7715.9, nsentences=120, sample_size=4193.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1937.8, ups=0.25, wpb=7715.9, bsz=120, num_updates=45590, lr=7.83344e-06, gnorm=0.976, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=186748 2023-05-03 06:26:17 - progress_bar.py[line:274] - INFO: epoch 008: 3384 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7568.2, nsentences=120, sample_size=4243.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1851.1, ups=0.24, wpb=7568.2, bsz=120, num_updates=45600, lr=7.82815e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=41, gb_free=28.3, wall=186789 2023-05-03 06:26:56 - progress_bar.py[line:274] - INFO: epoch 008: 3394 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7489.9, nsentences=120, sample_size=4055.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902.9, ups=0.25, wpb=7489.9, bsz=120, num_updates=45610, lr=7.82287e-06, gnorm=0.939, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=186829 2023-05-03 06:27:36 - progress_bar.py[line:274] - INFO: epoch 008: 3404 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7498.2, nsentences=119.2, sample_size=3974.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1875.7, ups=0.25, wpb=7498.2, bsz=119.2, num_updates=45620, lr=7.81759e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=186869 2023-05-03 06:28:16 - progress_bar.py[line:274] - INFO: epoch 008: 3414 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7850.6, nsentences=120, sample_size=4077.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1991.2, ups=0.25, wpb=7850.6, bsz=120, num_updates=45630, lr=7.81231e-06, gnorm=0.953, clip=10, loss_scale=64, train_wall=39, gb_free=29.6, wall=186908 2023-05-03 06:28:55 - progress_bar.py[line:274] - INFO: epoch 008: 3424 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7508, nsentences=120, sample_size=3763.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1903.7, ups=0.25, wpb=7508, bsz=120, num_updates=45640, lr=7.80703e-06, gnorm=0.976, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=186947 2023-05-03 06:29:36 - progress_bar.py[line:274] - INFO: epoch 008: 3434 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7770, nsentences=120, sample_size=4104.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1916.8, ups=0.25, wpb=7770, bsz=120, num_updates=45650, lr=7.80174e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=186988 2023-05-03 06:30:15 - progress_bar.py[line:274] - INFO: epoch 008: 3444 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7915.4, nsentences=120, sample_size=3795, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1988, ups=0.25, wpb=7915.4, bsz=120, num_updates=45660, lr=7.79646e-06, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=187028 2023-05-03 06:30:55 - progress_bar.py[line:274] - INFO: epoch 008: 3454 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.163, ntokens=7977.1, nsentences=120, sample_size=4105, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2001.8, ups=0.25, wpb=7977.1, bsz=120, num_updates=45670, lr=7.79118e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=187068 2023-05-03 06:31:35 - progress_bar.py[line:274] - INFO: epoch 008: 3464 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7780.7, nsentences=120, sample_size=4058.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1952.7, ups=0.25, wpb=7780.7, bsz=120, num_updates=45680, lr=7.7859e-06, gnorm=0.953, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=187108 2023-05-03 06:32:15 - progress_bar.py[line:274] - INFO: epoch 008: 3474 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7497.8, nsentences=120, sample_size=4333.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1877.8, ups=0.25, wpb=7497.8, bsz=120, num_updates=45690, lr=7.78061e-06, gnorm=0.972, clip=10, loss_scale=64, train_wall=40, gb_free=30.7, wall=187147 2023-05-03 06:32:55 - progress_bar.py[line:274] - INFO: epoch 008: 3484 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7575.8, nsentences=120, sample_size=3735.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1897.6, ups=0.25, wpb=7575.8, bsz=120, num_updates=45700, lr=7.77533e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=40, gb_free=27.4, wall=187187 2023-05-03 06:33:35 - progress_bar.py[line:274] - INFO: epoch 008: 3494 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7818.6, nsentences=120, sample_size=4101.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1945.3, ups=0.25, wpb=7818.6, bsz=120, num_updates=45710, lr=7.77005e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=187228 2023-05-03 06:34:15 - progress_bar.py[line:274] - INFO: epoch 008: 3504 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7870.2, nsentences=120, sample_size=4190.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1976.1, ups=0.25, wpb=7870.2, bsz=120, num_updates=45720, lr=7.76477e-06, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=187267 2023-05-03 06:34:56 - progress_bar.py[line:274] - INFO: epoch 008: 3514 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7676.9, nsentences=120, sample_size=4175.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1890.7, ups=0.25, wpb=7676.9, bsz=120, num_updates=45730, lr=7.75949e-06, gnorm=0.984, clip=10, loss_scale=64, train_wall=41, gb_free=28.9, wall=187308 2023-05-03 06:35:36 - progress_bar.py[line:274] - INFO: epoch 008: 3524 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7809, nsentences=120, sample_size=3831.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1932.6, ups=0.25, wpb=7809, bsz=120, num_updates=45740, lr=7.7542e-06, gnorm=1.014, clip=60, loss_scale=64, train_wall=40, gb_free=28.9, wall=187348 2023-05-03 06:36:16 - progress_bar.py[line:274] - INFO: epoch 008: 3534 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7584.3, nsentences=120, sample_size=3916.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1914.5, ups=0.25, wpb=7584.3, bsz=120, num_updates=45750, lr=7.74892e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=187388 2023-05-03 06:36:55 - progress_bar.py[line:274] - INFO: epoch 008: 3544 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7779.5, nsentences=120, sample_size=4091.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1957, ups=0.25, wpb=7779.5, bsz=120, num_updates=45760, lr=7.74364e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=187428 2023-05-03 06:37:35 - progress_bar.py[line:274] - INFO: epoch 008: 3554 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7741.2, nsentences=120, sample_size=4074.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1955.1, ups=0.25, wpb=7741.2, bsz=120, num_updates=45770, lr=7.73836e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=187467 2023-05-03 06:38:14 - progress_bar.py[line:274] - INFO: epoch 008: 3564 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7331.2, nsentences=120, sample_size=4218.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1877.6, ups=0.26, wpb=7331.2, bsz=120, num_updates=45780, lr=7.73308e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=187506 2023-05-03 06:38:54 - progress_bar.py[line:274] - INFO: epoch 008: 3574 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7887.7, nsentences=120, sample_size=4159.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1974.1, ups=0.25, wpb=7887.7, bsz=120, num_updates=45790, lr=7.72779e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=28, wall=187546 2023-05-03 06:39:34 - progress_bar.py[line:274] - INFO: epoch 008: 3584 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7794.6, nsentences=120, sample_size=4125.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968, ups=0.25, wpb=7794.6, bsz=120, num_updates=45800, lr=7.72251e-06, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=187586 2023-05-03 06:40:13 - progress_bar.py[line:274] - INFO: epoch 008: 3594 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7919.3, nsentences=120, sample_size=3835, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1997.4, ups=0.25, wpb=7919.3, bsz=120, num_updates=45810, lr=7.71723e-06, gnorm=1.011, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=187626 2023-05-03 06:40:52 - progress_bar.py[line:274] - INFO: epoch 008: 3604 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7631.5, nsentences=120, sample_size=3922.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1954.9, ups=0.26, wpb=7631.5, bsz=120, num_updates=45820, lr=7.71195e-06, gnorm=0.978, clip=50, loss_scale=64, train_wall=39, gb_free=29, wall=187665 2023-05-03 06:41:32 - progress_bar.py[line:274] - INFO: epoch 008: 3614 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7905.6, nsentences=120, sample_size=3933.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1992, ups=0.25, wpb=7905.6, bsz=120, num_updates=45830, lr=7.70666e-06, gnorm=1.012, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=187704 2023-05-03 06:42:13 - progress_bar.py[line:274] - INFO: epoch 008: 3624 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7777.7, nsentences=120, sample_size=4470.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1912.1, ups=0.25, wpb=7777.7, bsz=120, num_updates=45840, lr=7.70138e-06, gnorm=0.924, clip=10, loss_scale=64, train_wall=41, gb_free=29.8, wall=187745 2023-05-03 06:42:52 - progress_bar.py[line:274] - INFO: epoch 008: 3634 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7729.7, nsentences=120, sample_size=4071.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.4, ups=0.25, wpb=7729.7, bsz=120, num_updates=45850, lr=7.6961e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=187785 2023-05-03 06:43:32 - progress_bar.py[line:274] - INFO: epoch 008: 3644 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7726.3, nsentences=120, sample_size=3990.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953.9, ups=0.25, wpb=7726.3, bsz=120, num_updates=45860, lr=7.69082e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=187824 2023-05-03 06:44:11 - progress_bar.py[line:274] - INFO: epoch 008: 3654 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7745.6, nsentences=120, sample_size=4166.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1954, ups=0.25, wpb=7745.6, bsz=120, num_updates=45870, lr=7.68554e-06, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=187864 2023-05-03 06:44:52 - progress_bar.py[line:274] - INFO: epoch 008: 3664 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7987.6, nsentences=120, sample_size=3968.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1992.1, ups=0.25, wpb=7987.6, bsz=120, num_updates=45880, lr=7.68025e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=187904 2023-05-03 06:45:31 - progress_bar.py[line:274] - INFO: epoch 008: 3674 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7531.6, nsentences=120, sample_size=3888.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1899.6, ups=0.25, wpb=7531.6, bsz=120, num_updates=45890, lr=7.67497e-06, gnorm=1.005, clip=60, loss_scale=64, train_wall=40, gb_free=25.3, wall=187944 2023-05-03 06:46:10 - progress_bar.py[line:274] - INFO: epoch 008: 3684 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7534.8, nsentences=120, sample_size=3657.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1922.7, ups=0.26, wpb=7534.8, bsz=120, num_updates=45900, lr=7.66969e-06, gnorm=1.057, clip=80, loss_scale=64, train_wall=39, gb_free=30.7, wall=187983 2023-05-03 06:46:50 - progress_bar.py[line:274] - INFO: epoch 008: 3694 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7594.3, nsentences=120, sample_size=4246.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1909, ups=0.25, wpb=7594.3, bsz=120, num_updates=45910, lr=7.66441e-06, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=188023 2023-05-03 06:47:30 - progress_bar.py[line:274] - INFO: epoch 008: 3704 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7463.2, nsentences=120, sample_size=4709.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1878.1, ups=0.25, wpb=7463.2, bsz=120, num_updates=45920, lr=7.65912e-06, gnorm=0.925, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=188062 2023-05-03 06:48:09 - progress_bar.py[line:274] - INFO: epoch 008: 3714 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7693.4, nsentences=120, sample_size=3985.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1980.4, ups=0.26, wpb=7693.4, bsz=120, num_updates=45930, lr=7.65384e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=188101 2023-05-03 06:48:49 - progress_bar.py[line:274] - INFO: epoch 008: 3724 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7654.6, nsentences=120, sample_size=3962.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1906.5, ups=0.25, wpb=7654.6, bsz=120, num_updates=45940, lr=7.64856e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=188141 2023-05-03 06:49:29 - progress_bar.py[line:274] - INFO: epoch 008: 3734 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.177, ntokens=8032, nsentences=120, sample_size=3782.4, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=2010.6, ups=0.25, wpb=8032, bsz=120, num_updates=45950, lr=7.64328e-06, gnorm=1.007, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=188181 2023-05-03 06:50:08 - progress_bar.py[line:274] - INFO: epoch 008: 3744 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7552.4, nsentences=120, sample_size=3831.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1909.1, ups=0.25, wpb=7552.4, bsz=120, num_updates=45960, lr=7.638e-06, gnorm=1.009, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=188221 2023-05-03 06:50:49 - progress_bar.py[line:274] - INFO: epoch 008: 3754 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7861.5, nsentences=120, sample_size=4152.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1947.7, ups=0.25, wpb=7861.5, bsz=120, num_updates=45970, lr=7.63271e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=188261 2023-05-03 06:51:29 - progress_bar.py[line:274] - INFO: epoch 008: 3764 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7762.2, nsentences=120, sample_size=4256.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1944.1, ups=0.25, wpb=7762.2, bsz=120, num_updates=45980, lr=7.62743e-06, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=188301 2023-05-03 06:52:05 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 06:52:12 - progress_bar.py[line:274] - INFO: epoch 008: 3775 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7647.1, nsentences=120, sample_size=3982.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1747.5, ups=0.23, wpb=7647.1, bsz=120, num_updates=45990, lr=7.62215e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=44, gb_free=29.5, wall=188345 2023-05-03 06:52:52 - progress_bar.py[line:274] - INFO: epoch 008: 3785 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7768.2, nsentences=120, sample_size=3909.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1955.6, ups=0.25, wpb=7768.2, bsz=120, num_updates=46000, lr=7.61687e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=188385 2023-05-03 06:52:52 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 06:52:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 06:52:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:52:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:52:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:52:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:52:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:52:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:52:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:52:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:52:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:52:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:52:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:11 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 06:53:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:53:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 06:53:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:53:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 06:53:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:53:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:38 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 06:53:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:53:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:43 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 06:53:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 06:53:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 06:53:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 06:53:43 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.256 | loss_v1 0 | loss_v2 0 | nll_loss 2.092 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.26 | score 0.7505 | wps 3301.1 | wpb 3202.1 | bsz 39.4 | num_updates 46000 | best_score 0.7627 2023-05-03 06:53:43 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 46000 updates 2023-05-03 06:53:43 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_46000.pt 2023-05-03 06:54:08 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_46000.pt 2023-05-03 06:54:22 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_46000.pt (epoch 8 @ 46000 updates, score 0.7505) (writing took 38.27688589086756 seconds) 2023-05-03 06:55:02 - progress_bar.py[line:274] - INFO: epoch 008: 3795 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7559.1, nsentences=120, sample_size=3984.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=584.2, ups=0.08, wpb=7559.1, bsz=120, num_updates=46010, lr=7.61159e-06, gnorm=0.995, clip=50, loss_scale=64, train_wall=40, gb_free=31.3, wall=188514 2023-05-03 06:55:42 - progress_bar.py[line:274] - INFO: epoch 008: 3805 / 6042 loss=2.418, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7678.3, nsentences=120, sample_size=4254.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1899.1, ups=0.25, wpb=7678.3, bsz=120, num_updates=46020, lr=7.6063e-06, gnorm=0.941, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=188554 2023-05-03 06:56:21 - progress_bar.py[line:274] - INFO: epoch 008: 3815 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7786.9, nsentences=120, sample_size=3895.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1988.8, ups=0.26, wpb=7786.9, bsz=120, num_updates=46030, lr=7.60102e-06, gnorm=0.986, clip=60, loss_scale=64, train_wall=39, gb_free=28.7, wall=188594 2023-05-03 06:57:01 - progress_bar.py[line:274] - INFO: epoch 008: 3825 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7553, nsentences=120, sample_size=4101.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1906.9, ups=0.25, wpb=7553, bsz=120, num_updates=46040, lr=7.59574e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=188633 2023-05-03 06:57:41 - progress_bar.py[line:274] - INFO: epoch 008: 3835 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7631.2, nsentences=120, sample_size=4272.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1907.5, ups=0.25, wpb=7631.2, bsz=120, num_updates=46050, lr=7.59046e-06, gnorm=0.965, clip=50, loss_scale=64, train_wall=40, gb_free=28.6, wall=188673 2023-05-03 06:58:21 - progress_bar.py[line:274] - INFO: epoch 008: 3845 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7751.1, nsentences=120, sample_size=4079.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1947.5, ups=0.25, wpb=7751.1, bsz=120, num_updates=46060, lr=7.58517e-06, gnorm=1.009, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=188713 2023-05-03 06:59:00 - progress_bar.py[line:274] - INFO: epoch 008: 3855 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7633.5, nsentences=120, sample_size=3745.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1917.8, ups=0.25, wpb=7633.5, bsz=120, num_updates=46070, lr=7.57989e-06, gnorm=1.035, clip=60, loss_scale=64, train_wall=40, gb_free=28.5, wall=188753 2023-05-03 06:59:39 - progress_bar.py[line:274] - INFO: epoch 008: 3865 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7920, nsentences=120, sample_size=4142.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2048.4, ups=0.26, wpb=7920, bsz=120, num_updates=46080, lr=7.57461e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=188791 2023-05-03 07:00:18 - progress_bar.py[line:274] - INFO: epoch 008: 3875 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7565, nsentences=120, sample_size=3690, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927.8, ups=0.25, wpb=7565, bsz=120, num_updates=46090, lr=7.56933e-06, gnorm=1.048, clip=70, loss_scale=64, train_wall=39, gb_free=30.8, wall=188831 2023-05-03 07:00:58 - progress_bar.py[line:274] - INFO: epoch 008: 3885 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7379.5, nsentences=120, sample_size=4147.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1858.4, ups=0.25, wpb=7379.5, bsz=120, num_updates=46100, lr=7.56405e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=188870 2023-05-03 07:01:39 - progress_bar.py[line:274] - INFO: epoch 008: 3895 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7935.4, nsentences=120, sample_size=3980.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1951.6, ups=0.25, wpb=7935.4, bsz=120, num_updates=46110, lr=7.55876e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=41, gb_free=30.5, wall=188911 2023-05-03 07:02:18 - progress_bar.py[line:274] - INFO: epoch 008: 3905 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7291.4, nsentences=120, sample_size=4244.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1846.7, ups=0.25, wpb=7291.4, bsz=120, num_updates=46120, lr=7.55348e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=188951 2023-05-03 07:02:58 - progress_bar.py[line:274] - INFO: epoch 008: 3915 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7797.6, nsentences=120, sample_size=3955.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1934.8, ups=0.25, wpb=7797.6, bsz=120, num_updates=46130, lr=7.5482e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=188991 2023-05-03 07:03:39 - progress_bar.py[line:274] - INFO: epoch 008: 3925 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7781.6, nsentences=120, sample_size=3873.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1925.3, ups=0.25, wpb=7781.6, bsz=120, num_updates=46140, lr=7.54292e-06, gnorm=0.999, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=189031 2023-05-03 07:04:06 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 07:04:22 - progress_bar.py[line:274] - INFO: epoch 008: 3936 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7764.5, nsentences=120, sample_size=4339.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1794.3, ups=0.23, wpb=7764.5, bsz=120, num_updates=46150, lr=7.53764e-06, gnorm=0.957, clip=30, loss_scale=32, train_wall=43, gb_free=30.2, wall=189075 2023-05-03 07:05:02 - progress_bar.py[line:274] - INFO: epoch 008: 3946 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7971.5, nsentences=120, sample_size=3810.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2006.3, ups=0.25, wpb=7971.5, bsz=120, num_updates=46160, lr=7.53235e-06, gnorm=1.002, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=189114 2023-05-03 07:05:41 - progress_bar.py[line:274] - INFO: epoch 008: 3956 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7630.5, nsentences=120, sample_size=4024.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1946.1, ups=0.26, wpb=7630.5, bsz=120, num_updates=46170, lr=7.52707e-06, gnorm=1.003, clip=40, loss_scale=32, train_wall=39, gb_free=30.4, wall=189154 2023-05-03 07:06:21 - progress_bar.py[line:274] - INFO: epoch 008: 3966 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7847.9, nsentences=120, sample_size=4172.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1971.3, ups=0.25, wpb=7847.9, bsz=120, num_updates=46180, lr=7.52179e-06, gnorm=0.954, clip=40, loss_scale=32, train_wall=40, gb_free=30.5, wall=189193 2023-05-03 07:07:01 - progress_bar.py[line:274] - INFO: epoch 008: 3976 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7619.3, nsentences=120, sample_size=3719.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1917.5, ups=0.25, wpb=7619.3, bsz=120, num_updates=46190, lr=7.51651e-06, gnorm=1.019, clip=50, loss_scale=32, train_wall=40, gb_free=30.7, wall=189233 2023-05-03 07:07:40 - progress_bar.py[line:274] - INFO: epoch 008: 3986 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7775.7, nsentences=120, sample_size=3743.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1963, ups=0.25, wpb=7775.7, bsz=120, num_updates=46200, lr=7.51122e-06, gnorm=0.999, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=189273 2023-05-03 07:08:20 - progress_bar.py[line:274] - INFO: epoch 008: 3996 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7790.8, nsentences=120, sample_size=4177.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1975.7, ups=0.25, wpb=7790.8, bsz=120, num_updates=46210, lr=7.50594e-06, gnorm=0.976, clip=30, loss_scale=32, train_wall=39, gb_free=30.1, wall=189312 2023-05-03 07:08:59 - progress_bar.py[line:274] - INFO: epoch 008: 4006 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7772.1, nsentences=120, sample_size=3728.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1954.5, ups=0.25, wpb=7772.1, bsz=120, num_updates=46220, lr=7.50066e-06, gnorm=1.017, clip=70, loss_scale=32, train_wall=40, gb_free=30, wall=189352 2023-05-03 07:09:39 - progress_bar.py[line:274] - INFO: epoch 008: 4016 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7545.4, nsentences=120, sample_size=4029.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1910.3, ups=0.25, wpb=7545.4, bsz=120, num_updates=46230, lr=7.49538e-06, gnorm=0.985, clip=30, loss_scale=32, train_wall=39, gb_free=26.7, wall=189391 2023-05-03 07:10:19 - progress_bar.py[line:274] - INFO: epoch 008: 4026 / 6042 loss=2.409, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7784.8, nsentences=120, sample_size=4059.6, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1951.9, ups=0.25, wpb=7784.8, bsz=120, num_updates=46240, lr=7.4901e-06, gnorm=0.985, clip=40, loss_scale=32, train_wall=40, gb_free=29.6, wall=189431 2023-05-03 07:10:58 - progress_bar.py[line:274] - INFO: epoch 008: 4036 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7925.1, nsentences=120, sample_size=3813.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2015.1, ups=0.25, wpb=7925.1, bsz=120, num_updates=46250, lr=7.48481e-06, gnorm=1.02, clip=50, loss_scale=32, train_wall=39, gb_free=29.6, wall=189471 2023-05-03 07:11:39 - progress_bar.py[line:274] - INFO: epoch 008: 4046 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7998.1, nsentences=120, sample_size=4049, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1945.6, ups=0.24, wpb=7998.1, bsz=120, num_updates=46260, lr=7.47953e-06, gnorm=0.955, clip=30, loss_scale=32, train_wall=41, gb_free=29.9, wall=189512 2023-05-03 07:12:20 - progress_bar.py[line:274] - INFO: epoch 008: 4056 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7669.5, nsentences=120, sample_size=4127.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1903.4, ups=0.25, wpb=7669.5, bsz=120, num_updates=46270, lr=7.47425e-06, gnorm=0.98, clip=40, loss_scale=32, train_wall=40, gb_free=28.4, wall=189552 2023-05-03 07:12:59 - progress_bar.py[line:274] - INFO: epoch 008: 4066 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7741, nsentences=120, sample_size=3752.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1963.8, ups=0.25, wpb=7741, bsz=120, num_updates=46280, lr=7.46897e-06, gnorm=1.033, clip=70, loss_scale=32, train_wall=39, gb_free=30.2, wall=189591 2023-05-03 07:13:39 - progress_bar.py[line:274] - INFO: epoch 008: 4076 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7818.9, nsentences=120, sample_size=4116.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1970.6, ups=0.25, wpb=7818.9, bsz=120, num_updates=46290, lr=7.46369e-06, gnorm=0.967, clip=40, loss_scale=32, train_wall=40, gb_free=30.4, wall=189631 2023-05-03 07:14:19 - progress_bar.py[line:274] - INFO: epoch 008: 4086 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7838.4, nsentences=120, sample_size=4041.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1946.3, ups=0.25, wpb=7838.4, bsz=120, num_updates=46300, lr=7.4584e-06, gnorm=0.989, clip=60, loss_scale=32, train_wall=40, gb_free=31.2, wall=189671 2023-05-03 07:14:59 - progress_bar.py[line:274] - INFO: epoch 008: 4096 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7880.7, nsentences=120, sample_size=4015.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1961.7, ups=0.25, wpb=7880.7, bsz=120, num_updates=46310, lr=7.45312e-06, gnorm=0.98, clip=20, loss_scale=32, train_wall=40, gb_free=29.5, wall=189712 2023-05-03 07:15:39 - progress_bar.py[line:274] - INFO: epoch 008: 4106 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7873.1, nsentences=120, sample_size=4113.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1983, ups=0.25, wpb=7873.1, bsz=120, num_updates=46320, lr=7.44784e-06, gnorm=0.974, clip=50, loss_scale=32, train_wall=40, gb_free=30, wall=189751 2023-05-03 07:16:18 - progress_bar.py[line:274] - INFO: epoch 008: 4116 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7551.3, nsentences=120, sample_size=3925.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902.2, ups=0.25, wpb=7551.3, bsz=120, num_updates=46330, lr=7.44256e-06, gnorm=0.993, clip=60, loss_scale=32, train_wall=40, gb_free=30.2, wall=189791 2023-05-03 07:16:58 - progress_bar.py[line:274] - INFO: epoch 008: 4126 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7741.6, nsentences=120, sample_size=3991.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1946.7, ups=0.25, wpb=7741.6, bsz=120, num_updates=46340, lr=7.43727e-06, gnorm=0.992, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=189831 2023-05-03 07:17:38 - progress_bar.py[line:274] - INFO: epoch 008: 4136 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7813.2, nsentences=120, sample_size=4052, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1982, ups=0.25, wpb=7813.2, bsz=120, num_updates=46350, lr=7.43199e-06, gnorm=0.967, clip=50, loss_scale=32, train_wall=39, gb_free=28.8, wall=189870 2023-05-03 07:18:17 - progress_bar.py[line:274] - INFO: epoch 008: 4146 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7994.7, nsentences=120, sample_size=3966.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2009.4, ups=0.25, wpb=7994.7, bsz=120, num_updates=46360, lr=7.42671e-06, gnorm=0.98, clip=30, loss_scale=32, train_wall=40, gb_free=30.2, wall=189910 2023-05-03 07:18:57 - progress_bar.py[line:274] - INFO: epoch 008: 4156 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7956.3, nsentences=120, sample_size=3834.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2026.5, ups=0.25, wpb=7956.3, bsz=120, num_updates=46370, lr=7.42143e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=39, gb_free=28, wall=189949 2023-05-03 07:19:37 - progress_bar.py[line:274] - INFO: epoch 008: 4166 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7483.9, nsentences=120, sample_size=4327.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1855.2, ups=0.25, wpb=7483.9, bsz=120, num_updates=46380, lr=7.41615e-06, gnorm=0.973, clip=40, loss_scale=32, train_wall=40, gb_free=30.3, wall=189990 2023-05-03 07:20:17 - progress_bar.py[line:274] - INFO: epoch 008: 4176 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7644.5, nsentences=120, sample_size=3851.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1900.4, ups=0.25, wpb=7644.5, bsz=120, num_updates=46390, lr=7.41086e-06, gnorm=1.008, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=190030 2023-05-03 07:20:57 - progress_bar.py[line:274] - INFO: epoch 008: 4186 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7444, nsentences=120, sample_size=3794.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1875.4, ups=0.25, wpb=7444, bsz=120, num_updates=46400, lr=7.40558e-06, gnorm=1.125, clip=80, loss_scale=32, train_wall=40, gb_free=30.6, wall=190069 2023-05-03 07:21:37 - progress_bar.py[line:274] - INFO: epoch 008: 4196 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7719.7, nsentences=120, sample_size=3998.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1920.5, ups=0.25, wpb=7719.7, bsz=120, num_updates=46410, lr=7.4003e-06, gnorm=0.993, clip=60, loss_scale=32, train_wall=40, gb_free=30.1, wall=190110 2023-05-03 07:22:17 - progress_bar.py[line:274] - INFO: epoch 008: 4206 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7585.3, nsentences=120, sample_size=4125.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1920.3, ups=0.25, wpb=7585.3, bsz=120, num_updates=46420, lr=7.39502e-06, gnorm=0.965, clip=30, loss_scale=32, train_wall=39, gb_free=30.5, wall=190149 2023-05-03 07:22:56 - progress_bar.py[line:274] - INFO: epoch 008: 4216 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7617.9, nsentences=120, sample_size=4102.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1922.5, ups=0.25, wpb=7617.9, bsz=120, num_updates=46430, lr=7.38974e-06, gnorm=0.979, clip=30, loss_scale=32, train_wall=40, gb_free=25.6, wall=190189 2023-05-03 07:23:36 - progress_bar.py[line:274] - INFO: epoch 008: 4226 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7701.4, nsentences=120, sample_size=4222.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1928.4, ups=0.25, wpb=7701.4, bsz=120, num_updates=46440, lr=7.38445e-06, gnorm=0.981, clip=20, loss_scale=32, train_wall=40, gb_free=31.2, wall=190229 2023-05-03 07:24:15 - progress_bar.py[line:274] - INFO: epoch 008: 4236 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7744.2, nsentences=120, sample_size=3856.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1990.1, ups=0.26, wpb=7744.2, bsz=120, num_updates=46450, lr=7.37917e-06, gnorm=1.022, clip=50, loss_scale=32, train_wall=39, gb_free=29.1, wall=190268 2023-05-03 07:24:55 - progress_bar.py[line:274] - INFO: epoch 008: 4246 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7723.6, nsentences=120, sample_size=4042.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1963.4, ups=0.25, wpb=7723.6, bsz=120, num_updates=46460, lr=7.37389e-06, gnorm=0.987, clip=20, loss_scale=32, train_wall=39, gb_free=30.4, wall=190307 2023-05-03 07:25:35 - progress_bar.py[line:274] - INFO: epoch 008: 4256 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7844.8, nsentences=120, sample_size=4072.1, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1940.9, ups=0.25, wpb=7844.8, bsz=120, num_updates=46470, lr=7.36861e-06, gnorm=0.974, clip=30, loss_scale=32, train_wall=40, gb_free=30.7, wall=190347 2023-05-03 07:26:16 - progress_bar.py[line:274] - INFO: epoch 008: 4266 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7565.8, nsentences=120, sample_size=4026.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1861.4, ups=0.25, wpb=7565.8, bsz=120, num_updates=46480, lr=7.36332e-06, gnorm=0.972, clip=30, loss_scale=32, train_wall=41, gb_free=30.3, wall=190388 2023-05-03 07:26:56 - progress_bar.py[line:274] - INFO: epoch 008: 4276 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7994.1, nsentences=120, sample_size=3887.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1977.1, ups=0.25, wpb=7994.1, bsz=120, num_updates=46490, lr=7.35804e-06, gnorm=0.999, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=190428 2023-05-03 07:27:36 - progress_bar.py[line:274] - INFO: epoch 008: 4286 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7655.5, nsentences=120, sample_size=4244.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1907.4, ups=0.25, wpb=7655.5, bsz=120, num_updates=46500, lr=7.35276e-06, gnorm=0.958, clip=30, loss_scale=32, train_wall=40, gb_free=27.8, wall=190469 2023-05-03 07:28:17 - progress_bar.py[line:274] - INFO: epoch 008: 4296 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8107, nsentences=120, sample_size=4116.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2007.8, ups=0.25, wpb=8107, bsz=120, num_updates=46510, lr=7.34748e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=190509 2023-05-03 07:28:57 - progress_bar.py[line:274] - INFO: epoch 008: 4306 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7507.1, nsentences=120, sample_size=3950, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1861, ups=0.25, wpb=7507.1, bsz=120, num_updates=46520, lr=7.3422e-06, gnorm=0.998, clip=50, loss_scale=32, train_wall=40, gb_free=28.1, wall=190549 2023-05-03 07:29:37 - progress_bar.py[line:274] - INFO: epoch 008: 4316 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7631.4, nsentences=120, sample_size=3954.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923.7, ups=0.25, wpb=7631.4, bsz=120, num_updates=46530, lr=7.33691e-06, gnorm=0.989, clip=50, loss_scale=32, train_wall=40, gb_free=30, wall=190589 2023-05-03 07:30:17 - progress_bar.py[line:274] - INFO: epoch 008: 4326 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7790.7, nsentences=120, sample_size=4132.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1944, ups=0.25, wpb=7790.7, bsz=120, num_updates=46540, lr=7.33163e-06, gnorm=0.945, clip=30, loss_scale=32, train_wall=40, gb_free=30.2, wall=190629 2023-05-03 07:30:57 - progress_bar.py[line:274] - INFO: epoch 008: 4336 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7868.7, nsentences=120, sample_size=3838.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1971.7, ups=0.25, wpb=7868.7, bsz=120, num_updates=46550, lr=7.32635e-06, gnorm=1.017, clip=60, loss_scale=32, train_wall=40, gb_free=29.3, wall=190669 2023-05-03 07:31:37 - progress_bar.py[line:274] - INFO: epoch 008: 4346 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7785.8, nsentences=120, sample_size=3683.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1932.2, ups=0.25, wpb=7785.8, bsz=120, num_updates=46560, lr=7.32107e-06, gnorm=1.03, clip=70, loss_scale=32, train_wall=40, gb_free=29.6, wall=190709 2023-05-03 07:32:16 - progress_bar.py[line:274] - INFO: epoch 008: 4356 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7774.9, nsentences=120, sample_size=4178.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1963.6, ups=0.25, wpb=7774.9, bsz=120, num_updates=46570, lr=7.31578e-06, gnorm=0.952, clip=20, loss_scale=32, train_wall=40, gb_free=29.3, wall=190749 2023-05-03 07:32:56 - progress_bar.py[line:274] - INFO: epoch 008: 4366 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7974.2, nsentences=120, sample_size=3755.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1998.3, ups=0.25, wpb=7974.2, bsz=120, num_updates=46580, lr=7.3105e-06, gnorm=0.985, clip=40, loss_scale=32, train_wall=40, gb_free=29.8, wall=190789 2023-05-03 07:33:36 - progress_bar.py[line:274] - INFO: epoch 008: 4376 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7939, nsentences=120, sample_size=3976.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2002.2, ups=0.25, wpb=7939, bsz=120, num_updates=46590, lr=7.30522e-06, gnorm=0.957, clip=20, loss_scale=32, train_wall=40, gb_free=30.3, wall=190828 2023-05-03 07:34:16 - progress_bar.py[line:274] - INFO: epoch 008: 4386 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7532, nsentences=120, sample_size=3991.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1887.6, ups=0.25, wpb=7532, bsz=120, num_updates=46600, lr=7.29994e-06, gnorm=0.984, clip=10, loss_scale=32, train_wall=40, gb_free=30, wall=190868 2023-05-03 07:34:55 - progress_bar.py[line:274] - INFO: epoch 008: 4396 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7552.2, nsentences=120, sample_size=3929.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1923.9, ups=0.25, wpb=7552.2, bsz=120, num_updates=46610, lr=7.29466e-06, gnorm=0.976, clip=60, loss_scale=32, train_wall=39, gb_free=26.4, wall=190908 2023-05-03 07:35:36 - progress_bar.py[line:274] - INFO: epoch 008: 4406 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=8009.6, nsentences=120, sample_size=3926.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1975.8, ups=0.25, wpb=8009.6, bsz=120, num_updates=46620, lr=7.28937e-06, gnorm=1.013, clip=40, loss_scale=32, train_wall=40, gb_free=30.9, wall=190948 2023-05-03 07:36:15 - progress_bar.py[line:274] - INFO: epoch 008: 4416 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7759.9, nsentences=120, sample_size=4124.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1962.8, ups=0.25, wpb=7759.9, bsz=120, num_updates=46630, lr=7.28409e-06, gnorm=0.987, clip=40, loss_scale=32, train_wall=39, gb_free=31, wall=190988 2023-05-03 07:36:55 - progress_bar.py[line:274] - INFO: epoch 008: 4426 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7927.3, nsentences=120, sample_size=3844.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2001.2, ups=0.25, wpb=7927.3, bsz=120, num_updates=46640, lr=7.27881e-06, gnorm=1.004, clip=40, loss_scale=32, train_wall=40, gb_free=28.1, wall=191027 2023-05-03 07:37:34 - progress_bar.py[line:274] - INFO: epoch 008: 4436 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7759.2, nsentences=120, sample_size=3896.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1960.2, ups=0.25, wpb=7759.2, bsz=120, num_updates=46650, lr=7.27353e-06, gnorm=0.99, clip=50, loss_scale=32, train_wall=40, gb_free=26.5, wall=191067 2023-05-03 07:38:14 - progress_bar.py[line:274] - INFO: epoch 008: 4446 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7783.4, nsentences=120, sample_size=4118.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1944.6, ups=0.25, wpb=7783.4, bsz=120, num_updates=46660, lr=7.26825e-06, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=27.5, wall=191107 2023-05-03 07:38:55 - progress_bar.py[line:274] - INFO: epoch 008: 4456 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=8045.7, nsentences=120, sample_size=3931.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2003.6, ups=0.25, wpb=8045.7, bsz=120, num_updates=46670, lr=7.26296e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=191147 2023-05-03 07:39:35 - progress_bar.py[line:274] - INFO: epoch 008: 4466 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7567.6, nsentences=120, sample_size=4336.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1897, ups=0.25, wpb=7567.6, bsz=120, num_updates=46680, lr=7.25768e-06, gnorm=0.976, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=191187 2023-05-03 07:40:14 - progress_bar.py[line:274] - INFO: epoch 008: 4476 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7537.5, nsentences=120, sample_size=3789.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1904.1, ups=0.25, wpb=7537.5, bsz=120, num_updates=46690, lr=7.2524e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=191227 2023-05-03 07:40:55 - progress_bar.py[line:274] - INFO: epoch 008: 4486 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7930.2, nsentences=120, sample_size=3610.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1956.6, ups=0.25, wpb=7930.2, bsz=120, num_updates=46700, lr=7.24712e-06, gnorm=1.024, clip=70, loss_scale=64, train_wall=40, gb_free=29.8, wall=191267 2023-05-03 07:41:35 - progress_bar.py[line:274] - INFO: epoch 008: 4496 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7805.3, nsentences=120, sample_size=4146.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1945.2, ups=0.25, wpb=7805.3, bsz=120, num_updates=46710, lr=7.24183e-06, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=191307 2023-05-03 07:42:15 - progress_bar.py[line:274] - INFO: epoch 008: 4506 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7691.1, nsentences=120, sample_size=4102.4, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1923.7, ups=0.25, wpb=7691.1, bsz=120, num_updates=46720, lr=7.23655e-06, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=191347 2023-05-03 07:42:55 - progress_bar.py[line:274] - INFO: epoch 008: 4516 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=8133.5, nsentences=120, sample_size=3721.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2031.9, ups=0.25, wpb=8133.5, bsz=120, num_updates=46730, lr=7.23127e-06, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=27.2, wall=191387 2023-05-03 07:43:35 - progress_bar.py[line:274] - INFO: epoch 008: 4526 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.169, ntokens=7878.4, nsentences=120, sample_size=4187.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1977.1, ups=0.25, wpb=7878.4, bsz=120, num_updates=46740, lr=7.22599e-06, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=28.5, wall=191427 2023-05-03 07:44:14 - progress_bar.py[line:274] - INFO: epoch 008: 4536 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7464.8, nsentences=120, sample_size=3995.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1884, ups=0.25, wpb=7464.8, bsz=120, num_updates=46750, lr=7.22071e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=191467 2023-05-03 07:44:54 - progress_bar.py[line:274] - INFO: epoch 008: 4546 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7950, nsentences=120, sample_size=3884.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1997, ups=0.25, wpb=7950, bsz=120, num_updates=46760, lr=7.21542e-06, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=191507 2023-05-03 07:45:35 - progress_bar.py[line:274] - INFO: epoch 008: 4556 / 6042 loss=2.423, loss_v1=0, loss_v2=0, nll_loss=1.179, ntokens=7778.4, nsentences=120, sample_size=4072.5, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1916.3, ups=0.25, wpb=7778.4, bsz=120, num_updates=46770, lr=7.21014e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=30.3, wall=191547 2023-05-03 07:46:14 - progress_bar.py[line:274] - INFO: epoch 008: 4566 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7841.5, nsentences=120, sample_size=3684.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1983.1, ups=0.25, wpb=7841.5, bsz=120, num_updates=46780, lr=7.20486e-06, gnorm=1.008, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=191587 2023-05-03 07:46:55 - progress_bar.py[line:274] - INFO: epoch 008: 4576 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7659.6, nsentences=120, sample_size=4157.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1896.8, ups=0.25, wpb=7659.6, bsz=120, num_updates=46790, lr=7.19958e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=191627 2023-05-03 07:47:34 - progress_bar.py[line:274] - INFO: epoch 008: 4586 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7649.2, nsentences=120, sample_size=4306.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1932, ups=0.25, wpb=7649.2, bsz=120, num_updates=46800, lr=7.1943e-06, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=191667 2023-05-03 07:48:15 - progress_bar.py[line:274] - INFO: epoch 008: 4596 / 6042 loss=2.424, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=7960.4, nsentences=120, sample_size=3973.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1968.3, ups=0.25, wpb=7960.4, bsz=120, num_updates=46810, lr=7.18901e-06, gnorm=1.022, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=191707 2023-05-03 07:48:56 - progress_bar.py[line:274] - INFO: epoch 008: 4606 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=8019.7, nsentences=120, sample_size=4017.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.2, ups=0.24, wpb=8019.7, bsz=120, num_updates=46820, lr=7.18373e-06, gnorm=0.947, clip=10, loss_scale=64, train_wall=41, gb_free=30.9, wall=191748 2023-05-03 07:49:35 - progress_bar.py[line:274] - INFO: epoch 008: 4616 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7819.5, nsentences=120, sample_size=4008.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1976.3, ups=0.25, wpb=7819.5, bsz=120, num_updates=46830, lr=7.17845e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=191788 2023-05-03 07:50:15 - progress_bar.py[line:274] - INFO: epoch 008: 4626 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7689.9, nsentences=120, sample_size=4227.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1915.2, ups=0.25, wpb=7689.9, bsz=120, num_updates=46840, lr=7.17317e-06, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=191828 2023-05-03 07:50:56 - progress_bar.py[line:274] - INFO: epoch 008: 4636 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7952.9, nsentences=120, sample_size=3939.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1971, ups=0.25, wpb=7952.9, bsz=120, num_updates=46850, lr=7.16788e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=191868 2023-05-03 07:51:35 - progress_bar.py[line:274] - INFO: epoch 008: 4646 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7894, nsentences=120, sample_size=4115.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1996.3, ups=0.25, wpb=7894, bsz=120, num_updates=46860, lr=7.1626e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=39, gb_free=28.7, wall=191908 2023-05-03 07:52:14 - progress_bar.py[line:274] - INFO: epoch 008: 4656 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7669.2, nsentences=120, sample_size=4247, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1957.4, ups=0.26, wpb=7669.2, bsz=120, num_updates=46870, lr=7.15732e-06, gnorm=0.947, clip=30, loss_scale=64, train_wall=39, gb_free=30.8, wall=191947 2023-05-03 07:52:53 - progress_bar.py[line:274] - INFO: epoch 008: 4666 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7598, nsentences=120, sample_size=4127.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1965.9, ups=0.26, wpb=7598, bsz=120, num_updates=46880, lr=7.15204e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=191986 2023-05-03 07:53:32 - progress_bar.py[line:274] - INFO: epoch 008: 4676 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7573.5, nsentences=120, sample_size=4060.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1969, ups=0.26, wpb=7573.5, bsz=120, num_updates=46890, lr=7.14676e-06, gnorm=0.983, clip=20, loss_scale=64, train_wall=38, gb_free=30.8, wall=192024 2023-05-03 07:54:11 - progress_bar.py[line:274] - INFO: epoch 008: 4686 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7620.8, nsentences=120, sample_size=4300.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1919.7, ups=0.25, wpb=7620.8, bsz=120, num_updates=46900, lr=7.14147e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=192064 2023-05-03 07:54:51 - progress_bar.py[line:274] - INFO: epoch 008: 4696 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7628.7, nsentences=120, sample_size=4021.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1912, ups=0.25, wpb=7628.7, bsz=120, num_updates=46910, lr=7.13619e-06, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=192104 2023-05-03 07:55:31 - progress_bar.py[line:274] - INFO: epoch 008: 4706 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7386.3, nsentences=120, sample_size=4266.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1873.5, ups=0.25, wpb=7386.3, bsz=120, num_updates=46920, lr=7.13091e-06, gnorm=0.985, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=192143 2023-05-03 07:56:09 - progress_bar.py[line:274] - INFO: epoch 008: 4716 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7779.5, nsentences=120, sample_size=4037.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2011.8, ups=0.26, wpb=7779.5, bsz=120, num_updates=46930, lr=7.12563e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=26.8, wall=192182 2023-05-03 07:56:49 - progress_bar.py[line:274] - INFO: epoch 008: 4726 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7977, nsentences=120, sample_size=4080.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1997.9, ups=0.25, wpb=7977, bsz=120, num_updates=46940, lr=7.12035e-06, gnorm=0.961, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=192222 2023-05-03 07:57:29 - progress_bar.py[line:274] - INFO: epoch 008: 4736 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7846.5, nsentences=120, sample_size=3742.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973.2, ups=0.25, wpb=7846.5, bsz=120, num_updates=46950, lr=7.11506e-06, gnorm=1.012, clip=50, loss_scale=64, train_wall=40, gb_free=26.6, wall=192261 2023-05-03 07:58:09 - progress_bar.py[line:274] - INFO: epoch 008: 4746 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7687, nsentences=120, sample_size=4267.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1900.8, ups=0.25, wpb=7687, bsz=120, num_updates=46960, lr=7.10978e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=192302 2023-05-03 07:58:49 - progress_bar.py[line:274] - INFO: epoch 008: 4756 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7796.6, nsentences=120, sample_size=4084, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1986.6, ups=0.25, wpb=7796.6, bsz=120, num_updates=46970, lr=7.1045e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=192341 2023-05-03 07:59:29 - progress_bar.py[line:274] - INFO: epoch 008: 4766 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7647.1, nsentences=120, sample_size=3843.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1902.2, ups=0.25, wpb=7647.1, bsz=120, num_updates=46980, lr=7.09922e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=192381 2023-05-03 08:00:09 - progress_bar.py[line:274] - INFO: epoch 008: 4776 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7609, nsentences=120, sample_size=4157.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1887.2, ups=0.25, wpb=7609, bsz=120, num_updates=46990, lr=7.09393e-06, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=192422 2023-05-03 08:00:49 - progress_bar.py[line:274] - INFO: epoch 008: 4786 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.176, ntokens=7486.1, nsentences=120, sample_size=3838.8, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1886.9, ups=0.25, wpb=7486.1, bsz=120, num_updates=47000, lr=7.08865e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=192461 2023-05-03 08:00:49 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 08:00:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 08:00:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:00:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:00:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:00:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 08:01:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:01:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 08:01:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:01:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 08:01:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:01:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:35 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 08:01:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:01:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:40 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 08:01:40 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 08:01:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 08:01:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 08:01:40 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.253 | loss_v1 0 | loss_v2 0 | nll_loss 2.086 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.25 | score 0.7583 | wps 3300 | wpb 3202.1 | bsz 39.4 | num_updates 47000 | best_score 0.7627 2023-05-03 08:01:40 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 47000 updates 2023-05-03 08:01:40 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_47000.pt 2023-05-03 08:02:05 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_47000.pt 2023-05-03 08:02:18 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_47000.pt (epoch 8 @ 47000 updates, score 0.7583) (writing took 38.55611839191988 seconds) 2023-05-03 08:02:58 - progress_bar.py[line:274] - INFO: epoch 008: 4796 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7708.2, nsentences=120, sample_size=4084.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=595.6, ups=0.08, wpb=7708.2, bsz=120, num_updates=47010, lr=7.08337e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=192591 2023-05-03 08:03:39 - progress_bar.py[line:274] - INFO: epoch 008: 4806 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7471.4, nsentences=120, sample_size=4168, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1833.4, ups=0.25, wpb=7471.4, bsz=120, num_updates=47020, lr=7.07809e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=41, gb_free=30.5, wall=192631 2023-05-03 08:04:19 - progress_bar.py[line:274] - INFO: epoch 008: 4816 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7602.2, nsentences=120, sample_size=4414.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1878, ups=0.25, wpb=7602.2, bsz=120, num_updates=47030, lr=7.07281e-06, gnorm=0.937, clip=30, loss_scale=64, train_wall=40, gb_free=27.4, wall=192672 2023-05-03 08:04:59 - progress_bar.py[line:274] - INFO: epoch 008: 4826 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7684.8, nsentences=120, sample_size=3888.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1951.6, ups=0.25, wpb=7684.8, bsz=120, num_updates=47040, lr=7.06752e-06, gnorm=1.012, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=192711 2023-05-03 08:05:39 - progress_bar.py[line:274] - INFO: epoch 008: 4836 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7934.6, nsentences=120, sample_size=3965.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1973.6, ups=0.25, wpb=7934.6, bsz=120, num_updates=47050, lr=7.06224e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=192751 2023-05-03 08:06:19 - progress_bar.py[line:274] - INFO: epoch 008: 4846 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7821, nsentences=120, sample_size=3992.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1962, ups=0.25, wpb=7821, bsz=120, num_updates=47060, lr=7.05696e-06, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=192791 2023-05-03 08:06:58 - progress_bar.py[line:274] - INFO: epoch 008: 4856 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7507.2, nsentences=120, sample_size=4043.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1919.7, ups=0.26, wpb=7507.2, bsz=120, num_updates=47070, lr=7.05168e-06, gnorm=0.996, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=192830 2023-05-03 08:07:38 - progress_bar.py[line:274] - INFO: epoch 008: 4866 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7810.3, nsentences=120, sample_size=4134.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1977.2, ups=0.25, wpb=7810.3, bsz=120, num_updates=47080, lr=7.04639e-06, gnorm=0.958, clip=10, loss_scale=64, train_wall=39, gb_free=30.3, wall=192870 2023-05-03 08:08:17 - progress_bar.py[line:274] - INFO: epoch 008: 4876 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7652.6, nsentences=120, sample_size=4066.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914.6, ups=0.25, wpb=7652.6, bsz=120, num_updates=47090, lr=7.04111e-06, gnorm=1.01, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=192910 2023-05-03 08:08:57 - progress_bar.py[line:274] - INFO: epoch 008: 4886 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7497.4, nsentences=120, sample_size=4403.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1901.7, ups=0.25, wpb=7497.4, bsz=120, num_updates=47100, lr=7.03583e-06, gnorm=0.948, clip=0, loss_scale=64, train_wall=39, gb_free=28.9, wall=192949 2023-05-03 08:09:37 - progress_bar.py[line:274] - INFO: epoch 008: 4896 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7502.7, nsentences=120, sample_size=4193.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1873.6, ups=0.25, wpb=7502.7, bsz=120, num_updates=47110, lr=7.03055e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=192989 2023-05-03 08:10:16 - progress_bar.py[line:274] - INFO: epoch 008: 4906 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7497.4, nsentences=120, sample_size=3890.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1908.1, ups=0.25, wpb=7497.4, bsz=120, num_updates=47120, lr=7.02527e-06, gnorm=1.028, clip=80, loss_scale=64, train_wall=39, gb_free=29.5, wall=193029 2023-05-03 08:10:57 - progress_bar.py[line:274] - INFO: epoch 008: 4916 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7653.1, nsentences=120, sample_size=3943.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1881, ups=0.25, wpb=7653.1, bsz=120, num_updates=47130, lr=7.01998e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=41, gb_free=29.9, wall=193069 2023-05-03 08:11:37 - progress_bar.py[line:274] - INFO: epoch 008: 4926 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=8120.9, nsentences=120, sample_size=3949.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2051.3, ups=0.25, wpb=8120.9, bsz=120, num_updates=47140, lr=7.0147e-06, gnorm=1.005, clip=60, loss_scale=64, train_wall=40, gb_free=29.5, wall=193109 2023-05-03 08:12:16 - progress_bar.py[line:274] - INFO: epoch 008: 4936 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7725.6, nsentences=120, sample_size=4319.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1938.5, ups=0.25, wpb=7725.6, bsz=120, num_updates=47150, lr=7.00942e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=193149 2023-05-03 08:12:56 - progress_bar.py[line:274] - INFO: epoch 008: 4946 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7891.4, nsentences=120, sample_size=3850.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1978.8, ups=0.25, wpb=7891.4, bsz=120, num_updates=47160, lr=7.00414e-06, gnorm=1.036, clip=80, loss_scale=64, train_wall=40, gb_free=30.8, wall=193189 2023-05-03 08:13:37 - progress_bar.py[line:274] - INFO: epoch 008: 4956 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7922.2, nsentences=120, sample_size=3930.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1949.1, ups=0.25, wpb=7922.2, bsz=120, num_updates=47170, lr=6.99886e-06, gnorm=1.005, clip=40, loss_scale=128, train_wall=41, gb_free=31.1, wall=193229 2023-05-03 08:13:49 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 08:14:21 - progress_bar.py[line:274] - INFO: epoch 008: 4967 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7530.7, nsentences=120, sample_size=4061.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1704.1, ups=0.23, wpb=7530.7, bsz=120, num_updates=47180, lr=6.99357e-06, gnorm=0.986, clip=30, loss_scale=64, train_wall=44, gb_free=29, wall=193274 2023-05-03 08:15:01 - progress_bar.py[line:274] - INFO: epoch 008: 4977 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7723.2, nsentences=120, sample_size=3769.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1913.9, ups=0.25, wpb=7723.2, bsz=120, num_updates=47190, lr=6.98829e-06, gnorm=1, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=193314 2023-05-03 08:15:41 - progress_bar.py[line:274] - INFO: epoch 008: 4987 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7628.3, nsentences=120, sample_size=4321.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1920.9, ups=0.25, wpb=7628.3, bsz=120, num_updates=47200, lr=6.98301e-06, gnorm=0.951, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=193354 2023-05-03 08:16:21 - progress_bar.py[line:274] - INFO: epoch 008: 4997 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7967.4, nsentences=120, sample_size=4389.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2003.5, ups=0.25, wpb=7967.4, bsz=120, num_updates=47210, lr=6.97773e-06, gnorm=0.932, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=193393 2023-05-03 08:17:01 - progress_bar.py[line:274] - INFO: epoch 008: 5007 / 6042 loss=2.41, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7589.5, nsentences=120, sample_size=4157.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1886, ups=0.25, wpb=7589.5, bsz=120, num_updates=47220, lr=6.97244e-06, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=193434 2023-05-03 08:17:41 - progress_bar.py[line:274] - INFO: epoch 008: 5017 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7670.5, nsentences=120, sample_size=4149.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1922.3, ups=0.25, wpb=7670.5, bsz=120, num_updates=47230, lr=6.96716e-06, gnorm=1, clip=60, loss_scale=64, train_wall=40, gb_free=28.7, wall=193474 2023-05-03 08:18:21 - progress_bar.py[line:274] - INFO: epoch 008: 5027 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7798.9, nsentences=120, sample_size=3961.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1975.5, ups=0.25, wpb=7798.9, bsz=120, num_updates=47240, lr=6.96188e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=39, gb_free=30.7, wall=193513 2023-05-03 08:19:01 - progress_bar.py[line:274] - INFO: epoch 008: 5037 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=8173.7, nsentences=120, sample_size=3895.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2037.6, ups=0.25, wpb=8173.7, bsz=120, num_updates=47250, lr=6.9566e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=193553 2023-05-03 08:19:41 - progress_bar.py[line:274] - INFO: epoch 008: 5047 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7873.1, nsentences=120, sample_size=3691.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1947.2, ups=0.25, wpb=7873.1, bsz=120, num_updates=47260, lr=6.95132e-06, gnorm=0.998, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=193594 2023-05-03 08:20:21 - progress_bar.py[line:274] - INFO: epoch 008: 5057 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7643, nsentences=120, sample_size=3948.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1921.6, ups=0.25, wpb=7643, bsz=120, num_updates=47270, lr=6.94603e-06, gnorm=0.987, clip=60, loss_scale=64, train_wall=40, gb_free=30.6, wall=193633 2023-05-03 08:21:01 - progress_bar.py[line:274] - INFO: epoch 008: 5067 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7556.1, nsentences=120, sample_size=4035.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1899.1, ups=0.25, wpb=7556.1, bsz=120, num_updates=47280, lr=6.94075e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=193673 2023-05-03 08:21:41 - progress_bar.py[line:274] - INFO: epoch 008: 5077 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7764.6, nsentences=120, sample_size=3898.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1947.9, ups=0.25, wpb=7764.6, bsz=120, num_updates=47290, lr=6.93547e-06, gnorm=0.988, clip=60, loss_scale=64, train_wall=40, gb_free=28.2, wall=193713 2023-05-03 08:22:20 - progress_bar.py[line:274] - INFO: epoch 008: 5087 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7704.7, nsentences=120, sample_size=4367.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1936.9, ups=0.25, wpb=7704.7, bsz=120, num_updates=47300, lr=6.93019e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=193753 2023-05-03 08:23:00 - progress_bar.py[line:274] - INFO: epoch 008: 5097 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7634.2, nsentences=120, sample_size=4129.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1911.5, ups=0.25, wpb=7634.2, bsz=120, num_updates=47310, lr=6.92491e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=193793 2023-05-03 08:23:40 - progress_bar.py[line:274] - INFO: epoch 008: 5107 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7814.1, nsentences=120, sample_size=3937.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1948.7, ups=0.25, wpb=7814.1, bsz=120, num_updates=47320, lr=6.91962e-06, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=193833 2023-05-03 08:24:20 - progress_bar.py[line:274] - INFO: epoch 008: 5117 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7701, nsentences=120, sample_size=4116.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1962.8, ups=0.25, wpb=7701, bsz=120, num_updates=47330, lr=6.91434e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=193872 2023-05-03 08:25:00 - progress_bar.py[line:274] - INFO: epoch 008: 5127 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7627, nsentences=120, sample_size=3995.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902, ups=0.25, wpb=7627, bsz=120, num_updates=47340, lr=6.90906e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=193912 2023-05-03 08:25:39 - progress_bar.py[line:274] - INFO: epoch 008: 5137 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7572.6, nsentences=120, sample_size=3848, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1910.9, ups=0.25, wpb=7572.6, bsz=120, num_updates=47350, lr=6.90378e-06, gnorm=1.01, clip=50, loss_scale=64, train_wall=40, gb_free=30.9, wall=193952 2023-05-03 08:26:19 - progress_bar.py[line:274] - INFO: epoch 008: 5147 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7775.4, nsentences=120, sample_size=3797.1, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1955, ups=0.25, wpb=7775.4, bsz=120, num_updates=47360, lr=6.89849e-06, gnorm=1.022, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=193992 2023-05-03 08:26:59 - progress_bar.py[line:274] - INFO: epoch 008: 5157 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7728.3, nsentences=120, sample_size=4263, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1950.5, ups=0.25, wpb=7728.3, bsz=120, num_updates=47370, lr=6.89321e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=194031 2023-05-03 08:27:39 - progress_bar.py[line:274] - INFO: epoch 008: 5167 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7986.2, nsentences=120, sample_size=4126.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1984, ups=0.25, wpb=7986.2, bsz=120, num_updates=47380, lr=6.88793e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=194071 2023-05-03 08:28:19 - progress_bar.py[line:274] - INFO: epoch 008: 5177 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7603.6, nsentences=120, sample_size=4008.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1905.9, ups=0.25, wpb=7603.6, bsz=120, num_updates=47390, lr=6.88265e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=194111 2023-05-03 08:28:59 - progress_bar.py[line:274] - INFO: epoch 008: 5187 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=8208.5, nsentences=120, sample_size=4229.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2025.9, ups=0.25, wpb=8208.5, bsz=120, num_updates=47400, lr=6.87737e-06, gnorm=0.95, clip=10, loss_scale=64, train_wall=40, gb_free=30.9, wall=194152 2023-05-03 08:29:39 - progress_bar.py[line:274] - INFO: epoch 008: 5197 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7949.8, nsentences=120, sample_size=4011.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2009.6, ups=0.25, wpb=7949.8, bsz=120, num_updates=47410, lr=6.87208e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=28.5, wall=194191 2023-05-03 08:30:18 - progress_bar.py[line:274] - INFO: epoch 008: 5207 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7543.4, nsentences=120, sample_size=4062.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919, ups=0.25, wpb=7543.4, bsz=120, num_updates=47420, lr=6.8668e-06, gnorm=1.002, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=194231 2023-05-03 08:30:58 - progress_bar.py[line:274] - INFO: epoch 008: 5217 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7961.2, nsentences=120, sample_size=3881.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1997.5, ups=0.25, wpb=7961.2, bsz=120, num_updates=47430, lr=6.86152e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=194271 2023-05-03 08:31:38 - progress_bar.py[line:274] - INFO: epoch 008: 5227 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7724.8, nsentences=120, sample_size=3804.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1950.2, ups=0.25, wpb=7724.8, bsz=120, num_updates=47440, lr=6.85624e-06, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=194310 2023-05-03 08:32:18 - progress_bar.py[line:274] - INFO: epoch 008: 5237 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7598.1, nsentences=120, sample_size=4043.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1907.6, ups=0.25, wpb=7598.1, bsz=120, num_updates=47450, lr=6.85096e-06, gnorm=0.98, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=194350 2023-05-03 08:32:57 - progress_bar.py[line:274] - INFO: epoch 008: 5247 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7789, nsentences=120, sample_size=4132.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1969.6, ups=0.25, wpb=7789, bsz=120, num_updates=47460, lr=6.84567e-06, gnorm=0.951, clip=10, loss_scale=64, train_wall=39, gb_free=29, wall=194390 2023-05-03 08:33:37 - progress_bar.py[line:274] - INFO: epoch 008: 5257 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7706.7, nsentences=120, sample_size=4104.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927.6, ups=0.25, wpb=7706.7, bsz=120, num_updates=47470, lr=6.84039e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=194430 2023-05-03 08:34:16 - progress_bar.py[line:274] - INFO: epoch 008: 5267 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7637.3, nsentences=120, sample_size=3882.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1950.2, ups=0.26, wpb=7637.3, bsz=120, num_updates=47480, lr=6.83511e-06, gnorm=0.995, clip=50, loss_scale=64, train_wall=39, gb_free=31.2, wall=194469 2023-05-03 08:34:56 - progress_bar.py[line:274] - INFO: epoch 008: 5277 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7624.7, nsentences=120, sample_size=4174.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1895.8, ups=0.25, wpb=7624.7, bsz=120, num_updates=47490, lr=6.82983e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=194509 2023-05-03 08:35:36 - progress_bar.py[line:274] - INFO: epoch 008: 5287 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7637.5, nsentences=120, sample_size=4171.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1920.9, ups=0.25, wpb=7637.5, bsz=120, num_updates=47500, lr=6.82454e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=194549 2023-05-03 08:36:16 - progress_bar.py[line:274] - INFO: epoch 008: 5297 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7493.1, nsentences=120, sample_size=4061.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1904.9, ups=0.25, wpb=7493.1, bsz=120, num_updates=47510, lr=6.81926e-06, gnorm=0.992, clip=20, loss_scale=64, train_wall=39, gb_free=25.2, wall=194588 2023-05-03 08:36:55 - progress_bar.py[line:274] - INFO: epoch 008: 5307 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7692.8, nsentences=120, sample_size=3966.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1956.6, ups=0.25, wpb=7692.8, bsz=120, num_updates=47520, lr=6.81398e-06, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=31, wall=194627 2023-05-03 08:37:35 - progress_bar.py[line:274] - INFO: epoch 008: 5317 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7712.1, nsentences=120, sample_size=3742.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1900.8, ups=0.25, wpb=7712.1, bsz=120, num_updates=47530, lr=6.8087e-06, gnorm=1.04, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=194668 2023-05-03 08:38:15 - progress_bar.py[line:274] - INFO: epoch 008: 5327 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7857.8, nsentences=120, sample_size=3770.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1972.3, ups=0.25, wpb=7857.8, bsz=120, num_updates=47540, lr=6.80342e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=194708 2023-05-03 08:38:55 - progress_bar.py[line:274] - INFO: epoch 008: 5337 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7629.2, nsentences=120, sample_size=4255.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1899.7, ups=0.25, wpb=7629.2, bsz=120, num_updates=47550, lr=6.79813e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=194748 2023-05-03 08:39:35 - progress_bar.py[line:274] - INFO: epoch 008: 5347 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7680.8, nsentences=120, sample_size=4165.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1965.4, ups=0.26, wpb=7680.8, bsz=120, num_updates=47560, lr=6.79285e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=39, gb_free=29.7, wall=194787 2023-05-03 08:40:14 - progress_bar.py[line:274] - INFO: epoch 008: 5357 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7932.2, nsentences=120, sample_size=4070.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=2003.2, ups=0.25, wpb=7932.2, bsz=120, num_updates=47570, lr=6.78757e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=194827 2023-05-03 08:40:54 - progress_bar.py[line:274] - INFO: epoch 008: 5367 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7845.9, nsentences=120, sample_size=3421.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1993, ups=0.25, wpb=7845.9, bsz=120, num_updates=47580, lr=6.78229e-06, gnorm=1.045, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=194866 2023-05-03 08:41:33 - progress_bar.py[line:274] - INFO: epoch 008: 5377 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7588.3, nsentences=120, sample_size=4105.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.1, ups=0.25, wpb=7588.3, bsz=120, num_updates=47590, lr=6.77701e-06, gnorm=0.956, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=194906 2023-05-03 08:42:12 - progress_bar.py[line:274] - INFO: epoch 008: 5387 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7490, nsentences=120, sample_size=4088.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1926.2, ups=0.26, wpb=7490, bsz=120, num_updates=47600, lr=6.77172e-06, gnorm=0.975, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=194944 2023-05-03 08:42:52 - progress_bar.py[line:274] - INFO: epoch 008: 5397 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7510.9, nsentences=120, sample_size=3881.4, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1874.9, ups=0.25, wpb=7510.9, bsz=120, num_updates=47610, lr=6.76644e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=194984 2023-05-03 08:43:31 - progress_bar.py[line:274] - INFO: epoch 008: 5407 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7570, nsentences=120, sample_size=4206.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1963.5, ups=0.26, wpb=7570, bsz=120, num_updates=47620, lr=6.76116e-06, gnorm=1.006, clip=40, loss_scale=64, train_wall=38, gb_free=30.5, wall=195023 2023-05-03 08:44:10 - progress_bar.py[line:274] - INFO: epoch 008: 5417 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7858.8, nsentences=120, sample_size=4063.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1986.9, ups=0.25, wpb=7858.8, bsz=120, num_updates=47630, lr=6.75588e-06, gnorm=0.964, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=195063 2023-05-03 08:44:50 - progress_bar.py[line:274] - INFO: epoch 008: 5427 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.155, ntokens=7623.6, nsentences=120, sample_size=4129.6, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1909, ups=0.25, wpb=7623.6, bsz=120, num_updates=47640, lr=6.75059e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=31.6, wall=195103 2023-05-03 08:45:30 - progress_bar.py[line:274] - INFO: epoch 008: 5437 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7496.9, nsentences=120, sample_size=3880.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1863.2, ups=0.25, wpb=7496.9, bsz=120, num_updates=47650, lr=6.74531e-06, gnorm=1.011, clip=40, loss_scale=64, train_wall=40, gb_free=28.1, wall=195143 2023-05-03 08:46:10 - progress_bar.py[line:274] - INFO: epoch 008: 5447 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7851.6, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1966.5, ups=0.25, wpb=7851.6, bsz=120, num_updates=47660, lr=6.74003e-06, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=195183 2023-05-03 08:46:50 - progress_bar.py[line:274] - INFO: epoch 008: 5457 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7887.5, nsentences=120, sample_size=4080.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1958.4, ups=0.25, wpb=7887.5, bsz=120, num_updates=47670, lr=6.73475e-06, gnorm=0.958, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=195223 2023-05-03 08:47:30 - progress_bar.py[line:274] - INFO: epoch 008: 5467 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7874.9, nsentences=120, sample_size=3906, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1974.8, ups=0.25, wpb=7874.9, bsz=120, num_updates=47680, lr=6.72947e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=195263 2023-05-03 08:48:10 - progress_bar.py[line:274] - INFO: epoch 008: 5477 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7615.9, nsentences=120, sample_size=4093.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1915.3, ups=0.25, wpb=7615.9, bsz=120, num_updates=47690, lr=6.72418e-06, gnorm=0.972, clip=20, loss_scale=128, train_wall=40, gb_free=30.2, wall=195303 2023-05-03 08:48:41 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 08:48:53 - progress_bar.py[line:274] - INFO: epoch 008: 5488 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7596.4, nsentences=120, sample_size=4074.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1758.2, ups=0.23, wpb=7596.4, bsz=120, num_updates=47700, lr=6.7189e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=43, gb_free=28.6, wall=195346 2023-05-03 08:49:33 - progress_bar.py[line:274] - INFO: epoch 008: 5498 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7888.9, nsentences=120, sample_size=3810.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1981.7, ups=0.25, wpb=7888.9, bsz=120, num_updates=47710, lr=6.71362e-06, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=195386 2023-05-03 08:50:13 - progress_bar.py[line:274] - INFO: epoch 008: 5508 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7410.1, nsentences=120, sample_size=4007.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1871.8, ups=0.25, wpb=7410.1, bsz=120, num_updates=47720, lr=6.70834e-06, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=195425 2023-05-03 08:50:53 - progress_bar.py[line:274] - INFO: epoch 008: 5518 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7727, nsentences=120, sample_size=4064.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1938.7, ups=0.25, wpb=7727, bsz=120, num_updates=47730, lr=6.70305e-06, gnorm=0.993, clip=60, loss_scale=64, train_wall=40, gb_free=30.3, wall=195465 2023-05-03 08:51:33 - progress_bar.py[line:274] - INFO: epoch 008: 5528 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7324.3, nsentences=120, sample_size=4270.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1828, ups=0.25, wpb=7324.3, bsz=120, num_updates=47740, lr=6.69777e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=195505 2023-05-03 08:52:12 - progress_bar.py[line:274] - INFO: epoch 008: 5538 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7631, nsentences=120, sample_size=4132.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1939.7, ups=0.25, wpb=7631, bsz=120, num_updates=47750, lr=6.69249e-06, gnorm=0.943, clip=30, loss_scale=64, train_wall=39, gb_free=29.3, wall=195544 2023-05-03 08:52:52 - progress_bar.py[line:274] - INFO: epoch 008: 5548 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7785.4, nsentences=120, sample_size=4109.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1944.4, ups=0.25, wpb=7785.4, bsz=120, num_updates=47760, lr=6.68721e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=195585 2023-05-03 08:53:32 - progress_bar.py[line:274] - INFO: epoch 008: 5558 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7938.2, nsentences=120, sample_size=4263.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1974.3, ups=0.25, wpb=7938.2, bsz=120, num_updates=47770, lr=6.68193e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=28.2, wall=195625 2023-05-03 08:54:12 - progress_bar.py[line:274] - INFO: epoch 008: 5568 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7897.5, nsentences=120, sample_size=4008.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1968.8, ups=0.25, wpb=7897.5, bsz=120, num_updates=47780, lr=6.67664e-06, gnorm=0.99, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=195665 2023-05-03 08:54:52 - progress_bar.py[line:274] - INFO: epoch 008: 5578 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7550.1, nsentences=120, sample_size=4092.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1908.9, ups=0.25, wpb=7550.1, bsz=120, num_updates=47790, lr=6.67136e-06, gnorm=0.995, clip=50, loss_scale=64, train_wall=39, gb_free=29, wall=195704 2023-05-03 08:55:32 - progress_bar.py[line:274] - INFO: epoch 008: 5588 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=8115.5, nsentences=120, sample_size=4312.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=2032.8, ups=0.25, wpb=8115.5, bsz=120, num_updates=47800, lr=6.66608e-06, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=195744 2023-05-03 08:56:12 - progress_bar.py[line:274] - INFO: epoch 008: 5598 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7681.2, nsentences=120, sample_size=4048.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1936.5, ups=0.25, wpb=7681.2, bsz=120, num_updates=47810, lr=6.6608e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=31.6, wall=195784 2023-05-03 08:56:51 - progress_bar.py[line:274] - INFO: epoch 008: 5608 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7575.3, nsentences=120, sample_size=4050, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1923.6, ups=0.25, wpb=7575.3, bsz=120, num_updates=47820, lr=6.65552e-06, gnorm=0.987, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=195823 2023-05-03 08:57:31 - progress_bar.py[line:274] - INFO: epoch 008: 5618 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7905.9, nsentences=120, sample_size=3972, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1966.2, ups=0.25, wpb=7905.9, bsz=120, num_updates=47830, lr=6.65023e-06, gnorm=1.034, clip=60, loss_scale=64, train_wall=40, gb_free=30.8, wall=195864 2023-05-03 08:58:11 - progress_bar.py[line:274] - INFO: epoch 008: 5628 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7681.2, nsentences=120, sample_size=4240, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1949, ups=0.25, wpb=7681.2, bsz=120, num_updates=47840, lr=6.64495e-06, gnorm=0.945, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=195903 2023-05-03 08:58:51 - progress_bar.py[line:274] - INFO: epoch 008: 5638 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7786, nsentences=120, sample_size=4212.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1907, ups=0.24, wpb=7786, bsz=120, num_updates=47850, lr=6.63967e-06, gnorm=0.999, clip=40, loss_scale=64, train_wall=41, gb_free=25.7, wall=195944 2023-05-03 08:59:31 - progress_bar.py[line:274] - INFO: epoch 008: 5648 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7411.3, nsentences=120, sample_size=4019.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1854.3, ups=0.25, wpb=7411.3, bsz=120, num_updates=47860, lr=6.63439e-06, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=28.4, wall=195984 2023-05-03 09:00:12 - progress_bar.py[line:274] - INFO: epoch 008: 5658 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7718.5, nsentences=120, sample_size=4204.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1896.7, ups=0.25, wpb=7718.5, bsz=120, num_updates=47870, lr=6.6291e-06, gnorm=0.97, clip=20, loss_scale=64, train_wall=41, gb_free=29.2, wall=196024 2023-05-03 09:00:52 - progress_bar.py[line:274] - INFO: epoch 008: 5668 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7854.9, nsentences=120, sample_size=3952, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1986.9, ups=0.25, wpb=7854.9, bsz=120, num_updates=47880, lr=6.62382e-06, gnorm=0.994, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=196064 2023-05-03 09:01:32 - progress_bar.py[line:274] - INFO: epoch 008: 5678 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7711.8, nsentences=120, sample_size=4183.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1920.7, ups=0.25, wpb=7711.8, bsz=120, num_updates=47890, lr=6.61854e-06, gnorm=0.969, clip=10, loss_scale=64, train_wall=40, gb_free=29.4, wall=196104 2023-05-03 09:02:13 - progress_bar.py[line:274] - INFO: epoch 008: 5688 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7813.4, nsentences=120, sample_size=4272.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1906.4, ups=0.24, wpb=7813.4, bsz=120, num_updates=47900, lr=6.61326e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=41, gb_free=28.4, wall=196145 2023-05-03 09:02:53 - progress_bar.py[line:274] - INFO: epoch 008: 5698 / 6042 loss=2.429, loss_v1=0, loss_v2=0, nll_loss=1.174, ntokens=7779.2, nsentences=120, sample_size=3754.1, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1937.7, ups=0.25, wpb=7779.2, bsz=120, num_updates=47910, lr=6.60798e-06, gnorm=1.055, clip=60, loss_scale=64, train_wall=40, gb_free=29.7, wall=196185 2023-05-03 09:03:33 - progress_bar.py[line:274] - INFO: epoch 008: 5708 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7624.6, nsentences=120, sample_size=4085.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1911.1, ups=0.25, wpb=7624.6, bsz=120, num_updates=47920, lr=6.60269e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=31.4, wall=196225 2023-05-03 09:04:13 - progress_bar.py[line:274] - INFO: epoch 008: 5718 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7836.8, nsentences=120, sample_size=4312.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1961.3, ups=0.25, wpb=7836.8, bsz=120, num_updates=47930, lr=6.59741e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=196265 2023-05-03 09:04:52 - progress_bar.py[line:274] - INFO: epoch 008: 5728 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=8012, nsentences=120, sample_size=3858.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2035.5, ups=0.25, wpb=8012, bsz=120, num_updates=47940, lr=6.59213e-06, gnorm=1.004, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=196305 2023-05-03 09:05:32 - progress_bar.py[line:274] - INFO: epoch 008: 5738 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7764.2, nsentences=120, sample_size=3998, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1966.3, ups=0.25, wpb=7764.2, bsz=120, num_updates=47950, lr=6.58685e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=25.2, wall=196344 2023-05-03 09:06:11 - progress_bar.py[line:274] - INFO: epoch 008: 5748 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7871.1, nsentences=120, sample_size=3996.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1975.3, ups=0.25, wpb=7871.1, bsz=120, num_updates=47960, lr=6.58157e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=196384 2023-05-03 09:06:51 - progress_bar.py[line:274] - INFO: epoch 008: 5758 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7622.4, nsentences=120, sample_size=3914.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1936.2, ups=0.25, wpb=7622.4, bsz=120, num_updates=47970, lr=6.57628e-06, gnorm=1, clip=60, loss_scale=64, train_wall=39, gb_free=29.4, wall=196423 2023-05-03 09:07:31 - progress_bar.py[line:274] - INFO: epoch 008: 5768 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7665.9, nsentences=120, sample_size=3829, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.7, ups=0.25, wpb=7665.9, bsz=120, num_updates=47980, lr=6.571e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=196463 2023-05-03 09:08:10 - progress_bar.py[line:274] - INFO: epoch 008: 5778 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.055, ntokens=7946.4, nsentences=120, sample_size=3643.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1998.7, ups=0.25, wpb=7946.4, bsz=120, num_updates=47990, lr=6.56572e-06, gnorm=1.029, clip=80, loss_scale=64, train_wall=40, gb_free=29.7, wall=196503 2023-05-03 09:08:50 - progress_bar.py[line:274] - INFO: epoch 008: 5788 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7651.4, nsentences=120, sample_size=4114, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1923.9, ups=0.25, wpb=7651.4, bsz=120, num_updates=48000, lr=6.56044e-06, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=196543 2023-05-03 09:08:50 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 09:08:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:08:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:08:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:08:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:08:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:09:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:09:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:21 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:09:21 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:09:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:32 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:09:32 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:09:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:36 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 09:09:36 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:09:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:41 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:09:41 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:09:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:09:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:09:41 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.251 | loss_v1 0 | loss_v2 0 | nll_loss 2.085 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.24 | score 0.7593 | wps 3304.7 | wpb 3202.1 | bsz 39.4 | num_updates 48000 | best_score 0.7627 2023-05-03 09:09:41 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 48000 updates 2023-05-03 09:09:41 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_48000.pt 2023-05-03 09:10:06 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_48000.pt 2023-05-03 09:10:20 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_8_48000.pt (epoch 8 @ 48000 updates, score 0.7593) (writing took 38.58748286799528 seconds) 2023-05-03 09:10:59 - progress_bar.py[line:274] - INFO: epoch 008: 5798 / 6042 loss=2.425, loss_v1=0, loss_v2=0, nll_loss=1.172, ntokens=7636.6, nsentences=120, sample_size=4164.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=593, ups=0.08, wpb=7636.6, bsz=120, num_updates=48010, lr=6.55515e-06, gnorm=0.942, clip=20, loss_scale=64, train_wall=39, gb_free=30.7, wall=196671 2023-05-03 09:11:38 - progress_bar.py[line:274] - INFO: epoch 008: 5808 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7733.8, nsentences=120, sample_size=4079.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1969.1, ups=0.25, wpb=7733.8, bsz=120, num_updates=48020, lr=6.54987e-06, gnorm=0.988, clip=50, loss_scale=64, train_wall=39, gb_free=30.6, wall=196711 2023-05-03 09:12:18 - progress_bar.py[line:274] - INFO: epoch 008: 5818 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7599.4, nsentences=120, sample_size=4277.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1890.9, ups=0.25, wpb=7599.4, bsz=120, num_updates=48030, lr=6.54459e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=28.9, wall=196751 2023-05-03 09:12:58 - progress_bar.py[line:274] - INFO: epoch 008: 5828 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7864.5, nsentences=120, sample_size=4268.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1968.7, ups=0.25, wpb=7864.5, bsz=120, num_updates=48040, lr=6.53931e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=196791 2023-05-03 09:13:39 - progress_bar.py[line:274] - INFO: epoch 008: 5838 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7720.2, nsentences=120, sample_size=3949.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1921.4, ups=0.25, wpb=7720.2, bsz=120, num_updates=48050, lr=6.53403e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=196831 2023-05-03 09:14:18 - progress_bar.py[line:274] - INFO: epoch 008: 5848 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7822.1, nsentences=120, sample_size=4207.6, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1983.5, ups=0.25, wpb=7822.1, bsz=120, num_updates=48060, lr=6.52874e-06, gnorm=0.944, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=196871 2023-05-03 09:14:58 - progress_bar.py[line:274] - INFO: epoch 008: 5858 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7735.7, nsentences=120, sample_size=4054.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1952.4, ups=0.25, wpb=7735.7, bsz=120, num_updates=48070, lr=6.52346e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=26.7, wall=196910 2023-05-03 09:15:37 - progress_bar.py[line:274] - INFO: epoch 008: 5868 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.168, ntokens=7937.6, nsentences=120, sample_size=4050.3, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2009.5, ups=0.25, wpb=7937.6, bsz=120, num_updates=48080, lr=6.51818e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=196950 2023-05-03 09:16:17 - progress_bar.py[line:274] - INFO: epoch 008: 5878 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7697.9, nsentences=120, sample_size=4003.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1909.3, ups=0.25, wpb=7697.9, bsz=120, num_updates=48090, lr=6.5129e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=196990 2023-05-03 09:16:57 - progress_bar.py[line:274] - INFO: epoch 008: 5888 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7654.7, nsentences=120, sample_size=3947.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1953.7, ups=0.26, wpb=7654.7, bsz=120, num_updates=48100, lr=6.50762e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=39, gb_free=24.8, wall=197029 2023-05-03 09:17:36 - progress_bar.py[line:274] - INFO: epoch 008: 5898 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7671.8, nsentences=120, sample_size=3940.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1942.5, ups=0.25, wpb=7671.8, bsz=120, num_updates=48110, lr=6.50233e-06, gnorm=0.987, clip=30, loss_scale=64, train_wall=39, gb_free=28.9, wall=197069 2023-05-03 09:18:16 - progress_bar.py[line:274] - INFO: epoch 008: 5908 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7484.2, nsentences=120, sample_size=3900.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1886.5, ups=0.25, wpb=7484.2, bsz=120, num_updates=48120, lr=6.49705e-06, gnorm=1.007, clip=50, loss_scale=64, train_wall=40, gb_free=27.8, wall=197108 2023-05-03 09:18:57 - progress_bar.py[line:274] - INFO: epoch 008: 5918 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=8012.8, nsentences=120, sample_size=3501.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1961.8, ups=0.24, wpb=8012.8, bsz=120, num_updates=48130, lr=6.49177e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=41, gb_free=30.2, wall=197149 2023-05-03 09:19:37 - progress_bar.py[line:274] - INFO: epoch 008: 5928 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7994.3, nsentences=120, sample_size=3785, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1991.4, ups=0.25, wpb=7994.3, bsz=120, num_updates=48140, lr=6.48649e-06, gnorm=0.966, clip=40, loss_scale=64, train_wall=40, gb_free=31.3, wall=197189 2023-05-03 09:20:17 - progress_bar.py[line:274] - INFO: epoch 008: 5938 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7540.2, nsentences=120, sample_size=4025.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1895.6, ups=0.25, wpb=7540.2, bsz=120, num_updates=48150, lr=6.4812e-06, gnorm=0.986, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=197229 2023-05-03 09:20:56 - progress_bar.py[line:274] - INFO: epoch 008: 5948 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7733.9, nsentences=120, sample_size=4084.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1957.9, ups=0.25, wpb=7733.9, bsz=120, num_updates=48160, lr=6.47592e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=197269 2023-05-03 09:21:36 - progress_bar.py[line:274] - INFO: epoch 008: 5958 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7716.3, nsentences=120, sample_size=3999.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1953, ups=0.25, wpb=7716.3, bsz=120, num_updates=48170, lr=6.47064e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=31.4, wall=197308 2023-05-03 09:22:15 - progress_bar.py[line:274] - INFO: epoch 008: 5968 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7708.3, nsentences=120, sample_size=3936.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1978.5, ups=0.26, wpb=7708.3, bsz=120, num_updates=48180, lr=6.46536e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=197347 2023-05-03 09:22:54 - progress_bar.py[line:274] - INFO: epoch 008: 5978 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7783.7, nsentences=120, sample_size=3841.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1966.2, ups=0.25, wpb=7783.7, bsz=120, num_updates=48190, lr=6.46008e-06, gnorm=0.973, clip=40, loss_scale=64, train_wall=40, gb_free=29.2, wall=197387 2023-05-03 09:23:34 - progress_bar.py[line:274] - INFO: epoch 008: 5988 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7566.4, nsentences=120, sample_size=3891.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1910.4, ups=0.25, wpb=7566.4, bsz=120, num_updates=48200, lr=6.45479e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=40, gb_free=30.8, wall=197426 2023-05-03 09:24:14 - progress_bar.py[line:274] - INFO: epoch 008: 5998 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7673.7, nsentences=120, sample_size=3851.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1895.5, ups=0.25, wpb=7673.7, bsz=120, num_updates=48210, lr=6.44951e-06, gnorm=0.981, clip=50, loss_scale=128, train_wall=40, gb_free=29.8, wall=197467 2023-05-03 09:24:54 - progress_bar.py[line:274] - INFO: epoch 008: 6008 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7588.6, nsentences=120, sample_size=4049.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1896.5, ups=0.25, wpb=7588.6, bsz=120, num_updates=48220, lr=6.44423e-06, gnorm=0.992, clip=60, loss_scale=128, train_wall=40, gb_free=28.7, wall=197507 2023-05-03 09:25:34 - progress_bar.py[line:274] - INFO: epoch 008: 6018 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7916.5, nsentences=120, sample_size=4359.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2017.8, ups=0.25, wpb=7916.5, bsz=120, num_updates=48230, lr=6.43895e-06, gnorm=0.954, clip=30, loss_scale=128, train_wall=39, gb_free=29.2, wall=197546 2023-05-03 09:26:14 - progress_bar.py[line:274] - INFO: epoch 008: 6028 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7591.5, nsentences=120, sample_size=4143.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1864.9, ups=0.25, wpb=7591.5, bsz=120, num_updates=48240, lr=6.43366e-06, gnorm=0.964, clip=20, loss_scale=128, train_wall=41, gb_free=29.3, wall=197587 2023-05-03 09:26:54 - progress_bar.py[line:274] - INFO: epoch 008: 6038 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7874.5, nsentences=120, sample_size=3897.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1993.5, ups=0.25, wpb=7874.5, bsz=120, num_updates=48250, lr=6.42838e-06, gnorm=1.015, clip=70, loss_scale=128, train_wall=39, gb_free=29.9, wall=197626 2023-05-03 09:27:08 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 09:27:09 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:27:09 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:26 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:27:26 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:27:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:27:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:54 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 09:27:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:58 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 09:27:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 09:27:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 09:27:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 09:27:59 - progress_bar.py[line:282] - INFO: epoch 008 | valid on 'valid' subset | loss 3.243 | loss_v1 0 | loss_v2 0 | nll_loss 2.076 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.22 | score 0.7539 | wps 3300.7 | wpb 3202.1 | bsz 39.4 | num_updates 48254 | best_score 0.7627 2023-05-03 09:27:59 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 8 @ 48254 updates 2023-05-03 09:27:59 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 09:28:26 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 09:28:26 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 8 @ 48254 updates, score 0.7539) (writing took 27.364202245138586 seconds) 2023-05-03 09:28:26 - train.py[line:332] - INFO: end of epoch 8 (average epoch stats below) 2023-05-03 09:28:26 - progress_bar.py[line:282] - INFO: epoch 008 | loss 2.372 | loss_v1 0 | loss_v2 0 | nll_loss 1.115 | ntokens 7724.78 | nsentences 119.992 | sample_size 4029.41 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.17 | wps 1888.1 | ups 0.24 | wpb 7724.8 | bsz 120 | num_updates 48254 | lr 6.42627e-06 | gnorm 0.982 | clip 37.6 | loss_scale 128 | train_wall 24006 | gb_free 31.3 | wall 197719 2023-05-03 09:28:26 - trainer.py[line:639] - INFO: loading train data for epoch 9 2023-05-03 09:28:26 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-03 09:28:26 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-03 09:28:28 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-03 09:28:30 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-03 09:28:30 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-03 09:28:31 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-03 09:28:31 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-03 09:28:31 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-03 09:28:32 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-03 09:28:32 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-03 09:28:33 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-03 09:28:33 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-03 09:28:33 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-03 09:28:33 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-03 09:28:34 - trainer.py[line:703] - INFO: begin training epoch 9 2023-05-03 09:28:34 - train.py[line:305] - INFO: Start iterating over samples 2023-05-03 09:28:57 - progress_bar.py[line:274] - INFO: epoch 009: 6 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7139.4, nsentences=116, sample_size=3743.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=578.1, ups=0.08, wpb=7139.4, bsz=116, num_updates=48260, lr=6.4231e-06, gnorm=1.038, clip=40, loss_scale=128, train_wall=37, gb_free=30.8, wall=197750 2023-05-03 09:29:13 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 09:29:40 - progress_bar.py[line:274] - INFO: epoch 009: 17 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7678.4, nsentences=120, sample_size=3833.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1776.5, ups=0.23, wpb=7678.4, bsz=120, num_updates=48270, lr=6.41782e-06, gnorm=1.004, clip=30, loss_scale=64, train_wall=43, gb_free=29.1, wall=197793 2023-05-03 09:30:20 - progress_bar.py[line:274] - INFO: epoch 009: 27 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7640.2, nsentences=120, sample_size=3841.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1918, ups=0.25, wpb=7640.2, bsz=120, num_updates=48280, lr=6.41254e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=197833 2023-05-03 09:31:00 - progress_bar.py[line:274] - INFO: epoch 009: 37 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7779, nsentences=120, sample_size=3774.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1948.2, ups=0.25, wpb=7779, bsz=120, num_updates=48290, lr=6.40725e-06, gnorm=1.011, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=197873 2023-05-03 09:31:39 - progress_bar.py[line:274] - INFO: epoch 009: 47 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7622.6, nsentences=120, sample_size=4152.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1948.9, ups=0.26, wpb=7622.6, bsz=120, num_updates=48300, lr=6.40197e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=39, gb_free=27.1, wall=197912 2023-05-03 09:32:19 - progress_bar.py[line:274] - INFO: epoch 009: 57 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7552.6, nsentences=120, sample_size=3881.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1918.4, ups=0.25, wpb=7552.6, bsz=120, num_updates=48310, lr=6.39669e-06, gnorm=1.007, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=197951 2023-05-03 09:32:57 - progress_bar.py[line:274] - INFO: epoch 009: 67 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7819.7, nsentences=120, sample_size=3888.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2023.2, ups=0.26, wpb=7819.7, bsz=120, num_updates=48320, lr=6.39141e-06, gnorm=1.014, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=197990 2023-05-03 09:33:37 - progress_bar.py[line:274] - INFO: epoch 009: 77 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7859.9, nsentences=120, sample_size=3841.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1986.4, ups=0.25, wpb=7859.9, bsz=120, num_updates=48330, lr=6.38613e-06, gnorm=1, clip=60, loss_scale=64, train_wall=39, gb_free=29.5, wall=198029 2023-05-03 09:34:17 - progress_bar.py[line:274] - INFO: epoch 009: 87 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7749.8, nsentences=120, sample_size=3966.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1930.1, ups=0.25, wpb=7749.8, bsz=120, num_updates=48340, lr=6.38084e-06, gnorm=1.02, clip=50, loss_scale=64, train_wall=40, gb_free=28.8, wall=198070 2023-05-03 09:34:56 - progress_bar.py[line:274] - INFO: epoch 009: 97 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7982.3, nsentences=120, sample_size=3914.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2026.5, ups=0.25, wpb=7982.3, bsz=120, num_updates=48350, lr=6.37556e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=198109 2023-05-03 09:35:36 - progress_bar.py[line:274] - INFO: epoch 009: 107 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7586.7, nsentences=120, sample_size=4056.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1937.1, ups=0.26, wpb=7586.7, bsz=120, num_updates=48360, lr=6.37028e-06, gnorm=1.006, clip=60, loss_scale=64, train_wall=39, gb_free=29.8, wall=198148 2023-05-03 09:36:15 - progress_bar.py[line:274] - INFO: epoch 009: 117 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7657.6, nsentences=120, sample_size=3918, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1939.7, ups=0.25, wpb=7657.6, bsz=120, num_updates=48370, lr=6.365e-06, gnorm=1.014, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=198188 2023-05-03 09:36:55 - progress_bar.py[line:274] - INFO: epoch 009: 127 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7996.5, nsentences=120, sample_size=3905.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1997.2, ups=0.25, wpb=7996.5, bsz=120, num_updates=48380, lr=6.35971e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=31.1, wall=198228 2023-05-03 09:37:36 - progress_bar.py[line:274] - INFO: epoch 009: 137 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7574.2, nsentences=120, sample_size=4206.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1867.5, ups=0.25, wpb=7574.2, bsz=120, num_updates=48390, lr=6.35443e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=25.9, wall=198268 2023-05-03 09:38:15 - progress_bar.py[line:274] - INFO: epoch 009: 147 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7896, nsentences=120, sample_size=3939.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2013.7, ups=0.26, wpb=7896, bsz=120, num_updates=48400, lr=6.34915e-06, gnorm=0.98, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=198307 2023-05-03 09:38:55 - progress_bar.py[line:274] - INFO: epoch 009: 157 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7594.9, nsentences=120, sample_size=4312.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1886.9, ups=0.25, wpb=7594.9, bsz=120, num_updates=48410, lr=6.34387e-06, gnorm=0.937, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=198348 2023-05-03 09:39:35 - progress_bar.py[line:274] - INFO: epoch 009: 167 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7716.3, nsentences=120, sample_size=4228.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1940.5, ups=0.25, wpb=7716.3, bsz=120, num_updates=48420, lr=6.33859e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=198387 2023-05-03 09:40:15 - progress_bar.py[line:274] - INFO: epoch 009: 177 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7449.6, nsentences=120, sample_size=4273.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1883.7, ups=0.25, wpb=7449.6, bsz=120, num_updates=48430, lr=6.3333e-06, gnorm=0.978, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=198427 2023-05-03 09:40:54 - progress_bar.py[line:274] - INFO: epoch 009: 187 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7923, nsentences=120, sample_size=3660.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2018.4, ups=0.25, wpb=7923, bsz=120, num_updates=48440, lr=6.32802e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=39, gb_free=31.1, wall=198466 2023-05-03 09:41:33 - progress_bar.py[line:274] - INFO: epoch 009: 197 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7559.6, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1910.7, ups=0.25, wpb=7559.6, bsz=120, num_updates=48450, lr=6.32274e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=198506 2023-05-03 09:42:13 - progress_bar.py[line:274] - INFO: epoch 009: 207 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7822.8, nsentences=120, sample_size=3950.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1974.5, ups=0.25, wpb=7822.8, bsz=120, num_updates=48460, lr=6.31746e-06, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=198545 2023-05-03 09:42:53 - progress_bar.py[line:274] - INFO: epoch 009: 217 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7696.2, nsentences=120, sample_size=3865, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1926.6, ups=0.25, wpb=7696.2, bsz=120, num_updates=48470, lr=6.31218e-06, gnorm=1.017, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=198585 2023-05-03 09:43:33 - progress_bar.py[line:274] - INFO: epoch 009: 227 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7792.9, nsentences=120, sample_size=3885.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1948.4, ups=0.25, wpb=7792.9, bsz=120, num_updates=48480, lr=6.30689e-06, gnorm=0.99, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=198625 2023-05-03 09:44:13 - progress_bar.py[line:274] - INFO: epoch 009: 237 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.154, ntokens=7963.8, nsentences=120, sample_size=3905.3, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1972.1, ups=0.25, wpb=7963.8, bsz=120, num_updates=48490, lr=6.30161e-06, gnorm=1.018, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=198666 2023-05-03 09:44:53 - progress_bar.py[line:274] - INFO: epoch 009: 247 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7896.1, nsentences=120, sample_size=3887.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1977.7, ups=0.25, wpb=7896.1, bsz=120, num_updates=48500, lr=6.29633e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=198706 2023-05-03 09:45:33 - progress_bar.py[line:274] - INFO: epoch 009: 257 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7998.9, nsentences=120, sample_size=4237.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1989.7, ups=0.25, wpb=7998.9, bsz=120, num_updates=48510, lr=6.29105e-06, gnorm=0.935, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=198746 2023-05-03 09:46:13 - progress_bar.py[line:274] - INFO: epoch 009: 267 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7621.2, nsentences=120, sample_size=4189.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1911.1, ups=0.25, wpb=7621.2, bsz=120, num_updates=48520, lr=6.28576e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=198786 2023-05-03 09:46:54 - progress_bar.py[line:274] - INFO: epoch 009: 277 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7847.6, nsentences=120, sample_size=4213.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1938.3, ups=0.25, wpb=7847.6, bsz=120, num_updates=48530, lr=6.28048e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=198826 2023-05-03 09:47:33 - progress_bar.py[line:274] - INFO: epoch 009: 287 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7309, nsentences=120, sample_size=4387.8, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1862.6, ups=0.25, wpb=7309, bsz=120, num_updates=48540, lr=6.2752e-06, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=198865 2023-05-03 09:48:13 - progress_bar.py[line:274] - INFO: epoch 009: 297 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7682.9, nsentences=120, sample_size=4264.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1936.1, ups=0.25, wpb=7682.9, bsz=120, num_updates=48550, lr=6.26992e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=198905 2023-05-03 09:48:52 - progress_bar.py[line:274] - INFO: epoch 009: 307 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7439.1, nsentences=120, sample_size=4050.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1899.1, ups=0.26, wpb=7439.1, bsz=120, num_updates=48560, lr=6.26464e-06, gnorm=1.025, clip=50, loss_scale=64, train_wall=39, gb_free=30.9, wall=198944 2023-05-03 09:49:32 - progress_bar.py[line:274] - INFO: epoch 009: 317 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7688.3, nsentences=120, sample_size=3886.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1931, ups=0.25, wpb=7688.3, bsz=120, num_updates=48570, lr=6.25935e-06, gnorm=1.012, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=198984 2023-05-03 09:50:12 - progress_bar.py[line:274] - INFO: epoch 009: 327 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7670.6, nsentences=120, sample_size=4326, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1924.9, ups=0.25, wpb=7670.6, bsz=120, num_updates=48580, lr=6.25407e-06, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=199024 2023-05-03 09:50:51 - progress_bar.py[line:274] - INFO: epoch 009: 337 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7524.8, nsentences=120, sample_size=4410.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1888.1, ups=0.25, wpb=7524.8, bsz=120, num_updates=48590, lr=6.24879e-06, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=199064 2023-05-03 09:51:31 - progress_bar.py[line:274] - INFO: epoch 009: 347 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7577.2, nsentences=120, sample_size=4055.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1901.5, ups=0.25, wpb=7577.2, bsz=120, num_updates=48600, lr=6.24351e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=199104 2023-05-03 09:52:11 - progress_bar.py[line:274] - INFO: epoch 009: 357 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7463.8, nsentences=120, sample_size=3973.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1896.7, ups=0.25, wpb=7463.8, bsz=120, num_updates=48610, lr=6.23823e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=39, gb_free=31, wall=199143 2023-05-03 09:52:50 - progress_bar.py[line:274] - INFO: epoch 009: 367 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7734.3, nsentences=120, sample_size=4230.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1941.2, ups=0.25, wpb=7734.3, bsz=120, num_updates=48620, lr=6.23294e-06, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=199183 2023-05-03 09:53:30 - progress_bar.py[line:274] - INFO: epoch 009: 377 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7665, nsentences=120, sample_size=4174.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1922.4, ups=0.25, wpb=7665, bsz=120, num_updates=48630, lr=6.22766e-06, gnorm=0.945, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=199223 2023-05-03 09:54:10 - progress_bar.py[line:274] - INFO: epoch 009: 387 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7737, nsentences=120, sample_size=4128.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1940.9, ups=0.25, wpb=7737, bsz=120, num_updates=48640, lr=6.22238e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=31.4, wall=199263 2023-05-03 09:54:50 - progress_bar.py[line:274] - INFO: epoch 009: 397 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7716.6, nsentences=120, sample_size=3870.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1937.9, ups=0.25, wpb=7716.6, bsz=120, num_updates=48650, lr=6.2171e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=199302 2023-05-03 09:55:30 - progress_bar.py[line:274] - INFO: epoch 009: 407 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7803, nsentences=120, sample_size=3823.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1971, ups=0.25, wpb=7803, bsz=120, num_updates=48660, lr=6.21181e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=199342 2023-05-03 09:56:09 - progress_bar.py[line:274] - INFO: epoch 009: 417 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7723.2, nsentences=120, sample_size=3999.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1965.8, ups=0.25, wpb=7723.2, bsz=120, num_updates=48670, lr=6.20653e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=199381 2023-05-03 09:56:49 - progress_bar.py[line:274] - INFO: epoch 009: 427 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7780, nsentences=120, sample_size=3706.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923, ups=0.25, wpb=7780, bsz=120, num_updates=48680, lr=6.20125e-06, gnorm=1.004, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=199422 2023-05-03 09:57:29 - progress_bar.py[line:274] - INFO: epoch 009: 437 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7866.7, nsentences=120, sample_size=3766.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1977.4, ups=0.25, wpb=7866.7, bsz=120, num_updates=48690, lr=6.19597e-06, gnorm=1.016, clip=70, loss_scale=64, train_wall=40, gb_free=30, wall=199462 2023-05-03 09:58:09 - progress_bar.py[line:274] - INFO: epoch 009: 447 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7749.6, nsentences=120, sample_size=3953.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.1, ups=0.25, wpb=7749.6, bsz=120, num_updates=48700, lr=6.19069e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=199501 2023-05-03 09:58:48 - progress_bar.py[line:274] - INFO: epoch 009: 457 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7560.2, nsentences=120, sample_size=4136.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1896.5, ups=0.25, wpb=7560.2, bsz=120, num_updates=48710, lr=6.1854e-06, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=199541 2023-05-03 09:59:28 - progress_bar.py[line:274] - INFO: epoch 009: 467 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7719.8, nsentences=120, sample_size=4148.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1946, ups=0.25, wpb=7719.8, bsz=120, num_updates=48720, lr=6.18012e-06, gnorm=0.995, clip=60, loss_scale=64, train_wall=40, gb_free=25, wall=199581 2023-05-03 10:00:08 - progress_bar.py[line:274] - INFO: epoch 009: 477 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7889.6, nsentences=120, sample_size=3925.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1975.5, ups=0.25, wpb=7889.6, bsz=120, num_updates=48730, lr=6.17484e-06, gnorm=1.021, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=199620 2023-05-03 10:00:48 - progress_bar.py[line:274] - INFO: epoch 009: 487 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7483.5, nsentences=120, sample_size=4237.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1873, ups=0.25, wpb=7483.5, bsz=120, num_updates=48740, lr=6.16956e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=199660 2023-05-03 10:01:27 - progress_bar.py[line:274] - INFO: epoch 009: 497 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7457.4, nsentences=120, sample_size=4369.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1902.1, ups=0.26, wpb=7457.4, bsz=120, num_updates=48750, lr=6.16428e-06, gnorm=0.948, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=199700 2023-05-03 10:02:06 - progress_bar.py[line:274] - INFO: epoch 009: 507 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7774.1, nsentences=120, sample_size=4185, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1995.8, ups=0.26, wpb=7774.1, bsz=120, num_updates=48760, lr=6.15899e-06, gnorm=0.988, clip=50, loss_scale=64, train_wall=39, gb_free=28.4, wall=199739 2023-05-03 10:02:46 - progress_bar.py[line:274] - INFO: epoch 009: 517 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8041, nsentences=120, sample_size=3650, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2017.3, ups=0.25, wpb=8041, bsz=120, num_updates=48770, lr=6.15371e-06, gnorm=1.016, clip=60, loss_scale=64, train_wall=40, gb_free=29.2, wall=199778 2023-05-03 10:03:26 - progress_bar.py[line:274] - INFO: epoch 009: 527 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7976.3, nsentences=120, sample_size=4424.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1999.4, ups=0.25, wpb=7976.3, bsz=120, num_updates=48780, lr=6.14843e-06, gnorm=0.95, clip=20, loss_scale=128, train_wall=40, gb_free=28.5, wall=199818 2023-05-03 10:04:06 - progress_bar.py[line:274] - INFO: epoch 009: 537 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7754.9, nsentences=120, sample_size=4377.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1955.2, ups=0.25, wpb=7754.9, bsz=120, num_updates=48790, lr=6.14315e-06, gnorm=0.962, clip=30, loss_scale=128, train_wall=40, gb_free=31.2, wall=199858 2023-05-03 10:04:45 - progress_bar.py[line:274] - INFO: epoch 009: 547 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7749.7, nsentences=120, sample_size=3931.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1943.8, ups=0.25, wpb=7749.7, bsz=120, num_updates=48800, lr=6.13786e-06, gnorm=0.977, clip=40, loss_scale=128, train_wall=40, gb_free=28.1, wall=199898 2023-05-03 10:05:25 - progress_bar.py[line:274] - INFO: epoch 009: 557 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7824.6, nsentences=120, sample_size=4307.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1966.9, ups=0.25, wpb=7824.6, bsz=120, num_updates=48810, lr=6.13258e-06, gnorm=0.97, clip=30, loss_scale=128, train_wall=40, gb_free=28.8, wall=199938 2023-05-03 10:06:06 - progress_bar.py[line:274] - INFO: epoch 009: 567 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7924.5, nsentences=120, sample_size=3835.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1961.4, ups=0.25, wpb=7924.5, bsz=120, num_updates=48820, lr=6.1273e-06, gnorm=1.017, clip=60, loss_scale=128, train_wall=40, gb_free=31.1, wall=199978 2023-05-03 10:06:09 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 10:06:49 - progress_bar.py[line:274] - INFO: epoch 009: 578 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7354, nsentences=120, sample_size=4015.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1688.5, ups=0.23, wpb=7354, bsz=120, num_updates=48830, lr=6.12202e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=43, gb_free=30.5, wall=200022 2023-05-03 10:07:28 - progress_bar.py[line:274] - INFO: epoch 009: 588 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7676.6, nsentences=120, sample_size=4196, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1962.8, ups=0.26, wpb=7676.6, bsz=120, num_updates=48840, lr=6.11674e-06, gnorm=0.959, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=200061 2023-05-03 10:08:08 - progress_bar.py[line:274] - INFO: epoch 009: 598 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=8215.5, nsentences=120, sample_size=4378, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2050.1, ups=0.25, wpb=8215.5, bsz=120, num_updates=48850, lr=6.11145e-06, gnorm=0.918, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=200101 2023-05-03 10:08:48 - progress_bar.py[line:274] - INFO: epoch 009: 608 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7829.1, nsentences=120, sample_size=4238.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1959.9, ups=0.25, wpb=7829.1, bsz=120, num_updates=48860, lr=6.10617e-06, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=28.3, wall=200141 2023-05-03 10:09:28 - progress_bar.py[line:274] - INFO: epoch 009: 618 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7557.5, nsentences=120, sample_size=4063.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1906.8, ups=0.25, wpb=7557.5, bsz=120, num_updates=48870, lr=6.10089e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=27.5, wall=200180 2023-05-03 10:10:09 - progress_bar.py[line:274] - INFO: epoch 009: 628 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7788.5, nsentences=120, sample_size=3901.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1913.4, ups=0.25, wpb=7788.5, bsz=120, num_updates=48880, lr=6.09561e-06, gnorm=0.996, clip=40, loss_scale=64, train_wall=41, gb_free=30.7, wall=200221 2023-05-03 10:10:48 - progress_bar.py[line:274] - INFO: epoch 009: 638 / 6042 loss=2.307, loss_v1=0, loss_v2=0, nll_loss=1.04, ntokens=7528.9, nsentences=120, sample_size=3932.6, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1904.8, ups=0.25, wpb=7528.9, bsz=120, num_updates=48890, lr=6.09032e-06, gnorm=1.009, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=200261 2023-05-03 10:11:28 - progress_bar.py[line:274] - INFO: epoch 009: 648 / 6042 loss=2.406, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7777.3, nsentences=120, sample_size=4081.7, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1964.4, ups=0.25, wpb=7777.3, bsz=120, num_updates=48900, lr=6.08504e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=200300 2023-05-03 10:12:08 - progress_bar.py[line:274] - INFO: epoch 009: 658 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7555, nsentences=120, sample_size=3909, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1889.4, ups=0.25, wpb=7555, bsz=120, num_updates=48910, lr=6.07976e-06, gnorm=1.024, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=200340 2023-05-03 10:12:47 - progress_bar.py[line:274] - INFO: epoch 009: 668 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7882.1, nsentences=120, sample_size=3980.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2030.8, ups=0.26, wpb=7882.1, bsz=120, num_updates=48920, lr=6.07448e-06, gnorm=0.951, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=200379 2023-05-03 10:13:27 - progress_bar.py[line:274] - INFO: epoch 009: 678 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7280.6, nsentences=120, sample_size=4160.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1818.5, ups=0.25, wpb=7280.6, bsz=120, num_updates=48930, lr=6.0692e-06, gnorm=0.972, clip=50, loss_scale=64, train_wall=40, gb_free=27.7, wall=200419 2023-05-03 10:14:06 - progress_bar.py[line:274] - INFO: epoch 009: 688 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7651.3, nsentences=120, sample_size=4013.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1930.4, ups=0.25, wpb=7651.3, bsz=120, num_updates=48940, lr=6.06391e-06, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=200459 2023-05-03 10:14:47 - progress_bar.py[line:274] - INFO: epoch 009: 698 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7632.3, nsentences=120, sample_size=3938.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1860.9, ups=0.24, wpb=7632.3, bsz=120, num_updates=48950, lr=6.05863e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=41, gb_free=28.1, wall=200500 2023-05-03 10:15:27 - progress_bar.py[line:274] - INFO: epoch 009: 708 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7902.1, nsentences=120, sample_size=4212.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2011.6, ups=0.25, wpb=7902.1, bsz=120, num_updates=48960, lr=6.05335e-06, gnorm=0.956, clip=40, loss_scale=64, train_wall=39, gb_free=25.2, wall=200539 2023-05-03 10:16:07 - progress_bar.py[line:274] - INFO: epoch 009: 718 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7630, nsentences=120, sample_size=3981.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1893.4, ups=0.25, wpb=7630, bsz=120, num_updates=48970, lr=6.04807e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=200579 2023-05-03 10:16:47 - progress_bar.py[line:274] - INFO: epoch 009: 728 / 6042 loss=2.321, loss_v1=0, loss_v2=0, nll_loss=1.059, ntokens=7629, nsentences=120, sample_size=3788, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1923, ups=0.25, wpb=7629, bsz=120, num_updates=48980, lr=6.04279e-06, gnorm=1.013, clip=40, loss_scale=64, train_wall=40, gb_free=27.3, wall=200619 2023-05-03 10:17:27 - progress_bar.py[line:274] - INFO: epoch 009: 738 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7703.1, nsentences=120, sample_size=3881.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1917.8, ups=0.25, wpb=7703.1, bsz=120, num_updates=48990, lr=6.0375e-06, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=200659 2023-05-03 10:18:08 - progress_bar.py[line:274] - INFO: epoch 009: 748 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7751.6, nsentences=120, sample_size=3996, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1886.8, ups=0.24, wpb=7751.6, bsz=120, num_updates=49000, lr=6.03222e-06, gnorm=0.948, clip=30, loss_scale=64, train_wall=41, gb_free=29.9, wall=200700 2023-05-03 10:18:08 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 10:18:10 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 10:18:10 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 10:18:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:39 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 10:18:39 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 10:18:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:54 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 10:18:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:59 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 10:18:59 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 10:18:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 10:18:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 10:18:59 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.258 | loss_v1 0 | loss_v2 0 | nll_loss 2.092 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.26 | score 0.7529 | wps 3306.1 | wpb 3202.1 | bsz 39.4 | num_updates 49000 | best_score 0.7627 2023-05-03 10:18:59 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 49000 updates 2023-05-03 10:18:59 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_49000.pt 2023-05-03 10:19:25 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_49000.pt 2023-05-03 10:19:39 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_49000.pt (epoch 9 @ 49000 updates, score 0.7529) (writing took 39.373980097007006 seconds) 2023-05-03 10:20:18 - progress_bar.py[line:274] - INFO: epoch 009: 758 / 6042 loss=2.401, loss_v1=0, loss_v2=0, nll_loss=1.149, ntokens=7720.7, nsentences=120, sample_size=4209.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=592.1, ups=0.08, wpb=7720.7, bsz=120, num_updates=49010, lr=6.02694e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=39, gb_free=26.4, wall=200831 2023-05-03 10:20:58 - progress_bar.py[line:274] - INFO: epoch 009: 768 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7567, nsentences=120, sample_size=4083.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1885.4, ups=0.25, wpb=7567, bsz=120, num_updates=49020, lr=6.02166e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=200871 2023-05-03 10:21:37 - progress_bar.py[line:274] - INFO: epoch 009: 778 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7457.2, nsentences=120, sample_size=3946.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1911.2, ups=0.26, wpb=7457.2, bsz=120, num_updates=49030, lr=6.01637e-06, gnorm=0.997, clip=60, loss_scale=64, train_wall=39, gb_free=27.3, wall=200910 2023-05-03 10:22:18 - progress_bar.py[line:274] - INFO: epoch 009: 788 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7797.4, nsentences=120, sample_size=3910.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1932.5, ups=0.25, wpb=7797.4, bsz=120, num_updates=49040, lr=6.01109e-06, gnorm=1.003, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=200950 2023-05-03 10:22:57 - progress_bar.py[line:274] - INFO: epoch 009: 798 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7521.3, nsentences=120, sample_size=3922.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1890.4, ups=0.25, wpb=7521.3, bsz=120, num_updates=49050, lr=6.00581e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=200990 2023-05-03 10:23:37 - progress_bar.py[line:274] - INFO: epoch 009: 808 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7524.3, nsentences=120, sample_size=3888.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1914.3, ups=0.25, wpb=7524.3, bsz=120, num_updates=49060, lr=6.00053e-06, gnorm=1.007, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=201029 2023-05-03 10:24:17 - progress_bar.py[line:274] - INFO: epoch 009: 818 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7560.2, nsentences=120, sample_size=3919.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1860.2, ups=0.25, wpb=7560.2, bsz=120, num_updates=49070, lr=5.99525e-06, gnorm=1.01, clip=40, loss_scale=64, train_wall=41, gb_free=30.1, wall=201070 2023-05-03 10:24:58 - progress_bar.py[line:274] - INFO: epoch 009: 828 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7793, nsentences=120, sample_size=3732.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1917.2, ups=0.25, wpb=7793, bsz=120, num_updates=49080, lr=5.98996e-06, gnorm=1.018, clip=70, loss_scale=64, train_wall=41, gb_free=28.5, wall=201111 2023-05-03 10:25:38 - progress_bar.py[line:274] - INFO: epoch 009: 838 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7615.9, nsentences=120, sample_size=3952, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1899, ups=0.25, wpb=7615.9, bsz=120, num_updates=49090, lr=5.98468e-06, gnorm=1.009, clip=50, loss_scale=64, train_wall=40, gb_free=28.3, wall=201151 2023-05-03 10:26:18 - progress_bar.py[line:274] - INFO: epoch 009: 848 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7666.1, nsentences=120, sample_size=4173.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1924.8, ups=0.25, wpb=7666.1, bsz=120, num_updates=49100, lr=5.9794e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=201190 2023-05-03 10:26:57 - progress_bar.py[line:274] - INFO: epoch 009: 858 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7780.8, nsentences=120, sample_size=4002.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1969.9, ups=0.25, wpb=7780.8, bsz=120, num_updates=49110, lr=5.97412e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=201230 2023-05-03 10:27:37 - progress_bar.py[line:274] - INFO: epoch 009: 868 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7543.3, nsentences=120, sample_size=3954.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1914.4, ups=0.25, wpb=7543.3, bsz=120, num_updates=49120, lr=5.96884e-06, gnorm=1.003, clip=60, loss_scale=64, train_wall=39, gb_free=30.5, wall=201269 2023-05-03 10:28:17 - progress_bar.py[line:274] - INFO: epoch 009: 878 / 6042 loss=2.415, loss_v1=0, loss_v2=0, nll_loss=1.158, ntokens=7892.9, nsentences=120, sample_size=3992.8, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1991.4, ups=0.25, wpb=7892.9, bsz=120, num_updates=49130, lr=5.96355e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=27.1, wall=201309 2023-05-03 10:28:56 - progress_bar.py[line:274] - INFO: epoch 009: 888 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7626.7, nsentences=120, sample_size=4125.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1916, ups=0.25, wpb=7626.7, bsz=120, num_updates=49140, lr=5.95827e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.5, wall=201349 2023-05-03 10:29:37 - progress_bar.py[line:274] - INFO: epoch 009: 898 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7566.9, nsentences=120, sample_size=4229, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1843.6, ups=0.24, wpb=7566.9, bsz=120, num_updates=49150, lr=5.95299e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=41, gb_free=30.6, wall=201390 2023-05-03 10:30:18 - progress_bar.py[line:274] - INFO: epoch 009: 908 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7774.2, nsentences=120, sample_size=4018.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1933.2, ups=0.25, wpb=7774.2, bsz=120, num_updates=49160, lr=5.94771e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=201430 2023-05-03 10:30:57 - progress_bar.py[line:274] - INFO: epoch 009: 918 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7707.1, nsentences=120, sample_size=3952.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1951.9, ups=0.25, wpb=7707.1, bsz=120, num_updates=49170, lr=5.94242e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=39, gb_free=31.4, wall=201470 2023-05-03 10:31:37 - progress_bar.py[line:274] - INFO: epoch 009: 928 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7773.5, nsentences=120, sample_size=4243.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1951, ups=0.25, wpb=7773.5, bsz=120, num_updates=49180, lr=5.93714e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=201509 2023-05-03 10:32:17 - progress_bar.py[line:274] - INFO: epoch 009: 938 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7833.5, nsentences=120, sample_size=4117.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1950.3, ups=0.25, wpb=7833.5, bsz=120, num_updates=49190, lr=5.93186e-06, gnorm=0.971, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=201550 2023-05-03 10:32:58 - progress_bar.py[line:274] - INFO: epoch 009: 948 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7666.8, nsentences=120, sample_size=4393.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1883.4, ups=0.25, wpb=7666.8, bsz=120, num_updates=49200, lr=5.92658e-06, gnorm=0.936, clip=20, loss_scale=64, train_wall=41, gb_free=27.7, wall=201590 2023-05-03 10:33:38 - progress_bar.py[line:274] - INFO: epoch 009: 958 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7734.9, nsentences=120, sample_size=4102, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1931.1, ups=0.25, wpb=7734.9, bsz=120, num_updates=49210, lr=5.9213e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=201630 2023-05-03 10:34:18 - progress_bar.py[line:274] - INFO: epoch 009: 968 / 6042 loss=2.407, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=7988.4, nsentences=120, sample_size=4034.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1985.5, ups=0.25, wpb=7988.4, bsz=120, num_updates=49220, lr=5.91601e-06, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=201671 2023-05-03 10:34:58 - progress_bar.py[line:274] - INFO: epoch 009: 978 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7943.9, nsentences=120, sample_size=3978.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1972.4, ups=0.25, wpb=7943.9, bsz=120, num_updates=49230, lr=5.91073e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=201711 2023-05-03 10:35:39 - progress_bar.py[line:274] - INFO: epoch 009: 988 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7738.8, nsentences=120, sample_size=4000.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1917.4, ups=0.25, wpb=7738.8, bsz=120, num_updates=49240, lr=5.90545e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=201751 2023-05-03 10:36:18 - progress_bar.py[line:274] - INFO: epoch 009: 998 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7795.2, nsentences=120, sample_size=4115.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1965.6, ups=0.25, wpb=7795.2, bsz=120, num_updates=49250, lr=5.90017e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=201791 2023-05-03 10:36:59 - progress_bar.py[line:274] - INFO: epoch 009: 1008 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7729.9, nsentences=120, sample_size=4011.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1919.4, ups=0.25, wpb=7729.9, bsz=120, num_updates=49260, lr=5.89489e-06, gnorm=0.999, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=201831 2023-05-03 10:37:39 - progress_bar.py[line:274] - INFO: epoch 009: 1018 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7532, nsentences=120, sample_size=4330.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1876.4, ups=0.25, wpb=7532, bsz=120, num_updates=49270, lr=5.8896e-06, gnorm=0.933, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=201871 2023-05-03 10:38:18 - progress_bar.py[line:274] - INFO: epoch 009: 1028 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=8152.8, nsentences=120, sample_size=3782.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2057.5, ups=0.25, wpb=8152.8, bsz=120, num_updates=49280, lr=5.88432e-06, gnorm=1.006, clip=60, loss_scale=64, train_wall=40, gb_free=26.5, wall=201911 2023-05-03 10:38:59 - progress_bar.py[line:274] - INFO: epoch 009: 1038 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7957, nsentences=120, sample_size=4191.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1966.4, ups=0.25, wpb=7957, bsz=120, num_updates=49290, lr=5.87904e-06, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=28.4, wall=201951 2023-05-03 10:39:39 - progress_bar.py[line:274] - INFO: epoch 009: 1048 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7826.3, nsentences=120, sample_size=3909, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1963.8, ups=0.25, wpb=7826.3, bsz=120, num_updates=49300, lr=5.87376e-06, gnorm=1.01, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=201991 2023-05-03 10:40:19 - progress_bar.py[line:274] - INFO: epoch 009: 1058 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7800.7, nsentences=120, sample_size=4291.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1933.3, ups=0.25, wpb=7800.7, bsz=120, num_updates=49310, lr=5.86847e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=202032 2023-05-03 10:40:58 - progress_bar.py[line:274] - INFO: epoch 009: 1068 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7960.4, nsentences=120, sample_size=3863.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2031.3, ups=0.26, wpb=7960.4, bsz=120, num_updates=49320, lr=5.86319e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=28.8, wall=202071 2023-05-03 10:41:38 - progress_bar.py[line:274] - INFO: epoch 009: 1078 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7623, nsentences=120, sample_size=4125.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1919.1, ups=0.25, wpb=7623, bsz=120, num_updates=49330, lr=5.85791e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=202110 2023-05-03 10:42:19 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 10:42:23 - progress_bar.py[line:274] - INFO: epoch 009: 1089 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7888.9, nsentences=120, sample_size=4057, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1770.6, ups=0.22, wpb=7888.9, bsz=120, num_updates=49340, lr=5.85263e-06, gnorm=0.992, clip=30, loss_scale=64, train_wall=44, gb_free=28.1, wall=202155 2023-05-03 10:43:03 - progress_bar.py[line:274] - INFO: epoch 009: 1099 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7967.3, nsentences=120, sample_size=3956.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1991.7, ups=0.25, wpb=7967.3, bsz=120, num_updates=49350, lr=5.84735e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=202195 2023-05-03 10:43:43 - progress_bar.py[line:274] - INFO: epoch 009: 1109 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7895.2, nsentences=120, sample_size=3814.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1960.8, ups=0.25, wpb=7895.2, bsz=120, num_updates=49360, lr=5.84206e-06, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=202235 2023-05-03 10:44:23 - progress_bar.py[line:274] - INFO: epoch 009: 1119 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7809.6, nsentences=120, sample_size=4213, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1960.5, ups=0.25, wpb=7809.6, bsz=120, num_updates=49370, lr=5.83678e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=202275 2023-05-03 10:45:03 - progress_bar.py[line:274] - INFO: epoch 009: 1129 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7875.9, nsentences=120, sample_size=4262.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958.2, ups=0.25, wpb=7875.9, bsz=120, num_updates=49380, lr=5.8315e-06, gnorm=0.931, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=202315 2023-05-03 10:45:43 - progress_bar.py[line:274] - INFO: epoch 009: 1139 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7864.4, nsentences=120, sample_size=4160.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1973.9, ups=0.25, wpb=7864.4, bsz=120, num_updates=49390, lr=5.82622e-06, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=202355 2023-05-03 10:46:22 - progress_bar.py[line:274] - INFO: epoch 009: 1149 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7824.2, nsentences=120, sample_size=3600.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1972.5, ups=0.25, wpb=7824.2, bsz=120, num_updates=49400, lr=5.82093e-06, gnorm=1.018, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=202395 2023-05-03 10:47:03 - progress_bar.py[line:274] - INFO: epoch 009: 1159 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7954.4, nsentences=120, sample_size=3988.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1961.3, ups=0.25, wpb=7954.4, bsz=120, num_updates=49410, lr=5.81565e-06, gnorm=0.969, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=202435 2023-05-03 10:47:43 - progress_bar.py[line:274] - INFO: epoch 009: 1169 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7483.4, nsentences=120, sample_size=4070.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1872.8, ups=0.25, wpb=7483.4, bsz=120, num_updates=49420, lr=5.81037e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=202475 2023-05-03 10:48:23 - progress_bar.py[line:274] - INFO: epoch 009: 1179 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7987.9, nsentences=120, sample_size=3943.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1988.5, ups=0.25, wpb=7987.9, bsz=120, num_updates=49430, lr=5.80509e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=202516 2023-05-03 10:49:03 - progress_bar.py[line:274] - INFO: epoch 009: 1189 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7569.3, nsentences=120, sample_size=4355.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1880.6, ups=0.25, wpb=7569.3, bsz=120, num_updates=49440, lr=5.79981e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=202556 2023-05-03 10:49:43 - progress_bar.py[line:274] - INFO: epoch 009: 1199 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7896.4, nsentences=120, sample_size=4188.2, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1981, ups=0.25, wpb=7896.4, bsz=120, num_updates=49450, lr=5.79452e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=202596 2023-05-03 10:50:22 - progress_bar.py[line:274] - INFO: epoch 009: 1209 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7441.2, nsentences=120, sample_size=4046.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1910.4, ups=0.26, wpb=7441.2, bsz=120, num_updates=49460, lr=5.78924e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=28.8, wall=202635 2023-05-03 10:51:02 - progress_bar.py[line:274] - INFO: epoch 009: 1219 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=8158.7, nsentences=120, sample_size=4169.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2047.2, ups=0.25, wpb=8158.7, bsz=120, num_updates=49470, lr=5.78396e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=28.8, wall=202674 2023-05-03 10:51:42 - progress_bar.py[line:274] - INFO: epoch 009: 1229 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7498.5, nsentences=120, sample_size=3961, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1898.3, ups=0.25, wpb=7498.5, bsz=120, num_updates=49480, lr=5.77868e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=202714 2023-05-03 10:52:21 - progress_bar.py[line:274] - INFO: epoch 009: 1239 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7908.6, nsentences=120, sample_size=4005.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2001.6, ups=0.25, wpb=7908.6, bsz=120, num_updates=49490, lr=5.7734e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=202753 2023-05-03 10:53:01 - progress_bar.py[line:274] - INFO: epoch 009: 1249 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7977.5, nsentences=120, sample_size=3924, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2006.2, ups=0.25, wpb=7977.5, bsz=120, num_updates=49500, lr=5.76811e-06, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=202793 2023-05-03 10:53:40 - progress_bar.py[line:274] - INFO: epoch 009: 1259 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7731.5, nsentences=120, sample_size=4058.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1961.8, ups=0.25, wpb=7731.5, bsz=120, num_updates=49510, lr=5.76283e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=39, gb_free=29.3, wall=202833 2023-05-03 10:54:20 - progress_bar.py[line:274] - INFO: epoch 009: 1269 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7748.2, nsentences=120, sample_size=3728.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1944.3, ups=0.25, wpb=7748.2, bsz=120, num_updates=49520, lr=5.75755e-06, gnorm=1.002, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=202873 2023-05-03 10:55:01 - progress_bar.py[line:274] - INFO: epoch 009: 1279 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7986, nsentences=120, sample_size=4102.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1973.3, ups=0.25, wpb=7986, bsz=120, num_updates=49530, lr=5.75227e-06, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=26.6, wall=202913 2023-05-03 10:55:40 - progress_bar.py[line:274] - INFO: epoch 009: 1289 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7675.4, nsentences=120, sample_size=3943, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1930.2, ups=0.25, wpb=7675.4, bsz=120, num_updates=49540, lr=5.74698e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=28.7, wall=202953 2023-05-03 10:56:21 - progress_bar.py[line:274] - INFO: epoch 009: 1299 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=8041.4, nsentences=120, sample_size=3922.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1994.2, ups=0.25, wpb=8041.4, bsz=120, num_updates=49550, lr=5.7417e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=28.7, wall=202993 2023-05-03 10:57:01 - progress_bar.py[line:274] - INFO: epoch 009: 1309 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7676.2, nsentences=120, sample_size=3958.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1905, ups=0.25, wpb=7676.2, bsz=120, num_updates=49560, lr=5.73642e-06, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=203033 2023-05-03 10:57:41 - progress_bar.py[line:274] - INFO: epoch 009: 1319 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7603.3, nsentences=120, sample_size=4098.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1913.9, ups=0.25, wpb=7603.3, bsz=120, num_updates=49570, lr=5.73114e-06, gnorm=0.979, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=203073 2023-05-03 10:58:20 - progress_bar.py[line:274] - INFO: epoch 009: 1329 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7457.9, nsentences=120, sample_size=3947.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1915.8, ups=0.26, wpb=7457.9, bsz=120, num_updates=49580, lr=5.72586e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=203112 2023-05-03 10:58:59 - progress_bar.py[line:274] - INFO: epoch 009: 1339 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7950.5, nsentences=120, sample_size=3993.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2012.9, ups=0.25, wpb=7950.5, bsz=120, num_updates=49590, lr=5.72057e-06, gnorm=1.007, clip=70, loss_scale=64, train_wall=39, gb_free=29.3, wall=203152 2023-05-03 10:59:39 - progress_bar.py[line:274] - INFO: epoch 009: 1349 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7813.9, nsentences=120, sample_size=4221, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1950.2, ups=0.25, wpb=7813.9, bsz=120, num_updates=49600, lr=5.71529e-06, gnorm=0.967, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=203192 2023-05-03 11:00:19 - progress_bar.py[line:274] - INFO: epoch 009: 1359 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7670.9, nsentences=120, sample_size=3936.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1915.6, ups=0.25, wpb=7670.9, bsz=120, num_updates=49610, lr=5.71001e-06, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=203232 2023-05-03 11:00:59 - progress_bar.py[line:274] - INFO: epoch 009: 1369 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7772.6, nsentences=120, sample_size=4161.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1938.2, ups=0.25, wpb=7772.6, bsz=120, num_updates=49620, lr=5.70473e-06, gnorm=0.947, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=203272 2023-05-03 11:01:39 - progress_bar.py[line:274] - INFO: epoch 009: 1379 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7539.1, nsentences=120, sample_size=4105.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1922.1, ups=0.25, wpb=7539.1, bsz=120, num_updates=49630, lr=5.69945e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=27.5, wall=203311 2023-05-03 11:02:18 - progress_bar.py[line:274] - INFO: epoch 009: 1389 / 6042 loss=2.313, loss_v1=0, loss_v2=0, nll_loss=1.051, ntokens=7524.5, nsentences=120, sample_size=4033.1, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1904.5, ups=0.25, wpb=7524.5, bsz=120, num_updates=49640, lr=5.69416e-06, gnorm=0.991, clip=60, loss_scale=64, train_wall=39, gb_free=30.7, wall=203350 2023-05-03 11:02:57 - progress_bar.py[line:274] - INFO: epoch 009: 1399 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7915.8, nsentences=120, sample_size=3895.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2006.1, ups=0.25, wpb=7915.8, bsz=120, num_updates=49650, lr=5.68888e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=31.2, wall=203390 2023-05-03 11:03:37 - progress_bar.py[line:274] - INFO: epoch 009: 1409 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7472.5, nsentences=120, sample_size=4221.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1913.4, ups=0.26, wpb=7472.5, bsz=120, num_updates=49660, lr=5.6836e-06, gnorm=0.977, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=203429 2023-05-03 11:04:17 - progress_bar.py[line:274] - INFO: epoch 009: 1419 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7976.8, nsentences=120, sample_size=3966.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1982.9, ups=0.25, wpb=7976.8, bsz=120, num_updates=49670, lr=5.67832e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=203469 2023-05-03 11:04:56 - progress_bar.py[line:274] - INFO: epoch 009: 1429 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7750.8, nsentences=120, sample_size=3897.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1982.1, ups=0.26, wpb=7750.8, bsz=120, num_updates=49680, lr=5.67303e-06, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=203508 2023-05-03 11:05:36 - progress_bar.py[line:274] - INFO: epoch 009: 1439 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=8068.8, nsentences=120, sample_size=3783.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2035.1, ups=0.25, wpb=8068.8, bsz=120, num_updates=49690, lr=5.66775e-06, gnorm=1.01, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=203548 2023-05-03 11:06:15 - progress_bar.py[line:274] - INFO: epoch 009: 1449 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7629.4, nsentences=120, sample_size=4070.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1932.1, ups=0.25, wpb=7629.4, bsz=120, num_updates=49700, lr=5.66247e-06, gnorm=0.953, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=203587 2023-05-03 11:06:54 - progress_bar.py[line:274] - INFO: epoch 009: 1459 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7628, nsentences=120, sample_size=4150.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1935.3, ups=0.25, wpb=7628, bsz=120, num_updates=49710, lr=5.65719e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=39, gb_free=30.7, wall=203627 2023-05-03 11:07:34 - progress_bar.py[line:274] - INFO: epoch 009: 1469 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7818.1, nsentences=120, sample_size=4045.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1982.3, ups=0.25, wpb=7818.1, bsz=120, num_updates=49720, lr=5.65191e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=203666 2023-05-03 11:08:13 - progress_bar.py[line:274] - INFO: epoch 009: 1479 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7942.8, nsentences=120, sample_size=3930.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=2007.7, ups=0.25, wpb=7942.8, bsz=120, num_updates=49730, lr=5.64662e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=203706 2023-05-03 11:08:53 - progress_bar.py[line:274] - INFO: epoch 009: 1489 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7540.2, nsentences=120, sample_size=4229.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1927, ups=0.26, wpb=7540.2, bsz=120, num_updates=49740, lr=5.64134e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=29.7, wall=203745 2023-05-03 11:09:32 - progress_bar.py[line:274] - INFO: epoch 009: 1499 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7529.3, nsentences=120, sample_size=3933.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1910.4, ups=0.25, wpb=7529.3, bsz=120, num_updates=49750, lr=5.63606e-06, gnorm=1.029, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=203784 2023-05-03 11:10:12 - progress_bar.py[line:274] - INFO: epoch 009: 1509 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7724, nsentences=120, sample_size=4094.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1934.8, ups=0.25, wpb=7724, bsz=120, num_updates=49760, lr=5.63078e-06, gnorm=1.041, clip=70, loss_scale=64, train_wall=40, gb_free=29.8, wall=203824 2023-05-03 11:10:51 - progress_bar.py[line:274] - INFO: epoch 009: 1519 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7400.6, nsentences=120, sample_size=4132.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1878.2, ups=0.25, wpb=7400.6, bsz=120, num_updates=49770, lr=5.6255e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=203864 2023-05-03 11:11:31 - progress_bar.py[line:274] - INFO: epoch 009: 1529 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7842, nsentences=120, sample_size=4168.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1955.2, ups=0.25, wpb=7842, bsz=120, num_updates=49780, lr=5.62021e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=203904 2023-05-03 11:12:11 - progress_bar.py[line:274] - INFO: epoch 009: 1539 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7907.8, nsentences=120, sample_size=4314.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2001, ups=0.25, wpb=7907.8, bsz=120, num_updates=49790, lr=5.61493e-06, gnorm=0.947, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=203943 2023-05-03 11:12:51 - progress_bar.py[line:274] - INFO: epoch 009: 1549 / 6042 loss=2.312, loss_v1=0, loss_v2=0, nll_loss=1.049, ntokens=7482.2, nsentences=120, sample_size=4494.3, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1862.6, ups=0.25, wpb=7482.2, bsz=120, num_updates=49800, lr=5.60965e-06, gnorm=0.937, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=203984 2023-05-03 11:13:31 - progress_bar.py[line:274] - INFO: epoch 009: 1559 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7695.6, nsentences=120, sample_size=3928.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1924.5, ups=0.25, wpb=7695.6, bsz=120, num_updates=49810, lr=5.60437e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=204024 2023-05-03 11:14:11 - progress_bar.py[line:274] - INFO: epoch 009: 1569 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7301.8, nsentences=120, sample_size=4096.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1830.5, ups=0.25, wpb=7301.8, bsz=120, num_updates=49820, lr=5.59908e-06, gnorm=0.996, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=204063 2023-05-03 11:14:51 - progress_bar.py[line:274] - INFO: epoch 009: 1579 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7716.5, nsentences=120, sample_size=3981.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1927.5, ups=0.25, wpb=7716.5, bsz=120, num_updates=49830, lr=5.5938e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=204103 2023-05-03 11:15:30 - progress_bar.py[line:274] - INFO: epoch 009: 1589 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7527.6, nsentences=120, sample_size=4199.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1913.5, ups=0.25, wpb=7527.6, bsz=120, num_updates=49840, lr=5.58852e-06, gnorm=0.963, clip=20, loss_scale=64, train_wall=39, gb_free=30.9, wall=204143 2023-05-03 11:16:09 - progress_bar.py[line:274] - INFO: epoch 009: 1599 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7776.3, nsentences=120, sample_size=4021.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1986.6, ups=0.26, wpb=7776.3, bsz=120, num_updates=49850, lr=5.58324e-06, gnorm=0.995, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=204182 2023-05-03 11:16:49 - progress_bar.py[line:274] - INFO: epoch 009: 1609 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7456.6, nsentences=120, sample_size=4264.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1894.6, ups=0.25, wpb=7456.6, bsz=120, num_updates=49860, lr=5.57796e-06, gnorm=0.937, clip=10, loss_scale=128, train_wall=39, gb_free=29.8, wall=204221 2023-05-03 11:17:25 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 11:17:33 - progress_bar.py[line:274] - INFO: epoch 009: 1620 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8261.1, nsentences=120, sample_size=3875.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1872.3, ups=0.23, wpb=8261.1, bsz=120, num_updates=49870, lr=5.57267e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=44, gb_free=30.2, wall=204265 2023-05-03 11:18:13 - progress_bar.py[line:274] - INFO: epoch 009: 1630 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7557.5, nsentences=120, sample_size=4261.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1895.9, ups=0.25, wpb=7557.5, bsz=120, num_updates=49880, lr=5.56739e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=31.3, wall=204305 2023-05-03 11:18:53 - progress_bar.py[line:274] - INFO: epoch 009: 1640 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7822.1, nsentences=120, sample_size=3991.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1962.3, ups=0.25, wpb=7822.1, bsz=120, num_updates=49890, lr=5.56211e-06, gnorm=0.981, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=204345 2023-05-03 11:19:33 - progress_bar.py[line:274] - INFO: epoch 009: 1650 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7871.9, nsentences=120, sample_size=3831.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1956.2, ups=0.25, wpb=7871.9, bsz=120, num_updates=49900, lr=5.55683e-06, gnorm=1.027, clip=70, loss_scale=64, train_wall=40, gb_free=30.8, wall=204385 2023-05-03 11:20:13 - progress_bar.py[line:274] - INFO: epoch 009: 1660 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7403.2, nsentences=120, sample_size=4245.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1867.4, ups=0.25, wpb=7403.2, bsz=120, num_updates=49910, lr=5.55155e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.6, wall=204425 2023-05-03 11:20:53 - progress_bar.py[line:274] - INFO: epoch 009: 1670 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7395.3, nsentences=120, sample_size=4149.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1822.7, ups=0.25, wpb=7395.3, bsz=120, num_updates=49920, lr=5.54626e-06, gnorm=0.981, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=204466 2023-05-03 11:21:33 - progress_bar.py[line:274] - INFO: epoch 009: 1680 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7637.9, nsentences=120, sample_size=4121.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1898.5, ups=0.25, wpb=7637.9, bsz=120, num_updates=49930, lr=5.54098e-06, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=204506 2023-05-03 11:22:13 - progress_bar.py[line:274] - INFO: epoch 009: 1690 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7633.1, nsentences=120, sample_size=4006.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1910.5, ups=0.25, wpb=7633.1, bsz=120, num_updates=49940, lr=5.5357e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=204546 2023-05-03 11:22:54 - progress_bar.py[line:274] - INFO: epoch 009: 1700 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7767.7, nsentences=120, sample_size=4169.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1929.6, ups=0.25, wpb=7767.7, bsz=120, num_updates=49950, lr=5.53042e-06, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=204586 2023-05-03 11:23:34 - progress_bar.py[line:274] - INFO: epoch 009: 1710 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7780.4, nsentences=120, sample_size=3843.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1944.2, ups=0.25, wpb=7780.4, bsz=120, num_updates=49960, lr=5.52513e-06, gnorm=0.999, clip=70, loss_scale=64, train_wall=40, gb_free=30.6, wall=204626 2023-05-03 11:24:13 - progress_bar.py[line:274] - INFO: epoch 009: 1720 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7751.5, nsentences=120, sample_size=3906.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1957.2, ups=0.25, wpb=7751.5, bsz=120, num_updates=49970, lr=5.51985e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=204666 2023-05-03 11:24:52 - progress_bar.py[line:274] - INFO: epoch 009: 1730 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7859.8, nsentences=120, sample_size=3649.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2008.6, ups=0.26, wpb=7859.8, bsz=120, num_updates=49980, lr=5.51457e-06, gnorm=1.027, clip=60, loss_scale=64, train_wall=39, gb_free=28.9, wall=204705 2023-05-03 11:25:32 - progress_bar.py[line:274] - INFO: epoch 009: 1740 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7731.6, nsentences=120, sample_size=3969.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1969, ups=0.25, wpb=7731.6, bsz=120, num_updates=49990, lr=5.50929e-06, gnorm=1.004, clip=40, loss_scale=64, train_wall=39, gb_free=28.1, wall=204744 2023-05-03 11:26:12 - progress_bar.py[line:274] - INFO: epoch 009: 1750 / 6042 loss=2.428, loss_v1=0, loss_v2=0, nll_loss=1.182, ntokens=8069.4, nsentences=120, sample_size=3994.7, sample_size_v1=0, sample_size_v2=0, ppl=2.27, wps=1994.4, ups=0.25, wpb=8069.4, bsz=120, num_updates=50000, lr=5.50401e-06, gnorm=1, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=204785 2023-05-03 11:26:12 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 11:26:14 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 11:26:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:26:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 11:26:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:26:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:43 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 11:26:43 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:26:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 11:26:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:26:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:26:58 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 11:26:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:26:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:26:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:27:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:27:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:27:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:27:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:27:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:27:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:27:03 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 11:27:03 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 11:27:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 11:27:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 11:27:03 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.271 | loss_v1 0 | loss_v2 0 | nll_loss 2.106 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.31 | score 0.7524 | wps 3295.5 | wpb 3202.1 | bsz 39.4 | num_updates 50000 | best_score 0.7627 2023-05-03 11:27:03 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 50000 updates 2023-05-03 11:27:03 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_50000.pt 2023-05-03 11:27:28 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_50000.pt 2023-05-03 11:27:41 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_50000.pt (epoch 9 @ 50000 updates, score 0.7524) (writing took 38.29431224311702 seconds) 2023-05-03 11:28:21 - progress_bar.py[line:274] - INFO: epoch 009: 1760 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7655.6, nsentences=120, sample_size=4218.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=594.8, ups=0.08, wpb=7655.6, bsz=120, num_updates=50010, lr=5.49872e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=204913 2023-05-03 11:29:00 - progress_bar.py[line:274] - INFO: epoch 009: 1770 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7635.1, nsentences=120, sample_size=4036.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1933.6, ups=0.25, wpb=7635.1, bsz=120, num_updates=50020, lr=5.49344e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=204953 2023-05-03 11:29:40 - progress_bar.py[line:274] - INFO: epoch 009: 1780 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7735.9, nsentences=120, sample_size=4093.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1925, ups=0.25, wpb=7735.9, bsz=120, num_updates=50030, lr=5.48816e-06, gnorm=1.038, clip=60, loss_scale=64, train_wall=40, gb_free=29, wall=204993 2023-05-03 11:30:21 - progress_bar.py[line:274] - INFO: epoch 009: 1790 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7596.6, nsentences=120, sample_size=4164.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1878.2, ups=0.25, wpb=7596.6, bsz=120, num_updates=50040, lr=5.48288e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=205033 2023-05-03 11:31:00 - progress_bar.py[line:274] - INFO: epoch 009: 1800 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7595.5, nsentences=120, sample_size=4104.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1929.4, ups=0.25, wpb=7595.5, bsz=120, num_updates=50050, lr=5.47759e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=31.1, wall=205073 2023-05-03 11:31:40 - progress_bar.py[line:274] - INFO: epoch 009: 1810 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7537.2, nsentences=120, sample_size=4127.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1888.1, ups=0.25, wpb=7537.2, bsz=120, num_updates=50060, lr=5.47231e-06, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=205113 2023-05-03 11:32:21 - progress_bar.py[line:274] - INFO: epoch 009: 1820 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7797, nsentences=120, sample_size=3771.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1904, ups=0.24, wpb=7797, bsz=120, num_updates=50070, lr=5.46703e-06, gnorm=1, clip=50, loss_scale=64, train_wall=41, gb_free=30.1, wall=205154 2023-05-03 11:33:02 - progress_bar.py[line:274] - INFO: epoch 009: 1830 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7836, nsentences=120, sample_size=4157, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1940.2, ups=0.25, wpb=7836, bsz=120, num_updates=50080, lr=5.46175e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=205194 2023-05-03 11:33:42 - progress_bar.py[line:274] - INFO: epoch 009: 1840 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7676.5, nsentences=120, sample_size=3855.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1896, ups=0.25, wpb=7676.5, bsz=120, num_updates=50090, lr=5.45647e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=31.1, wall=205235 2023-05-03 11:34:22 - progress_bar.py[line:274] - INFO: epoch 009: 1850 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7740.9, nsentences=120, sample_size=4231.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1935.7, ups=0.25, wpb=7740.9, bsz=120, num_updates=50100, lr=5.45118e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=205274 2023-05-03 11:35:02 - progress_bar.py[line:274] - INFO: epoch 009: 1860 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7593.2, nsentences=120, sample_size=4055.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1897.2, ups=0.25, wpb=7593.2, bsz=120, num_updates=50110, lr=5.4459e-06, gnorm=0.947, clip=10, loss_scale=64, train_wall=40, gb_free=30.2, wall=205315 2023-05-03 11:35:42 - progress_bar.py[line:274] - INFO: epoch 009: 1870 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7647.5, nsentences=120, sample_size=4341.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1915.3, ups=0.25, wpb=7647.5, bsz=120, num_updates=50120, lr=5.44062e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=205354 2023-05-03 11:36:21 - progress_bar.py[line:274] - INFO: epoch 009: 1880 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7639.4, nsentences=120, sample_size=4159.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1942.7, ups=0.25, wpb=7639.4, bsz=120, num_updates=50130, lr=5.43534e-06, gnorm=0.966, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=205394 2023-05-03 11:37:00 - progress_bar.py[line:274] - INFO: epoch 009: 1890 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7667.4, nsentences=120, sample_size=4088.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1956.7, ups=0.26, wpb=7667.4, bsz=120, num_updates=50140, lr=5.43006e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=39, gb_free=30.8, wall=205433 2023-05-03 11:37:40 - progress_bar.py[line:274] - INFO: epoch 009: 1900 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7802.3, nsentences=120, sample_size=4062, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1957.7, ups=0.25, wpb=7802.3, bsz=120, num_updates=50150, lr=5.42477e-06, gnorm=0.974, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=205473 2023-05-03 11:38:20 - progress_bar.py[line:274] - INFO: epoch 009: 1910 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7861.9, nsentences=120, sample_size=4323.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1983.9, ups=0.25, wpb=7861.9, bsz=120, num_updates=50160, lr=5.41949e-06, gnorm=0.957, clip=30, loss_scale=64, train_wall=40, gb_free=31.6, wall=205512 2023-05-03 11:38:59 - progress_bar.py[line:274] - INFO: epoch 009: 1920 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7838.5, nsentences=120, sample_size=4001.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1984.7, ups=0.25, wpb=7838.5, bsz=120, num_updates=50170, lr=5.41421e-06, gnorm=0.978, clip=60, loss_scale=64, train_wall=39, gb_free=29.4, wall=205552 2023-05-03 11:39:39 - progress_bar.py[line:274] - INFO: epoch 009: 1930 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7808.7, nsentences=120, sample_size=4020.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1962.2, ups=0.25, wpb=7808.7, bsz=120, num_updates=50180, lr=5.40893e-06, gnorm=1.012, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=205592 2023-05-03 11:40:19 - progress_bar.py[line:274] - INFO: epoch 009: 1940 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7751.7, nsentences=120, sample_size=3735.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1936.9, ups=0.25, wpb=7751.7, bsz=120, num_updates=50190, lr=5.40364e-06, gnorm=1.011, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=205632 2023-05-03 11:40:59 - progress_bar.py[line:274] - INFO: epoch 009: 1950 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7635.7, nsentences=120, sample_size=4129.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1918.5, ups=0.25, wpb=7635.7, bsz=120, num_updates=50200, lr=5.39836e-06, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=28, wall=205672 2023-05-03 11:41:40 - progress_bar.py[line:274] - INFO: epoch 009: 1960 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7552.1, nsentences=120, sample_size=4196.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1862.5, ups=0.25, wpb=7552.1, bsz=120, num_updates=50210, lr=5.39308e-06, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=205712 2023-05-03 11:42:20 - progress_bar.py[line:274] - INFO: epoch 009: 1970 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7571.7, nsentences=120, sample_size=4338, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1898.6, ups=0.25, wpb=7571.7, bsz=120, num_updates=50220, lr=5.3878e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.8, wall=205752 2023-05-03 11:42:59 - progress_bar.py[line:274] - INFO: epoch 009: 1980 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7725.4, nsentences=120, sample_size=3956.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1962.4, ups=0.25, wpb=7725.4, bsz=120, num_updates=50230, lr=5.38252e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=205791 2023-05-03 11:43:39 - progress_bar.py[line:274] - INFO: epoch 009: 1990 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7744, nsentences=120, sample_size=3782.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1948.4, ups=0.25, wpb=7744, bsz=120, num_updates=50240, lr=5.37723e-06, gnorm=1.014, clip=60, loss_scale=64, train_wall=40, gb_free=28, wall=205831 2023-05-03 11:44:19 - progress_bar.py[line:274] - INFO: epoch 009: 2000 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.15, ntokens=7920.5, nsentences=120, sample_size=3770.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1970.8, ups=0.25, wpb=7920.5, bsz=120, num_updates=50250, lr=5.37195e-06, gnorm=1.003, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=205871 2023-05-03 11:44:58 - progress_bar.py[line:274] - INFO: epoch 009: 2010 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7413.7, nsentences=120, sample_size=4419.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1892.2, ups=0.26, wpb=7413.7, bsz=120, num_updates=50260, lr=5.36667e-06, gnorm=0.936, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=205910 2023-05-03 11:45:38 - progress_bar.py[line:274] - INFO: epoch 009: 2020 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7922.6, nsentences=120, sample_size=3601.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1989, ups=0.25, wpb=7922.6, bsz=120, num_updates=50270, lr=5.36139e-06, gnorm=1.029, clip=70, loss_scale=64, train_wall=40, gb_free=29.3, wall=205950 2023-05-03 11:46:17 - progress_bar.py[line:274] - INFO: epoch 009: 2030 / 6042 loss=2.316, loss_v1=0, loss_v2=0, nll_loss=1.053, ntokens=7753.8, nsentences=120, sample_size=4035.7, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1975.3, ups=0.25, wpb=7753.8, bsz=120, num_updates=50280, lr=5.35611e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=29.2, wall=205990 2023-05-03 11:46:57 - progress_bar.py[line:274] - INFO: epoch 009: 2040 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7856.8, nsentences=120, sample_size=4249.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1958.8, ups=0.25, wpb=7856.8, bsz=120, num_updates=50290, lr=5.35082e-06, gnorm=0.945, clip=0, loss_scale=64, train_wall=40, gb_free=29.3, wall=206030 2023-05-03 11:47:37 - progress_bar.py[line:274] - INFO: epoch 009: 2050 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7779.2, nsentences=120, sample_size=4040.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1977, ups=0.25, wpb=7779.2, bsz=120, num_updates=50300, lr=5.34554e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=25.1, wall=206069 2023-05-03 11:48:16 - progress_bar.py[line:274] - INFO: epoch 009: 2060 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7811.6, nsentences=120, sample_size=4188.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1962.3, ups=0.25, wpb=7811.6, bsz=120, num_updates=50310, lr=5.34026e-06, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=206109 2023-05-03 11:48:57 - progress_bar.py[line:274] - INFO: epoch 009: 2070 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7663.5, nsentences=120, sample_size=4316.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1891.5, ups=0.25, wpb=7663.5, bsz=120, num_updates=50320, lr=5.33498e-06, gnorm=0.94, clip=0, loss_scale=64, train_wall=40, gb_free=29.6, wall=206149 2023-05-03 11:49:37 - progress_bar.py[line:274] - INFO: epoch 009: 2080 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7520.1, nsentences=120, sample_size=4193.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1873.4, ups=0.25, wpb=7520.1, bsz=120, num_updates=50330, lr=5.32969e-06, gnorm=0.985, clip=60, loss_scale=64, train_wall=40, gb_free=29, wall=206189 2023-05-03 11:50:17 - progress_bar.py[line:274] - INFO: epoch 009: 2090 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7869.7, nsentences=120, sample_size=3967, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1967.3, ups=0.25, wpb=7869.7, bsz=120, num_updates=50340, lr=5.32441e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=206229 2023-05-03 11:50:58 - progress_bar.py[line:274] - INFO: epoch 009: 2100 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7902.8, nsentences=120, sample_size=3951.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1947.5, ups=0.25, wpb=7902.8, bsz=120, num_updates=50350, lr=5.31913e-06, gnorm=1.011, clip=40, loss_scale=64, train_wall=41, gb_free=26.3, wall=206270 2023-05-03 11:51:37 - progress_bar.py[line:274] - INFO: epoch 009: 2110 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7576.9, nsentences=120, sample_size=4165.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1921.3, ups=0.25, wpb=7576.9, bsz=120, num_updates=50360, lr=5.31385e-06, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=206310 2023-05-03 11:52:17 - progress_bar.py[line:274] - INFO: epoch 009: 2120 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7761.6, nsentences=120, sample_size=4148.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1927, ups=0.25, wpb=7761.6, bsz=120, num_updates=50370, lr=5.30857e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=206350 2023-05-03 11:52:57 - progress_bar.py[line:274] - INFO: epoch 009: 2130 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7631.3, nsentences=120, sample_size=4269.1, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1910, ups=0.25, wpb=7631.3, bsz=120, num_updates=50380, lr=5.30328e-06, gnorm=0.944, clip=20, loss_scale=128, train_wall=40, gb_free=31.2, wall=206390 2023-05-03 11:53:37 - progress_bar.py[line:274] - INFO: epoch 009: 2140 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7524.4, nsentences=120, sample_size=3927.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1886.1, ups=0.25, wpb=7524.4, bsz=120, num_updates=50390, lr=5.298e-06, gnorm=1.022, clip=60, loss_scale=128, train_wall=40, gb_free=30.6, wall=206430 2023-05-03 11:54:17 - progress_bar.py[line:274] - INFO: epoch 009: 2150 / 6042 loss=2.308, loss_v1=0, loss_v2=0, nll_loss=1.041, ntokens=7823.6, nsentences=120, sample_size=3759.8, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1960.8, ups=0.25, wpb=7823.6, bsz=120, num_updates=50400, lr=5.29272e-06, gnorm=1.017, clip=70, loss_scale=128, train_wall=40, gb_free=27.7, wall=206470 2023-05-03 11:54:21 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 11:55:01 - progress_bar.py[line:274] - INFO: epoch 009: 2161 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7702.8, nsentences=120, sample_size=3871.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1773.4, ups=0.23, wpb=7702.8, bsz=120, num_updates=50410, lr=5.28744e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=43, gb_free=29.8, wall=206513 2023-05-03 11:55:40 - progress_bar.py[line:274] - INFO: epoch 009: 2171 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7907.7, nsentences=120, sample_size=3989.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1979, ups=0.25, wpb=7907.7, bsz=120, num_updates=50420, lr=5.28216e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=206553 2023-05-03 11:56:21 - progress_bar.py[line:274] - INFO: epoch 009: 2181 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7724.3, nsentences=120, sample_size=3852.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1923.8, ups=0.25, wpb=7724.3, bsz=120, num_updates=50430, lr=5.27687e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=40, gb_free=30.5, wall=206593 2023-05-03 11:57:01 - progress_bar.py[line:274] - INFO: epoch 009: 2191 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7734, nsentences=120, sample_size=4064.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1934.7, ups=0.25, wpb=7734, bsz=120, num_updates=50440, lr=5.27159e-06, gnorm=0.999, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=206633 2023-05-03 11:57:41 - progress_bar.py[line:274] - INFO: epoch 009: 2201 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7442.7, nsentences=120, sample_size=4189.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1858.8, ups=0.25, wpb=7442.7, bsz=120, num_updates=50450, lr=5.26631e-06, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=206673 2023-05-03 11:58:20 - progress_bar.py[line:274] - INFO: epoch 009: 2211 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7607.9, nsentences=120, sample_size=4028.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1931.4, ups=0.25, wpb=7607.9, bsz=120, num_updates=50460, lr=5.26103e-06, gnorm=1.021, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=206713 2023-05-03 11:59:00 - progress_bar.py[line:274] - INFO: epoch 009: 2221 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7887.1, nsentences=120, sample_size=4268.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1966.8, ups=0.25, wpb=7887.1, bsz=120, num_updates=50470, lr=5.25574e-06, gnorm=0.939, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=206753 2023-05-03 11:59:40 - progress_bar.py[line:274] - INFO: epoch 009: 2231 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7942, nsentences=120, sample_size=3887.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1975.2, ups=0.25, wpb=7942, bsz=120, num_updates=50480, lr=5.25046e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=28.6, wall=206793 2023-05-03 12:00:20 - progress_bar.py[line:274] - INFO: epoch 009: 2241 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7607.5, nsentences=120, sample_size=4334.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1913.6, ups=0.25, wpb=7607.5, bsz=120, num_updates=50490, lr=5.24518e-06, gnorm=0.946, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=206833 2023-05-03 12:01:00 - progress_bar.py[line:274] - INFO: epoch 009: 2251 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7646.3, nsentences=120, sample_size=3870.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1927.8, ups=0.25, wpb=7646.3, bsz=120, num_updates=50500, lr=5.2399e-06, gnorm=0.994, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=206872 2023-05-03 12:01:40 - progress_bar.py[line:274] - INFO: epoch 009: 2261 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.152, ntokens=7978, nsentences=120, sample_size=4145.4, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1986.5, ups=0.25, wpb=7978, bsz=120, num_updates=50510, lr=5.23462e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=24.8, wall=206912 2023-05-03 12:02:20 - progress_bar.py[line:274] - INFO: epoch 009: 2271 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7615.2, nsentences=120, sample_size=4072, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1905.8, ups=0.25, wpb=7615.2, bsz=120, num_updates=50520, lr=5.22933e-06, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=206952 2023-05-03 12:03:00 - progress_bar.py[line:274] - INFO: epoch 009: 2281 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7847.2, nsentences=120, sample_size=4052.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1947.8, ups=0.25, wpb=7847.2, bsz=120, num_updates=50530, lr=5.22405e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=206993 2023-05-03 12:03:40 - progress_bar.py[line:274] - INFO: epoch 009: 2291 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7605.5, nsentences=120, sample_size=3843.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1911, ups=0.25, wpb=7605.5, bsz=120, num_updates=50540, lr=5.21877e-06, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=207032 2023-05-03 12:04:20 - progress_bar.py[line:274] - INFO: epoch 009: 2301 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7885.3, nsentences=120, sample_size=4018.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1967.1, ups=0.25, wpb=7885.3, bsz=120, num_updates=50550, lr=5.21349e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=29.1, wall=207073 2023-05-03 12:05:00 - progress_bar.py[line:274] - INFO: epoch 009: 2311 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7430.8, nsentences=120, sample_size=4245.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1858.2, ups=0.25, wpb=7430.8, bsz=120, num_updates=50560, lr=5.2082e-06, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=29.9, wall=207113 2023-05-03 12:05:40 - progress_bar.py[line:274] - INFO: epoch 009: 2321 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7757.3, nsentences=120, sample_size=3886.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1952.5, ups=0.25, wpb=7757.3, bsz=120, num_updates=50570, lr=5.20292e-06, gnorm=1.015, clip=70, loss_scale=64, train_wall=40, gb_free=29.4, wall=207152 2023-05-03 12:06:19 - progress_bar.py[line:274] - INFO: epoch 009: 2331 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7706.4, nsentences=120, sample_size=3880.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1944.2, ups=0.25, wpb=7706.4, bsz=120, num_updates=50580, lr=5.19764e-06, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=207192 2023-05-03 12:06:59 - progress_bar.py[line:274] - INFO: epoch 009: 2341 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7735.4, nsentences=120, sample_size=4120.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1945.4, ups=0.25, wpb=7735.4, bsz=120, num_updates=50590, lr=5.19236e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=207232 2023-05-03 12:07:39 - progress_bar.py[line:274] - INFO: epoch 009: 2351 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7698.5, nsentences=120, sample_size=3890.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1958.2, ups=0.25, wpb=7698.5, bsz=120, num_updates=50600, lr=5.18708e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=207271 2023-05-03 12:08:19 - progress_bar.py[line:274] - INFO: epoch 009: 2361 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7862.5, nsentences=120, sample_size=4125.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1963.9, ups=0.25, wpb=7862.5, bsz=120, num_updates=50610, lr=5.18179e-06, gnorm=0.986, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=207311 2023-05-03 12:08:58 - progress_bar.py[line:274] - INFO: epoch 009: 2371 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.055, ntokens=7835.1, nsentences=120, sample_size=3609.1, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1968.3, ups=0.25, wpb=7835.1, bsz=120, num_updates=50620, lr=5.17651e-06, gnorm=1.015, clip=60, loss_scale=64, train_wall=40, gb_free=29.5, wall=207351 2023-05-03 12:09:38 - progress_bar.py[line:274] - INFO: epoch 009: 2381 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7805.9, nsentences=120, sample_size=3840.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1992.4, ups=0.26, wpb=7805.9, bsz=120, num_updates=50630, lr=5.17123e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=28.7, wall=207390 2023-05-03 12:10:17 - progress_bar.py[line:274] - INFO: epoch 009: 2391 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.055, ntokens=7548.4, nsentences=120, sample_size=4403.8, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1902.9, ups=0.25, wpb=7548.4, bsz=120, num_updates=50640, lr=5.16595e-06, gnorm=0.942, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=207430 2023-05-03 12:10:58 - progress_bar.py[line:274] - INFO: epoch 009: 2401 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7625.2, nsentences=120, sample_size=4137.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1892.1, ups=0.25, wpb=7625.2, bsz=120, num_updates=50650, lr=5.16067e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=207470 2023-05-03 12:11:36 - progress_bar.py[line:274] - INFO: epoch 009: 2411 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7653.6, nsentences=120, sample_size=4339.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.9, ups=0.26, wpb=7653.6, bsz=120, num_updates=50660, lr=5.15538e-06, gnorm=0.943, clip=10, loss_scale=64, train_wall=39, gb_free=29.7, wall=207509 2023-05-03 12:12:16 - progress_bar.py[line:274] - INFO: epoch 009: 2421 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7454.2, nsentences=120, sample_size=4096.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1885, ups=0.25, wpb=7454.2, bsz=120, num_updates=50670, lr=5.1501e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=207548 2023-05-03 12:12:57 - progress_bar.py[line:274] - INFO: epoch 009: 2431 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7783.7, nsentences=120, sample_size=4241.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1916.3, ups=0.25, wpb=7783.7, bsz=120, num_updates=50680, lr=5.14482e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=41, gb_free=29.7, wall=207589 2023-05-03 12:13:37 - progress_bar.py[line:274] - INFO: epoch 009: 2441 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=8124.2, nsentences=120, sample_size=3747.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2008.5, ups=0.25, wpb=8124.2, bsz=120, num_updates=50690, lr=5.13954e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=207630 2023-05-03 12:14:17 - progress_bar.py[line:274] - INFO: epoch 009: 2451 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7979.1, nsentences=120, sample_size=3744.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1992.8, ups=0.25, wpb=7979.1, bsz=120, num_updates=50700, lr=5.13425e-06, gnorm=1.019, clip=70, loss_scale=64, train_wall=40, gb_free=31.3, wall=207670 2023-05-03 12:14:58 - progress_bar.py[line:274] - INFO: epoch 009: 2461 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7460.8, nsentences=120, sample_size=4003.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1831.9, ups=0.25, wpb=7460.8, bsz=120, num_updates=50710, lr=5.12897e-06, gnorm=1.002, clip=60, loss_scale=64, train_wall=41, gb_free=30.1, wall=207710 2023-05-03 12:15:37 - progress_bar.py[line:274] - INFO: epoch 009: 2471 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7678, nsentences=120, sample_size=4175.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1970.1, ups=0.26, wpb=7678, bsz=120, num_updates=50720, lr=5.12369e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=31, wall=207749 2023-05-03 12:16:17 - progress_bar.py[line:274] - INFO: epoch 009: 2481 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7706.1, nsentences=120, sample_size=4240.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1934.2, ups=0.25, wpb=7706.1, bsz=120, num_updates=50730, lr=5.11841e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=207789 2023-05-03 12:16:56 - progress_bar.py[line:274] - INFO: epoch 009: 2491 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.047, ntokens=7625, nsentences=120, sample_size=4291.4, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1936, ups=0.25, wpb=7625, bsz=120, num_updates=50740, lr=5.11313e-06, gnorm=0.945, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=207828 2023-05-03 12:17:36 - progress_bar.py[line:274] - INFO: epoch 009: 2501 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7673.2, nsentences=120, sample_size=4191.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1913, ups=0.25, wpb=7673.2, bsz=120, num_updates=50750, lr=5.10784e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=207869 2023-05-03 12:18:16 - progress_bar.py[line:274] - INFO: epoch 009: 2511 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7807.9, nsentences=120, sample_size=4120.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1965.2, ups=0.25, wpb=7807.9, bsz=120, num_updates=50760, lr=5.10256e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=207908 2023-05-03 12:18:55 - progress_bar.py[line:274] - INFO: epoch 009: 2521 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7813.3, nsentences=120, sample_size=3852.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1979.2, ups=0.25, wpb=7813.3, bsz=120, num_updates=50770, lr=5.09728e-06, gnorm=1.038, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=207948 2023-05-03 12:19:35 - progress_bar.py[line:274] - INFO: epoch 009: 2531 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7887.4, nsentences=120, sample_size=4075.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1992.3, ups=0.25, wpb=7887.4, bsz=120, num_updates=50780, lr=5.092e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=207987 2023-05-03 12:20:15 - progress_bar.py[line:274] - INFO: epoch 009: 2541 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7704.9, nsentences=120, sample_size=4179.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1928.5, ups=0.25, wpb=7704.9, bsz=120, num_updates=50790, lr=5.08672e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=208027 2023-05-03 12:20:55 - progress_bar.py[line:274] - INFO: epoch 009: 2551 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7741.5, nsentences=120, sample_size=3729.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.3, ups=0.25, wpb=7741.5, bsz=120, num_updates=50800, lr=5.08143e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=208068 2023-05-03 12:21:35 - progress_bar.py[line:274] - INFO: epoch 009: 2561 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7937.2, nsentences=120, sample_size=3749.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1967.1, ups=0.25, wpb=7937.2, bsz=120, num_updates=50810, lr=5.07615e-06, gnorm=0.99, clip=60, loss_scale=64, train_wall=40, gb_free=29.4, wall=208108 2023-05-03 12:22:15 - progress_bar.py[line:274] - INFO: epoch 009: 2571 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7326, nsentences=120, sample_size=4221.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1856.5, ups=0.25, wpb=7326, bsz=120, num_updates=50820, lr=5.07087e-06, gnorm=1.002, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=208147 2023-05-03 12:22:55 - progress_bar.py[line:274] - INFO: epoch 009: 2581 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7715.5, nsentences=120, sample_size=4249.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1937.8, ups=0.25, wpb=7715.5, bsz=120, num_updates=50830, lr=5.06559e-06, gnorm=0.938, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=208187 2023-05-03 12:23:34 - progress_bar.py[line:274] - INFO: epoch 009: 2591 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7795.3, nsentences=120, sample_size=3907, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1975.9, ups=0.25, wpb=7795.3, bsz=120, num_updates=50840, lr=5.0603e-06, gnorm=1.035, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=208227 2023-05-03 12:24:15 - progress_bar.py[line:274] - INFO: epoch 009: 2601 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7565.6, nsentences=120, sample_size=4246, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1875.5, ups=0.25, wpb=7565.6, bsz=120, num_updates=50850, lr=5.05502e-06, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=208267 2023-05-03 12:24:55 - progress_bar.py[line:274] - INFO: epoch 009: 2611 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7869.4, nsentences=120, sample_size=4006.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1961.2, ups=0.25, wpb=7869.4, bsz=120, num_updates=50860, lr=5.04974e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=208307 2023-05-03 12:25:34 - progress_bar.py[line:274] - INFO: epoch 009: 2621 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.055, ntokens=7611.9, nsentences=120, sample_size=4048.2, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1913.9, ups=0.25, wpb=7611.9, bsz=120, num_updates=50870, lr=5.04446e-06, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=208347 2023-05-03 12:26:15 - progress_bar.py[line:274] - INFO: epoch 009: 2631 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7606.4, nsentences=120, sample_size=4266.9, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1872.1, ups=0.25, wpb=7606.4, bsz=120, num_updates=50880, lr=5.03918e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=41, gb_free=30.1, wall=208388 2023-05-03 12:26:55 - progress_bar.py[line:274] - INFO: epoch 009: 2641 / 6042 loss=2.319, loss_v1=0, loss_v2=0, nll_loss=1.054, ntokens=7443.7, nsentences=120, sample_size=4132.7, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1867.8, ups=0.25, wpb=7443.7, bsz=120, num_updates=50890, lr=5.03389e-06, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=208427 2023-05-03 12:27:34 - progress_bar.py[line:274] - INFO: epoch 009: 2651 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7529.4, nsentences=120, sample_size=3883.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.7, ups=0.25, wpb=7529.4, bsz=120, num_updates=50900, lr=5.02861e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=208467 2023-05-03 12:28:14 - progress_bar.py[line:274] - INFO: epoch 009: 2661 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7655, nsentences=120, sample_size=4063.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1917, ups=0.25, wpb=7655, bsz=120, num_updates=50910, lr=5.02333e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=208507 2023-05-03 12:28:54 - progress_bar.py[line:274] - INFO: epoch 009: 2671 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7677.6, nsentences=120, sample_size=3835.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1930, ups=0.25, wpb=7677.6, bsz=120, num_updates=50920, lr=5.01805e-06, gnorm=1.021, clip=70, loss_scale=128, train_wall=40, gb_free=29.1, wall=208547 2023-05-03 12:29:34 - progress_bar.py[line:274] - INFO: epoch 009: 2681 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7660.5, nsentences=120, sample_size=4045.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1947.3, ups=0.25, wpb=7660.5, bsz=120, num_updates=50930, lr=5.01277e-06, gnorm=0.978, clip=30, loss_scale=128, train_wall=39, gb_free=29.9, wall=208586 2023-05-03 12:30:13 - progress_bar.py[line:274] - INFO: epoch 009: 2691 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7811.9, nsentences=120, sample_size=4165.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1988.9, ups=0.25, wpb=7811.9, bsz=120, num_updates=50940, lr=5.00748e-06, gnorm=0.962, clip=20, loss_scale=128, train_wall=39, gb_free=30.5, wall=208625 2023-05-03 12:30:53 - progress_bar.py[line:274] - INFO: epoch 009: 2701 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7848.1, nsentences=120, sample_size=3970.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1950.5, ups=0.25, wpb=7848.1, bsz=120, num_updates=50950, lr=5.0022e-06, gnorm=0.984, clip=50, loss_scale=128, train_wall=40, gb_free=30.5, wall=208666 2023-05-03 12:31:33 - progress_bar.py[line:274] - INFO: epoch 009: 2711 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7848.9, nsentences=120, sample_size=3938.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1957.7, ups=0.25, wpb=7848.9, bsz=120, num_updates=50960, lr=4.99692e-06, gnorm=1.012, clip=40, loss_scale=128, train_wall=40, gb_free=30.3, wall=208706 2023-05-03 12:32:12 - progress_bar.py[line:274] - INFO: epoch 009: 2721 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7798.8, nsentences=120, sample_size=3811.3, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1993.5, ups=0.26, wpb=7798.8, bsz=120, num_updates=50970, lr=4.99164e-06, gnorm=1.019, clip=50, loss_scale=128, train_wall=39, gb_free=28.9, wall=208745 2023-05-03 12:32:52 - progress_bar.py[line:274] - INFO: epoch 009: 2731 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7560.3, nsentences=119.2, sample_size=4200.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1920.6, ups=0.25, wpb=7560.3, bsz=119.2, num_updates=50980, lr=4.98635e-06, gnorm=0.953, clip=20, loss_scale=128, train_wall=39, gb_free=29.8, wall=208784 2023-05-03 12:33:32 - progress_bar.py[line:274] - INFO: epoch 009: 2741 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7800.1, nsentences=120, sample_size=3839.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1950.4, ups=0.25, wpb=7800.1, bsz=120, num_updates=50990, lr=4.98107e-06, gnorm=0.997, clip=40, loss_scale=128, train_wall=40, gb_free=30.9, wall=208824 2023-05-03 12:34:12 - progress_bar.py[line:274] - INFO: epoch 009: 2751 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7495.3, nsentences=120, sample_size=4015.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1878.3, ups=0.25, wpb=7495.3, bsz=120, num_updates=51000, lr=4.97579e-06, gnorm=0.991, clip=40, loss_scale=128, train_wall=40, gb_free=29.7, wall=208864 2023-05-03 12:34:12 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 12:34:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 12:34:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:34:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 12:34:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:34:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 12:34:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:34:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:53 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 12:34:53 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:34:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:58 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 12:34:58 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:34:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:34:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:34:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:35:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:35:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:35:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:35:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:35:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 12:35:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 12:35:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 12:35:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 12:35:03 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.25 | loss_v1 0 | loss_v2 0 | nll_loss 2.084 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.24 | score 0.7544 | wps 3306.2 | wpb 3202.1 | bsz 39.4 | num_updates 51000 | best_score 0.7627 2023-05-03 12:35:03 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 51000 updates 2023-05-03 12:35:03 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_51000.pt 2023-05-03 12:35:27 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_51000.pt 2023-05-03 12:35:41 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_51000.pt (epoch 9 @ 51000 updates, score 0.7544) (writing took 38.18549222406 seconds) 2023-05-03 12:36:21 - progress_bar.py[line:274] - INFO: epoch 009: 2761 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7740.1, nsentences=120, sample_size=3866.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=599.4, ups=0.08, wpb=7740.1, bsz=120, num_updates=51010, lr=4.97051e-06, gnorm=0.985, clip=40, loss_scale=128, train_wall=40, gb_free=30.7, wall=208993 2023-05-03 12:37:00 - progress_bar.py[line:274] - INFO: epoch 009: 2771 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=8007.6, nsentences=120, sample_size=4204.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2031.9, ups=0.25, wpb=8007.6, bsz=120, num_updates=51020, lr=4.96523e-06, gnorm=0.955, clip=20, loss_scale=128, train_wall=39, gb_free=24.8, wall=209033 2023-05-03 12:37:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 12:37:44 - progress_bar.py[line:274] - INFO: epoch 009: 2782 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7536.7, nsentences=120, sample_size=4123.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1713.4, ups=0.23, wpb=7536.7, bsz=120, num_updates=51030, lr=4.95994e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=44, gb_free=29.8, wall=209077 2023-05-03 12:38:24 - progress_bar.py[line:274] - INFO: epoch 009: 2792 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7635.2, nsentences=120, sample_size=4053.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1903.3, ups=0.25, wpb=7635.2, bsz=120, num_updates=51040, lr=4.95466e-06, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=31.4, wall=209117 2023-05-03 12:39:03 - progress_bar.py[line:274] - INFO: epoch 009: 2802 / 6042 loss=2.392, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7513.2, nsentences=120, sample_size=4177.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1924.2, ups=0.26, wpb=7513.2, bsz=120, num_updates=51050, lr=4.94938e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=209156 2023-05-03 12:39:44 - progress_bar.py[line:274] - INFO: epoch 009: 2812 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7794.8, nsentences=120, sample_size=4005.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1902.6, ups=0.24, wpb=7794.8, bsz=120, num_updates=51060, lr=4.9441e-06, gnorm=1.002, clip=40, loss_scale=64, train_wall=41, gb_free=30.1, wall=209197 2023-05-03 12:40:24 - progress_bar.py[line:274] - INFO: epoch 009: 2822 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7587.6, nsentences=120, sample_size=3799.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1923.7, ups=0.25, wpb=7587.6, bsz=120, num_updates=51070, lr=4.93882e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=39, gb_free=29.5, wall=209236 2023-05-03 12:41:03 - progress_bar.py[line:274] - INFO: epoch 009: 2832 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.057, ntokens=7819.4, nsentences=120, sample_size=4094, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1978.5, ups=0.25, wpb=7819.4, bsz=120, num_updates=51080, lr=4.93353e-06, gnorm=0.978, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=209276 2023-05-03 12:41:43 - progress_bar.py[line:274] - INFO: epoch 009: 2842 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7637.1, nsentences=120, sample_size=3775, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1906.6, ups=0.25, wpb=7637.1, bsz=120, num_updates=51090, lr=4.92825e-06, gnorm=1.017, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=209316 2023-05-03 12:42:22 - progress_bar.py[line:274] - INFO: epoch 009: 2852 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7699.4, nsentences=120, sample_size=4216.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1961.5, ups=0.25, wpb=7699.4, bsz=120, num_updates=51100, lr=4.92297e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=209355 2023-05-03 12:43:01 - progress_bar.py[line:274] - INFO: epoch 009: 2862 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7454.2, nsentences=120, sample_size=3774.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1929.6, ups=0.26, wpb=7454.2, bsz=120, num_updates=51110, lr=4.91769e-06, gnorm=1.045, clip=80, loss_scale=64, train_wall=39, gb_free=30.1, wall=209394 2023-05-03 12:43:40 - progress_bar.py[line:274] - INFO: epoch 009: 2872 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7938.6, nsentences=120, sample_size=4147.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2018.6, ups=0.25, wpb=7938.6, bsz=120, num_updates=51120, lr=4.9124e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=39, gb_free=29.2, wall=209433 2023-05-03 12:44:20 - progress_bar.py[line:274] - INFO: epoch 009: 2882 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7794, nsentences=120, sample_size=4219.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1962.7, ups=0.25, wpb=7794, bsz=120, num_updates=51130, lr=4.90712e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=209473 2023-05-03 12:45:00 - progress_bar.py[line:274] - INFO: epoch 009: 2892 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7594.1, nsentences=120, sample_size=4053.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1896.1, ups=0.25, wpb=7594.1, bsz=120, num_updates=51140, lr=4.90184e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=209513 2023-05-03 12:45:39 - progress_bar.py[line:274] - INFO: epoch 009: 2902 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7768.2, nsentences=120, sample_size=4053.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1998.1, ups=0.26, wpb=7768.2, bsz=120, num_updates=51150, lr=4.89656e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=209552 2023-05-03 12:46:18 - progress_bar.py[line:274] - INFO: epoch 009: 2912 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7597.8, nsentences=120, sample_size=3905.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1957, ups=0.26, wpb=7597.8, bsz=120, num_updates=51160, lr=4.89128e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=39, gb_free=29.4, wall=209590 2023-05-03 12:46:57 - progress_bar.py[line:274] - INFO: epoch 009: 2922 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7507.2, nsentences=120, sample_size=3660.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1910.3, ups=0.25, wpb=7507.2, bsz=120, num_updates=51170, lr=4.88599e-06, gnorm=1.045, clip=70, loss_scale=64, train_wall=39, gb_free=29.7, wall=209630 2023-05-03 12:47:38 - progress_bar.py[line:274] - INFO: epoch 009: 2932 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7970.9, nsentences=120, sample_size=4014.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1962, ups=0.25, wpb=7970.9, bsz=120, num_updates=51180, lr=4.88071e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=41, gb_free=29.8, wall=209670 2023-05-03 12:48:18 - progress_bar.py[line:274] - INFO: epoch 009: 2942 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=8218.3, nsentences=120, sample_size=3834.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=2061.6, ups=0.25, wpb=8218.3, bsz=120, num_updates=51190, lr=4.87543e-06, gnorm=0.988, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=209710 2023-05-03 12:48:58 - progress_bar.py[line:274] - INFO: epoch 009: 2952 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7884.3, nsentences=120, sample_size=3963.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1970.5, ups=0.25, wpb=7884.3, bsz=120, num_updates=51200, lr=4.87015e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=209750 2023-05-03 12:49:38 - progress_bar.py[line:274] - INFO: epoch 009: 2962 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7844.4, nsentences=120, sample_size=3947.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1962.7, ups=0.25, wpb=7844.4, bsz=120, num_updates=51210, lr=4.86486e-06, gnorm=0.96, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=209790 2023-05-03 12:50:18 - progress_bar.py[line:274] - INFO: epoch 009: 2972 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7639, nsentences=120, sample_size=3785.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916.6, ups=0.25, wpb=7639, bsz=120, num_updates=51220, lr=4.85958e-06, gnorm=1.025, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=209830 2023-05-03 12:50:57 - progress_bar.py[line:274] - INFO: epoch 009: 2982 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7744.9, nsentences=120, sample_size=3838.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1962.6, ups=0.25, wpb=7744.9, bsz=120, num_updates=51230, lr=4.8543e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=39, gb_free=31, wall=209869 2023-05-03 12:51:36 - progress_bar.py[line:274] - INFO: epoch 009: 2992 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7546.2, nsentences=120, sample_size=4113.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1921.6, ups=0.25, wpb=7546.2, bsz=120, num_updates=51240, lr=4.84902e-06, gnorm=0.996, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=209909 2023-05-03 12:52:16 - progress_bar.py[line:274] - INFO: epoch 009: 3002 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7762.1, nsentences=120, sample_size=3946.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1949.2, ups=0.25, wpb=7762.1, bsz=120, num_updates=51250, lr=4.84374e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=209949 2023-05-03 12:52:55 - progress_bar.py[line:274] - INFO: epoch 009: 3012 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7884.1, nsentences=120, sample_size=3681.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2020.4, ups=0.26, wpb=7884.1, bsz=120, num_updates=51260, lr=4.83845e-06, gnorm=0.997, clip=40, loss_scale=64, train_wall=39, gb_free=29.3, wall=209988 2023-05-03 12:53:36 - progress_bar.py[line:274] - INFO: epoch 009: 3022 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7425.3, nsentences=120, sample_size=4272.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1830, ups=0.25, wpb=7425.3, bsz=120, num_updates=51270, lr=4.83317e-06, gnorm=0.957, clip=30, loss_scale=64, train_wall=41, gb_free=30.4, wall=210028 2023-05-03 12:53:40 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 12:54:18 - progress_bar.py[line:274] - INFO: epoch 009: 3033 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7408.7, nsentences=120, sample_size=3967.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1741.6, ups=0.24, wpb=7408.7, bsz=120, num_updates=51280, lr=4.82789e-06, gnorm=0.982, clip=50, loss_scale=32, train_wall=42, gb_free=30.1, wall=210071 2023-05-03 12:54:58 - progress_bar.py[line:274] - INFO: epoch 009: 3043 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7667.4, nsentences=120, sample_size=4104.1, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1949.6, ups=0.25, wpb=7667.4, bsz=120, num_updates=51290, lr=4.82261e-06, gnorm=0.992, clip=40, loss_scale=32, train_wall=39, gb_free=31, wall=210110 2023-05-03 12:55:37 - progress_bar.py[line:274] - INFO: epoch 009: 3053 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7707.7, nsentences=120, sample_size=4245.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1962.3, ups=0.25, wpb=7707.7, bsz=120, num_updates=51300, lr=4.81733e-06, gnorm=0.954, clip=20, loss_scale=32, train_wall=39, gb_free=26.1, wall=210149 2023-05-03 12:56:17 - progress_bar.py[line:274] - INFO: epoch 009: 3063 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7714.9, nsentences=120, sample_size=3876.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1930.7, ups=0.25, wpb=7714.9, bsz=120, num_updates=51310, lr=4.81204e-06, gnorm=1.003, clip=60, loss_scale=32, train_wall=40, gb_free=30.8, wall=210189 2023-05-03 12:56:57 - progress_bar.py[line:274] - INFO: epoch 009: 3073 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7382.8, nsentences=120, sample_size=4300.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1844.2, ups=0.25, wpb=7382.8, bsz=120, num_updates=51320, lr=4.80676e-06, gnorm=0.987, clip=50, loss_scale=32, train_wall=40, gb_free=29.7, wall=210229 2023-05-03 12:57:37 - progress_bar.py[line:274] - INFO: epoch 009: 3083 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7669.1, nsentences=120, sample_size=3733, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1925.2, ups=0.25, wpb=7669.1, bsz=120, num_updates=51330, lr=4.80148e-06, gnorm=1.048, clip=70, loss_scale=32, train_wall=40, gb_free=30.1, wall=210269 2023-05-03 12:58:16 - progress_bar.py[line:274] - INFO: epoch 009: 3093 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7778.4, nsentences=120, sample_size=3756.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1986, ups=0.26, wpb=7778.4, bsz=120, num_updates=51340, lr=4.7962e-06, gnorm=1.037, clip=80, loss_scale=32, train_wall=39, gb_free=28.7, wall=210308 2023-05-03 12:58:55 - progress_bar.py[line:274] - INFO: epoch 009: 3103 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7609.1, nsentences=120, sample_size=3645.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1943.7, ups=0.26, wpb=7609.1, bsz=120, num_updates=51350, lr=4.79091e-06, gnorm=1.026, clip=60, loss_scale=32, train_wall=39, gb_free=29.8, wall=210347 2023-05-03 12:59:34 - progress_bar.py[line:274] - INFO: epoch 009: 3113 / 6042 loss=2.412, loss_v1=0, loss_v2=0, nll_loss=1.17, ntokens=7735.7, nsentences=120, sample_size=3971.7, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=1970.2, ups=0.25, wpb=7735.7, bsz=120, num_updates=51360, lr=4.78563e-06, gnorm=1.011, clip=70, loss_scale=32, train_wall=39, gb_free=30.9, wall=210387 2023-05-03 13:00:14 - progress_bar.py[line:274] - INFO: epoch 009: 3123 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7602.8, nsentences=120, sample_size=4292.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1928.7, ups=0.25, wpb=7602.8, bsz=120, num_updates=51370, lr=4.78035e-06, gnorm=0.974, clip=40, loss_scale=32, train_wall=39, gb_free=28.7, wall=210426 2023-05-03 13:00:53 - progress_bar.py[line:274] - INFO: epoch 009: 3133 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7355.1, nsentences=120, sample_size=4290.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1878.6, ups=0.26, wpb=7355.1, bsz=120, num_updates=51380, lr=4.77507e-06, gnorm=0.974, clip=30, loss_scale=32, train_wall=39, gb_free=30.9, wall=210465 2023-05-03 13:01:34 - progress_bar.py[line:274] - INFO: epoch 009: 3143 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7878.6, nsentences=120, sample_size=3801.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.6, ups=0.25, wpb=7878.6, bsz=120, num_updates=51390, lr=4.76979e-06, gnorm=1.008, clip=50, loss_scale=32, train_wall=41, gb_free=29.6, wall=210506 2023-05-03 13:02:14 - progress_bar.py[line:274] - INFO: epoch 009: 3153 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7801.5, nsentences=120, sample_size=3988.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1953.8, ups=0.25, wpb=7801.5, bsz=120, num_updates=51400, lr=4.7645e-06, gnorm=1.015, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=210546 2023-05-03 13:02:53 - progress_bar.py[line:274] - INFO: epoch 009: 3163 / 6042 loss=2.319, loss_v1=0, loss_v2=0, nll_loss=1.057, ntokens=8059.2, nsentences=120, sample_size=3659.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=2022.8, ups=0.25, wpb=8059.2, bsz=120, num_updates=51410, lr=4.75922e-06, gnorm=1.023, clip=80, loss_scale=32, train_wall=40, gb_free=27.6, wall=210586 2023-05-03 13:03:33 - progress_bar.py[line:274] - INFO: epoch 009: 3173 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7528.8, nsentences=120, sample_size=4106, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1918.7, ups=0.25, wpb=7528.8, bsz=120, num_updates=51420, lr=4.75394e-06, gnorm=0.971, clip=40, loss_scale=32, train_wall=39, gb_free=29.9, wall=210625 2023-05-03 13:04:14 - progress_bar.py[line:274] - INFO: epoch 009: 3183 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7820.6, nsentences=120, sample_size=4179.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1902.5, ups=0.24, wpb=7820.6, bsz=120, num_updates=51430, lr=4.74866e-06, gnorm=0.964, clip=40, loss_scale=32, train_wall=41, gb_free=29.7, wall=210666 2023-05-03 13:04:54 - progress_bar.py[line:274] - INFO: epoch 009: 3193 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7660.1, nsentences=120, sample_size=3959.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1925.3, ups=0.25, wpb=7660.1, bsz=120, num_updates=51440, lr=4.74338e-06, gnorm=0.996, clip=40, loss_scale=32, train_wall=40, gb_free=29.3, wall=210706 2023-05-03 13:05:33 - progress_bar.py[line:274] - INFO: epoch 009: 3203 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7814.2, nsentences=120, sample_size=3808.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1957, ups=0.25, wpb=7814.2, bsz=120, num_updates=51450, lr=4.73809e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=40, gb_free=31.4, wall=210746 2023-05-03 13:06:13 - progress_bar.py[line:274] - INFO: epoch 009: 3213 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7714.9, nsentences=120, sample_size=4138.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1943.9, ups=0.25, wpb=7714.9, bsz=120, num_updates=51460, lr=4.73281e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=210786 2023-05-03 13:06:53 - progress_bar.py[line:274] - INFO: epoch 009: 3223 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7974, nsentences=120, sample_size=4013.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1985.8, ups=0.25, wpb=7974, bsz=120, num_updates=51470, lr=4.72753e-06, gnorm=0.948, clip=20, loss_scale=32, train_wall=40, gb_free=28.2, wall=210826 2023-05-03 13:07:34 - progress_bar.py[line:274] - INFO: epoch 009: 3233 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7628.9, nsentences=120, sample_size=4132.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1893.8, ups=0.25, wpb=7628.9, bsz=120, num_updates=51480, lr=4.72225e-06, gnorm=0.978, clip=40, loss_scale=32, train_wall=40, gb_free=30.3, wall=210866 2023-05-03 13:08:13 - progress_bar.py[line:274] - INFO: epoch 009: 3243 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7809.3, nsentences=120, sample_size=4064.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1969.6, ups=0.25, wpb=7809.3, bsz=120, num_updates=51490, lr=4.71696e-06, gnorm=0.984, clip=40, loss_scale=32, train_wall=40, gb_free=28.2, wall=210906 2023-05-03 13:08:53 - progress_bar.py[line:274] - INFO: epoch 009: 3253 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7544, nsentences=120, sample_size=3776.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1906.9, ups=0.25, wpb=7544, bsz=120, num_updates=51500, lr=4.71168e-06, gnorm=1.006, clip=70, loss_scale=32, train_wall=39, gb_free=28.3, wall=210945 2023-05-03 13:09:32 - progress_bar.py[line:274] - INFO: epoch 009: 3263 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7497.4, nsentences=120, sample_size=4017.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1889, ups=0.25, wpb=7497.4, bsz=120, num_updates=51510, lr=4.7064e-06, gnorm=0.98, clip=30, loss_scale=32, train_wall=40, gb_free=29.7, wall=210985 2023-05-03 13:10:12 - progress_bar.py[line:274] - INFO: epoch 009: 3273 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7872.8, nsentences=120, sample_size=3914.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1994.8, ups=0.25, wpb=7872.8, bsz=120, num_updates=51520, lr=4.70112e-06, gnorm=0.99, clip=40, loss_scale=32, train_wall=39, gb_free=31.4, wall=211024 2023-05-03 13:10:52 - progress_bar.py[line:274] - INFO: epoch 009: 3283 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.144, ntokens=7708.8, nsentences=120, sample_size=3929.7, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1920.3, ups=0.25, wpb=7708.8, bsz=120, num_updates=51530, lr=4.69584e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=29.8, wall=211065 2023-05-03 13:11:32 - progress_bar.py[line:274] - INFO: epoch 009: 3293 / 6042 loss=2.305, loss_v1=0, loss_v2=0, nll_loss=1.035, ntokens=7378.5, nsentences=120, sample_size=3975.5, sample_size_v1=0, sample_size_v2=0, ppl=2.05, wps=1857.8, ups=0.25, wpb=7378.5, bsz=120, num_updates=51540, lr=4.69055e-06, gnorm=0.994, clip=50, loss_scale=32, train_wall=40, gb_free=30.3, wall=211104 2023-05-03 13:12:12 - progress_bar.py[line:274] - INFO: epoch 009: 3303 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7759.6, nsentences=120, sample_size=4191, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1952.5, ups=0.25, wpb=7759.6, bsz=120, num_updates=51550, lr=4.68527e-06, gnorm=0.955, clip=20, loss_scale=32, train_wall=40, gb_free=27, wall=211144 2023-05-03 13:12:51 - progress_bar.py[line:274] - INFO: epoch 009: 3313 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.048, ntokens=7436.2, nsentences=120, sample_size=4053.6, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1903.5, ups=0.26, wpb=7436.2, bsz=120, num_updates=51560, lr=4.67999e-06, gnorm=0.981, clip=40, loss_scale=32, train_wall=39, gb_free=29.8, wall=211183 2023-05-03 13:13:31 - progress_bar.py[line:274] - INFO: epoch 009: 3323 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7728.1, nsentences=120, sample_size=4076.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1934.9, ups=0.25, wpb=7728.1, bsz=120, num_updates=51570, lr=4.67471e-06, gnorm=0.969, clip=40, loss_scale=32, train_wall=40, gb_free=30.1, wall=211223 2023-05-03 13:14:11 - progress_bar.py[line:274] - INFO: epoch 009: 3333 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7559.7, nsentences=120, sample_size=3890.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1869.6, ups=0.25, wpb=7559.7, bsz=120, num_updates=51580, lr=4.66943e-06, gnorm=0.998, clip=60, loss_scale=32, train_wall=40, gb_free=28.9, wall=211263 2023-05-03 13:14:51 - progress_bar.py[line:274] - INFO: epoch 009: 3343 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7774.6, nsentences=120, sample_size=4006.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1951.7, ups=0.25, wpb=7774.6, bsz=120, num_updates=51590, lr=4.66414e-06, gnorm=0.981, clip=30, loss_scale=32, train_wall=40, gb_free=30.4, wall=211303 2023-05-03 13:15:30 - progress_bar.py[line:274] - INFO: epoch 009: 3353 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7903.4, nsentences=120, sample_size=3963.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2013.1, ups=0.25, wpb=7903.4, bsz=120, num_updates=51600, lr=4.65886e-06, gnorm=1.001, clip=50, loss_scale=32, train_wall=39, gb_free=31.4, wall=211343 2023-05-03 13:16:10 - progress_bar.py[line:274] - INFO: epoch 009: 3363 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7848.2, nsentences=120, sample_size=3866.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1962.7, ups=0.25, wpb=7848.2, bsz=120, num_updates=51610, lr=4.65358e-06, gnorm=1.006, clip=40, loss_scale=32, train_wall=40, gb_free=29.5, wall=211383 2023-05-03 13:16:50 - progress_bar.py[line:274] - INFO: epoch 009: 3373 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7693.2, nsentences=120, sample_size=3985.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1943.8, ups=0.25, wpb=7693.2, bsz=120, num_updates=51620, lr=4.6483e-06, gnorm=1.012, clip=50, loss_scale=32, train_wall=40, gb_free=30.3, wall=211422 2023-05-03 13:17:30 - progress_bar.py[line:274] - INFO: epoch 009: 3383 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7806.4, nsentences=120, sample_size=4085.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1940.8, ups=0.25, wpb=7806.4, bsz=120, num_updates=51630, lr=4.64301e-06, gnorm=0.987, clip=50, loss_scale=32, train_wall=40, gb_free=30.6, wall=211462 2023-05-03 13:18:09 - progress_bar.py[line:274] - INFO: epoch 009: 3393 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7624, nsentences=120, sample_size=3828.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1939.3, ups=0.25, wpb=7624, bsz=120, num_updates=51640, lr=4.63773e-06, gnorm=0.999, clip=40, loss_scale=32, train_wall=39, gb_free=27.1, wall=211502 2023-05-03 13:18:50 - progress_bar.py[line:274] - INFO: epoch 009: 3403 / 6042 loss=2.414, loss_v1=0, loss_v2=0, nll_loss=1.161, ntokens=7713.8, nsentences=120, sample_size=3842.4, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1905.4, ups=0.25, wpb=7713.8, bsz=120, num_updates=51650, lr=4.63245e-06, gnorm=1.004, clip=50, loss_scale=32, train_wall=40, gb_free=30.4, wall=211542 2023-05-03 13:19:29 - progress_bar.py[line:274] - INFO: epoch 009: 3413 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7448.5, nsentences=120, sample_size=4233.7, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1878.1, ups=0.25, wpb=7448.5, bsz=120, num_updates=51660, lr=4.62717e-06, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=30, wall=211582 2023-05-03 13:20:10 - progress_bar.py[line:274] - INFO: epoch 009: 3423 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7443.5, nsentences=120, sample_size=4219.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1828.1, ups=0.25, wpb=7443.5, bsz=120, num_updates=51670, lr=4.62189e-06, gnorm=0.973, clip=20, loss_scale=32, train_wall=41, gb_free=28.4, wall=211623 2023-05-03 13:20:50 - progress_bar.py[line:274] - INFO: epoch 009: 3433 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7686, nsentences=120, sample_size=4158.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914.5, ups=0.25, wpb=7686, bsz=120, num_updates=51680, lr=4.6166e-06, gnorm=0.962, clip=20, loss_scale=32, train_wall=40, gb_free=29.8, wall=211663 2023-05-03 13:21:30 - progress_bar.py[line:274] - INFO: epoch 009: 3443 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7885.5, nsentences=120, sample_size=4124.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1970.6, ups=0.25, wpb=7885.5, bsz=120, num_updates=51690, lr=4.61132e-06, gnorm=0.973, clip=20, loss_scale=32, train_wall=40, gb_free=29.9, wall=211703 2023-05-03 13:22:10 - progress_bar.py[line:274] - INFO: epoch 009: 3453 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7641.6, nsentences=120, sample_size=3884.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1909.7, ups=0.25, wpb=7641.6, bsz=120, num_updates=51700, lr=4.60604e-06, gnorm=1.022, clip=60, loss_scale=32, train_wall=40, gb_free=29.9, wall=211743 2023-05-03 13:22:50 - progress_bar.py[line:274] - INFO: epoch 009: 3463 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8100.9, nsentences=120, sample_size=3949.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2029.2, ups=0.25, wpb=8100.9, bsz=120, num_updates=51710, lr=4.60076e-06, gnorm=1.04, clip=70, loss_scale=32, train_wall=40, gb_free=27.2, wall=211783 2023-05-03 13:23:31 - progress_bar.py[line:274] - INFO: epoch 009: 3473 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7903.2, nsentences=120, sample_size=4363.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1937.5, ups=0.25, wpb=7903.2, bsz=120, num_updates=51720, lr=4.59547e-06, gnorm=0.942, clip=30, loss_scale=32, train_wall=41, gb_free=28, wall=211823 2023-05-03 13:24:10 - progress_bar.py[line:274] - INFO: epoch 009: 3483 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7384.1, nsentences=120, sample_size=4249.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1879.7, ups=0.25, wpb=7384.1, bsz=120, num_updates=51730, lr=4.59019e-06, gnorm=0.995, clip=50, loss_scale=32, train_wall=39, gb_free=30.2, wall=211863 2023-05-03 13:24:49 - progress_bar.py[line:274] - INFO: epoch 009: 3493 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7477.1, nsentences=120, sample_size=3917.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1950.1, ups=0.26, wpb=7477.1, bsz=120, num_updates=51740, lr=4.58491e-06, gnorm=1.007, clip=50, loss_scale=32, train_wall=38, gb_free=30.2, wall=211901 2023-05-03 13:25:28 - progress_bar.py[line:274] - INFO: epoch 009: 3503 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7855.9, nsentences=120, sample_size=4229.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1982.6, ups=0.25, wpb=7855.9, bsz=120, num_updates=51750, lr=4.57963e-06, gnorm=0.96, clip=10, loss_scale=32, train_wall=40, gb_free=30.1, wall=211941 2023-05-03 13:26:08 - progress_bar.py[line:274] - INFO: epoch 009: 3513 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7739.1, nsentences=120, sample_size=4122.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1956.9, ups=0.25, wpb=7739.1, bsz=120, num_updates=51760, lr=4.57435e-06, gnorm=0.983, clip=40, loss_scale=32, train_wall=39, gb_free=30.6, wall=211980 2023-05-03 13:26:47 - progress_bar.py[line:274] - INFO: epoch 009: 3523 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7857.3, nsentences=120, sample_size=3935.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1981.2, ups=0.25, wpb=7857.3, bsz=120, num_updates=51770, lr=4.56906e-06, gnorm=0.991, clip=50, loss_scale=32, train_wall=40, gb_free=30.4, wall=212020 2023-05-03 13:27:28 - progress_bar.py[line:274] - INFO: epoch 009: 3533 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7666.9, nsentences=120, sample_size=4082.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1879.9, ups=0.25, wpb=7666.9, bsz=120, num_updates=51780, lr=4.56378e-06, gnorm=0.947, clip=10, loss_scale=32, train_wall=41, gb_free=29.1, wall=212061 2023-05-03 13:28:08 - progress_bar.py[line:274] - INFO: epoch 009: 3543 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7557.5, nsentences=120, sample_size=4052.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1881.7, ups=0.25, wpb=7557.5, bsz=120, num_updates=51790, lr=4.5585e-06, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=212101 2023-05-03 13:28:48 - progress_bar.py[line:274] - INFO: epoch 009: 3553 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7894.7, nsentences=120, sample_size=3902, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1972.9, ups=0.25, wpb=7894.7, bsz=120, num_updates=51800, lr=4.55322e-06, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=26.5, wall=212141 2023-05-03 13:29:29 - progress_bar.py[line:274] - INFO: epoch 009: 3563 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7548.4, nsentences=120, sample_size=4257.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1873.3, ups=0.25, wpb=7548.4, bsz=120, num_updates=51810, lr=4.54794e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=212181 2023-05-03 13:30:09 - progress_bar.py[line:274] - INFO: epoch 009: 3573 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7986, nsentences=120, sample_size=4140.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2005.3, ups=0.25, wpb=7986, bsz=120, num_updates=51820, lr=4.54265e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=212221 2023-05-03 13:30:49 - progress_bar.py[line:274] - INFO: epoch 009: 3583 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7818.9, nsentences=120, sample_size=3826.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1929.9, ups=0.25, wpb=7818.9, bsz=120, num_updates=51830, lr=4.53737e-06, gnorm=1.029, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=212261 2023-05-03 13:31:28 - progress_bar.py[line:274] - INFO: epoch 009: 3593 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7695.2, nsentences=120, sample_size=4121.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1969.4, ups=0.26, wpb=7695.2, bsz=120, num_updates=51840, lr=4.53209e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=39, gb_free=27.4, wall=212301 2023-05-03 13:32:08 - progress_bar.py[line:274] - INFO: epoch 009: 3603 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7879.5, nsentences=120, sample_size=4080.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1961.9, ups=0.25, wpb=7879.5, bsz=120, num_updates=51850, lr=4.52681e-06, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=28.7, wall=212341 2023-05-03 13:32:47 - progress_bar.py[line:274] - INFO: epoch 009: 3613 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7675.4, nsentences=120, sample_size=4399.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1987.9, ups=0.26, wpb=7675.4, bsz=120, num_updates=51860, lr=4.52152e-06, gnorm=0.948, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=212379 2023-05-03 13:33:27 - progress_bar.py[line:274] - INFO: epoch 009: 3623 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7793.4, nsentences=120, sample_size=4000.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1928.5, ups=0.25, wpb=7793.4, bsz=120, num_updates=51870, lr=4.51624e-06, gnorm=1.014, clip=60, loss_scale=64, train_wall=40, gb_free=29.7, wall=212420 2023-05-03 13:34:07 - progress_bar.py[line:274] - INFO: epoch 009: 3633 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7685.4, nsentences=120, sample_size=4076.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1921.1, ups=0.25, wpb=7685.4, bsz=120, num_updates=51880, lr=4.51096e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=212460 2023-05-03 13:34:47 - progress_bar.py[line:274] - INFO: epoch 009: 3643 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7610.5, nsentences=120, sample_size=4338.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1910.8, ups=0.25, wpb=7610.5, bsz=120, num_updates=51890, lr=4.50568e-06, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=212500 2023-05-03 13:35:27 - progress_bar.py[line:274] - INFO: epoch 009: 3653 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7775.8, nsentences=120, sample_size=3875.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1959.4, ups=0.25, wpb=7775.8, bsz=120, num_updates=51900, lr=4.5004e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30.6, wall=212539 2023-05-03 13:36:07 - progress_bar.py[line:274] - INFO: epoch 009: 3663 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=8045.9, nsentences=120, sample_size=4097.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1994.9, ups=0.25, wpb=8045.9, bsz=120, num_updates=51910, lr=4.49511e-06, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=212580 2023-05-03 13:36:46 - progress_bar.py[line:274] - INFO: epoch 009: 3673 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7718, nsentences=120, sample_size=4139.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1985.6, ups=0.26, wpb=7718, bsz=120, num_updates=51920, lr=4.48983e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=39, gb_free=31, wall=212618 2023-05-03 13:37:25 - progress_bar.py[line:274] - INFO: epoch 009: 3683 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7617.5, nsentences=120, sample_size=3971.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1947.7, ups=0.26, wpb=7617.5, bsz=120, num_updates=51930, lr=4.48455e-06, gnorm=1, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=212658 2023-05-03 13:38:04 - progress_bar.py[line:274] - INFO: epoch 009: 3693 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7502.4, nsentences=120, sample_size=4165.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1915.8, ups=0.26, wpb=7502.4, bsz=120, num_updates=51940, lr=4.47927e-06, gnorm=0.973, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=212697 2023-05-03 13:38:44 - progress_bar.py[line:274] - INFO: epoch 009: 3703 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7651.5, nsentences=120, sample_size=4143.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1902.8, ups=0.25, wpb=7651.5, bsz=120, num_updates=51950, lr=4.47399e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=212737 2023-05-03 13:39:24 - progress_bar.py[line:274] - INFO: epoch 009: 3713 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7848.8, nsentences=120, sample_size=3985.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1972.5, ups=0.25, wpb=7848.8, bsz=120, num_updates=51960, lr=4.4687e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=212777 2023-05-03 13:40:03 - progress_bar.py[line:274] - INFO: epoch 009: 3723 / 6042 loss=2.309, loss_v1=0, loss_v2=0, nll_loss=1.04, ntokens=7548.7, nsentences=120, sample_size=3840.9, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1932.8, ups=0.26, wpb=7548.7, bsz=120, num_updates=51970, lr=4.46342e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=212816 2023-05-03 13:40:44 - progress_bar.py[line:274] - INFO: epoch 009: 3733 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7725.6, nsentences=120, sample_size=3935.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1894, ups=0.25, wpb=7725.6, bsz=120, num_updates=51980, lr=4.45814e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=41, gb_free=30.2, wall=212857 2023-05-03 13:41:25 - progress_bar.py[line:274] - INFO: epoch 009: 3743 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7961.9, nsentences=120, sample_size=4321.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1954.7, ups=0.25, wpb=7961.9, bsz=120, num_updates=51990, lr=4.45286e-06, gnorm=0.948, clip=10, loss_scale=64, train_wall=41, gb_free=30.3, wall=212897 2023-05-03 13:42:05 - progress_bar.py[line:274] - INFO: epoch 009: 3753 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7451.8, nsentences=120, sample_size=4117.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1870.4, ups=0.25, wpb=7451.8, bsz=120, num_updates=52000, lr=4.44757e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=212937 2023-05-03 13:42:05 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 13:42:06 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 13:42:06 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:23 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 13:42:23 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:35 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 13:42:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 13:42:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:51 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 13:42:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 13:42:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 13:42:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 13:42:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 13:42:56 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.264 | loss_v1 0 | loss_v2 0 | nll_loss 2.098 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.28 | score 0.7549 | wps 3293.4 | wpb 3202.1 | bsz 39.4 | num_updates 52000 | best_score 0.7627 2023-05-03 13:42:56 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 52000 updates 2023-05-03 13:42:56 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_52000.pt 2023-05-03 13:43:20 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_52000.pt 2023-05-03 13:43:34 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_52000.pt (epoch 9 @ 52000 updates, score 0.7549) (writing took 38.14312398200855 seconds) 2023-05-03 13:44:13 - progress_bar.py[line:274] - INFO: epoch 009: 3763 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7882.3, nsentences=120, sample_size=3836, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=613.9, ups=0.08, wpb=7882.3, bsz=120, num_updates=52010, lr=4.44229e-06, gnorm=1.02, clip=60, loss_scale=64, train_wall=39, gb_free=29.5, wall=213066 2023-05-03 13:44:53 - progress_bar.py[line:274] - INFO: epoch 009: 3773 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7805.5, nsentences=120, sample_size=4077.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1960.2, ups=0.25, wpb=7805.5, bsz=120, num_updates=52020, lr=4.43701e-06, gnorm=0.947, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=213105 2023-05-03 13:45:33 - progress_bar.py[line:274] - INFO: epoch 009: 3783 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7484, nsentences=120, sample_size=4113.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1875.4, ups=0.25, wpb=7484, bsz=120, num_updates=52030, lr=4.43173e-06, gnorm=0.979, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=213145 2023-05-03 13:46:12 - progress_bar.py[line:274] - INFO: epoch 009: 3793 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7495.7, nsentences=120, sample_size=4254.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1891.1, ups=0.25, wpb=7495.7, bsz=120, num_updates=52040, lr=4.42645e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=213185 2023-05-03 13:46:52 - progress_bar.py[line:274] - INFO: epoch 009: 3803 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7777.1, nsentences=120, sample_size=3930.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1971.7, ups=0.25, wpb=7777.1, bsz=120, num_updates=52050, lr=4.42116e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=213224 2023-05-03 13:47:32 - progress_bar.py[line:274] - INFO: epoch 009: 3813 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=8051.7, nsentences=120, sample_size=3799.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2020.8, ups=0.25, wpb=8051.7, bsz=120, num_updates=52060, lr=4.41588e-06, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=213264 2023-05-03 13:48:11 - progress_bar.py[line:274] - INFO: epoch 009: 3823 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7541.6, nsentences=120, sample_size=4071.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1935.7, ups=0.26, wpb=7541.6, bsz=120, num_updates=52070, lr=4.4106e-06, gnorm=0.962, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=213303 2023-05-03 13:48:51 - progress_bar.py[line:274] - INFO: epoch 009: 3833 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7667, nsentences=120, sample_size=4012.7, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1908, ups=0.25, wpb=7667, bsz=120, num_updates=52080, lr=4.40532e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=213343 2023-05-03 13:49:32 - progress_bar.py[line:274] - INFO: epoch 009: 3843 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7991.9, nsentences=120, sample_size=4035.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1958.4, ups=0.25, wpb=7991.9, bsz=120, num_updates=52090, lr=4.40004e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=41, gb_free=30.6, wall=213384 2023-05-03 13:50:11 - progress_bar.py[line:274] - INFO: epoch 009: 3853 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7769.4, nsentences=120, sample_size=3811.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1965.6, ups=0.25, wpb=7769.4, bsz=120, num_updates=52100, lr=4.39475e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=39, gb_free=26.9, wall=213424 2023-05-03 13:50:51 - progress_bar.py[line:274] - INFO: epoch 009: 3863 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7727.8, nsentences=120, sample_size=4371.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1937.6, ups=0.25, wpb=7727.8, bsz=120, num_updates=52110, lr=4.38947e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=213464 2023-05-03 13:51:31 - progress_bar.py[line:274] - INFO: epoch 009: 3873 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7777.2, nsentences=120, sample_size=4108.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1960.6, ups=0.25, wpb=7777.2, bsz=120, num_updates=52120, lr=4.38419e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=213503 2023-05-03 13:52:11 - progress_bar.py[line:274] - INFO: epoch 009: 3883 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7825.5, nsentences=120, sample_size=3913.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1958.7, ups=0.25, wpb=7825.5, bsz=120, num_updates=52130, lr=4.37891e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=213543 2023-05-03 13:52:50 - progress_bar.py[line:274] - INFO: epoch 009: 3893 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7398.3, nsentences=120, sample_size=4462.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1867.6, ups=0.25, wpb=7398.3, bsz=120, num_updates=52140, lr=4.37362e-06, gnorm=0.94, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=213583 2023-05-03 13:53:31 - progress_bar.py[line:274] - INFO: epoch 009: 3903 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7897.1, nsentences=120, sample_size=4319, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1942.3, ups=0.25, wpb=7897.1, bsz=120, num_updates=52150, lr=4.36834e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=41, gb_free=29, wall=213624 2023-05-03 13:54:10 - progress_bar.py[line:274] - INFO: epoch 009: 3913 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7659.8, nsentences=120, sample_size=3980.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1990.5, ups=0.26, wpb=7659.8, bsz=120, num_updates=52160, lr=4.36306e-06, gnorm=1.005, clip=40, loss_scale=64, train_wall=38, gb_free=28, wall=213662 2023-05-03 13:54:49 - progress_bar.py[line:274] - INFO: epoch 009: 3923 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7963.1, nsentences=120, sample_size=3931.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2012.8, ups=0.25, wpb=7963.1, bsz=120, num_updates=52170, lr=4.35778e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=213702 2023-05-03 13:55:29 - progress_bar.py[line:274] - INFO: epoch 009: 3933 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=8050.3, nsentences=120, sample_size=3874.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1998.3, ups=0.25, wpb=8050.3, bsz=120, num_updates=52180, lr=4.3525e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=213742 2023-05-03 13:56:09 - progress_bar.py[line:274] - INFO: epoch 009: 3943 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7523.8, nsentences=120, sample_size=4190.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1912.3, ups=0.25, wpb=7523.8, bsz=120, num_updates=52190, lr=4.34721e-06, gnorm=0.959, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=213781 2023-05-03 13:56:48 - progress_bar.py[line:274] - INFO: epoch 009: 3953 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7547.7, nsentences=120, sample_size=4065.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916.6, ups=0.25, wpb=7547.7, bsz=120, num_updates=52200, lr=4.34193e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=31.2, wall=213821 2023-05-03 13:57:28 - progress_bar.py[line:274] - INFO: epoch 009: 3963 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7674.4, nsentences=120, sample_size=3848.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1901, ups=0.25, wpb=7674.4, bsz=120, num_updates=52210, lr=4.33665e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=213861 2023-05-03 13:58:08 - progress_bar.py[line:274] - INFO: epoch 009: 3973 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7442.5, nsentences=120, sample_size=4024.3, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1874.7, ups=0.25, wpb=7442.5, bsz=120, num_updates=52220, lr=4.33137e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=213901 2023-05-03 13:58:49 - progress_bar.py[line:274] - INFO: epoch 009: 3983 / 6042 loss=2.416, loss_v1=0, loss_v2=0, nll_loss=1.171, ntokens=8183.4, nsentences=120, sample_size=3870.2, sample_size_v1=0, sample_size_v2=0, ppl=2.25, wps=2009.6, ups=0.25, wpb=8183.4, bsz=120, num_updates=52230, lr=4.32609e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=41, gb_free=30.8, wall=213941 2023-05-03 13:59:29 - progress_bar.py[line:274] - INFO: epoch 009: 3993 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7832.6, nsentences=120, sample_size=3986.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1960.3, ups=0.25, wpb=7832.6, bsz=120, num_updates=52240, lr=4.3208e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=213981 2023-05-03 14:00:10 - progress_bar.py[line:274] - INFO: epoch 009: 4003 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7630.2, nsentences=120, sample_size=4152.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1876, ups=0.25, wpb=7630.2, bsz=120, num_updates=52250, lr=4.31552e-06, gnorm=0.962, clip=20, loss_scale=64, train_wall=41, gb_free=29.6, wall=214022 2023-05-03 14:00:50 - progress_bar.py[line:274] - INFO: epoch 009: 4013 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7777.5, nsentences=120, sample_size=4303.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1943.5, ups=0.25, wpb=7777.5, bsz=120, num_updates=52260, lr=4.31024e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=214062 2023-05-03 14:01:28 - progress_bar.py[line:274] - INFO: epoch 009: 4023 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7445, nsentences=120, sample_size=4250.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1923.4, ups=0.26, wpb=7445, bsz=120, num_updates=52270, lr=4.30496e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=39, gb_free=29.9, wall=214101 2023-05-03 14:02:08 - progress_bar.py[line:274] - INFO: epoch 009: 4033 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7754.5, nsentences=120, sample_size=4206, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1933.6, ups=0.25, wpb=7754.5, bsz=120, num_updates=52280, lr=4.29967e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=214141 2023-05-03 14:02:48 - progress_bar.py[line:274] - INFO: epoch 009: 4043 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7681.2, nsentences=120, sample_size=4104.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1945.7, ups=0.25, wpb=7681.2, bsz=120, num_updates=52290, lr=4.29439e-06, gnorm=0.977, clip=60, loss_scale=64, train_wall=39, gb_free=30.5, wall=214180 2023-05-03 14:03:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 14:03:32 - progress_bar.py[line:274] - INFO: epoch 009: 4054 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7540.7, nsentences=120, sample_size=4004.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1714.6, ups=0.23, wpb=7540.7, bsz=120, num_updates=52300, lr=4.28911e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=44, gb_free=29.3, wall=214224 2023-05-03 14:04:11 - progress_bar.py[line:274] - INFO: epoch 009: 4064 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7435.7, nsentences=120, sample_size=3717.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1896.6, ups=0.26, wpb=7435.7, bsz=120, num_updates=52310, lr=4.28383e-06, gnorm=1.028, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=214263 2023-05-03 14:04:51 - progress_bar.py[line:274] - INFO: epoch 009: 4074 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7731.6, nsentences=120, sample_size=4209.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1942.6, ups=0.25, wpb=7731.6, bsz=120, num_updates=52320, lr=4.27855e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=214303 2023-05-03 14:05:30 - progress_bar.py[line:274] - INFO: epoch 009: 4084 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7663.4, nsentences=120, sample_size=4367.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1946.7, ups=0.25, wpb=7663.4, bsz=120, num_updates=52330, lr=4.27326e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=214343 2023-05-03 14:06:10 - progress_bar.py[line:274] - INFO: epoch 009: 4094 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.153, ntokens=7806.1, nsentences=120, sample_size=4096, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1967.3, ups=0.25, wpb=7806.1, bsz=120, num_updates=52340, lr=4.26798e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=214382 2023-05-03 14:06:50 - progress_bar.py[line:274] - INFO: epoch 009: 4104 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7766.4, nsentences=120, sample_size=3950.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1952.2, ups=0.25, wpb=7766.4, bsz=120, num_updates=52350, lr=4.2627e-06, gnorm=1.018, clip=60, loss_scale=64, train_wall=40, gb_free=32.1, wall=214422 2023-05-03 14:07:30 - progress_bar.py[line:274] - INFO: epoch 009: 4114 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=8013, nsentences=120, sample_size=3744.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1983.1, ups=0.25, wpb=8013, bsz=120, num_updates=52360, lr=4.25742e-06, gnorm=1.015, clip=60, loss_scale=64, train_wall=40, gb_free=30.7, wall=214463 2023-05-03 14:08:10 - progress_bar.py[line:274] - INFO: epoch 009: 4124 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7749.2, nsentences=120, sample_size=4087.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1945, ups=0.25, wpb=7749.2, bsz=120, num_updates=52370, lr=4.25213e-06, gnorm=0.965, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=214502 2023-05-03 14:08:50 - progress_bar.py[line:274] - INFO: epoch 009: 4134 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=8035.5, nsentences=120, sample_size=3680.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2025.2, ups=0.25, wpb=8035.5, bsz=120, num_updates=52380, lr=4.24685e-06, gnorm=1.026, clip=70, loss_scale=64, train_wall=40, gb_free=29.9, wall=214542 2023-05-03 14:09:30 - progress_bar.py[line:274] - INFO: epoch 009: 4144 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7918.5, nsentences=120, sample_size=3595.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1979.6, ups=0.25, wpb=7918.5, bsz=120, num_updates=52390, lr=4.24157e-06, gnorm=1.019, clip=70, loss_scale=64, train_wall=40, gb_free=30.9, wall=214582 2023-05-03 14:10:09 - progress_bar.py[line:274] - INFO: epoch 009: 4154 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7623, nsentences=120, sample_size=4158.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1924.2, ups=0.25, wpb=7623, bsz=120, num_updates=52400, lr=4.23629e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=214622 2023-05-03 14:10:48 - progress_bar.py[line:274] - INFO: epoch 009: 4164 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7564.3, nsentences=120, sample_size=3981.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927, ups=0.25, wpb=7564.3, bsz=120, num_updates=52410, lr=4.23101e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=30.1, wall=214661 2023-05-03 14:11:28 - progress_bar.py[line:274] - INFO: epoch 009: 4174 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7443, nsentences=120, sample_size=4475.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1882.2, ups=0.25, wpb=7443, bsz=120, num_updates=52420, lr=4.22572e-06, gnorm=0.933, clip=10, loss_scale=64, train_wall=39, gb_free=30.1, wall=214700 2023-05-03 14:12:08 - progress_bar.py[line:274] - INFO: epoch 009: 4184 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7757.8, nsentences=120, sample_size=3947.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1947, ups=0.25, wpb=7757.8, bsz=120, num_updates=52430, lr=4.22044e-06, gnorm=0.972, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=214740 2023-05-03 14:12:48 - progress_bar.py[line:274] - INFO: epoch 009: 4194 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7853.8, nsentences=120, sample_size=3623.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1958.2, ups=0.25, wpb=7853.8, bsz=120, num_updates=52440, lr=4.21516e-06, gnorm=1.026, clip=70, loss_scale=64, train_wall=40, gb_free=31.3, wall=214780 2023-05-03 14:13:28 - progress_bar.py[line:274] - INFO: epoch 009: 4204 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7800.9, nsentences=120, sample_size=4039.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1941.3, ups=0.25, wpb=7800.9, bsz=120, num_updates=52450, lr=4.20988e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=29.1, wall=214821 2023-05-03 14:14:09 - progress_bar.py[line:274] - INFO: epoch 009: 4214 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7822.3, nsentences=120, sample_size=3881.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1935, ups=0.25, wpb=7822.3, bsz=120, num_updates=52460, lr=4.2046e-06, gnorm=0.997, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=214861 2023-05-03 14:14:48 - progress_bar.py[line:274] - INFO: epoch 009: 4224 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7638.2, nsentences=120, sample_size=3996.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1913.1, ups=0.25, wpb=7638.2, bsz=120, num_updates=52470, lr=4.19931e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=214901 2023-05-03 14:15:29 - progress_bar.py[line:274] - INFO: epoch 009: 4234 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7830.7, nsentences=120, sample_size=3924.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1936.2, ups=0.25, wpb=7830.7, bsz=120, num_updates=52480, lr=4.19403e-06, gnorm=1.022, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=214941 2023-05-03 14:16:09 - progress_bar.py[line:274] - INFO: epoch 009: 4244 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7560, nsentences=120, sample_size=3958.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1870.2, ups=0.25, wpb=7560, bsz=120, num_updates=52490, lr=4.18875e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=214982 2023-05-03 14:16:49 - progress_bar.py[line:274] - INFO: epoch 009: 4254 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7718.2, nsentences=120, sample_size=3660.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1953.9, ups=0.25, wpb=7718.2, bsz=120, num_updates=52500, lr=4.18347e-06, gnorm=1.021, clip=70, loss_scale=64, train_wall=39, gb_free=30, wall=215021 2023-05-03 14:17:29 - progress_bar.py[line:274] - INFO: epoch 009: 4264 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7270.1, nsentences=120, sample_size=3963.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1820.1, ups=0.25, wpb=7270.1, bsz=120, num_updates=52510, lr=4.17818e-06, gnorm=0.987, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=215061 2023-05-03 14:18:09 - progress_bar.py[line:274] - INFO: epoch 009: 4274 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=8116.9, nsentences=120, sample_size=3995.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2029.2, ups=0.25, wpb=8116.9, bsz=120, num_updates=52520, lr=4.1729e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=215101 2023-05-03 14:18:48 - progress_bar.py[line:274] - INFO: epoch 009: 4284 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7710, nsentences=120, sample_size=3793.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1950.6, ups=0.25, wpb=7710, bsz=120, num_updates=52530, lr=4.16762e-06, gnorm=0.994, clip=60, loss_scale=64, train_wall=39, gb_free=30.6, wall=215141 2023-05-03 14:19:28 - progress_bar.py[line:274] - INFO: epoch 009: 4294 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7806, nsentences=120, sample_size=4099.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1954.5, ups=0.25, wpb=7806, bsz=120, num_updates=52540, lr=4.16234e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=215181 2023-05-03 14:20:08 - progress_bar.py[line:274] - INFO: epoch 009: 4304 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7428.7, nsentences=120, sample_size=4236.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1873.7, ups=0.25, wpb=7428.7, bsz=120, num_updates=52550, lr=4.15706e-06, gnorm=0.992, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=215220 2023-05-03 14:20:49 - progress_bar.py[line:274] - INFO: epoch 009: 4314 / 6042 loss=2.4, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7605.5, nsentences=120, sample_size=4431.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1862.2, ups=0.24, wpb=7605.5, bsz=120, num_updates=52560, lr=4.15177e-06, gnorm=0.94, clip=20, loss_scale=64, train_wall=41, gb_free=30.3, wall=215261 2023-05-03 14:21:29 - progress_bar.py[line:274] - INFO: epoch 009: 4324 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7701.3, nsentences=120, sample_size=4182.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1932.5, ups=0.25, wpb=7701.3, bsz=120, num_updates=52570, lr=4.14649e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=28.6, wall=215301 2023-05-03 14:22:08 - progress_bar.py[line:274] - INFO: epoch 009: 4334 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7942.3, nsentences=120, sample_size=4296.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1996.1, ups=0.25, wpb=7942.3, bsz=120, num_updates=52580, lr=4.14121e-06, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=215341 2023-05-03 14:22:48 - progress_bar.py[line:274] - INFO: epoch 009: 4344 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7394.3, nsentences=120, sample_size=4242.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1860, ups=0.25, wpb=7394.3, bsz=120, num_updates=52590, lr=4.13593e-06, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=31.3, wall=215381 2023-05-03 14:23:28 - progress_bar.py[line:274] - INFO: epoch 009: 4354 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7584.8, nsentences=120, sample_size=3833.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1902.9, ups=0.25, wpb=7584.8, bsz=120, num_updates=52600, lr=4.13065e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=215421 2023-05-03 14:24:08 - progress_bar.py[line:274] - INFO: epoch 009: 4364 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7888.8, nsentences=120, sample_size=4105.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1951, ups=0.25, wpb=7888.8, bsz=120, num_updates=52610, lr=4.12536e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=215461 2023-05-03 14:24:48 - progress_bar.py[line:274] - INFO: epoch 009: 4374 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.165, ntokens=7675.9, nsentences=120, sample_size=3732.6, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=1931, ups=0.25, wpb=7675.9, bsz=120, num_updates=52620, lr=4.12008e-06, gnorm=1.024, clip=60, loss_scale=64, train_wall=40, gb_free=28.5, wall=215501 2023-05-03 14:25:28 - progress_bar.py[line:274] - INFO: epoch 009: 4384 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7678.8, nsentences=120, sample_size=4339.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.2, ups=0.25, wpb=7678.8, bsz=120, num_updates=52630, lr=4.1148e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=215541 2023-05-03 14:26:07 - progress_bar.py[line:274] - INFO: epoch 009: 4394 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7656.4, nsentences=120, sample_size=4155.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1951.4, ups=0.25, wpb=7656.4, bsz=120, num_updates=52640, lr=4.10952e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=39, gb_free=28.8, wall=215580 2023-05-03 14:26:47 - progress_bar.py[line:274] - INFO: epoch 009: 4404 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7668.7, nsentences=120, sample_size=4089.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1938.5, ups=0.25, wpb=7668.7, bsz=120, num_updates=52650, lr=4.10423e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=29.9, wall=215619 2023-05-03 14:27:27 - progress_bar.py[line:274] - INFO: epoch 009: 4414 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7716.7, nsentences=120, sample_size=3971.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1910.8, ups=0.25, wpb=7716.7, bsz=120, num_updates=52660, lr=4.09895e-06, gnorm=1.005, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=215660 2023-05-03 14:28:07 - progress_bar.py[line:274] - INFO: epoch 009: 4424 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7825.7, nsentences=120, sample_size=4150.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1978.9, ups=0.25, wpb=7825.7, bsz=120, num_updates=52670, lr=4.09367e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=39, gb_free=28.4, wall=215699 2023-05-03 14:28:48 - progress_bar.py[line:274] - INFO: epoch 009: 4434 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=8109.5, nsentences=120, sample_size=4109.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1992.9, ups=0.25, wpb=8109.5, bsz=120, num_updates=52680, lr=4.08839e-06, gnorm=0.953, clip=10, loss_scale=64, train_wall=41, gb_free=30.8, wall=215740 2023-05-03 14:29:27 - progress_bar.py[line:274] - INFO: epoch 009: 4444 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7646, nsentences=120, sample_size=4173.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1951.2, ups=0.26, wpb=7646, bsz=120, num_updates=52690, lr=4.08311e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=39, gb_free=30.2, wall=215779 2023-05-03 14:30:06 - progress_bar.py[line:274] - INFO: epoch 009: 4454 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7496.1, nsentences=120, sample_size=4120, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902.8, ups=0.25, wpb=7496.1, bsz=120, num_updates=52700, lr=4.07782e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=30.8, wall=215819 2023-05-03 14:30:46 - progress_bar.py[line:274] - INFO: epoch 009: 4464 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7608.9, nsentences=120, sample_size=3958.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1915.3, ups=0.25, wpb=7608.9, bsz=120, num_updates=52710, lr=4.07254e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=215858 2023-05-03 14:31:26 - progress_bar.py[line:274] - INFO: epoch 009: 4474 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7663.4, nsentences=120, sample_size=3822.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1906.9, ups=0.25, wpb=7663.4, bsz=120, num_updates=52720, lr=4.06726e-06, gnorm=0.999, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=215899 2023-05-03 14:32:06 - progress_bar.py[line:274] - INFO: epoch 009: 4484 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7878.8, nsentences=120, sample_size=3873.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1990.6, ups=0.25, wpb=7878.8, bsz=120, num_updates=52730, lr=4.06198e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=27.3, wall=215938 2023-05-03 14:32:46 - progress_bar.py[line:274] - INFO: epoch 009: 4494 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=8021, nsentences=120, sample_size=3842, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1991.9, ups=0.25, wpb=8021, bsz=120, num_updates=52740, lr=4.0567e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=215978 2023-05-03 14:33:26 - progress_bar.py[line:274] - INFO: epoch 009: 4504 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7491.1, nsentences=120, sample_size=4154.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1865.3, ups=0.25, wpb=7491.1, bsz=120, num_updates=52750, lr=4.05141e-06, gnorm=0.974, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=216019 2023-05-03 14:34:05 - progress_bar.py[line:274] - INFO: epoch 009: 4514 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7698.5, nsentences=120, sample_size=4241.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1966.7, ups=0.26, wpb=7698.5, bsz=120, num_updates=52760, lr=4.04613e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=216058 2023-05-03 14:34:45 - progress_bar.py[line:274] - INFO: epoch 009: 4524 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7717.3, nsentences=120, sample_size=4306.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1920.3, ups=0.25, wpb=7717.3, bsz=120, num_updates=52770, lr=4.04085e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=216098 2023-05-03 14:35:26 - progress_bar.py[line:274] - INFO: epoch 009: 4534 / 6042 loss=2.321, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7809.6, nsentences=120, sample_size=3907.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1937.8, ups=0.25, wpb=7809.6, bsz=120, num_updates=52780, lr=4.03557e-06, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=216138 2023-05-03 14:36:06 - progress_bar.py[line:274] - INFO: epoch 009: 4544 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7816.7, nsentences=120, sample_size=4222.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1957.4, ups=0.25, wpb=7816.7, bsz=120, num_updates=52790, lr=4.03028e-06, gnorm=0.969, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=216178 2023-05-03 14:36:46 - progress_bar.py[line:274] - INFO: epoch 009: 4554 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7860.5, nsentences=120, sample_size=4263.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1931.8, ups=0.25, wpb=7860.5, bsz=120, num_updates=52800, lr=4.025e-06, gnorm=0.962, clip=20, loss_scale=64, train_wall=41, gb_free=27.6, wall=216219 2023-05-03 14:37:26 - progress_bar.py[line:274] - INFO: epoch 009: 4564 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7848.1, nsentences=120, sample_size=4041.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1970.1, ups=0.25, wpb=7848.1, bsz=120, num_updates=52810, lr=4.01972e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=216259 2023-05-03 14:37:50 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 14:38:11 - progress_bar.py[line:274] - INFO: epoch 009: 4575 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7748.6, nsentences=120, sample_size=4209.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1747.8, ups=0.23, wpb=7748.6, bsz=120, num_updates=52820, lr=4.01444e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=44, gb_free=26.7, wall=216303 2023-05-03 14:38:49 - progress_bar.py[line:274] - INFO: epoch 009: 4585 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7837, nsentences=120, sample_size=3878.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2031.8, ups=0.26, wpb=7837, bsz=120, num_updates=52830, lr=4.00916e-06, gnorm=1.009, clip=70, loss_scale=64, train_wall=39, gb_free=29.9, wall=216342 2023-05-03 14:39:29 - progress_bar.py[line:274] - INFO: epoch 009: 4595 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7964.1, nsentences=120, sample_size=4176.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1985, ups=0.25, wpb=7964.1, bsz=120, num_updates=52840, lr=4.00387e-06, gnorm=0.997, clip=60, loss_scale=64, train_wall=40, gb_free=25.2, wall=216382 2023-05-03 14:40:09 - progress_bar.py[line:274] - INFO: epoch 009: 4605 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7992.9, nsentences=120, sample_size=3945.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2016.9, ups=0.25, wpb=7992.9, bsz=120, num_updates=52850, lr=3.99859e-06, gnorm=0.998, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=216421 2023-05-03 14:40:49 - progress_bar.py[line:274] - INFO: epoch 009: 4615 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7913.6, nsentences=120, sample_size=4074, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1978.3, ups=0.25, wpb=7913.6, bsz=120, num_updates=52860, lr=3.99331e-06, gnorm=0.998, clip=30, loss_scale=64, train_wall=40, gb_free=27.3, wall=216461 2023-05-03 14:41:29 - progress_bar.py[line:274] - INFO: epoch 009: 4625 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7653.7, nsentences=120, sample_size=4117.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1929.3, ups=0.25, wpb=7653.7, bsz=120, num_updates=52870, lr=3.98803e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=216501 2023-05-03 14:42:08 - progress_bar.py[line:274] - INFO: epoch 009: 4635 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7709.9, nsentences=120, sample_size=4103.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1967.9, ups=0.26, wpb=7709.9, bsz=120, num_updates=52880, lr=3.98274e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=216540 2023-05-03 14:42:47 - progress_bar.py[line:274] - INFO: epoch 009: 4645 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7488.2, nsentences=120, sample_size=3969.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1886, ups=0.25, wpb=7488.2, bsz=120, num_updates=52890, lr=3.97746e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=28.6, wall=216580 2023-05-03 14:43:27 - progress_bar.py[line:274] - INFO: epoch 009: 4655 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7729.5, nsentences=120, sample_size=3983.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1968.3, ups=0.25, wpb=7729.5, bsz=120, num_updates=52900, lr=3.97218e-06, gnorm=0.988, clip=60, loss_scale=64, train_wall=39, gb_free=28.7, wall=216619 2023-05-03 14:44:06 - progress_bar.py[line:274] - INFO: epoch 009: 4665 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7573.8, nsentences=120, sample_size=3970.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1922.9, ups=0.25, wpb=7573.8, bsz=120, num_updates=52910, lr=3.9669e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=216659 2023-05-03 14:44:46 - progress_bar.py[line:274] - INFO: epoch 009: 4675 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7959.2, nsentences=120, sample_size=3944.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2005.3, ups=0.25, wpb=7959.2, bsz=120, num_updates=52920, lr=3.96162e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=216698 2023-05-03 14:45:26 - progress_bar.py[line:274] - INFO: epoch 009: 4685 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7818.1, nsentences=120, sample_size=3948.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1952.7, ups=0.25, wpb=7818.1, bsz=120, num_updates=52930, lr=3.95633e-06, gnorm=1.02, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=216738 2023-05-03 14:46:05 - progress_bar.py[line:274] - INFO: epoch 009: 4695 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7513.4, nsentences=120, sample_size=4320.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1894, ups=0.25, wpb=7513.4, bsz=120, num_updates=52940, lr=3.95105e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=216778 2023-05-03 14:46:45 - progress_bar.py[line:274] - INFO: epoch 009: 4705 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7585.2, nsentences=120, sample_size=4250.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1912.3, ups=0.25, wpb=7585.2, bsz=120, num_updates=52950, lr=3.94577e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=216818 2023-05-03 14:47:24 - progress_bar.py[line:274] - INFO: epoch 009: 4715 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7938, nsentences=120, sample_size=3867.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2026.5, ups=0.26, wpb=7938, bsz=120, num_updates=52960, lr=3.94049e-06, gnorm=1.008, clip=60, loss_scale=64, train_wall=39, gb_free=29.5, wall=216857 2023-05-03 14:48:04 - progress_bar.py[line:274] - INFO: epoch 009: 4725 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7951.6, nsentences=120, sample_size=4028.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1993, ups=0.25, wpb=7951.6, bsz=120, num_updates=52970, lr=3.93521e-06, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=216897 2023-05-03 14:48:44 - progress_bar.py[line:274] - INFO: epoch 009: 4735 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7514.6, nsentences=120, sample_size=3967.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1908.2, ups=0.25, wpb=7514.6, bsz=120, num_updates=52980, lr=3.92992e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=216936 2023-05-03 14:49:24 - progress_bar.py[line:274] - INFO: epoch 009: 4745 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7921, nsentences=120, sample_size=3732.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1982.8, ups=0.25, wpb=7921, bsz=120, num_updates=52990, lr=3.92464e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=216976 2023-05-03 14:50:04 - progress_bar.py[line:274] - INFO: epoch 009: 4755 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7747.1, nsentences=120, sample_size=4013.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1935.5, ups=0.25, wpb=7747.1, bsz=120, num_updates=53000, lr=3.91936e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=217016 2023-05-03 14:50:04 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 14:50:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 14:50:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 14:50:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 14:50:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:45 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 14:50:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:50 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 14:50:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 14:50:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 14:50:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 14:50:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 14:50:55 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.248 | loss_v1 0 | loss_v2 0 | nll_loss 2.082 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.23 | score 0.7598 | wps 3302 | wpb 3202.1 | bsz 39.4 | num_updates 53000 | best_score 0.7627 2023-05-03 14:50:55 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 53000 updates 2023-05-03 14:50:55 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_53000.pt 2023-05-03 14:51:20 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_53000.pt 2023-05-03 14:51:33 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_53000.pt (epoch 9 @ 53000 updates, score 0.7598) (writing took 38.77605805196799 seconds) 2023-05-03 14:52:13 - progress_bar.py[line:274] - INFO: epoch 009: 4765 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7441.5, nsentences=120, sample_size=3953.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=574.2, ups=0.08, wpb=7441.5, bsz=120, num_updates=53010, lr=3.91408e-06, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=217146 2023-05-03 14:52:53 - progress_bar.py[line:274] - INFO: epoch 009: 4775 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7816.5, nsentences=120, sample_size=4168.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1961.7, ups=0.25, wpb=7816.5, bsz=120, num_updates=53020, lr=3.90879e-06, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=217185 2023-05-03 14:53:33 - progress_bar.py[line:274] - INFO: epoch 009: 4785 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7393.2, nsentences=120, sample_size=4046, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1865.9, ups=0.25, wpb=7393.2, bsz=120, num_updates=53030, lr=3.90351e-06, gnorm=0.994, clip=60, loss_scale=64, train_wall=40, gb_free=29.7, wall=217225 2023-05-03 14:54:12 - progress_bar.py[line:274] - INFO: epoch 009: 4795 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7786, nsentences=120, sample_size=3751.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1990, ups=0.26, wpb=7786, bsz=120, num_updates=53040, lr=3.89823e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=217264 2023-05-03 14:54:51 - progress_bar.py[line:274] - INFO: epoch 009: 4805 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7963.3, nsentences=120, sample_size=3868.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2030.2, ups=0.25, wpb=7963.3, bsz=120, num_updates=53050, lr=3.89295e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=217303 2023-05-03 14:55:31 - progress_bar.py[line:274] - INFO: epoch 009: 4815 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7498.1, nsentences=120, sample_size=4318, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1867.7, ups=0.25, wpb=7498.1, bsz=120, num_updates=53060, lr=3.88767e-06, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=217344 2023-05-03 14:56:12 - progress_bar.py[line:274] - INFO: epoch 009: 4825 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7868.9, nsentences=120, sample_size=3910.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1943, ups=0.25, wpb=7868.9, bsz=120, num_updates=53070, lr=3.88238e-06, gnorm=0.997, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=217384 2023-05-03 14:56:50 - progress_bar.py[line:274] - INFO: epoch 009: 4835 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7756.6, nsentences=120, sample_size=3746.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2005.9, ups=0.26, wpb=7756.6, bsz=120, num_updates=53080, lr=3.8771e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=217423 2023-05-03 14:57:31 - progress_bar.py[line:274] - INFO: epoch 009: 4845 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=8371.2, nsentences=120, sample_size=3701.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2079.6, ups=0.25, wpb=8371.2, bsz=120, num_updates=53090, lr=3.87182e-06, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=29.3, wall=217463 2023-05-03 14:58:10 - progress_bar.py[line:274] - INFO: epoch 009: 4855 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7981.7, nsentences=120, sample_size=4072.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2005.7, ups=0.25, wpb=7981.7, bsz=120, num_updates=53100, lr=3.86654e-06, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=217503 2023-05-03 14:58:51 - progress_bar.py[line:274] - INFO: epoch 009: 4865 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7655, nsentences=120, sample_size=3675.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1898.5, ups=0.25, wpb=7655, bsz=120, num_updates=53110, lr=3.86126e-06, gnorm=0.983, clip=60, loss_scale=64, train_wall=40, gb_free=30.7, wall=217543 2023-05-03 14:59:31 - progress_bar.py[line:274] - INFO: epoch 009: 4875 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7886, nsentences=120, sample_size=4080, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1967.5, ups=0.25, wpb=7886, bsz=120, num_updates=53120, lr=3.85597e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=24.6, wall=217583 2023-05-03 15:00:11 - progress_bar.py[line:274] - INFO: epoch 009: 4885 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7959.7, nsentences=120, sample_size=3952, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1976.9, ups=0.25, wpb=7959.7, bsz=120, num_updates=53130, lr=3.85069e-06, gnorm=0.975, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=217623 2023-05-03 15:00:51 - progress_bar.py[line:274] - INFO: epoch 009: 4895 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7611, nsentences=120, sample_size=3878.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1902.9, ups=0.25, wpb=7611, bsz=120, num_updates=53140, lr=3.84541e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=29.7, wall=217663 2023-05-03 15:01:30 - progress_bar.py[line:274] - INFO: epoch 009: 4905 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7517.2, nsentences=120, sample_size=4097.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1917.3, ups=0.26, wpb=7517.2, bsz=120, num_updates=53150, lr=3.84013e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=28, wall=217703 2023-05-03 15:02:10 - progress_bar.py[line:274] - INFO: epoch 009: 4915 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7618.4, nsentences=120, sample_size=4277.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1939.5, ups=0.25, wpb=7618.4, bsz=120, num_updates=53160, lr=3.83484e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=217742 2023-05-03 15:02:49 - progress_bar.py[line:274] - INFO: epoch 009: 4925 / 6042 loss=2.411, loss_v1=0, loss_v2=0, nll_loss=1.164, ntokens=8114.9, nsentences=120, sample_size=3933, sample_size_v1=0, sample_size_v2=0, ppl=2.24, wps=2041.1, ups=0.25, wpb=8114.9, bsz=120, num_updates=53170, lr=3.82956e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=217782 2023-05-03 15:03:29 - progress_bar.py[line:274] - INFO: epoch 009: 4935 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7780.8, nsentences=120, sample_size=4188.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1963, ups=0.25, wpb=7780.8, bsz=120, num_updates=53180, lr=3.82428e-06, gnorm=0.952, clip=20, loss_scale=64, train_wall=40, gb_free=26.8, wall=217821 2023-05-03 15:04:09 - progress_bar.py[line:274] - INFO: epoch 009: 4945 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7825.7, nsentences=120, sample_size=4117.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1960.7, ups=0.25, wpb=7825.7, bsz=120, num_updates=53190, lr=3.819e-06, gnorm=0.973, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=217861 2023-05-03 15:04:49 - progress_bar.py[line:274] - INFO: epoch 009: 4955 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=8156.6, nsentences=120, sample_size=3897.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2018.1, ups=0.25, wpb=8156.6, bsz=120, num_updates=53200, lr=3.81372e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=217902 2023-05-03 15:05:29 - progress_bar.py[line:274] - INFO: epoch 009: 4965 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7700, nsentences=120, sample_size=3962.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1938.3, ups=0.25, wpb=7700, bsz=120, num_updates=53210, lr=3.80843e-06, gnorm=1.001, clip=70, loss_scale=64, train_wall=40, gb_free=30.6, wall=217941 2023-05-03 15:06:09 - progress_bar.py[line:274] - INFO: epoch 009: 4975 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7762, nsentences=120, sample_size=4028.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1955, ups=0.25, wpb=7762, bsz=120, num_updates=53220, lr=3.80315e-06, gnorm=0.964, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=217981 2023-05-03 15:06:48 - progress_bar.py[line:274] - INFO: epoch 009: 4985 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7940.6, nsentences=120, sample_size=3858.1, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1995.5, ups=0.25, wpb=7940.6, bsz=120, num_updates=53230, lr=3.79787e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=218021 2023-05-03 15:07:28 - progress_bar.py[line:274] - INFO: epoch 009: 4995 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=8164.7, nsentences=120, sample_size=4082.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2056.8, ups=0.25, wpb=8164.7, bsz=120, num_updates=53240, lr=3.79259e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=218061 2023-05-03 15:08:08 - progress_bar.py[line:274] - INFO: epoch 009: 5005 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7487.1, nsentences=120, sample_size=4201.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1857.6, ups=0.25, wpb=7487.1, bsz=120, num_updates=53250, lr=3.78731e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=218101 2023-05-03 15:08:48 - progress_bar.py[line:274] - INFO: epoch 009: 5015 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7684.2, nsentences=120, sample_size=4228, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1924.4, ups=0.25, wpb=7684.2, bsz=120, num_updates=53260, lr=3.78202e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=218141 2023-05-03 15:09:28 - progress_bar.py[line:274] - INFO: epoch 009: 5025 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7604.6, nsentences=120, sample_size=4273.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1916.2, ups=0.25, wpb=7604.6, bsz=120, num_updates=53270, lr=3.77674e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=218181 2023-05-03 15:10:08 - progress_bar.py[line:274] - INFO: epoch 009: 5035 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7673.9, nsentences=120, sample_size=4038.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1928, ups=0.25, wpb=7673.9, bsz=120, num_updates=53280, lr=3.77146e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=218220 2023-05-03 15:10:49 - progress_bar.py[line:274] - INFO: epoch 009: 5045 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.157, ntokens=8094.1, nsentences=120, sample_size=4247.1, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1987.4, ups=0.25, wpb=8094.1, bsz=120, num_updates=53290, lr=3.76618e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=31.2, wall=218261 2023-05-03 15:11:29 - progress_bar.py[line:274] - INFO: epoch 009: 5055 / 6042 loss=2.404, loss_v1=0, loss_v2=0, nll_loss=1.16, ntokens=7488.8, nsentences=120, sample_size=3999.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1860.5, ups=0.25, wpb=7488.8, bsz=120, num_updates=53300, lr=3.76089e-06, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=31.7, wall=218301 2023-05-03 15:12:09 - progress_bar.py[line:274] - INFO: epoch 009: 5065 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7477.8, nsentences=120, sample_size=3844.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1870.2, ups=0.25, wpb=7477.8, bsz=120, num_updates=53310, lr=3.75561e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=218341 2023-05-03 15:12:49 - progress_bar.py[line:274] - INFO: epoch 009: 5075 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7728.9, nsentences=120, sample_size=3796.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1943.6, ups=0.25, wpb=7728.9, bsz=120, num_updates=53320, lr=3.75033e-06, gnorm=1, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=218381 2023-05-03 15:13:28 - progress_bar.py[line:274] - INFO: epoch 009: 5085 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7877.2, nsentences=120, sample_size=3839.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1979.6, ups=0.25, wpb=7877.2, bsz=120, num_updates=53330, lr=3.74505e-06, gnorm=0.987, clip=30, loss_scale=128, train_wall=40, gb_free=31.6, wall=218421 2023-05-03 15:13:57 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 15:14:12 - progress_bar.py[line:274] - INFO: epoch 009: 5096 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7729.9, nsentences=120, sample_size=4313.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1758.4, ups=0.23, wpb=7729.9, bsz=120, num_updates=53340, lr=3.73977e-06, gnorm=0.947, clip=30, loss_scale=64, train_wall=44, gb_free=29.9, wall=218465 2023-05-03 15:14:52 - progress_bar.py[line:274] - INFO: epoch 009: 5106 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7745.3, nsentences=120, sample_size=3938.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1961.4, ups=0.25, wpb=7745.3, bsz=120, num_updates=53350, lr=3.73448e-06, gnorm=0.958, clip=10, loss_scale=64, train_wall=39, gb_free=27.3, wall=218504 2023-05-03 15:15:32 - progress_bar.py[line:274] - INFO: epoch 009: 5116 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7807.7, nsentences=120, sample_size=4053.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1956.5, ups=0.25, wpb=7807.7, bsz=120, num_updates=53360, lr=3.7292e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=218544 2023-05-03 15:16:12 - progress_bar.py[line:274] - INFO: epoch 009: 5126 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7643.2, nsentences=120, sample_size=3930.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1914.8, ups=0.25, wpb=7643.2, bsz=120, num_updates=53370, lr=3.72392e-06, gnorm=1.007, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=218584 2023-05-03 15:16:51 - progress_bar.py[line:274] - INFO: epoch 009: 5136 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7704.7, nsentences=120, sample_size=3936.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1940.7, ups=0.25, wpb=7704.7, bsz=120, num_updates=53380, lr=3.71864e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=218624 2023-05-03 15:17:31 - progress_bar.py[line:274] - INFO: epoch 009: 5146 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7837.4, nsentences=120, sample_size=4013.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1960.9, ups=0.25, wpb=7837.4, bsz=120, num_updates=53390, lr=3.71336e-06, gnorm=0.97, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=218664 2023-05-03 15:18:11 - progress_bar.py[line:274] - INFO: epoch 009: 5156 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7551.9, nsentences=120, sample_size=3905.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1915.4, ups=0.25, wpb=7551.9, bsz=120, num_updates=53400, lr=3.70807e-06, gnorm=1.002, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=218703 2023-05-03 15:18:50 - progress_bar.py[line:274] - INFO: epoch 009: 5166 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7508.3, nsentences=120, sample_size=4199.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1908.2, ups=0.25, wpb=7508.3, bsz=120, num_updates=53410, lr=3.70279e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=39, gb_free=29, wall=218743 2023-05-03 15:19:31 - progress_bar.py[line:274] - INFO: epoch 009: 5176 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7746.9, nsentences=120, sample_size=4240.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1908.8, ups=0.25, wpb=7746.9, bsz=120, num_updates=53420, lr=3.69751e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=41, gb_free=30.3, wall=218783 2023-05-03 15:20:11 - progress_bar.py[line:274] - INFO: epoch 009: 5186 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7856.5, nsentences=120, sample_size=4289.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1944.8, ups=0.25, wpb=7856.5, bsz=120, num_updates=53430, lr=3.69223e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=218824 2023-05-03 15:20:51 - progress_bar.py[line:274] - INFO: epoch 009: 5196 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7694.6, nsentences=120, sample_size=3831.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1927.6, ups=0.25, wpb=7694.6, bsz=120, num_updates=53440, lr=3.68694e-06, gnorm=1.028, clip=70, loss_scale=64, train_wall=40, gb_free=30.4, wall=218864 2023-05-03 15:21:31 - progress_bar.py[line:274] - INFO: epoch 009: 5206 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7709.8, nsentences=120, sample_size=4067.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1924.4, ups=0.25, wpb=7709.8, bsz=120, num_updates=53450, lr=3.68166e-06, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=218904 2023-05-03 15:22:12 - progress_bar.py[line:274] - INFO: epoch 009: 5216 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.059, ntokens=7738.7, nsentences=120, sample_size=4002, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1901.3, ups=0.25, wpb=7738.7, bsz=120, num_updates=53460, lr=3.67638e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=41, gb_free=27.7, wall=218944 2023-05-03 15:22:52 - progress_bar.py[line:274] - INFO: epoch 009: 5226 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7664.7, nsentences=120, sample_size=4044.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1923.6, ups=0.25, wpb=7664.7, bsz=120, num_updates=53470, lr=3.6711e-06, gnorm=1.003, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=218984 2023-05-03 15:23:32 - progress_bar.py[line:274] - INFO: epoch 009: 5236 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7470.7, nsentences=120, sample_size=4239.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1864.3, ups=0.25, wpb=7470.7, bsz=120, num_updates=53480, lr=3.66582e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=219024 2023-05-03 15:24:12 - progress_bar.py[line:274] - INFO: epoch 009: 5246 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7720.2, nsentences=120, sample_size=4410.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1935, ups=0.25, wpb=7720.2, bsz=120, num_updates=53490, lr=3.66053e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=31.6, wall=219064 2023-05-03 15:24:52 - progress_bar.py[line:274] - INFO: epoch 009: 5256 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7891.7, nsentences=120, sample_size=3978.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1972.9, ups=0.25, wpb=7891.7, bsz=120, num_updates=53500, lr=3.65525e-06, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=219104 2023-05-03 15:25:31 - progress_bar.py[line:274] - INFO: epoch 009: 5266 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7673.5, nsentences=120, sample_size=3826.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1941.1, ups=0.25, wpb=7673.5, bsz=120, num_updates=53510, lr=3.64997e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=219144 2023-05-03 15:26:11 - progress_bar.py[line:274] - INFO: epoch 009: 5276 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7566.1, nsentences=120, sample_size=3930.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1885.7, ups=0.25, wpb=7566.1, bsz=120, num_updates=53520, lr=3.64469e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=219184 2023-05-03 15:26:50 - progress_bar.py[line:274] - INFO: epoch 009: 5286 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7460.1, nsentences=120, sample_size=3790.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1908.9, ups=0.26, wpb=7460.1, bsz=120, num_updates=53530, lr=3.6394e-06, gnorm=1.052, clip=70, loss_scale=64, train_wall=39, gb_free=29.9, wall=219223 2023-05-03 15:27:31 - progress_bar.py[line:274] - INFO: epoch 009: 5296 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7769.7, nsentences=120, sample_size=3840.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1933.1, ups=0.25, wpb=7769.7, bsz=120, num_updates=53540, lr=3.63412e-06, gnorm=1.008, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=219263 2023-05-03 15:28:10 - progress_bar.py[line:274] - INFO: epoch 009: 5306 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7538.8, nsentences=120, sample_size=4003, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1914.4, ups=0.25, wpb=7538.8, bsz=120, num_updates=53550, lr=3.62884e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=39, gb_free=29.5, wall=219302 2023-05-03 15:28:50 - progress_bar.py[line:274] - INFO: epoch 009: 5316 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7864, nsentences=120, sample_size=4227.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1953.5, ups=0.25, wpb=7864, bsz=120, num_updates=53560, lr=3.62356e-06, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=219343 2023-05-03 15:29:30 - progress_bar.py[line:274] - INFO: epoch 009: 5326 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7800.6, nsentences=120, sample_size=3883.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1943.9, ups=0.25, wpb=7800.6, bsz=120, num_updates=53570, lr=3.61828e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=219383 2023-05-03 15:30:11 - progress_bar.py[line:274] - INFO: epoch 009: 5336 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7788.2, nsentences=120, sample_size=4356.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1899.4, ups=0.24, wpb=7788.2, bsz=120, num_updates=53580, lr=3.61299e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=41, gb_free=27.7, wall=219424 2023-05-03 15:30:51 - progress_bar.py[line:274] - INFO: epoch 009: 5346 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7590, nsentences=120, sample_size=4016.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1933.8, ups=0.25, wpb=7590, bsz=120, num_updates=53590, lr=3.60771e-06, gnorm=1.006, clip=60, loss_scale=64, train_wall=39, gb_free=28.8, wall=219463 2023-05-03 15:31:30 - progress_bar.py[line:274] - INFO: epoch 009: 5356 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7612.5, nsentences=120, sample_size=4051.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1913.7, ups=0.25, wpb=7612.5, bsz=120, num_updates=53600, lr=3.60243e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=219503 2023-05-03 15:32:11 - progress_bar.py[line:274] - INFO: epoch 009: 5366 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7645.3, nsentences=120, sample_size=4040.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1894.8, ups=0.25, wpb=7645.3, bsz=120, num_updates=53610, lr=3.59715e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=219543 2023-05-03 15:32:50 - progress_bar.py[line:274] - INFO: epoch 009: 5376 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7700.4, nsentences=120, sample_size=4115.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1943.3, ups=0.25, wpb=7700.4, bsz=120, num_updates=53620, lr=3.59187e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=219583 2023-05-03 15:33:30 - progress_bar.py[line:274] - INFO: epoch 009: 5386 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7875.9, nsentences=120, sample_size=3790.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1994.7, ups=0.25, wpb=7875.9, bsz=120, num_updates=53630, lr=3.58658e-06, gnorm=1.004, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=219622 2023-05-03 15:34:09 - progress_bar.py[line:274] - INFO: epoch 009: 5396 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7458.9, nsentences=120, sample_size=3811.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1898.5, ups=0.25, wpb=7458.9, bsz=120, num_updates=53640, lr=3.5813e-06, gnorm=1.009, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=219662 2023-05-03 15:34:48 - progress_bar.py[line:274] - INFO: epoch 009: 5406 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7505.3, nsentences=120, sample_size=4311.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1910.2, ups=0.25, wpb=7505.3, bsz=120, num_updates=53650, lr=3.57602e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=39, gb_free=29.7, wall=219701 2023-05-03 15:35:28 - progress_bar.py[line:274] - INFO: epoch 009: 5416 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7676.9, nsentences=120, sample_size=4205.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1934.3, ups=0.25, wpb=7676.9, bsz=120, num_updates=53660, lr=3.57074e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=28.4, wall=219741 2023-05-03 15:36:08 - progress_bar.py[line:274] - INFO: epoch 009: 5426 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7623.3, nsentences=120, sample_size=3844.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1917.9, ups=0.25, wpb=7623.3, bsz=120, num_updates=53670, lr=3.56545e-06, gnorm=1.03, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=219780 2023-05-03 15:36:47 - progress_bar.py[line:274] - INFO: epoch 009: 5436 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7469.3, nsentences=120, sample_size=3994.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1886.4, ups=0.25, wpb=7469.3, bsz=120, num_updates=53680, lr=3.56017e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=28, wall=219820 2023-05-03 15:37:27 - progress_bar.py[line:274] - INFO: epoch 009: 5446 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7766.6, nsentences=120, sample_size=3903, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1941.3, ups=0.25, wpb=7766.6, bsz=120, num_updates=53690, lr=3.55489e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=219860 2023-05-03 15:38:07 - progress_bar.py[line:274] - INFO: epoch 009: 5456 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7658.4, nsentences=120, sample_size=3904, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1939.3, ups=0.25, wpb=7658.4, bsz=120, num_updates=53700, lr=3.54961e-06, gnorm=1.004, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=219899 2023-05-03 15:38:47 - progress_bar.py[line:274] - INFO: epoch 009: 5466 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7566, nsentences=120, sample_size=4214, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1898.4, ups=0.25, wpb=7566, bsz=120, num_updates=53710, lr=3.54433e-06, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=219939 2023-05-03 15:39:27 - progress_bar.py[line:274] - INFO: epoch 009: 5476 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7716.2, nsentences=120, sample_size=3963.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1929.6, ups=0.25, wpb=7716.2, bsz=120, num_updates=53720, lr=3.53904e-06, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=219979 2023-05-03 15:40:06 - progress_bar.py[line:274] - INFO: epoch 009: 5486 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7758.4, nsentences=120, sample_size=3896.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1958.6, ups=0.25, wpb=7758.4, bsz=120, num_updates=53730, lr=3.53376e-06, gnorm=1.031, clip=60, loss_scale=64, train_wall=40, gb_free=30.9, wall=220019 2023-05-03 15:40:46 - progress_bar.py[line:274] - INFO: epoch 009: 5496 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.139, ntokens=7845, nsentences=120, sample_size=4199, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1963.1, ups=0.25, wpb=7845, bsz=120, num_updates=53740, lr=3.52848e-06, gnorm=0.988, clip=20, loss_scale=64, train_wall=40, gb_free=28.5, wall=220059 2023-05-03 15:41:26 - progress_bar.py[line:274] - INFO: epoch 009: 5506 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7847.1, nsentences=120, sample_size=4087.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1971.9, ups=0.25, wpb=7847.1, bsz=120, num_updates=53750, lr=3.5232e-06, gnorm=1.006, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=220099 2023-05-03 15:42:05 - progress_bar.py[line:274] - INFO: epoch 009: 5516 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7818.3, nsentences=120, sample_size=3960.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1991.2, ups=0.25, wpb=7818.3, bsz=120, num_updates=53760, lr=3.51792e-06, gnorm=0.996, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=220138 2023-05-03 15:42:45 - progress_bar.py[line:274] - INFO: epoch 009: 5526 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7536.1, nsentences=120, sample_size=4012.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1894.2, ups=0.25, wpb=7536.1, bsz=120, num_updates=53770, lr=3.51263e-06, gnorm=1.001, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=220178 2023-05-03 15:43:26 - progress_bar.py[line:274] - INFO: epoch 009: 5536 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7760.1, nsentences=120, sample_size=3979.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1919.2, ups=0.25, wpb=7760.1, bsz=120, num_updates=53780, lr=3.50735e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=220218 2023-05-03 15:44:05 - progress_bar.py[line:274] - INFO: epoch 009: 5546 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7647.9, nsentences=120, sample_size=3934, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1936.2, ups=0.25, wpb=7647.9, bsz=120, num_updates=53790, lr=3.50207e-06, gnorm=1, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=220258 2023-05-03 15:44:45 - progress_bar.py[line:274] - INFO: epoch 009: 5556 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7845.7, nsentences=120, sample_size=3997.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1946, ups=0.25, wpb=7845.7, bsz=120, num_updates=53800, lr=3.49679e-06, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=220298 2023-05-03 15:45:25 - progress_bar.py[line:274] - INFO: epoch 009: 5566 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7909.4, nsentences=120, sample_size=4249, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1998.4, ups=0.25, wpb=7909.4, bsz=120, num_updates=53810, lr=3.4915e-06, gnorm=0.953, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=220338 2023-05-03 15:45:41 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 15:46:09 - progress_bar.py[line:274] - INFO: epoch 009: 5577 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7835.6, nsentences=120, sample_size=4046.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1786.2, ups=0.23, wpb=7835.6, bsz=120, num_updates=53820, lr=3.48622e-06, gnorm=0.975, clip=20, loss_scale=32, train_wall=44, gb_free=30.2, wall=220381 2023-05-03 15:46:49 - progress_bar.py[line:274] - INFO: epoch 009: 5587 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7945.7, nsentences=120, sample_size=3994.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1996.7, ups=0.25, wpb=7945.7, bsz=120, num_updates=53830, lr=3.48094e-06, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=28.2, wall=220421 2023-05-03 15:47:29 - progress_bar.py[line:274] - INFO: epoch 009: 5597 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7470.7, nsentences=120, sample_size=4203.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1877.3, ups=0.25, wpb=7470.7, bsz=120, num_updates=53840, lr=3.47566e-06, gnorm=0.996, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=220461 2023-05-03 15:48:08 - progress_bar.py[line:274] - INFO: epoch 009: 5607 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7798.6, nsentences=120, sample_size=4271.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1993.1, ups=0.26, wpb=7798.6, bsz=120, num_updates=53850, lr=3.47038e-06, gnorm=0.945, clip=20, loss_scale=32, train_wall=39, gb_free=29.6, wall=220500 2023-05-03 15:48:47 - progress_bar.py[line:274] - INFO: epoch 009: 5617 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7639, nsentences=120, sample_size=3951.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1929.7, ups=0.25, wpb=7639, bsz=120, num_updates=53860, lr=3.46509e-06, gnorm=0.994, clip=50, loss_scale=32, train_wall=40, gb_free=30, wall=220540 2023-05-03 15:49:28 - progress_bar.py[line:274] - INFO: epoch 009: 5627 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7993.7, nsentences=120, sample_size=4116.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1954.5, ups=0.24, wpb=7993.7, bsz=120, num_updates=53870, lr=3.45981e-06, gnorm=0.975, clip=20, loss_scale=32, train_wall=41, gb_free=30.9, wall=220581 2023-05-03 15:50:07 - progress_bar.py[line:274] - INFO: epoch 009: 5637 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7657.9, nsentences=120, sample_size=4108.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1945.6, ups=0.25, wpb=7657.9, bsz=120, num_updates=53880, lr=3.45453e-06, gnorm=0.95, clip=30, loss_scale=32, train_wall=39, gb_free=29.4, wall=220620 2023-05-03 15:50:47 - progress_bar.py[line:274] - INFO: epoch 009: 5647 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7583.2, nsentences=120, sample_size=4069.9, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1925.4, ups=0.25, wpb=7583.2, bsz=120, num_updates=53890, lr=3.44925e-06, gnorm=0.979, clip=40, loss_scale=32, train_wall=39, gb_free=29.9, wall=220659 2023-05-03 15:51:27 - progress_bar.py[line:274] - INFO: epoch 009: 5657 / 6042 loss=2.395, loss_v1=0, loss_v2=0, nll_loss=1.143, ntokens=7811.6, nsentences=120, sample_size=4225.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1935.5, ups=0.25, wpb=7811.6, bsz=120, num_updates=53900, lr=3.44397e-06, gnorm=0.963, clip=30, loss_scale=32, train_wall=40, gb_free=30.9, wall=220700 2023-05-03 15:52:08 - progress_bar.py[line:274] - INFO: epoch 009: 5667 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7688.9, nsentences=120, sample_size=4337.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1908.6, ups=0.25, wpb=7688.9, bsz=120, num_updates=53910, lr=3.43868e-06, gnorm=0.944, clip=30, loss_scale=32, train_wall=40, gb_free=29.3, wall=220740 2023-05-03 15:52:47 - progress_bar.py[line:274] - INFO: epoch 009: 5677 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7784.7, nsentences=120, sample_size=3758.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1959.1, ups=0.25, wpb=7784.7, bsz=120, num_updates=53920, lr=3.4334e-06, gnorm=1.037, clip=70, loss_scale=32, train_wall=40, gb_free=29.4, wall=220780 2023-05-03 15:53:26 - progress_bar.py[line:274] - INFO: epoch 009: 5687 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7604.1, nsentences=120, sample_size=4061.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1951.4, ups=0.26, wpb=7604.1, bsz=120, num_updates=53930, lr=3.42812e-06, gnorm=0.952, clip=10, loss_scale=32, train_wall=39, gb_free=29.1, wall=220819 2023-05-03 15:54:06 - progress_bar.py[line:274] - INFO: epoch 009: 5697 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7417.7, nsentences=120, sample_size=4069.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1883.2, ups=0.25, wpb=7417.7, bsz=120, num_updates=53940, lr=3.42284e-06, gnorm=0.99, clip=50, loss_scale=32, train_wall=39, gb_free=29.6, wall=220858 2023-05-03 15:54:46 - progress_bar.py[line:274] - INFO: epoch 009: 5707 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7544.2, nsentences=120, sample_size=4006.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1872.9, ups=0.25, wpb=7544.2, bsz=120, num_updates=53950, lr=3.41755e-06, gnorm=0.976, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=220898 2023-05-03 15:55:24 - progress_bar.py[line:274] - INFO: epoch 009: 5717 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.14, ntokens=7652.1, nsentences=120, sample_size=4089.3, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1997, ups=0.26, wpb=7652.1, bsz=120, num_updates=53960, lr=3.41227e-06, gnorm=0.991, clip=50, loss_scale=32, train_wall=38, gb_free=29.4, wall=220937 2023-05-03 15:56:04 - progress_bar.py[line:274] - INFO: epoch 009: 5727 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7865, nsentences=120, sample_size=4208.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1994.7, ups=0.25, wpb=7865, bsz=120, num_updates=53970, lr=3.40699e-06, gnorm=0.986, clip=40, loss_scale=32, train_wall=39, gb_free=30, wall=220976 2023-05-03 15:56:45 - progress_bar.py[line:274] - INFO: epoch 009: 5737 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7454.9, nsentences=120, sample_size=4110.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1821.1, ups=0.24, wpb=7454.9, bsz=120, num_updates=53980, lr=3.40171e-06, gnorm=0.976, clip=40, loss_scale=32, train_wall=41, gb_free=29.7, wall=221017 2023-05-03 15:57:24 - progress_bar.py[line:274] - INFO: epoch 009: 5747 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7822.3, nsentences=120, sample_size=4161.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1991.6, ups=0.25, wpb=7822.3, bsz=120, num_updates=53990, lr=3.39643e-06, gnorm=0.99, clip=50, loss_scale=32, train_wall=39, gb_free=30.4, wall=221056 2023-05-03 15:58:03 - progress_bar.py[line:274] - INFO: epoch 009: 5757 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7676.2, nsentences=120, sample_size=3910.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1950.5, ups=0.25, wpb=7676.2, bsz=120, num_updates=54000, lr=3.39114e-06, gnorm=1.007, clip=50, loss_scale=32, train_wall=39, gb_free=29.5, wall=221096 2023-05-03 15:58:03 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 15:58:05 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 15:58:05 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 15:58:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:34 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 15:58:34 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:45 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 15:58:45 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:49 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 15:58:49 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:54 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 15:58:54 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 15:58:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 15:58:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 15:58:55 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.253 | loss_v1 0 | loss_v2 0 | nll_loss 2.087 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.25 | score 0.7549 | wps 3314 | wpb 3202.1 | bsz 39.4 | num_updates 54000 | best_score 0.7627 2023-05-03 15:58:55 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 54000 updates 2023-05-03 15:58:55 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_54000.pt 2023-05-03 15:59:19 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_54000.pt 2023-05-03 15:59:34 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_9_54000.pt (epoch 9 @ 54000 updates, score 0.7549) (writing took 39.172184712952 seconds) 2023-05-03 16:00:13 - progress_bar.py[line:274] - INFO: epoch 009: 5767 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7698, nsentences=120, sample_size=3899.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=595.4, ups=0.08, wpb=7698, bsz=120, num_updates=54010, lr=3.38586e-06, gnorm=0.988, clip=40, loss_scale=32, train_wall=39, gb_free=29.8, wall=221225 2023-05-03 16:00:53 - progress_bar.py[line:274] - INFO: epoch 009: 5777 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7924.1, nsentences=120, sample_size=3919.4, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1937.6, ups=0.24, wpb=7924.1, bsz=120, num_updates=54020, lr=3.38058e-06, gnorm=0.985, clip=30, loss_scale=32, train_wall=41, gb_free=30.4, wall=221266 2023-05-03 16:01:33 - progress_bar.py[line:274] - INFO: epoch 009: 5787 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7510.6, nsentences=120, sample_size=3848.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1873.9, ups=0.25, wpb=7510.6, bsz=120, num_updates=54030, lr=3.3753e-06, gnorm=1.004, clip=60, loss_scale=32, train_wall=40, gb_free=30.8, wall=221306 2023-05-03 16:02:13 - progress_bar.py[line:274] - INFO: epoch 009: 5797 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7651.1, nsentences=120, sample_size=4134.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1919.5, ups=0.25, wpb=7651.1, bsz=120, num_updates=54040, lr=3.37001e-06, gnorm=0.956, clip=40, loss_scale=32, train_wall=40, gb_free=29.9, wall=221346 2023-05-03 16:02:52 - progress_bar.py[line:274] - INFO: epoch 009: 5807 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7602.7, nsentences=120, sample_size=4059.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1959.4, ups=0.26, wpb=7602.7, bsz=120, num_updates=54050, lr=3.36473e-06, gnorm=1.002, clip=40, loss_scale=32, train_wall=39, gb_free=30.4, wall=221385 2023-05-03 16:03:31 - progress_bar.py[line:274] - INFO: epoch 009: 5817 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7744.4, nsentences=120, sample_size=4121.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1977.9, ups=0.26, wpb=7744.4, bsz=120, num_updates=54060, lr=3.35945e-06, gnorm=0.995, clip=40, loss_scale=32, train_wall=39, gb_free=30.7, wall=221424 2023-05-03 16:04:11 - progress_bar.py[line:274] - INFO: epoch 009: 5827 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7862.9, nsentences=120, sample_size=4056.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1962.2, ups=0.25, wpb=7862.9, bsz=120, num_updates=54070, lr=3.35417e-06, gnorm=0.98, clip=40, loss_scale=32, train_wall=40, gb_free=23.7, wall=221464 2023-05-03 16:04:51 - progress_bar.py[line:274] - INFO: epoch 009: 5837 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7777.2, nsentences=120, sample_size=4084.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1979.2, ups=0.25, wpb=7777.2, bsz=120, num_updates=54080, lr=3.34889e-06, gnorm=0.988, clip=30, loss_scale=32, train_wall=39, gb_free=30.5, wall=221503 2023-05-03 16:05:31 - progress_bar.py[line:274] - INFO: epoch 009: 5847 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7930.7, nsentences=120, sample_size=4245.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.6, ups=0.25, wpb=7930.7, bsz=120, num_updates=54090, lr=3.3436e-06, gnorm=0.973, clip=40, loss_scale=32, train_wall=40, gb_free=30.2, wall=221543 2023-05-03 16:06:11 - progress_bar.py[line:274] - INFO: epoch 009: 5857 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7409.4, nsentences=120, sample_size=4198.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1870.2, ups=0.25, wpb=7409.4, bsz=120, num_updates=54100, lr=3.33832e-06, gnorm=0.972, clip=50, loss_scale=32, train_wall=40, gb_free=30.6, wall=221583 2023-05-03 16:06:52 - progress_bar.py[line:274] - INFO: epoch 009: 5867 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7591.1, nsentences=120, sample_size=4265.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1856.9, ups=0.24, wpb=7591.1, bsz=120, num_updates=54110, lr=3.33304e-06, gnorm=0.952, clip=20, loss_scale=32, train_wall=41, gb_free=30.3, wall=221624 2023-05-03 16:07:31 - progress_bar.py[line:274] - INFO: epoch 009: 5877 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7806, nsentences=120, sample_size=4058, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1992.8, ups=0.26, wpb=7806, bsz=120, num_updates=54120, lr=3.32776e-06, gnorm=0.964, clip=20, loss_scale=32, train_wall=39, gb_free=29.6, wall=221663 2023-05-03 16:08:10 - progress_bar.py[line:274] - INFO: epoch 009: 5887 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7560.8, nsentences=120, sample_size=3894.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1907.7, ups=0.25, wpb=7560.8, bsz=120, num_updates=54130, lr=3.32248e-06, gnorm=1.003, clip=50, loss_scale=32, train_wall=40, gb_free=29.8, wall=221703 2023-05-03 16:08:50 - progress_bar.py[line:274] - INFO: epoch 009: 5897 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7634.7, nsentences=120, sample_size=3897.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1923.3, ups=0.25, wpb=7634.7, bsz=120, num_updates=54140, lr=3.31719e-06, gnorm=1.005, clip=50, loss_scale=32, train_wall=40, gb_free=29.3, wall=221742 2023-05-03 16:09:30 - progress_bar.py[line:274] - INFO: epoch 009: 5907 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7270.1, nsentences=120, sample_size=4321.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1834.9, ups=0.25, wpb=7270.1, bsz=120, num_updates=54150, lr=3.31191e-06, gnorm=0.937, clip=20, loss_scale=32, train_wall=40, gb_free=30.5, wall=221782 2023-05-03 16:10:09 - progress_bar.py[line:274] - INFO: epoch 009: 5917 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7650, nsentences=120, sample_size=4054.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1930, ups=0.25, wpb=7650, bsz=120, num_updates=54160, lr=3.30663e-06, gnorm=0.986, clip=40, loss_scale=32, train_wall=40, gb_free=30.6, wall=221822 2023-05-03 16:10:49 - progress_bar.py[line:274] - INFO: epoch 009: 5927 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7727.4, nsentences=120, sample_size=3885.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1923.2, ups=0.25, wpb=7727.4, bsz=120, num_updates=54170, lr=3.30135e-06, gnorm=1.019, clip=30, loss_scale=32, train_wall=40, gb_free=29.9, wall=221862 2023-05-03 16:11:29 - progress_bar.py[line:274] - INFO: epoch 009: 5937 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7652.1, nsentences=120, sample_size=4009.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1928.2, ups=0.25, wpb=7652.1, bsz=120, num_updates=54180, lr=3.29606e-06, gnorm=0.968, clip=10, loss_scale=32, train_wall=40, gb_free=30.8, wall=221902 2023-05-03 16:12:09 - progress_bar.py[line:274] - INFO: epoch 009: 5947 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7469.8, nsentences=120, sample_size=4110.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1876.7, ups=0.25, wpb=7469.8, bsz=120, num_updates=54190, lr=3.29078e-06, gnorm=0.955, clip=10, loss_scale=32, train_wall=40, gb_free=30.9, wall=221941 2023-05-03 16:12:49 - progress_bar.py[line:274] - INFO: epoch 009: 5957 / 6042 loss=2.403, loss_v1=0, loss_v2=0, nll_loss=1.151, ntokens=7815.3, nsentences=120, sample_size=4114.5, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1928.3, ups=0.25, wpb=7815.3, bsz=120, num_updates=54200, lr=3.2855e-06, gnorm=0.987, clip=50, loss_scale=32, train_wall=40, gb_free=30.5, wall=221982 2023-05-03 16:13:30 - progress_bar.py[line:274] - INFO: epoch 009: 5967 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7577.4, nsentences=120, sample_size=3781.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1874.2, ups=0.25, wpb=7577.4, bsz=120, num_updates=54210, lr=3.28022e-06, gnorm=0.998, clip=50, loss_scale=32, train_wall=40, gb_free=30.1, wall=222022 2023-05-03 16:14:10 - progress_bar.py[line:274] - INFO: epoch 009: 5977 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7693.7, nsentences=120, sample_size=4080.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1937.7, ups=0.25, wpb=7693.7, bsz=120, num_updates=54220, lr=3.27494e-06, gnorm=0.992, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=222062 2023-05-03 16:14:49 - progress_bar.py[line:274] - INFO: epoch 009: 5987 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7789.7, nsentences=120, sample_size=3965.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1997.7, ups=0.26, wpb=7789.7, bsz=120, num_updates=54230, lr=3.26965e-06, gnorm=0.985, clip=50, loss_scale=32, train_wall=39, gb_free=30.5, wall=222101 2023-05-03 16:15:27 - progress_bar.py[line:274] - INFO: epoch 009: 5997 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7422.9, nsentences=120, sample_size=4127.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1912.7, ups=0.26, wpb=7422.9, bsz=120, num_updates=54240, lr=3.26437e-06, gnorm=0.98, clip=30, loss_scale=32, train_wall=39, gb_free=29.6, wall=222140 2023-05-03 16:16:06 - progress_bar.py[line:274] - INFO: epoch 009: 6007 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7736.7, nsentences=120, sample_size=3871.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1986.1, ups=0.26, wpb=7736.7, bsz=120, num_updates=54250, lr=3.25909e-06, gnorm=1.019, clip=50, loss_scale=32, train_wall=39, gb_free=27.5, wall=222179 2023-05-03 16:16:47 - progress_bar.py[line:274] - INFO: epoch 009: 6017 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=8008.5, nsentences=120, sample_size=3988, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1993.7, ups=0.25, wpb=8008.5, bsz=120, num_updates=54260, lr=3.25381e-06, gnorm=0.98, clip=30, loss_scale=32, train_wall=40, gb_free=30.7, wall=222219 2023-05-03 16:17:26 - progress_bar.py[line:274] - INFO: epoch 009: 6027 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=8052.6, nsentences=120, sample_size=4091.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2056.6, ups=0.26, wpb=8052.6, bsz=120, num_updates=54270, lr=3.24853e-06, gnorm=0.963, clip=40, loss_scale=32, train_wall=39, gb_free=27.2, wall=222258 2023-05-03 16:18:06 - progress_bar.py[line:274] - INFO: epoch 009: 6037 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=8022.5, nsentences=120, sample_size=3828.2, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1973.8, ups=0.25, wpb=8022.5, bsz=120, num_updates=54280, lr=3.24324e-06, gnorm=0.973, clip=30, loss_scale=32, train_wall=41, gb_free=29.9, wall=222299 2023-05-03 16:18:26 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 16:18:27 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 16:18:27 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:18:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:44 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 16:18:44 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:18:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:56 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 16:18:56 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:18:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:18:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:18:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:07 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 16:19:07 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:19:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:11 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 16:19:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:19:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:16 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 16:19:16 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 16:19:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 16:19:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 16:19:17 - progress_bar.py[line:282] - INFO: epoch 009 | valid on 'valid' subset | loss 3.26 | loss_v1 0 | loss_v2 0 | nll_loss 2.093 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.27 | score 0.7573 | wps 3304 | wpb 3202.1 | bsz 39.4 | num_updates 54285 | best_score 0.7627 2023-05-03 16:19:17 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 9 @ 54285 updates 2023-05-03 16:19:17 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 16:19:43 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt 2023-05-03 16:19:43 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_last.pt (epoch 9 @ 54285 updates, score 0.7573) (writing took 26.392783724004403 seconds) 2023-05-03 16:19:43 - train.py[line:332] - INFO: end of epoch 9 (average epoch stats below) 2023-05-03 16:19:43 - progress_bar.py[line:282] - INFO: epoch 009 | loss 2.36 | loss_v1 0 | loss_v2 0 | nll_loss 1.103 | ntokens 7720.17 | nsentences 119.992 | sample_size 4038.19 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.15 | wps 1886.8 | ups 0.24 | wpb 7720.2 | bsz 120 | num_updates 54285 | lr 3.2406e-06 | gnorm 0.983 | clip 38.9 | loss_scale 32 | train_wall 24007 | gb_free 30.8 | wall 222396 2023-05-03 16:19:43 - trainer.py[line:639] - INFO: loading train data for epoch 10 2023-05-03 16:19:43 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-03 16:19:43 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-03 16:19:45 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-03 16:19:46 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-03 16:19:47 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-03 16:19:47 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-03 16:19:47 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-03 16:19:47 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-03 16:19:48 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-03 16:19:48 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-03 16:19:49 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-03 16:19:49 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-03 16:19:49 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-03 16:19:49 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-03 16:19:50 - trainer.py[line:703] - INFO: begin training epoch 10 2023-05-03 16:19:50 - train.py[line:305] - INFO: Start iterating over samples 2023-05-03 16:20:10 - progress_bar.py[line:274] - INFO: epoch 010: 5 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7072.6, nsentences=116, sample_size=3996.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=570.6, ups=0.08, wpb=7072.6, bsz=116, num_updates=54290, lr=3.23796e-06, gnorm=1.008, clip=50, loss_scale=32, train_wall=39, gb_free=29.5, wall=222423 2023-05-03 16:20:51 - progress_bar.py[line:274] - INFO: epoch 010: 15 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7824.1, nsentences=120, sample_size=4015.1, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1942.5, ups=0.25, wpb=7824.1, bsz=120, num_updates=54300, lr=3.23268e-06, gnorm=0.96, clip=40, loss_scale=32, train_wall=40, gb_free=30.2, wall=222463 2023-05-03 16:21:31 - progress_bar.py[line:274] - INFO: epoch 010: 25 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7669.3, nsentences=120, sample_size=3921.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1920, ups=0.25, wpb=7669.3, bsz=120, num_updates=54310, lr=3.2274e-06, gnorm=0.988, clip=50, loss_scale=32, train_wall=40, gb_free=29.7, wall=222503 2023-05-03 16:22:11 - progress_bar.py[line:274] - INFO: epoch 010: 35 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7733.3, nsentences=120, sample_size=3923.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1904.1, ups=0.25, wpb=7733.3, bsz=120, num_updates=54320, lr=3.22211e-06, gnorm=0.982, clip=20, loss_scale=32, train_wall=41, gb_free=30.1, wall=222544 2023-05-03 16:22:51 - progress_bar.py[line:274] - INFO: epoch 010: 45 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7917.3, nsentences=120, sample_size=4249.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1991.8, ups=0.25, wpb=7917.3, bsz=120, num_updates=54330, lr=3.21683e-06, gnorm=0.973, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=222583 2023-05-03 16:23:31 - progress_bar.py[line:274] - INFO: epoch 010: 55 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7526.2, nsentences=120, sample_size=3804.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1874.8, ups=0.25, wpb=7526.2, bsz=120, num_updates=54340, lr=3.21155e-06, gnorm=1.005, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=222624 2023-05-03 16:24:11 - progress_bar.py[line:274] - INFO: epoch 010: 65 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7856.3, nsentences=120, sample_size=4031.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1952.9, ups=0.25, wpb=7856.3, bsz=120, num_updates=54350, lr=3.20627e-06, gnorm=1.011, clip=60, loss_scale=64, train_wall=40, gb_free=30.7, wall=222664 2023-05-03 16:24:52 - progress_bar.py[line:274] - INFO: epoch 010: 75 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7710.5, nsentences=120, sample_size=3983.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1914.6, ups=0.25, wpb=7710.5, bsz=120, num_updates=54360, lr=3.20099e-06, gnorm=1.024, clip=50, loss_scale=64, train_wall=40, gb_free=31.4, wall=222704 2023-05-03 16:25:31 - progress_bar.py[line:274] - INFO: epoch 010: 85 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7746, nsentences=120, sample_size=4115.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1941.6, ups=0.25, wpb=7746, bsz=120, num_updates=54370, lr=3.1957e-06, gnorm=0.965, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=222744 2023-05-03 16:26:11 - progress_bar.py[line:274] - INFO: epoch 010: 95 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7861.8, nsentences=120, sample_size=3848, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2010.3, ups=0.26, wpb=7861.8, bsz=120, num_updates=54380, lr=3.19042e-06, gnorm=1.002, clip=60, loss_scale=64, train_wall=39, gb_free=29.2, wall=222783 2023-05-03 16:26:51 - progress_bar.py[line:274] - INFO: epoch 010: 105 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7902.9, nsentences=120, sample_size=4160.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1950, ups=0.25, wpb=7902.9, bsz=120, num_updates=54390, lr=3.18514e-06, gnorm=0.948, clip=10, loss_scale=64, train_wall=40, gb_free=27.5, wall=222824 2023-05-03 16:27:31 - progress_bar.py[line:274] - INFO: epoch 010: 115 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7664.7, nsentences=120, sample_size=4209.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1929.3, ups=0.25, wpb=7664.7, bsz=120, num_updates=54400, lr=3.17986e-06, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=222863 2023-05-03 16:28:10 - progress_bar.py[line:274] - INFO: epoch 010: 125 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7679.1, nsentences=120, sample_size=3931.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1955.2, ups=0.25, wpb=7679.1, bsz=120, num_updates=54410, lr=3.17458e-06, gnorm=1.048, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=222903 2023-05-03 16:28:49 - progress_bar.py[line:274] - INFO: epoch 010: 135 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7535.5, nsentences=120, sample_size=4250.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1937.5, ups=0.26, wpb=7535.5, bsz=120, num_updates=54420, lr=3.16929e-06, gnorm=0.938, clip=30, loss_scale=64, train_wall=39, gb_free=28.4, wall=222941 2023-05-03 16:29:29 - progress_bar.py[line:274] - INFO: epoch 010: 145 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7865.3, nsentences=120, sample_size=4150.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949, ups=0.25, wpb=7865.3, bsz=120, num_updates=54430, lr=3.16401e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=222982 2023-05-03 16:30:09 - progress_bar.py[line:274] - INFO: epoch 010: 155 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7466.8, nsentences=120, sample_size=4078.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1888.5, ups=0.25, wpb=7466.8, bsz=120, num_updates=54440, lr=3.15873e-06, gnorm=0.995, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=223021 2023-05-03 16:30:49 - progress_bar.py[line:274] - INFO: epoch 010: 165 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7698.7, nsentences=120, sample_size=4179.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1936.1, ups=0.25, wpb=7698.7, bsz=120, num_updates=54450, lr=3.15345e-06, gnorm=0.977, clip=20, loss_scale=64, train_wall=40, gb_free=28.5, wall=223061 2023-05-03 16:31:28 - progress_bar.py[line:274] - INFO: epoch 010: 175 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7723.6, nsentences=120, sample_size=4112, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1949.4, ups=0.25, wpb=7723.6, bsz=120, num_updates=54460, lr=3.14816e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=28.5, wall=223101 2023-05-03 16:32:08 - progress_bar.py[line:274] - INFO: epoch 010: 185 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7834.2, nsentences=120, sample_size=3792.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1962.2, ups=0.25, wpb=7834.2, bsz=120, num_updates=54470, lr=3.14288e-06, gnorm=1.006, clip=40, loss_scale=64, train_wall=40, gb_free=29.5, wall=223141 2023-05-03 16:32:48 - progress_bar.py[line:274] - INFO: epoch 010: 195 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7812.2, nsentences=120, sample_size=3825.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1942.4, ups=0.25, wpb=7812.2, bsz=120, num_updates=54480, lr=3.1376e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=223181 2023-05-03 16:33:28 - progress_bar.py[line:274] - INFO: epoch 010: 205 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7691.6, nsentences=120, sample_size=3972.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1937.3, ups=0.25, wpb=7691.6, bsz=120, num_updates=54490, lr=3.13232e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=223221 2023-05-03 16:34:08 - progress_bar.py[line:274] - INFO: epoch 010: 215 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7663.4, nsentences=120, sample_size=4250.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1935.9, ups=0.25, wpb=7663.4, bsz=120, num_updates=54500, lr=3.12704e-06, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=29.2, wall=223260 2023-05-03 16:34:47 - progress_bar.py[line:274] - INFO: epoch 010: 225 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7520.9, nsentences=120, sample_size=3835.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1919.7, ups=0.26, wpb=7520.9, bsz=120, num_updates=54510, lr=3.12175e-06, gnorm=1.012, clip=50, loss_scale=64, train_wall=39, gb_free=30.8, wall=223299 2023-05-03 16:35:27 - progress_bar.py[line:274] - INFO: epoch 010: 235 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7918.5, nsentences=120, sample_size=3963.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1978.7, ups=0.25, wpb=7918.5, bsz=120, num_updates=54520, lr=3.11647e-06, gnorm=1.006, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=223339 2023-05-03 16:36:07 - progress_bar.py[line:274] - INFO: epoch 010: 245 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7902.1, nsentences=120, sample_size=4112.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1983, ups=0.25, wpb=7902.1, bsz=120, num_updates=54530, lr=3.11119e-06, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=223379 2023-05-03 16:36:47 - progress_bar.py[line:274] - INFO: epoch 010: 255 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7875.8, nsentences=120, sample_size=4130.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1958.1, ups=0.25, wpb=7875.8, bsz=120, num_updates=54540, lr=3.10591e-06, gnorm=0.96, clip=30, loss_scale=64, train_wall=40, gb_free=30.8, wall=223419 2023-05-03 16:37:26 - progress_bar.py[line:274] - INFO: epoch 010: 265 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7815.3, nsentences=120, sample_size=3867.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1980.9, ups=0.25, wpb=7815.3, bsz=120, num_updates=54550, lr=3.10063e-06, gnorm=1.001, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=223459 2023-05-03 16:38:06 - progress_bar.py[line:274] - INFO: epoch 010: 275 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7679.9, nsentences=120, sample_size=4268.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1934.1, ups=0.25, wpb=7679.9, bsz=120, num_updates=54560, lr=3.09534e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=223499 2023-05-03 16:38:46 - progress_bar.py[line:274] - INFO: epoch 010: 285 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7522.1, nsentences=120, sample_size=4444.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1907.9, ups=0.25, wpb=7522.1, bsz=120, num_updates=54570, lr=3.09006e-06, gnorm=0.933, clip=20, loss_scale=64, train_wall=39, gb_free=27.9, wall=223538 2023-05-03 16:39:24 - progress_bar.py[line:274] - INFO: epoch 010: 295 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7570.8, nsentences=120, sample_size=3927.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1969.1, ups=0.26, wpb=7570.8, bsz=120, num_updates=54580, lr=3.08478e-06, gnorm=0.998, clip=40, loss_scale=64, train_wall=38, gb_free=29.8, wall=223576 2023-05-03 16:40:04 - progress_bar.py[line:274] - INFO: epoch 010: 305 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7824.1, nsentences=120, sample_size=3962.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1965.6, ups=0.25, wpb=7824.1, bsz=120, num_updates=54590, lr=3.0795e-06, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=223616 2023-05-03 16:40:44 - progress_bar.py[line:274] - INFO: epoch 010: 315 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7658.7, nsentences=120, sample_size=4041.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1908.3, ups=0.25, wpb=7658.7, bsz=120, num_updates=54600, lr=3.07421e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=223656 2023-05-03 16:41:24 - progress_bar.py[line:274] - INFO: epoch 010: 325 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7765.4, nsentences=120, sample_size=3990.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1946.7, ups=0.25, wpb=7765.4, bsz=120, num_updates=54610, lr=3.06893e-06, gnorm=0.954, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=223696 2023-05-03 16:42:04 - progress_bar.py[line:274] - INFO: epoch 010: 335 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7453.5, nsentences=120, sample_size=4147.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1857.3, ups=0.25, wpb=7453.5, bsz=120, num_updates=54620, lr=3.06365e-06, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=30.1, wall=223736 2023-05-03 16:42:44 - progress_bar.py[line:274] - INFO: epoch 010: 345 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7978.3, nsentences=120, sample_size=3600.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2017.7, ups=0.25, wpb=7978.3, bsz=120, num_updates=54630, lr=3.05837e-06, gnorm=1.121, clip=50, loss_scale=64, train_wall=39, gb_free=29.2, wall=223776 2023-05-03 16:43:24 - progress_bar.py[line:274] - INFO: epoch 010: 355 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7947.3, nsentences=120, sample_size=4045.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1952.1, ups=0.25, wpb=7947.3, bsz=120, num_updates=54640, lr=3.05309e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=41, gb_free=29.4, wall=223817 2023-05-03 16:44:05 - progress_bar.py[line:274] - INFO: epoch 010: 365 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7427.8, nsentences=120, sample_size=4080.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1843, ups=0.25, wpb=7427.8, bsz=120, num_updates=54650, lr=3.0478e-06, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=223857 2023-05-03 16:44:44 - progress_bar.py[line:274] - INFO: epoch 010: 375 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7722.2, nsentences=120, sample_size=4458.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1966.4, ups=0.25, wpb=7722.2, bsz=120, num_updates=54660, lr=3.04252e-06, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=27.5, wall=223896 2023-05-03 16:45:24 - progress_bar.py[line:274] - INFO: epoch 010: 385 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7699.6, nsentences=120, sample_size=3914.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1925.2, ups=0.25, wpb=7699.6, bsz=120, num_updates=54670, lr=3.03724e-06, gnorm=0.995, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=223936 2023-05-03 16:46:04 - progress_bar.py[line:274] - INFO: epoch 010: 395 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7975.5, nsentences=120, sample_size=3762.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1991.7, ups=0.25, wpb=7975.5, bsz=120, num_updates=54680, lr=3.03196e-06, gnorm=1.015, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=223976 2023-05-03 16:46:44 - progress_bar.py[line:274] - INFO: epoch 010: 405 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7794.4, nsentences=120, sample_size=3912.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1934.4, ups=0.25, wpb=7794.4, bsz=120, num_updates=54690, lr=3.02667e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=28.3, wall=224017 2023-05-03 16:47:24 - progress_bar.py[line:274] - INFO: epoch 010: 415 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7787.3, nsentences=120, sample_size=3924.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1965.1, ups=0.25, wpb=7787.3, bsz=120, num_updates=54700, lr=3.02139e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=224056 2023-05-03 16:48:03 - progress_bar.py[line:274] - INFO: epoch 010: 425 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7769.1, nsentences=120, sample_size=3771.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1980.6, ups=0.25, wpb=7769.1, bsz=120, num_updates=54710, lr=3.01611e-06, gnorm=1.034, clip=60, loss_scale=64, train_wall=39, gb_free=29.8, wall=224095 2023-05-03 16:48:42 - progress_bar.py[line:274] - INFO: epoch 010: 435 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7922.7, nsentences=120, sample_size=3824.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2029.6, ups=0.26, wpb=7922.7, bsz=120, num_updates=54720, lr=3.01083e-06, gnorm=1.003, clip=60, loss_scale=64, train_wall=39, gb_free=31, wall=224134 2023-05-03 16:49:22 - progress_bar.py[line:274] - INFO: epoch 010: 445 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7866, nsentences=120, sample_size=4026.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1967.1, ups=0.25, wpb=7866, bsz=120, num_updates=54730, lr=3.00555e-06, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=224174 2023-05-03 16:50:01 - progress_bar.py[line:274] - INFO: epoch 010: 455 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7655.4, nsentences=120, sample_size=4057.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1955.2, ups=0.26, wpb=7655.4, bsz=120, num_updates=54740, lr=3.00026e-06, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=30.1, wall=224214 2023-05-03 16:50:41 - progress_bar.py[line:274] - INFO: epoch 010: 465 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7938.9, nsentences=120, sample_size=3919.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2004.1, ups=0.25, wpb=7938.9, bsz=120, num_updates=54750, lr=2.99498e-06, gnorm=0.953, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=224253 2023-05-03 16:51:21 - progress_bar.py[line:274] - INFO: epoch 010: 475 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7815.6, nsentences=120, sample_size=3922.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1963.8, ups=0.25, wpb=7815.6, bsz=120, num_updates=54760, lr=2.9897e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=224293 2023-05-03 16:52:00 - progress_bar.py[line:274] - INFO: epoch 010: 485 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7736.2, nsentences=120, sample_size=3863.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1979.8, ups=0.26, wpb=7736.2, bsz=120, num_updates=54770, lr=2.98442e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=39, gb_free=30.5, wall=224332 2023-05-03 16:52:40 - progress_bar.py[line:274] - INFO: epoch 010: 495 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7636, nsentences=120, sample_size=4188.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1886.5, ups=0.25, wpb=7636, bsz=120, num_updates=54780, lr=2.97914e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=224373 2023-05-03 16:53:20 - progress_bar.py[line:274] - INFO: epoch 010: 505 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7839, nsentences=120, sample_size=4050.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1969.5, ups=0.25, wpb=7839, bsz=120, num_updates=54790, lr=2.97385e-06, gnorm=0.978, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=224412 2023-05-03 16:54:01 - progress_bar.py[line:274] - INFO: epoch 010: 515 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=8062.9, nsentences=120, sample_size=4093.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1982.6, ups=0.25, wpb=8062.9, bsz=120, num_updates=54800, lr=2.96857e-06, gnorm=0.982, clip=50, loss_scale=64, train_wall=41, gb_free=30.8, wall=224453 2023-05-03 16:54:40 - progress_bar.py[line:274] - INFO: epoch 010: 525 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7902.8, nsentences=120, sample_size=4000.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1986.4, ups=0.25, wpb=7902.8, bsz=120, num_updates=54810, lr=2.96329e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29.1, wall=224493 2023-05-03 16:55:21 - progress_bar.py[line:274] - INFO: epoch 010: 535 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7631.5, nsentences=120, sample_size=3816.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1893, ups=0.25, wpb=7631.5, bsz=120, num_updates=54820, lr=2.95801e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=224533 2023-05-03 16:56:00 - progress_bar.py[line:274] - INFO: epoch 010: 545 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7552.2, nsentences=120, sample_size=4188.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1910.1, ups=0.25, wpb=7552.2, bsz=120, num_updates=54830, lr=2.95272e-06, gnorm=0.993, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=224573 2023-05-03 16:56:41 - progress_bar.py[line:274] - INFO: epoch 010: 555 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=8207.2, nsentences=120, sample_size=4359.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2030.3, ups=0.25, wpb=8207.2, bsz=120, num_updates=54840, lr=2.94744e-06, gnorm=0.917, clip=10, loss_scale=128, train_wall=40, gb_free=29.6, wall=224613 2023-05-03 16:57:20 - progress_bar.py[line:274] - INFO: epoch 010: 565 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7683, nsentences=120, sample_size=4391.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1932.8, ups=0.25, wpb=7683, bsz=120, num_updates=54850, lr=2.94216e-06, gnorm=0.947, clip=20, loss_scale=128, train_wall=40, gb_free=29.8, wall=224653 2023-05-03 16:58:00 - progress_bar.py[line:274] - INFO: epoch 010: 575 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7591.1, nsentences=120, sample_size=4320.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1914.1, ups=0.25, wpb=7591.1, bsz=120, num_updates=54860, lr=2.93688e-06, gnorm=0.952, clip=20, loss_scale=128, train_wall=40, gb_free=29.4, wall=224693 2023-05-03 16:58:08 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 16:58:44 - progress_bar.py[line:274] - INFO: epoch 010: 586 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7892.3, nsentences=120, sample_size=4075.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1781.1, ups=0.23, wpb=7892.3, bsz=120, num_updates=54870, lr=2.9316e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=44, gb_free=28.7, wall=224737 2023-05-03 16:59:24 - progress_bar.py[line:274] - INFO: epoch 010: 596 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.054, ntokens=7546.7, nsentences=120, sample_size=3973.5, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1897.7, ups=0.25, wpb=7546.7, bsz=120, num_updates=54880, lr=2.92631e-06, gnorm=1.017, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=224777 2023-05-03 17:00:04 - progress_bar.py[line:274] - INFO: epoch 010: 606 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7587.2, nsentences=120, sample_size=4399.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1920.6, ups=0.25, wpb=7587.2, bsz=120, num_updates=54890, lr=2.92103e-06, gnorm=0.926, clip=10, loss_scale=64, train_wall=39, gb_free=30.5, wall=224816 2023-05-03 17:00:43 - progress_bar.py[line:274] - INFO: epoch 010: 616 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7782.5, nsentences=120, sample_size=4001.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1968.9, ups=0.25, wpb=7782.5, bsz=120, num_updates=54900, lr=2.91575e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=224856 2023-05-03 17:01:22 - progress_bar.py[line:274] - INFO: epoch 010: 626 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7576.6, nsentences=120, sample_size=4113.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1929.5, ups=0.25, wpb=7576.6, bsz=120, num_updates=54910, lr=2.91047e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=39, gb_free=29.6, wall=224895 2023-05-03 17:02:02 - progress_bar.py[line:274] - INFO: epoch 010: 636 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7679.1, nsentences=120, sample_size=3929, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1928.3, ups=0.25, wpb=7679.1, bsz=120, num_updates=54920, lr=2.90519e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=224935 2023-05-03 17:02:42 - progress_bar.py[line:274] - INFO: epoch 010: 646 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7610.4, nsentences=120, sample_size=4232, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1907.9, ups=0.25, wpb=7610.4, bsz=120, num_updates=54930, lr=2.8999e-06, gnorm=0.951, clip=20, loss_scale=64, train_wall=40, gb_free=26.3, wall=224975 2023-05-03 17:03:22 - progress_bar.py[line:274] - INFO: epoch 010: 656 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7801.1, nsentences=120, sample_size=3959.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1959, ups=0.25, wpb=7801.1, bsz=120, num_updates=54940, lr=2.89462e-06, gnorm=1.007, clip=50, loss_scale=64, train_wall=40, gb_free=30.2, wall=225014 2023-05-03 17:04:01 - progress_bar.py[line:274] - INFO: epoch 010: 666 / 6042 loss=2.309, loss_v1=0, loss_v2=0, nll_loss=1.042, ntokens=7599.7, nsentences=120, sample_size=4032.9, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1945.8, ups=0.26, wpb=7599.7, bsz=120, num_updates=54950, lr=2.88934e-06, gnorm=0.996, clip=70, loss_scale=64, train_wall=39, gb_free=29.7, wall=225054 2023-05-03 17:04:41 - progress_bar.py[line:274] - INFO: epoch 010: 676 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7789.6, nsentences=120, sample_size=3985.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1963.6, ups=0.25, wpb=7789.6, bsz=120, num_updates=54960, lr=2.88406e-06, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=225093 2023-05-03 17:05:21 - progress_bar.py[line:274] - INFO: epoch 010: 686 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7560.5, nsentences=120, sample_size=4451.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1898.5, ups=0.25, wpb=7560.5, bsz=120, num_updates=54970, lr=2.87877e-06, gnorm=0.939, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=225133 2023-05-03 17:06:00 - progress_bar.py[line:274] - INFO: epoch 010: 696 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7565.2, nsentences=120, sample_size=4045.9, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1902.4, ups=0.25, wpb=7565.2, bsz=120, num_updates=54980, lr=2.87349e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=225173 2023-05-03 17:06:40 - progress_bar.py[line:274] - INFO: epoch 010: 706 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7562.3, nsentences=120, sample_size=3962.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1910.8, ups=0.25, wpb=7562.3, bsz=120, num_updates=54990, lr=2.86821e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=225212 2023-05-03 17:07:20 - progress_bar.py[line:274] - INFO: epoch 010: 716 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7812.1, nsentences=120, sample_size=3898.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1952.9, ups=0.25, wpb=7812.1, bsz=120, num_updates=55000, lr=2.86293e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=225252 2023-05-03 17:07:20 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 17:07:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 17:07:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:07:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 17:07:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:07:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 17:07:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:07:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:07:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:07:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 17:08:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:08:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:06 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 17:08:06 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:08:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:11 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 17:08:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 17:08:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 17:08:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 17:08:11 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.271 | loss_v1 0 | loss_v2 0 | nll_loss 2.106 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.3 | score 0.7563 | wps 3296.6 | wpb 3202.1 | bsz 39.4 | num_updates 55000 | best_score 0.7627 2023-05-03 17:08:11 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 55000 updates 2023-05-03 17:08:11 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_55000.pt 2023-05-03 17:08:34 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_55000.pt 2023-05-03 17:08:48 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_55000.pt (epoch 10 @ 55000 updates, score 0.7563) (writing took 36.408049114979804 seconds) 2023-05-03 17:09:27 - progress_bar.py[line:274] - INFO: epoch 010: 726 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7521.2, nsentences=120, sample_size=4026.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=591.7, ups=0.08, wpb=7521.2, bsz=120, num_updates=55010, lr=2.85765e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=29.9, wall=225380 2023-05-03 17:09:47 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 2023-05-03 17:10:10 - progress_bar.py[line:274] - INFO: epoch 010: 737 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7454.1, nsentences=120, sample_size=4059.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1723.3, ups=0.23, wpb=7454.1, bsz=120, num_updates=55020, lr=2.85236e-06, gnorm=0.973, clip=20, loss_scale=32, train_wall=43, gb_free=29.7, wall=225423 2023-05-03 17:10:50 - progress_bar.py[line:274] - INFO: epoch 010: 747 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.146, ntokens=7713.4, nsentences=120, sample_size=3795.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1926.8, ups=0.25, wpb=7713.4, bsz=120, num_updates=55030, lr=2.84708e-06, gnorm=0.98, clip=40, loss_scale=32, train_wall=40, gb_free=29.7, wall=225463 2023-05-03 17:11:30 - progress_bar.py[line:274] - INFO: epoch 010: 757 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7771.2, nsentences=120, sample_size=4081.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1936.3, ups=0.25, wpb=7771.2, bsz=120, num_updates=55040, lr=2.8418e-06, gnorm=0.987, clip=20, loss_scale=32, train_wall=40, gb_free=29.7, wall=225503 2023-05-03 17:12:10 - progress_bar.py[line:274] - INFO: epoch 010: 767 / 6042 loss=2.301, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7622.3, nsentences=120, sample_size=4181.3, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1913.1, ups=0.25, wpb=7622.3, bsz=120, num_updates=55050, lr=2.83652e-06, gnorm=0.959, clip=30, loss_scale=32, train_wall=40, gb_free=28.7, wall=225543 2023-05-03 17:12:51 - progress_bar.py[line:274] - INFO: epoch 010: 777 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7592.3, nsentences=120, sample_size=4183.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1871.1, ups=0.25, wpb=7592.3, bsz=120, num_updates=55060, lr=2.83124e-06, gnorm=0.968, clip=40, loss_scale=32, train_wall=40, gb_free=30.1, wall=225583 2023-05-03 17:13:31 - progress_bar.py[line:274] - INFO: epoch 010: 787 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7657.7, nsentences=120, sample_size=3930.1, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1929.1, ups=0.25, wpb=7657.7, bsz=120, num_updates=55070, lr=2.82595e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=225623 2023-05-03 17:14:10 - progress_bar.py[line:274] - INFO: epoch 010: 797 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7596, nsentences=120, sample_size=4007.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1910.3, ups=0.25, wpb=7596, bsz=120, num_updates=55080, lr=2.82067e-06, gnorm=0.991, clip=40, loss_scale=32, train_wall=40, gb_free=29.4, wall=225663 2023-05-03 17:14:51 - progress_bar.py[line:274] - INFO: epoch 010: 807 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7738.2, nsentences=120, sample_size=4022.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1914.5, ups=0.25, wpb=7738.2, bsz=120, num_updates=55090, lr=2.81539e-06, gnorm=0.998, clip=40, loss_scale=32, train_wall=40, gb_free=30.6, wall=225703 2023-05-03 17:15:30 - progress_bar.py[line:274] - INFO: epoch 010: 817 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7634.3, nsentences=120, sample_size=4130.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1923.8, ups=0.25, wpb=7634.3, bsz=120, num_updates=55100, lr=2.81011e-06, gnorm=0.979, clip=50, loss_scale=32, train_wall=40, gb_free=30.4, wall=225743 2023-05-03 17:16:10 - progress_bar.py[line:274] - INFO: epoch 010: 827 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.059, ntokens=7419, nsentences=120, sample_size=4282.4, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1881.7, ups=0.25, wpb=7419, bsz=120, num_updates=55110, lr=2.80482e-06, gnorm=0.996, clip=40, loss_scale=32, train_wall=39, gb_free=24.6, wall=225782 2023-05-03 17:16:49 - progress_bar.py[line:274] - INFO: epoch 010: 837 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7700.4, nsentences=120, sample_size=4138.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1979.2, ups=0.26, wpb=7700.4, bsz=120, num_updates=55120, lr=2.79954e-06, gnorm=0.994, clip=40, loss_scale=32, train_wall=39, gb_free=29.6, wall=225821 2023-05-03 17:17:29 - progress_bar.py[line:274] - INFO: epoch 010: 847 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7629.1, nsentences=120, sample_size=4026.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1894, ups=0.25, wpb=7629.1, bsz=120, num_updates=55130, lr=2.79426e-06, gnorm=0.994, clip=40, loss_scale=32, train_wall=40, gb_free=26.9, wall=225862 2023-05-03 17:18:09 - progress_bar.py[line:274] - INFO: epoch 010: 857 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7825.2, nsentences=120, sample_size=4021.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1980.9, ups=0.25, wpb=7825.2, bsz=120, num_updates=55140, lr=2.78898e-06, gnorm=0.992, clip=40, loss_scale=32, train_wall=39, gb_free=29.4, wall=225901 2023-05-03 17:18:49 - progress_bar.py[line:274] - INFO: epoch 010: 867 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7692.5, nsentences=120, sample_size=4007.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1909.9, ups=0.25, wpb=7692.5, bsz=120, num_updates=55150, lr=2.7837e-06, gnorm=1.02, clip=50, loss_scale=32, train_wall=40, gb_free=30.6, wall=225941 2023-05-03 17:19:29 - progress_bar.py[line:274] - INFO: epoch 010: 877 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7754.3, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1928.9, ups=0.25, wpb=7754.3, bsz=120, num_updates=55160, lr=2.77841e-06, gnorm=0.954, clip=10, loss_scale=32, train_wall=40, gb_free=30.3, wall=225982 2023-05-03 17:20:09 - progress_bar.py[line:274] - INFO: epoch 010: 887 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7771.9, nsentences=120, sample_size=3880.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1936.4, ups=0.25, wpb=7771.9, bsz=120, num_updates=55170, lr=2.77313e-06, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=28.8, wall=226022 2023-05-03 17:20:49 - progress_bar.py[line:274] - INFO: epoch 010: 897 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7841.7, nsentences=120, sample_size=3752.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1980.9, ups=0.25, wpb=7841.7, bsz=120, num_updates=55180, lr=2.76785e-06, gnorm=1.011, clip=60, loss_scale=32, train_wall=40, gb_free=30.6, wall=226061 2023-05-03 17:21:28 - progress_bar.py[line:274] - INFO: epoch 010: 907 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7939.4, nsentences=120, sample_size=3852.3, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2001.4, ups=0.25, wpb=7939.4, bsz=120, num_updates=55190, lr=2.76257e-06, gnorm=1.004, clip=60, loss_scale=32, train_wall=40, gb_free=30.1, wall=226101 2023-05-03 17:22:09 - progress_bar.py[line:274] - INFO: epoch 010: 917 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7704.9, nsentences=120, sample_size=3850.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1891.9, ups=0.25, wpb=7704.9, bsz=120, num_updates=55200, lr=2.75728e-06, gnorm=1.005, clip=50, loss_scale=32, train_wall=41, gb_free=31, wall=226142 2023-05-03 17:22:48 - progress_bar.py[line:274] - INFO: epoch 010: 927 / 6042 loss=2.316, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7490.3, nsentences=120, sample_size=4223.7, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1907.9, ups=0.25, wpb=7490.3, bsz=120, num_updates=55210, lr=2.752e-06, gnorm=0.946, clip=10, loss_scale=32, train_wall=39, gb_free=29.6, wall=226181 2023-05-03 17:23:30 - progress_bar.py[line:274] - INFO: epoch 010: 937 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7928.1, nsentences=120, sample_size=3808.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1926.7, ups=0.24, wpb=7928.1, bsz=120, num_updates=55220, lr=2.74672e-06, gnorm=0.983, clip=30, loss_scale=32, train_wall=41, gb_free=30.8, wall=226222 2023-05-03 17:24:10 - progress_bar.py[line:274] - INFO: epoch 010: 947 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7851, nsentences=120, sample_size=4016.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1928.2, ups=0.25, wpb=7851, bsz=120, num_updates=55230, lr=2.74144e-06, gnorm=1.006, clip=50, loss_scale=32, train_wall=41, gb_free=29.7, wall=226263 2023-05-03 17:24:50 - progress_bar.py[line:274] - INFO: epoch 010: 957 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=8001.8, nsentences=120, sample_size=3923.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1999, ups=0.25, wpb=8001.8, bsz=120, num_updates=55240, lr=2.73616e-06, gnorm=0.995, clip=50, loss_scale=32, train_wall=40, gb_free=30.6, wall=226303 2023-05-03 17:25:30 - progress_bar.py[line:274] - INFO: epoch 010: 967 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7695.4, nsentences=120, sample_size=3944.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1947.9, ups=0.25, wpb=7695.4, bsz=120, num_updates=55250, lr=2.73087e-06, gnorm=0.984, clip=30, loss_scale=32, train_wall=39, gb_free=29.9, wall=226342 2023-05-03 17:26:10 - progress_bar.py[line:274] - INFO: epoch 010: 977 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7952, nsentences=120, sample_size=3984.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1987.2, ups=0.25, wpb=7952, bsz=120, num_updates=55260, lr=2.72559e-06, gnorm=0.964, clip=30, loss_scale=32, train_wall=40, gb_free=30, wall=226382 2023-05-03 17:26:49 - progress_bar.py[line:274] - INFO: epoch 010: 987 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7610.9, nsentences=120, sample_size=4021.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1926.5, ups=0.25, wpb=7610.9, bsz=120, num_updates=55270, lr=2.72031e-06, gnorm=0.988, clip=40, loss_scale=32, train_wall=39, gb_free=30.8, wall=226422 2023-05-03 17:27:29 - progress_bar.py[line:274] - INFO: epoch 010: 997 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7632, nsentences=120, sample_size=3958, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1943.1, ups=0.25, wpb=7632, bsz=120, num_updates=55280, lr=2.71503e-06, gnorm=0.997, clip=50, loss_scale=32, train_wall=39, gb_free=30.9, wall=226461 2023-05-03 17:28:09 - progress_bar.py[line:274] - INFO: epoch 010: 1007 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7794.6, nsentences=120, sample_size=4335.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1954.5, ups=0.25, wpb=7794.6, bsz=120, num_updates=55290, lr=2.70975e-06, gnorm=0.985, clip=40, loss_scale=32, train_wall=40, gb_free=28.9, wall=226501 2023-05-03 17:28:47 - progress_bar.py[line:274] - INFO: epoch 010: 1017 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7830.4, nsentences=120, sample_size=3936.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2016.1, ups=0.26, wpb=7830.4, bsz=120, num_updates=55300, lr=2.70446e-06, gnorm=0.994, clip=40, loss_scale=32, train_wall=39, gb_free=29.9, wall=226540 2023-05-03 17:29:27 - progress_bar.py[line:274] - INFO: epoch 010: 1027 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7785.6, nsentences=120, sample_size=4048.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1955.1, ups=0.25, wpb=7785.6, bsz=120, num_updates=55310, lr=2.69918e-06, gnorm=0.977, clip=30, loss_scale=32, train_wall=40, gb_free=30.8, wall=226580 2023-05-03 17:30:07 - progress_bar.py[line:274] - INFO: epoch 010: 1037 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7789.3, nsentences=120, sample_size=3875.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1961.3, ups=0.25, wpb=7789.3, bsz=120, num_updates=55320, lr=2.6939e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=40, gb_free=29.3, wall=226619 2023-05-03 17:30:47 - progress_bar.py[line:274] - INFO: epoch 010: 1047 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7944.8, nsentences=120, sample_size=3801, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1958.1, ups=0.25, wpb=7944.8, bsz=120, num_updates=55330, lr=2.68862e-06, gnorm=0.997, clip=40, loss_scale=32, train_wall=41, gb_free=27.5, wall=226660 2023-05-03 17:31:27 - progress_bar.py[line:274] - INFO: epoch 010: 1057 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7410.7, nsentences=120, sample_size=4115.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1871.6, ups=0.25, wpb=7410.7, bsz=120, num_updates=55340, lr=2.68333e-06, gnorm=0.982, clip=40, loss_scale=32, train_wall=40, gb_free=27.8, wall=226700 2023-05-03 17:32:07 - progress_bar.py[line:274] - INFO: epoch 010: 1067 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7828, nsentences=120, sample_size=3874.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1946.2, ups=0.25, wpb=7828, bsz=120, num_updates=55350, lr=2.67805e-06, gnorm=0.976, clip=40, loss_scale=32, train_wall=40, gb_free=29.4, wall=226740 2023-05-03 17:32:47 - progress_bar.py[line:274] - INFO: epoch 010: 1077 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7883.7, nsentences=120, sample_size=3929, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1966.6, ups=0.25, wpb=7883.7, bsz=120, num_updates=55360, lr=2.67277e-06, gnorm=0.992, clip=50, loss_scale=32, train_wall=40, gb_free=29.8, wall=226780 2023-05-03 17:33:27 - progress_bar.py[line:274] - INFO: epoch 010: 1087 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7660.1, nsentences=120, sample_size=4231.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1951.7, ups=0.25, wpb=7660.1, bsz=120, num_updates=55370, lr=2.66749e-06, gnorm=0.937, clip=0, loss_scale=32, train_wall=39, gb_free=31.1, wall=226819 2023-05-03 17:34:07 - progress_bar.py[line:274] - INFO: epoch 010: 1097 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7952.1, nsentences=120, sample_size=4243.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1981.1, ups=0.25, wpb=7952.1, bsz=120, num_updates=55380, lr=2.66221e-06, gnorm=0.931, clip=10, loss_scale=32, train_wall=40, gb_free=30.7, wall=226859 2023-05-03 17:34:47 - progress_bar.py[line:274] - INFO: epoch 010: 1107 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7720.6, nsentences=120, sample_size=4212.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1941.4, ups=0.25, wpb=7720.6, bsz=120, num_updates=55390, lr=2.65692e-06, gnorm=0.994, clip=50, loss_scale=32, train_wall=40, gb_free=29.9, wall=226899 2023-05-03 17:35:26 - progress_bar.py[line:274] - INFO: epoch 010: 1117 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7930.9, nsentences=120, sample_size=3704.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1987.5, ups=0.25, wpb=7930.9, bsz=120, num_updates=55400, lr=2.65164e-06, gnorm=1.003, clip=40, loss_scale=32, train_wall=40, gb_free=31.2, wall=226939 2023-05-03 17:36:06 - progress_bar.py[line:274] - INFO: epoch 010: 1127 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7655.5, nsentences=120, sample_size=3804.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1918.3, ups=0.25, wpb=7655.5, bsz=120, num_updates=55410, lr=2.64636e-06, gnorm=1.018, clip=70, loss_scale=32, train_wall=40, gb_free=28.7, wall=226979 2023-05-03 17:36:46 - progress_bar.py[line:274] - INFO: epoch 010: 1137 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7992, nsentences=120, sample_size=4283.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2014.7, ups=0.25, wpb=7992, bsz=120, num_updates=55420, lr=2.64108e-06, gnorm=0.931, clip=10, loss_scale=32, train_wall=40, gb_free=30.9, wall=227019 2023-05-03 17:37:26 - progress_bar.py[line:274] - INFO: epoch 010: 1147 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7956.1, nsentences=120, sample_size=3908.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2005.9, ups=0.25, wpb=7956.1, bsz=120, num_updates=55430, lr=2.6358e-06, gnorm=0.965, clip=40, loss_scale=32, train_wall=40, gb_free=29.6, wall=227058 2023-05-03 17:38:06 - progress_bar.py[line:274] - INFO: epoch 010: 1157 / 6042 loss=2.315, loss_v1=0, loss_v2=0, nll_loss=1.051, ntokens=7717.2, nsentences=120, sample_size=3640.4, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1936.4, ups=0.25, wpb=7717.2, bsz=120, num_updates=55440, lr=2.63051e-06, gnorm=1.02, clip=70, loss_scale=32, train_wall=40, gb_free=30.9, wall=227098 2023-05-03 17:38:45 - progress_bar.py[line:274] - INFO: epoch 010: 1167 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7815.2, nsentences=120, sample_size=3855.5, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1970.3, ups=0.25, wpb=7815.2, bsz=120, num_updates=55450, lr=2.62523e-06, gnorm=1.018, clip=60, loss_scale=32, train_wall=40, gb_free=31.4, wall=227138 2023-05-03 17:39:25 - progress_bar.py[line:274] - INFO: epoch 010: 1177 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=8020.3, nsentences=120, sample_size=4052.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2008.6, ups=0.25, wpb=8020.3, bsz=120, num_updates=55460, lr=2.61995e-06, gnorm=0.981, clip=40, loss_scale=32, train_wall=40, gb_free=30.8, wall=227178 2023-05-03 17:40:05 - progress_bar.py[line:274] - INFO: epoch 010: 1187 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7584, nsentences=120, sample_size=3883.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1886.5, ups=0.25, wpb=7584, bsz=120, num_updates=55470, lr=2.61467e-06, gnorm=1.015, clip=60, loss_scale=32, train_wall=40, gb_free=24.2, wall=227218 2023-05-03 17:40:45 - progress_bar.py[line:274] - INFO: epoch 010: 1197 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7564.1, nsentences=120, sample_size=3946.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914.9, ups=0.25, wpb=7564.1, bsz=120, num_updates=55480, lr=2.60938e-06, gnorm=1, clip=40, loss_scale=32, train_wall=39, gb_free=28.1, wall=227257 2023-05-03 17:41:25 - progress_bar.py[line:274] - INFO: epoch 010: 1207 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7838, nsentences=120, sample_size=4242.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1960.9, ups=0.25, wpb=7838, bsz=120, num_updates=55490, lr=2.6041e-06, gnorm=0.96, clip=30, loss_scale=32, train_wall=40, gb_free=27.2, wall=227297 2023-05-03 17:42:05 - progress_bar.py[line:274] - INFO: epoch 010: 1217 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7827, nsentences=120, sample_size=3791.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1972.7, ups=0.25, wpb=7827, bsz=120, num_updates=55500, lr=2.59882e-06, gnorm=1.02, clip=70, loss_scale=32, train_wall=40, gb_free=27.4, wall=227337 2023-05-03 17:42:44 - progress_bar.py[line:274] - INFO: epoch 010: 1227 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7675.6, nsentences=120, sample_size=3873.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1955, ups=0.25, wpb=7675.6, bsz=120, num_updates=55510, lr=2.59354e-06, gnorm=1.003, clip=50, loss_scale=32, train_wall=39, gb_free=28.6, wall=227376 2023-05-03 17:43:24 - progress_bar.py[line:274] - INFO: epoch 010: 1237 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7618.2, nsentences=120, sample_size=4133.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1915.4, ups=0.25, wpb=7618.2, bsz=120, num_updates=55520, lr=2.58826e-06, gnorm=0.953, clip=20, loss_scale=32, train_wall=40, gb_free=28.9, wall=227416 2023-05-03 17:44:03 - progress_bar.py[line:274] - INFO: epoch 010: 1247 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7943.2, nsentences=120, sample_size=4088.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2013.2, ups=0.25, wpb=7943.2, bsz=120, num_updates=55530, lr=2.58297e-06, gnorm=1.005, clip=50, loss_scale=64, train_wall=39, gb_free=26.6, wall=227455 2023-05-03 17:44:42 - progress_bar.py[line:274] - INFO: epoch 010: 1257 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7860.8, nsentences=120, sample_size=3963.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1992.8, ups=0.25, wpb=7860.8, bsz=120, num_updates=55540, lr=2.57769e-06, gnorm=0.986, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=227495 2023-05-03 17:45:23 - progress_bar.py[line:274] - INFO: epoch 010: 1267 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7823.8, nsentences=120, sample_size=4061, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1948.1, ups=0.25, wpb=7823.8, bsz=120, num_updates=55550, lr=2.57241e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=40, gb_free=25.6, wall=227535 2023-05-03 17:46:02 - progress_bar.py[line:274] - INFO: epoch 010: 1277 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7730, nsentences=120, sample_size=3994.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1961.1, ups=0.25, wpb=7730, bsz=120, num_updates=55560, lr=2.56713e-06, gnorm=0.995, clip=60, loss_scale=64, train_wall=39, gb_free=29, wall=227575 2023-05-03 17:46:42 - progress_bar.py[line:274] - INFO: epoch 010: 1287 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7612.5, nsentences=120, sample_size=4138.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1893.4, ups=0.25, wpb=7612.5, bsz=120, num_updates=55570, lr=2.56185e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=30.8, wall=227615 2023-05-03 17:47:23 - progress_bar.py[line:274] - INFO: epoch 010: 1297 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7603, nsentences=120, sample_size=3973.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1885.3, ups=0.25, wpb=7603, bsz=120, num_updates=55580, lr=2.55656e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=29.9, wall=227655 2023-05-03 17:48:03 - progress_bar.py[line:274] - INFO: epoch 010: 1307 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7607.2, nsentences=120, sample_size=3999.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1899, ups=0.25, wpb=7607.2, bsz=120, num_updates=55590, lr=2.55128e-06, gnorm=0.985, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=227695 2023-05-03 17:48:43 - progress_bar.py[line:274] - INFO: epoch 010: 1317 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7776.8, nsentences=120, sample_size=3811.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1946.9, ups=0.25, wpb=7776.8, bsz=120, num_updates=55600, lr=2.546e-06, gnorm=0.986, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=227735 2023-05-03 17:49:23 - progress_bar.py[line:274] - INFO: epoch 010: 1327 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7986.4, nsentences=120, sample_size=3802.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1968.2, ups=0.25, wpb=7986.4, bsz=120, num_updates=55610, lr=2.54072e-06, gnorm=1.01, clip=60, loss_scale=64, train_wall=41, gb_free=30.7, wall=227776 2023-05-03 17:50:02 - progress_bar.py[line:274] - INFO: epoch 010: 1337 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=8018.9, nsentences=120, sample_size=4328.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2042.5, ups=0.25, wpb=8018.9, bsz=120, num_updates=55620, lr=2.53543e-06, gnorm=0.961, clip=10, loss_scale=64, train_wall=39, gb_free=29.1, wall=227815 2023-05-03 17:50:43 - progress_bar.py[line:274] - INFO: epoch 010: 1347 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7780.5, nsentences=120, sample_size=3725.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1914, ups=0.25, wpb=7780.5, bsz=120, num_updates=55630, lr=2.53015e-06, gnorm=1.004, clip=40, loss_scale=64, train_wall=41, gb_free=30.1, wall=227856 2023-05-03 17:51:23 - progress_bar.py[line:274] - INFO: epoch 010: 1357 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7539.8, nsentences=120, sample_size=3996.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1900.8, ups=0.25, wpb=7539.8, bsz=120, num_updates=55640, lr=2.52487e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=227895 2023-05-03 17:52:03 - progress_bar.py[line:274] - INFO: epoch 010: 1367 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7783.9, nsentences=120, sample_size=3810.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1937.2, ups=0.25, wpb=7783.9, bsz=120, num_updates=55650, lr=2.51959e-06, gnorm=1.005, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=227935 2023-05-03 17:52:44 - progress_bar.py[line:274] - INFO: epoch 010: 1377 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7871.6, nsentences=120, sample_size=4038, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1895.5, ups=0.24, wpb=7871.6, bsz=120, num_updates=55660, lr=2.51431e-06, gnorm=0.988, clip=30, loss_scale=64, train_wall=41, gb_free=29.9, wall=227977 2023-05-03 17:53:24 - progress_bar.py[line:274] - INFO: epoch 010: 1387 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7672.1, nsentences=120, sample_size=4144.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1940.3, ups=0.25, wpb=7672.1, bsz=120, num_updates=55670, lr=2.50902e-06, gnorm=1, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=228016 2023-05-03 17:54:04 - progress_bar.py[line:274] - INFO: epoch 010: 1397 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=8104.3, nsentences=120, sample_size=4203.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2019.8, ups=0.25, wpb=8104.3, bsz=120, num_updates=55680, lr=2.50374e-06, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=228057 2023-05-03 17:54:44 - progress_bar.py[line:274] - INFO: epoch 010: 1407 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7634.7, nsentences=120, sample_size=4034.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1921, ups=0.25, wpb=7634.7, bsz=120, num_updates=55690, lr=2.49846e-06, gnorm=0.99, clip=30, loss_scale=64, train_wall=40, gb_free=30.9, wall=228096 2023-05-03 17:55:24 - progress_bar.py[line:274] - INFO: epoch 010: 1417 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7908.8, nsentences=120, sample_size=4030.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1983.8, ups=0.25, wpb=7908.8, bsz=120, num_updates=55700, lr=2.49318e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=228136 2023-05-03 17:56:04 - progress_bar.py[line:274] - INFO: epoch 010: 1427 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7718, nsentences=120, sample_size=3811.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1938.8, ups=0.25, wpb=7718, bsz=120, num_updates=55710, lr=2.4879e-06, gnorm=1.017, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=228176 2023-05-03 17:56:44 - progress_bar.py[line:274] - INFO: epoch 010: 1437 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7894.2, nsentences=120, sample_size=4074.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1969.1, ups=0.25, wpb=7894.2, bsz=120, num_updates=55720, lr=2.48261e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=228216 2023-05-03 17:57:24 - progress_bar.py[line:274] - INFO: epoch 010: 1447 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7479.3, nsentences=120, sample_size=3790.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1864.7, ups=0.25, wpb=7479.3, bsz=120, num_updates=55730, lr=2.47733e-06, gnorm=1.03, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=228256 2023-05-03 17:58:04 - progress_bar.py[line:274] - INFO: epoch 010: 1457 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7732.2, nsentences=120, sample_size=4001, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1940.6, ups=0.25, wpb=7732.2, bsz=120, num_updates=55740, lr=2.47205e-06, gnorm=1.004, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=228296 2023-05-03 17:58:44 - progress_bar.py[line:274] - INFO: epoch 010: 1467 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=8089.4, nsentences=120, sample_size=4117.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1996.4, ups=0.25, wpb=8089.4, bsz=120, num_updates=55750, lr=2.46677e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=228337 2023-05-03 17:59:24 - progress_bar.py[line:274] - INFO: epoch 010: 1477 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=8040, nsentences=120, sample_size=3906.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2012.6, ups=0.25, wpb=8040, bsz=120, num_updates=55760, lr=2.46148e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=228377 2023-05-03 18:00:05 - progress_bar.py[line:274] - INFO: epoch 010: 1487 / 6042 loss=2.422, loss_v1=0, loss_v2=0, nll_loss=1.178, ntokens=7966.7, nsentences=120, sample_size=4218.3, sample_size_v1=0, sample_size_v2=0, ppl=2.26, wps=1953.9, ups=0.25, wpb=7966.7, bsz=120, num_updates=55770, lr=2.4562e-06, gnorm=0.955, clip=10, loss_scale=64, train_wall=41, gb_free=31.1, wall=228417 2023-05-03 18:00:44 - progress_bar.py[line:274] - INFO: epoch 010: 1497 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7683.8, nsentences=120, sample_size=4035.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1972.5, ups=0.26, wpb=7683.8, bsz=120, num_updates=55780, lr=2.45092e-06, gnorm=1.003, clip=60, loss_scale=64, train_wall=39, gb_free=31, wall=228456 2023-05-03 18:01:23 - progress_bar.py[line:274] - INFO: epoch 010: 1507 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7635, nsentences=120, sample_size=4409.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1940.1, ups=0.25, wpb=7635, bsz=120, num_updates=55790, lr=2.44564e-06, gnorm=0.945, clip=20, loss_scale=64, train_wall=39, gb_free=29.5, wall=228496 2023-05-03 18:02:02 - progress_bar.py[line:274] - INFO: epoch 010: 1517 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7681.8, nsentences=120, sample_size=4060.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1966.4, ups=0.26, wpb=7681.8, bsz=120, num_updates=55800, lr=2.44036e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=228535 2023-05-03 18:02:43 - progress_bar.py[line:274] - INFO: epoch 010: 1527 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7860.3, nsentences=120, sample_size=3847.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1942.8, ups=0.25, wpb=7860.3, bsz=120, num_updates=55810, lr=2.43507e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=228575 2023-05-03 18:03:23 - progress_bar.py[line:274] - INFO: epoch 010: 1537 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=8085.2, nsentences=120, sample_size=4252.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2009.8, ups=0.25, wpb=8085.2, bsz=120, num_updates=55820, lr=2.42979e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=228615 2023-05-03 18:04:02 - progress_bar.py[line:274] - INFO: epoch 010: 1547 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7744.1, nsentences=120, sample_size=4070.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1969.2, ups=0.25, wpb=7744.1, bsz=120, num_updates=55830, lr=2.42451e-06, gnorm=0.97, clip=40, loss_scale=64, train_wall=39, gb_free=30.3, wall=228655 2023-05-03 18:04:42 - progress_bar.py[line:274] - INFO: epoch 010: 1557 / 6042 loss=2.398, loss_v1=0, loss_v2=0, nll_loss=1.145, ntokens=7959.8, nsentences=120, sample_size=4102.8, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1982.6, ups=0.25, wpb=7959.8, bsz=120, num_updates=55840, lr=2.41923e-06, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=26.9, wall=228695 2023-05-03 18:05:23 - progress_bar.py[line:274] - INFO: epoch 010: 1567 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7302.9, nsentences=120, sample_size=4141.4, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1799, ups=0.25, wpb=7302.9, bsz=120, num_updates=55850, lr=2.41394e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=41, gb_free=30.4, wall=228735 2023-05-03 18:06:03 - progress_bar.py[line:274] - INFO: epoch 010: 1577 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7831.7, nsentences=120, sample_size=4023.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1956.3, ups=0.25, wpb=7831.7, bsz=120, num_updates=55860, lr=2.40866e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=228775 2023-05-03 18:06:43 - progress_bar.py[line:274] - INFO: epoch 010: 1587 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7824.7, nsentences=120, sample_size=3714.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1971.2, ups=0.25, wpb=7824.7, bsz=120, num_updates=55870, lr=2.40338e-06, gnorm=1.02, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=228815 2023-05-03 18:07:22 - progress_bar.py[line:274] - INFO: epoch 010: 1597 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7796.6, nsentences=120, sample_size=3881.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1974.6, ups=0.25, wpb=7796.6, bsz=120, num_updates=55880, lr=2.3981e-06, gnorm=0.996, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=228855 2023-05-03 18:08:02 - progress_bar.py[line:274] - INFO: epoch 010: 1607 / 6042 loss=2.385, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7335.2, nsentences=120, sample_size=3913.5, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1855.4, ups=0.25, wpb=7335.2, bsz=120, num_updates=55890, lr=2.39282e-06, gnorm=0.999, clip=40, loss_scale=64, train_wall=39, gb_free=27.5, wall=228894 2023-05-03 18:08:41 - progress_bar.py[line:274] - INFO: epoch 010: 1617 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=8072.6, nsentences=120, sample_size=4063.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2040, ups=0.25, wpb=8072.6, bsz=120, num_updates=55900, lr=2.38753e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=228934 2023-05-03 18:09:21 - progress_bar.py[line:274] - INFO: epoch 010: 1627 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7562.5, nsentences=120, sample_size=4103, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1915.7, ups=0.25, wpb=7562.5, bsz=120, num_updates=55910, lr=2.38225e-06, gnorm=1.023, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=228973 2023-05-03 18:10:01 - progress_bar.py[line:274] - INFO: epoch 010: 1637 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7644.3, nsentences=120, sample_size=4270.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1919.8, ups=0.25, wpb=7644.3, bsz=120, num_updates=55920, lr=2.37697e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=31.1, wall=229013 2023-05-03 18:10:41 - progress_bar.py[line:274] - INFO: epoch 010: 1647 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7920.1, nsentences=120, sample_size=4166.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1959.7, ups=0.25, wpb=7920.1, bsz=120, num_updates=55930, lr=2.37169e-06, gnorm=0.969, clip=40, loss_scale=64, train_wall=40, gb_free=27.8, wall=229053 2023-05-03 18:11:20 - progress_bar.py[line:274] - INFO: epoch 010: 1657 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7485.3, nsentences=120, sample_size=4116.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1903.8, ups=0.25, wpb=7485.3, bsz=120, num_updates=55940, lr=2.36641e-06, gnorm=0.96, clip=20, loss_scale=64, train_wall=39, gb_free=29.8, wall=229093 2023-05-03 18:12:01 - progress_bar.py[line:274] - INFO: epoch 010: 1667 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7456.7, nsentences=120, sample_size=4237.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1856.1, ups=0.25, wpb=7456.7, bsz=120, num_updates=55950, lr=2.36112e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=229133 2023-05-03 18:12:40 - progress_bar.py[line:274] - INFO: epoch 010: 1677 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7542.1, nsentences=120, sample_size=4162.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1930.2, ups=0.26, wpb=7542.1, bsz=120, num_updates=55960, lr=2.35584e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=229172 2023-05-03 18:13:20 - progress_bar.py[line:274] - INFO: epoch 010: 1687 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7988.8, nsentences=120, sample_size=4094.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1993.3, ups=0.25, wpb=7988.8, bsz=120, num_updates=55970, lr=2.35056e-06, gnorm=0.993, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=229212 2023-05-03 18:13:59 - progress_bar.py[line:274] - INFO: epoch 010: 1697 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7516.4, nsentences=120, sample_size=4358.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1887.1, ups=0.25, wpb=7516.4, bsz=120, num_updates=55980, lr=2.34528e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=229252 2023-05-03 18:14:39 - progress_bar.py[line:274] - INFO: epoch 010: 1707 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7697.3, nsentences=120, sample_size=3987.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1949.4, ups=0.25, wpb=7697.3, bsz=120, num_updates=55990, lr=2.33999e-06, gnorm=0.999, clip=60, loss_scale=64, train_wall=39, gb_free=29.8, wall=229291 2023-05-03 18:15:19 - progress_bar.py[line:274] - INFO: epoch 010: 1717 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7749, nsentences=120, sample_size=4105.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1937.1, ups=0.25, wpb=7749, bsz=120, num_updates=56000, lr=2.33471e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=40, gb_free=29.7, wall=229331 2023-05-03 18:15:19 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 18:15:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 18:15:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:15:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:38 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 18:15:38 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:15:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 18:15:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:15:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:15:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:15:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 18:16:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:16:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:06 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 18:16:06 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:16:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:11 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 18:16:11 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 18:16:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 18:16:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 18:16:11 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.269 | loss_v1 0 | loss_v2 0 | nll_loss 2.104 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.3 | score 0.7603 | wps 3292.7 | wpb 3202.1 | bsz 39.4 | num_updates 56000 | best_score 0.7627 2023-05-03 18:16:11 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 56000 updates 2023-05-03 18:16:11 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_56000.pt 2023-05-03 18:16:33 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_56000.pt 2023-05-03 18:17:00 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_56000.pt (epoch 10 @ 56000 updates, score 0.7603) (writing took 49.092398065840825 seconds) 2023-05-03 18:17:40 - progress_bar.py[line:274] - INFO: epoch 010: 1727 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7912.9, nsentences=120, sample_size=4015.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=560.3, ups=0.07, wpb=7912.9, bsz=120, num_updates=56010, lr=2.32943e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=229473 2023-05-03 18:18:20 - progress_bar.py[line:274] - INFO: epoch 010: 1737 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7848.1, nsentences=120, sample_size=4095.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1969.9, ups=0.25, wpb=7848.1, bsz=120, num_updates=56020, lr=2.32415e-06, gnorm=0.966, clip=20, loss_scale=64, train_wall=40, gb_free=30.4, wall=229513 2023-05-03 18:19:00 - progress_bar.py[line:274] - INFO: epoch 010: 1747 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7622.4, nsentences=120, sample_size=3969.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1894.6, ups=0.25, wpb=7622.4, bsz=120, num_updates=56030, lr=2.31887e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=229553 2023-05-03 18:19:40 - progress_bar.py[line:274] - INFO: epoch 010: 1757 / 6042 loss=2.321, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7622.7, nsentences=120, sample_size=4076.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1918.5, ups=0.25, wpb=7622.7, bsz=120, num_updates=56040, lr=2.31358e-06, gnorm=0.983, clip=30, loss_scale=128, train_wall=40, gb_free=25.8, wall=229592 2023-05-03 18:20:20 - progress_bar.py[line:274] - INFO: epoch 010: 1767 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7623, nsentences=120, sample_size=3870.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1913.3, ups=0.25, wpb=7623, bsz=120, num_updates=56050, lr=2.3083e-06, gnorm=1.02, clip=60, loss_scale=128, train_wall=40, gb_free=29.7, wall=229632 2023-05-03 18:21:00 - progress_bar.py[line:274] - INFO: epoch 010: 1777 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7627.4, nsentences=120, sample_size=4014, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1914.7, ups=0.25, wpb=7627.4, bsz=120, num_updates=56060, lr=2.30302e-06, gnorm=0.981, clip=40, loss_scale=128, train_wall=40, gb_free=31.2, wall=229672 2023-05-03 18:21:39 - progress_bar.py[line:274] - INFO: epoch 010: 1787 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7735.7, nsentences=120, sample_size=4094.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1973.9, ups=0.26, wpb=7735.7, bsz=120, num_updates=56070, lr=2.29774e-06, gnorm=0.968, clip=20, loss_scale=128, train_wall=39, gb_free=31, wall=229711 2023-05-03 18:22:19 - progress_bar.py[line:274] - INFO: epoch 010: 1797 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7835.6, nsentences=120, sample_size=4084.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1972.1, ups=0.25, wpb=7835.6, bsz=120, num_updates=56080, lr=2.29246e-06, gnorm=0.966, clip=30, loss_scale=128, train_wall=40, gb_free=27.5, wall=229751 2023-05-03 18:22:47 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 18:23:03 - progress_bar.py[line:274] - INFO: epoch 010: 1808 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7643.4, nsentences=120, sample_size=4102.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1719.6, ups=0.22, wpb=7643.4, bsz=120, num_updates=56090, lr=2.28717e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=44, gb_free=29.4, wall=229796 2023-05-03 18:23:43 - progress_bar.py[line:274] - INFO: epoch 010: 1818 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7822, nsentences=120, sample_size=3946.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1965.5, ups=0.25, wpb=7822, bsz=120, num_updates=56100, lr=2.28189e-06, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=28.4, wall=229835 2023-05-03 18:24:23 - progress_bar.py[line:274] - INFO: epoch 010: 1828 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7708.3, nsentences=120, sample_size=4020, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1917.4, ups=0.25, wpb=7708.3, bsz=120, num_updates=56110, lr=2.27661e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=229876 2023-05-03 18:25:02 - progress_bar.py[line:274] - INFO: epoch 010: 1838 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7538.7, nsentences=120, sample_size=4147.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1920.7, ups=0.25, wpb=7538.7, bsz=120, num_updates=56120, lr=2.27133e-06, gnorm=0.98, clip=20, loss_scale=64, train_wall=39, gb_free=26.7, wall=229915 2023-05-03 18:25:42 - progress_bar.py[line:274] - INFO: epoch 010: 1848 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7815.8, nsentences=120, sample_size=4288.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1958.1, ups=0.25, wpb=7815.8, bsz=120, num_updates=56130, lr=2.26604e-06, gnorm=0.952, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=229955 2023-05-03 18:26:22 - progress_bar.py[line:274] - INFO: epoch 010: 1858 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7777.1, nsentences=120, sample_size=3982.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1968.6, ups=0.25, wpb=7777.1, bsz=120, num_updates=56140, lr=2.26076e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=28.6, wall=229994 2023-05-03 18:27:02 - progress_bar.py[line:274] - INFO: epoch 010: 1868 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7642.6, nsentences=120, sample_size=4004, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1895.5, ups=0.25, wpb=7642.6, bsz=120, num_updates=56150, lr=2.25548e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=27.3, wall=230035 2023-05-03 18:27:42 - progress_bar.py[line:274] - INFO: epoch 010: 1878 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7787.1, nsentences=120, sample_size=4001.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1932.4, ups=0.25, wpb=7787.1, bsz=120, num_updates=56160, lr=2.2502e-06, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=230075 2023-05-03 18:28:22 - progress_bar.py[line:274] - INFO: epoch 010: 1888 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7653.8, nsentences=120, sample_size=3920.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1926, ups=0.25, wpb=7653.8, bsz=120, num_updates=56170, lr=2.24492e-06, gnorm=1.013, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=230115 2023-05-03 18:29:03 - progress_bar.py[line:274] - INFO: epoch 010: 1898 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=8034.4, nsentences=120, sample_size=4330.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1947.3, ups=0.24, wpb=8034.4, bsz=120, num_updates=56180, lr=2.23963e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=41, gb_free=30, wall=230156 2023-05-03 18:29:43 - progress_bar.py[line:274] - INFO: epoch 010: 1908 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7831.9, nsentences=120, sample_size=3849.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1963.6, ups=0.25, wpb=7831.9, bsz=120, num_updates=56190, lr=2.23435e-06, gnorm=0.998, clip=60, loss_scale=64, train_wall=40, gb_free=29.9, wall=230196 2023-05-03 18:30:23 - progress_bar.py[line:274] - INFO: epoch 010: 1918 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7932.1, nsentences=120, sample_size=3785.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1993, ups=0.25, wpb=7932.1, bsz=120, num_updates=56200, lr=2.22907e-06, gnorm=1.02, clip=70, loss_scale=64, train_wall=40, gb_free=30.6, wall=230236 2023-05-03 18:31:04 - progress_bar.py[line:274] - INFO: epoch 010: 1928 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7924.4, nsentences=120, sample_size=3928.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1956.5, ups=0.25, wpb=7924.4, bsz=120, num_updates=56210, lr=2.22379e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=230276 2023-05-03 18:31:43 - progress_bar.py[line:274] - INFO: epoch 010: 1938 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7734.5, nsentences=120, sample_size=4147.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1971.3, ups=0.25, wpb=7734.5, bsz=120, num_updates=56220, lr=2.21851e-06, gnorm=0.965, clip=20, loss_scale=64, train_wall=39, gb_free=30.6, wall=230315 2023-05-03 18:32:22 - progress_bar.py[line:274] - INFO: epoch 010: 1948 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7857.2, nsentences=120, sample_size=3928.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2008.6, ups=0.26, wpb=7857.2, bsz=120, num_updates=56230, lr=2.21322e-06, gnorm=0.992, clip=30, loss_scale=64, train_wall=39, gb_free=29.6, wall=230354 2023-05-03 18:33:01 - progress_bar.py[line:274] - INFO: epoch 010: 1958 / 6042 loss=2.321, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7397.4, nsentences=120, sample_size=3915.5, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1885.2, ups=0.25, wpb=7397.4, bsz=120, num_updates=56240, lr=2.20794e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=28.4, wall=230394 2023-05-03 18:33:41 - progress_bar.py[line:274] - INFO: epoch 010: 1968 / 6042 loss=2.328, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7782.4, nsentences=120, sample_size=3950.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1936.2, ups=0.25, wpb=7782.4, bsz=120, num_updates=56250, lr=2.20266e-06, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=230434 2023-05-03 18:34:21 - progress_bar.py[line:274] - INFO: epoch 010: 1978 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7972.9, nsentences=120, sample_size=3837.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1994.9, ups=0.25, wpb=7972.9, bsz=120, num_updates=56260, lr=2.19738e-06, gnorm=1.011, clip=80, loss_scale=64, train_wall=40, gb_free=30.3, wall=230474 2023-05-03 18:35:01 - progress_bar.py[line:274] - INFO: epoch 010: 1988 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7748.1, nsentences=120, sample_size=3976.2, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1958.2, ups=0.25, wpb=7748.1, bsz=120, num_updates=56270, lr=2.19209e-06, gnorm=0.988, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=230513 2023-05-03 18:35:41 - progress_bar.py[line:274] - INFO: epoch 010: 1998 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7546.9, nsentences=120, sample_size=4141.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1904.7, ups=0.25, wpb=7546.9, bsz=120, num_updates=56280, lr=2.18681e-06, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=230553 2023-05-03 18:36:22 - progress_bar.py[line:274] - INFO: epoch 010: 2008 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7618.4, nsentences=120, sample_size=4176.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1840.9, ups=0.24, wpb=7618.4, bsz=120, num_updates=56290, lr=2.18153e-06, gnorm=0.958, clip=30, loss_scale=64, train_wall=41, gb_free=30.5, wall=230594 2023-05-03 18:37:02 - progress_bar.py[line:274] - INFO: epoch 010: 2018 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7409.7, nsentences=120, sample_size=3973.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1861.8, ups=0.25, wpb=7409.7, bsz=120, num_updates=56300, lr=2.17625e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=27.4, wall=230634 2023-05-03 18:37:42 - progress_bar.py[line:274] - INFO: epoch 010: 2028 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7959.8, nsentences=120, sample_size=4253.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1997.1, ups=0.25, wpb=7959.8, bsz=120, num_updates=56310, lr=2.17097e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=230674 2023-05-03 18:38:21 - progress_bar.py[line:274] - INFO: epoch 010: 2038 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7792.6, nsentences=120, sample_size=4206.3, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1962, ups=0.25, wpb=7792.6, bsz=120, num_updates=56320, lr=2.16568e-06, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=28.4, wall=230714 2023-05-03 18:39:01 - progress_bar.py[line:274] - INFO: epoch 010: 2048 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7842.5, nsentences=120, sample_size=3936.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1970.7, ups=0.25, wpb=7842.5, bsz=120, num_updates=56330, lr=2.1604e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=230754 2023-05-03 18:39:40 - progress_bar.py[line:274] - INFO: epoch 010: 2058 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7616.7, nsentences=120, sample_size=4052.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1943.1, ups=0.26, wpb=7616.7, bsz=120, num_updates=56340, lr=2.15512e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=39, gb_free=30.5, wall=230793 2023-05-03 18:40:19 - progress_bar.py[line:274] - INFO: epoch 010: 2068 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7726.6, nsentences=120, sample_size=3894, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1972.1, ups=0.26, wpb=7726.6, bsz=120, num_updates=56350, lr=2.14984e-06, gnorm=1.004, clip=40, loss_scale=64, train_wall=39, gb_free=30.6, wall=230832 2023-05-03 18:40:59 - progress_bar.py[line:274] - INFO: epoch 010: 2078 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7570.1, nsentences=120, sample_size=4077.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1907.7, ups=0.25, wpb=7570.1, bsz=120, num_updates=56360, lr=2.14455e-06, gnorm=0.998, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=230872 2023-05-03 18:41:39 - progress_bar.py[line:274] - INFO: epoch 010: 2088 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7692.4, nsentences=120, sample_size=4229.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1946, ups=0.25, wpb=7692.4, bsz=120, num_updates=56370, lr=2.13927e-06, gnorm=0.971, clip=40, loss_scale=64, train_wall=39, gb_free=31.4, wall=230911 2023-05-03 18:42:18 - progress_bar.py[line:274] - INFO: epoch 010: 2098 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7674.6, nsentences=120, sample_size=4113.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1928.5, ups=0.25, wpb=7674.6, bsz=120, num_updates=56380, lr=2.13399e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=230951 2023-05-03 18:42:58 - progress_bar.py[line:274] - INFO: epoch 010: 2108 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7546.3, nsentences=120, sample_size=4356.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1900.6, ups=0.25, wpb=7546.3, bsz=120, num_updates=56390, lr=2.12871e-06, gnorm=0.943, clip=20, loss_scale=64, train_wall=40, gb_free=27.6, wall=230991 2023-05-03 18:43:38 - progress_bar.py[line:274] - INFO: epoch 010: 2118 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7693, nsentences=120, sample_size=3890.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1942.1, ups=0.25, wpb=7693, bsz=120, num_updates=56400, lr=2.12343e-06, gnorm=1.003, clip=70, loss_scale=64, train_wall=40, gb_free=31.4, wall=231030 2023-05-03 18:44:18 - progress_bar.py[line:274] - INFO: epoch 010: 2128 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7519.2, nsentences=120, sample_size=3978, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1874.1, ups=0.25, wpb=7519.2, bsz=120, num_updates=56410, lr=2.11814e-06, gnorm=0.98, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=231070 2023-05-03 18:44:58 - progress_bar.py[line:274] - INFO: epoch 010: 2138 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7706.2, nsentences=120, sample_size=4168.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1929.4, ups=0.25, wpb=7706.2, bsz=120, num_updates=56420, lr=2.11286e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=231110 2023-05-03 18:45:37 - progress_bar.py[line:274] - INFO: epoch 010: 2148 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7819.9, nsentences=120, sample_size=3947.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1977.2, ups=0.25, wpb=7819.9, bsz=120, num_updates=56430, lr=2.10758e-06, gnorm=1.02, clip=60, loss_scale=64, train_wall=39, gb_free=30.1, wall=231150 2023-05-03 18:46:18 - progress_bar.py[line:274] - INFO: epoch 010: 2158 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7638.1, nsentences=120, sample_size=4063.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1903.8, ups=0.25, wpb=7638.1, bsz=120, num_updates=56440, lr=2.1023e-06, gnorm=0.991, clip=30, loss_scale=64, train_wall=40, gb_free=28.5, wall=231190 2023-05-03 18:46:58 - progress_bar.py[line:274] - INFO: epoch 010: 2168 / 6042 loss=2.311, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7951.3, nsentences=120, sample_size=3988.6, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1967.6, ups=0.25, wpb=7951.3, bsz=120, num_updates=56450, lr=2.09702e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=231230 2023-05-03 18:47:38 - progress_bar.py[line:274] - INFO: epoch 010: 2178 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=8089.8, nsentences=120, sample_size=3889.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2017.9, ups=0.25, wpb=8089.8, bsz=120, num_updates=56460, lr=2.09173e-06, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=231270 2023-05-03 18:48:18 - progress_bar.py[line:274] - INFO: epoch 010: 2188 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7727.4, nsentences=120, sample_size=3796.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1948.7, ups=0.25, wpb=7727.4, bsz=120, num_updates=56470, lr=2.08645e-06, gnorm=1.021, clip=70, loss_scale=64, train_wall=40, gb_free=30.4, wall=231310 2023-05-03 18:48:57 - progress_bar.py[line:274] - INFO: epoch 010: 2198 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7599.7, nsentences=120, sample_size=4258.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1930.2, ups=0.25, wpb=7599.7, bsz=120, num_updates=56480, lr=2.08117e-06, gnorm=0.936, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=231350 2023-05-03 18:49:37 - progress_bar.py[line:274] - INFO: epoch 010: 2208 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7276.8, nsentences=120, sample_size=4264, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1844.8, ups=0.25, wpb=7276.8, bsz=120, num_updates=56490, lr=2.07589e-06, gnorm=0.964, clip=40, loss_scale=64, train_wall=39, gb_free=29.7, wall=231389 2023-05-03 18:50:17 - progress_bar.py[line:274] - INFO: epoch 010: 2218 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7807, nsentences=120, sample_size=3920, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1918.1, ups=0.25, wpb=7807, bsz=120, num_updates=56500, lr=2.0706e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=41, gb_free=29.6, wall=231430 2023-05-03 18:50:57 - progress_bar.py[line:274] - INFO: epoch 010: 2228 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7775.6, nsentences=120, sample_size=4139.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1939.2, ups=0.25, wpb=7775.6, bsz=120, num_updates=56510, lr=2.06532e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=23.6, wall=231470 2023-05-03 18:51:37 - progress_bar.py[line:274] - INFO: epoch 010: 2238 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7661, nsentences=120, sample_size=4404.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1913.3, ups=0.25, wpb=7661, bsz=120, num_updates=56520, lr=2.06004e-06, gnorm=0.962, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=231510 2023-05-03 18:52:17 - progress_bar.py[line:274] - INFO: epoch 010: 2248 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7524, nsentences=120, sample_size=3978.9, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1885.5, ups=0.25, wpb=7524, bsz=120, num_updates=56530, lr=2.05476e-06, gnorm=1.024, clip=50, loss_scale=64, train_wall=40, gb_free=25.3, wall=231550 2023-05-03 18:52:57 - progress_bar.py[line:274] - INFO: epoch 010: 2258 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7778.4, nsentences=120, sample_size=3875.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1954.9, ups=0.25, wpb=7778.4, bsz=120, num_updates=56540, lr=2.04948e-06, gnorm=0.996, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=231590 2023-05-03 18:53:38 - progress_bar.py[line:274] - INFO: epoch 010: 2268 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=7794.4, nsentences=120, sample_size=3827.8, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1912.6, ups=0.25, wpb=7794.4, bsz=120, num_updates=56550, lr=2.04419e-06, gnorm=1.001, clip=40, loss_scale=64, train_wall=41, gb_free=30, wall=231630 2023-05-03 18:54:18 - progress_bar.py[line:274] - INFO: epoch 010: 2278 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=8059.2, nsentences=120, sample_size=3897.5, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2011.8, ups=0.25, wpb=8059.2, bsz=120, num_updates=56560, lr=2.03891e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=29.5, wall=231670 2023-05-03 18:54:58 - progress_bar.py[line:274] - INFO: epoch 010: 2288 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7423.6, nsentences=120, sample_size=4108.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1864.5, ups=0.25, wpb=7423.6, bsz=120, num_updates=56570, lr=2.03363e-06, gnorm=0.987, clip=60, loss_scale=64, train_wall=40, gb_free=27.8, wall=231710 2023-05-03 18:55:37 - progress_bar.py[line:274] - INFO: epoch 010: 2298 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7551.6, nsentences=120, sample_size=3726.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1915.2, ups=0.25, wpb=7551.6, bsz=120, num_updates=56580, lr=2.02835e-06, gnorm=1.021, clip=70, loss_scale=64, train_wall=39, gb_free=29.7, wall=231750 2023-05-03 18:56:17 - progress_bar.py[line:274] - INFO: epoch 010: 2308 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=7734.4, nsentences=120, sample_size=3874.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1934.3, ups=0.25, wpb=7734.4, bsz=120, num_updates=56590, lr=2.02307e-06, gnorm=1.016, clip=60, loss_scale=64, train_wall=40, gb_free=30.8, wall=231790 2023-05-03 18:56:57 - progress_bar.py[line:274] - INFO: epoch 010: 2318 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7625.6, nsentences=120, sample_size=4258.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1912.5, ups=0.25, wpb=7625.6, bsz=120, num_updates=56600, lr=2.01778e-06, gnorm=0.964, clip=20, loss_scale=128, train_wall=40, gb_free=28.7, wall=231829 2023-05-03 18:57:37 - progress_bar.py[line:274] - INFO: epoch 010: 2328 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7536.2, nsentences=120, sample_size=4069.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1897.5, ups=0.25, wpb=7536.2, bsz=120, num_updates=56610, lr=2.0125e-06, gnorm=0.964, clip=40, loss_scale=128, train_wall=40, gb_free=29.1, wall=231869 2023-05-03 18:58:16 - progress_bar.py[line:274] - INFO: epoch 010: 2338 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7686.6, nsentences=120, sample_size=4033.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1932.3, ups=0.25, wpb=7686.6, bsz=120, num_updates=56620, lr=2.00722e-06, gnorm=0.992, clip=30, loss_scale=128, train_wall=40, gb_free=30.5, wall=231909 2023-05-03 18:58:57 - progress_bar.py[line:274] - INFO: epoch 010: 2348 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7873.4, nsentences=120, sample_size=4206.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1939.7, ups=0.25, wpb=7873.4, bsz=120, num_updates=56630, lr=2.00194e-06, gnorm=0.965, clip=20, loss_scale=128, train_wall=41, gb_free=30.9, wall=231950 2023-05-03 18:59:37 - progress_bar.py[line:274] - INFO: epoch 010: 2358 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7723.4, nsentences=120, sample_size=4162.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1945.6, ups=0.25, wpb=7723.4, bsz=120, num_updates=56640, lr=1.99665e-06, gnorm=0.997, clip=20, loss_scale=128, train_wall=40, gb_free=26.6, wall=231989 2023-05-03 19:00:16 - progress_bar.py[line:274] - INFO: epoch 010: 2368 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=8117.2, nsentences=120, sample_size=3913.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2064.8, ups=0.25, wpb=8117.2, bsz=120, num_updates=56650, lr=1.99137e-06, gnorm=0.996, clip=50, loss_scale=128, train_wall=39, gb_free=29.7, wall=232029 2023-05-03 19:00:56 - progress_bar.py[line:274] - INFO: epoch 010: 2378 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7665.7, nsentences=120, sample_size=3979.2, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1940, ups=0.25, wpb=7665.7, bsz=120, num_updates=56660, lr=1.98609e-06, gnorm=0.993, clip=50, loss_scale=128, train_wall=39, gb_free=30.1, wall=232068 2023-05-03 19:01:36 - progress_bar.py[line:274] - INFO: epoch 010: 2388 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7497.5, nsentences=120, sample_size=4377.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1870.8, ups=0.25, wpb=7497.5, bsz=120, num_updates=56670, lr=1.98081e-06, gnorm=0.972, clip=50, loss_scale=128, train_wall=40, gb_free=29.2, wall=232108 2023-05-03 19:02:16 - progress_bar.py[line:274] - INFO: epoch 010: 2398 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7920.6, nsentences=120, sample_size=3970, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1955.5, ups=0.25, wpb=7920.6, bsz=120, num_updates=56680, lr=1.97553e-06, gnorm=0.983, clip=30, loss_scale=128, train_wall=40, gb_free=30.8, wall=232149 2023-05-03 19:02:36 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 19:03:00 - progress_bar.py[line:274] - INFO: epoch 010: 2409 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7966.7, nsentences=120, sample_size=4049.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1826.4, ups=0.23, wpb=7966.7, bsz=120, num_updates=56690, lr=1.97024e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=44, gb_free=29.3, wall=232192 2023-05-03 19:03:40 - progress_bar.py[line:274] - INFO: epoch 010: 2419 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.159, ntokens=7817.6, nsentences=120, sample_size=4001.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1928.1, ups=0.25, wpb=7817.6, bsz=120, num_updates=56700, lr=1.96496e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=29, wall=232233 2023-05-03 19:04:20 - progress_bar.py[line:274] - INFO: epoch 010: 2429 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7628.7, nsentences=120, sample_size=3956.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1929.8, ups=0.25, wpb=7628.7, bsz=120, num_updates=56710, lr=1.95968e-06, gnorm=0.983, clip=50, loss_scale=64, train_wall=39, gb_free=29.3, wall=232272 2023-05-03 19:04:59 - progress_bar.py[line:274] - INFO: epoch 010: 2439 / 6042 loss=2.319, loss_v1=0, loss_v2=0, nll_loss=1.052, ntokens=7594.8, nsentences=120, sample_size=3913.4, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1921.3, ups=0.25, wpb=7594.8, bsz=120, num_updates=56720, lr=1.9544e-06, gnorm=1.001, clip=40, loss_scale=64, train_wall=39, gb_free=30.6, wall=232312 2023-05-03 19:05:40 - progress_bar.py[line:274] - INFO: epoch 010: 2449 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=8259.2, nsentences=120, sample_size=4156.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2056.3, ups=0.25, wpb=8259.2, bsz=120, num_updates=56730, lr=1.94912e-06, gnorm=0.976, clip=50, loss_scale=64, train_wall=40, gb_free=24.5, wall=232352 2023-05-03 19:06:19 - progress_bar.py[line:274] - INFO: epoch 010: 2459 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=7658.6, nsentences=120, sample_size=4227.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1921.1, ups=0.25, wpb=7658.6, bsz=120, num_updates=56740, lr=1.94383e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=29.5, wall=232392 2023-05-03 19:06:59 - progress_bar.py[line:274] - INFO: epoch 010: 2469 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.066, ntokens=7500.5, nsentences=120, sample_size=4162.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1917.9, ups=0.26, wpb=7500.5, bsz=120, num_updates=56750, lr=1.93855e-06, gnorm=0.976, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=232431 2023-05-03 19:07:38 - progress_bar.py[line:274] - INFO: epoch 010: 2479 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7917.2, nsentences=120, sample_size=4072.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=2008.7, ups=0.25, wpb=7917.2, bsz=120, num_updates=56760, lr=1.93327e-06, gnorm=0.974, clip=40, loss_scale=64, train_wall=39, gb_free=28.2, wall=232470 2023-05-03 19:08:18 - progress_bar.py[line:274] - INFO: epoch 010: 2489 / 6042 loss=2.397, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7705.4, nsentences=120, sample_size=4242.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=1922, ups=0.25, wpb=7705.4, bsz=120, num_updates=56770, lr=1.92799e-06, gnorm=0.956, clip=30, loss_scale=64, train_wall=40, gb_free=31.4, wall=232511 2023-05-03 19:08:58 - progress_bar.py[line:274] - INFO: epoch 010: 2499 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7644.6, nsentences=120, sample_size=4093.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1915.9, ups=0.25, wpb=7644.6, bsz=120, num_updates=56780, lr=1.9227e-06, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=232550 2023-05-03 19:09:38 - progress_bar.py[line:274] - INFO: epoch 010: 2509 / 6042 loss=2.31, loss_v1=0, loss_v2=0, nll_loss=1.045, ntokens=7608.3, nsentences=120, sample_size=4108.6, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1904.7, ups=0.25, wpb=7608.3, bsz=120, num_updates=56790, lr=1.91742e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=232590 2023-05-03 19:10:18 - progress_bar.py[line:274] - INFO: epoch 010: 2519 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=8019.5, nsentences=120, sample_size=4345.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1994.7, ups=0.25, wpb=8019.5, bsz=120, num_updates=56800, lr=1.91214e-06, gnorm=0.988, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=232631 2023-05-03 19:10:58 - progress_bar.py[line:274] - INFO: epoch 010: 2529 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=8023.4, nsentences=120, sample_size=3987.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1994.7, ups=0.25, wpb=8023.4, bsz=120, num_updates=56810, lr=1.90686e-06, gnorm=1.012, clip=60, loss_scale=64, train_wall=40, gb_free=29.8, wall=232671 2023-05-03 19:11:39 - progress_bar.py[line:274] - INFO: epoch 010: 2539 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7881.5, nsentences=120, sample_size=4034.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1959.8, ups=0.25, wpb=7881.5, bsz=120, num_updates=56820, lr=1.90158e-06, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=232711 2023-05-03 19:12:19 - progress_bar.py[line:274] - INFO: epoch 010: 2549 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7648, nsentences=120, sample_size=4160.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1899.1, ups=0.25, wpb=7648, bsz=120, num_updates=56830, lr=1.89629e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=232751 2023-05-03 19:12:58 - progress_bar.py[line:274] - INFO: epoch 010: 2559 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7754.3, nsentences=120, sample_size=4048.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1992.5, ups=0.26, wpb=7754.3, bsz=120, num_updates=56840, lr=1.89101e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=30.9, wall=232790 2023-05-03 19:13:37 - progress_bar.py[line:274] - INFO: epoch 010: 2569 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7600.6, nsentences=120, sample_size=4038.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1943.6, ups=0.26, wpb=7600.6, bsz=120, num_updates=56850, lr=1.88573e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=232829 2023-05-03 19:14:17 - progress_bar.py[line:274] - INFO: epoch 010: 2579 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7708.2, nsentences=120, sample_size=3817.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1916.4, ups=0.25, wpb=7708.2, bsz=120, num_updates=56860, lr=1.88045e-06, gnorm=1.019, clip=70, loss_scale=64, train_wall=40, gb_free=30.9, wall=232870 2023-05-03 19:14:57 - progress_bar.py[line:274] - INFO: epoch 010: 2589 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7593.6, nsentences=120, sample_size=4076.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1898.1, ups=0.25, wpb=7593.6, bsz=120, num_updates=56870, lr=1.87517e-06, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=28.3, wall=232910 2023-05-03 19:15:37 - progress_bar.py[line:274] - INFO: epoch 010: 2599 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7695.7, nsentences=120, sample_size=3990.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1926.7, ups=0.25, wpb=7695.7, bsz=120, num_updates=56880, lr=1.86988e-06, gnorm=0.982, clip=30, loss_scale=64, train_wall=40, gb_free=29.2, wall=232949 2023-05-03 19:16:17 - progress_bar.py[line:274] - INFO: epoch 010: 2609 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7812.2, nsentences=120, sample_size=3905.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1978, ups=0.25, wpb=7812.2, bsz=120, num_updates=56890, lr=1.8646e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=39, gb_free=31.3, wall=232989 2023-05-03 19:16:56 - progress_bar.py[line:274] - INFO: epoch 010: 2619 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7612.6, nsentences=120, sample_size=3941.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1923, ups=0.25, wpb=7612.6, bsz=120, num_updates=56900, lr=1.85932e-06, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=233029 2023-05-03 19:17:36 - progress_bar.py[line:274] - INFO: epoch 010: 2629 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7632, nsentences=120, sample_size=3805.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1932, ups=0.25, wpb=7632, bsz=120, num_updates=56910, lr=1.85404e-06, gnorm=1.024, clip=60, loss_scale=64, train_wall=39, gb_free=30.4, wall=233068 2023-05-03 19:18:15 - progress_bar.py[line:274] - INFO: epoch 010: 2639 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7779.9, nsentences=120, sample_size=3972.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1966.3, ups=0.25, wpb=7779.9, bsz=120, num_updates=56920, lr=1.84875e-06, gnorm=0.989, clip=30, loss_scale=64, train_wall=39, gb_free=28.9, wall=233108 2023-05-03 19:18:56 - progress_bar.py[line:274] - INFO: epoch 010: 2649 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7715.4, nsentences=120, sample_size=4188.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1901, ups=0.25, wpb=7715.4, bsz=120, num_updates=56930, lr=1.84347e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=29.5, wall=233148 2023-05-03 19:19:35 - progress_bar.py[line:274] - INFO: epoch 010: 2659 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7661.2, nsentences=120, sample_size=4290.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1932, ups=0.25, wpb=7661.2, bsz=120, num_updates=56940, lr=1.83819e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=233188 2023-05-03 19:20:15 - progress_bar.py[line:274] - INFO: epoch 010: 2669 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7745.3, nsentences=120, sample_size=4196.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1963.3, ups=0.25, wpb=7745.3, bsz=120, num_updates=56950, lr=1.83291e-06, gnorm=0.956, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=233227 2023-05-03 19:20:54 - progress_bar.py[line:274] - INFO: epoch 010: 2679 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7538.9, nsentences=120, sample_size=4048.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1950.8, ups=0.26, wpb=7538.9, bsz=120, num_updates=56960, lr=1.82763e-06, gnorm=0.997, clip=30, loss_scale=64, train_wall=39, gb_free=30.4, wall=233266 2023-05-03 19:21:34 - progress_bar.py[line:274] - INFO: epoch 010: 2689 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7819.8, nsentences=120, sample_size=4088.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1953.9, ups=0.25, wpb=7819.8, bsz=120, num_updates=56970, lr=1.82234e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=233306 2023-05-03 19:22:13 - progress_bar.py[line:274] - INFO: epoch 010: 2699 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7895.3, nsentences=120, sample_size=4044.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1996, ups=0.25, wpb=7895.3, bsz=120, num_updates=56980, lr=1.81706e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=30.8, wall=233346 2023-05-03 19:22:53 - progress_bar.py[line:274] - INFO: epoch 010: 2709 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7540.9, nsentences=120, sample_size=4399.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1875.5, ups=0.25, wpb=7540.9, bsz=120, num_updates=56990, lr=1.81178e-06, gnorm=0.954, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=233386 2023-05-03 19:23:33 - progress_bar.py[line:274] - INFO: epoch 010: 2719 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=8006.6, nsentences=120, sample_size=3830.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2004.4, ups=0.25, wpb=8006.6, bsz=120, num_updates=57000, lr=1.8065e-06, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=233426 2023-05-03 19:23:33 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 19:23:35 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 19:23:35 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:23:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:52 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 19:23:52 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:23:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:23:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:23:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:04 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 19:24:04 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:24:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:15 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 19:24:15 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:24:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:19 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 19:24:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:24:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:24 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 19:24:24 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 19:24:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 19:24:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 19:24:24 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.266 | loss_v1 0 | loss_v2 0 | nll_loss 2.1 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.29 | score 0.7573 | wps 3301.2 | wpb 3202.1 | bsz 39.4 | num_updates 57000 | best_score 0.7627 2023-05-03 19:24:24 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 57000 updates 2023-05-03 19:24:24 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_57000.pt 2023-05-03 19:24:49 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_57000.pt 2023-05-03 19:25:03 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_57000.pt (epoch 10 @ 57000 updates, score 0.7573) (writing took 38.31199691910297 seconds) 2023-05-03 19:25:42 - progress_bar.py[line:274] - INFO: epoch 010: 2729 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7331, nsentences=120, sample_size=4078.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=570.7, ups=0.08, wpb=7331, bsz=120, num_updates=57010, lr=1.80121e-06, gnorm=1.027, clip=70, loss_scale=64, train_wall=39, gb_free=31.1, wall=233554 2023-05-03 19:26:22 - progress_bar.py[line:274] - INFO: epoch 010: 2739 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7426.8, nsentences=120, sample_size=4414.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1861.7, ups=0.25, wpb=7426.8, bsz=120, num_updates=57020, lr=1.79593e-06, gnorm=0.955, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=233594 2023-05-03 19:27:02 - progress_bar.py[line:274] - INFO: epoch 010: 2749 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7759.3, nsentences=120, sample_size=4039, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1912.5, ups=0.25, wpb=7759.3, bsz=120, num_updates=57030, lr=1.79065e-06, gnorm=0.985, clip=40, loss_scale=64, train_wall=41, gb_free=29.8, wall=233635 2023-05-03 19:27:42 - progress_bar.py[line:274] - INFO: epoch 010: 2759 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7607, nsentences=120, sample_size=3853.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1923.4, ups=0.25, wpb=7607, bsz=120, num_updates=57040, lr=1.78537e-06, gnorm=1.009, clip=50, loss_scale=64, train_wall=39, gb_free=30.7, wall=233674 2023-05-03 19:28:21 - progress_bar.py[line:274] - INFO: epoch 010: 2769 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7679.2, nsentences=120, sample_size=4027.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1963.6, ups=0.26, wpb=7679.2, bsz=120, num_updates=57050, lr=1.78009e-06, gnorm=1.003, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=233713 2023-05-03 19:29:00 - progress_bar.py[line:274] - INFO: epoch 010: 2779 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7618.6, nsentences=120, sample_size=4120.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1923.5, ups=0.25, wpb=7618.6, bsz=120, num_updates=57060, lr=1.7748e-06, gnorm=0.97, clip=30, loss_scale=64, train_wall=40, gb_free=27.4, wall=233753 2023-05-03 19:29:40 - progress_bar.py[line:274] - INFO: epoch 010: 2789 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7746.9, nsentences=120, sample_size=3984.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1939.1, ups=0.25, wpb=7746.9, bsz=120, num_updates=57070, lr=1.76952e-06, gnorm=0.999, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=233793 2023-05-03 19:30:20 - progress_bar.py[line:274] - INFO: epoch 010: 2799 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7652, nsentences=120, sample_size=4083.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1918.2, ups=0.25, wpb=7652, bsz=120, num_updates=57080, lr=1.76424e-06, gnorm=0.98, clip=50, loss_scale=64, train_wall=40, gb_free=29.1, wall=233833 2023-05-03 19:31:01 - progress_bar.py[line:274] - INFO: epoch 010: 2809 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=8047.6, nsentences=120, sample_size=4009.4, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1961.8, ups=0.24, wpb=8047.6, bsz=120, num_updates=57090, lr=1.75896e-06, gnorm=0.963, clip=30, loss_scale=64, train_wall=41, gb_free=30.3, wall=233874 2023-05-03 19:31:41 - progress_bar.py[line:274] - INFO: epoch 010: 2819 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7770.2, nsentences=120, sample_size=4076.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1937.7, ups=0.25, wpb=7770.2, bsz=120, num_updates=57100, lr=1.75368e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=233914 2023-05-03 19:32:22 - progress_bar.py[line:274] - INFO: epoch 010: 2829 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7831.7, nsentences=120, sample_size=3568.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1920.8, ups=0.25, wpb=7831.7, bsz=120, num_updates=57110, lr=1.74839e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=41, gb_free=30.8, wall=233955 2023-05-03 19:33:02 - progress_bar.py[line:274] - INFO: epoch 010: 2839 / 6042 loss=2.309, loss_v1=0, loss_v2=0, nll_loss=1.044, ntokens=7796.2, nsentences=120, sample_size=4277.1, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1955.2, ups=0.25, wpb=7796.2, bsz=120, num_updates=57120, lr=1.74311e-06, gnorm=0.973, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=233995 2023-05-03 19:33:42 - progress_bar.py[line:274] - INFO: epoch 010: 2849 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7640.5, nsentences=120, sample_size=4037.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1926, ups=0.25, wpb=7640.5, bsz=120, num_updates=57130, lr=1.73783e-06, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=234034 2023-05-03 19:34:21 - progress_bar.py[line:274] - INFO: epoch 010: 2859 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7723.4, nsentences=120, sample_size=4027.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1951.9, ups=0.25, wpb=7723.4, bsz=120, num_updates=57140, lr=1.73255e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=234074 2023-05-03 19:35:01 - progress_bar.py[line:274] - INFO: epoch 010: 2869 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7524.8, nsentences=120, sample_size=3889.7, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1918.9, ups=0.26, wpb=7524.8, bsz=120, num_updates=57150, lr=1.72726e-06, gnorm=1.041, clip=60, loss_scale=64, train_wall=39, gb_free=30.3, wall=234113 2023-05-03 19:35:41 - progress_bar.py[line:274] - INFO: epoch 010: 2879 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7943.1, nsentences=120, sample_size=3995.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1965.9, ups=0.25, wpb=7943.1, bsz=120, num_updates=57160, lr=1.72198e-06, gnorm=0.978, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=234153 2023-05-03 19:36:20 - progress_bar.py[line:274] - INFO: epoch 010: 2889 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7637.3, nsentences=120, sample_size=3908.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1946.1, ups=0.25, wpb=7637.3, bsz=120, num_updates=57170, lr=1.7167e-06, gnorm=1.012, clip=60, loss_scale=64, train_wall=39, gb_free=29.1, wall=234193 2023-05-03 19:37:00 - progress_bar.py[line:274] - INFO: epoch 010: 2899 / 6042 loss=2.321, loss_v1=0, loss_v2=0, nll_loss=1.053, ntokens=7699, nsentences=120, sample_size=3935.4, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1928.4, ups=0.25, wpb=7699, bsz=120, num_updates=57180, lr=1.71142e-06, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=234233 2023-05-03 19:37:40 - progress_bar.py[line:274] - INFO: epoch 010: 2909 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7878.5, nsentences=120, sample_size=4153.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1957.8, ups=0.25, wpb=7878.5, bsz=120, num_updates=57190, lr=1.70614e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=234273 2023-05-03 19:38:20 - progress_bar.py[line:274] - INFO: epoch 010: 2919 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7482.7, nsentences=120, sample_size=4042.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1901.6, ups=0.25, wpb=7482.7, bsz=120, num_updates=57200, lr=1.70085e-06, gnorm=0.957, clip=40, loss_scale=128, train_wall=39, gb_free=29.6, wall=234312 2023-05-03 19:38:28 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 19:39:04 - progress_bar.py[line:274] - INFO: epoch 010: 2930 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7659.7, nsentences=120, sample_size=4021.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1743.5, ups=0.23, wpb=7659.7, bsz=120, num_updates=57210, lr=1.69557e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=44, gb_free=30.6, wall=234356 2023-05-03 19:39:43 - progress_bar.py[line:274] - INFO: epoch 010: 2940 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7766.9, nsentences=120, sample_size=4088.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1948.2, ups=0.25, wpb=7766.9, bsz=120, num_updates=57220, lr=1.69029e-06, gnorm=0.986, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=234396 2023-05-03 19:40:23 - progress_bar.py[line:274] - INFO: epoch 010: 2950 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7804.3, nsentences=120, sample_size=4193.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1958.3, ups=0.25, wpb=7804.3, bsz=120, num_updates=57230, lr=1.68501e-06, gnorm=0.937, clip=30, loss_scale=64, train_wall=40, gb_free=28.8, wall=234436 2023-05-03 19:41:03 - progress_bar.py[line:274] - INFO: epoch 010: 2960 / 6042 loss=2.305, loss_v1=0, loss_v2=0, nll_loss=1.04, ntokens=7667, nsentences=120, sample_size=4179.2, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1932.7, ups=0.25, wpb=7667, bsz=120, num_updates=57240, lr=1.67973e-06, gnorm=0.943, clip=10, loss_scale=64, train_wall=40, gb_free=30.8, wall=234475 2023-05-03 19:41:43 - progress_bar.py[line:274] - INFO: epoch 010: 2970 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7442.3, nsentences=120, sample_size=4066.8, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1863.2, ups=0.25, wpb=7442.3, bsz=120, num_updates=57250, lr=1.67444e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=234515 2023-05-03 19:42:22 - progress_bar.py[line:274] - INFO: epoch 010: 2980 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7310.1, nsentences=120, sample_size=4314.2, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1861.8, ups=0.25, wpb=7310.1, bsz=120, num_updates=57260, lr=1.66916e-06, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=29.4, wall=234555 2023-05-03 19:43:02 - progress_bar.py[line:274] - INFO: epoch 010: 2990 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7937, nsentences=120, sample_size=3767.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1972.4, ups=0.25, wpb=7937, bsz=120, num_updates=57270, lr=1.66388e-06, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=234595 2023-05-03 19:43:43 - progress_bar.py[line:274] - INFO: epoch 010: 3000 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7855, nsentences=120, sample_size=4131.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1933.9, ups=0.25, wpb=7855, bsz=120, num_updates=57280, lr=1.6586e-06, gnorm=0.974, clip=30, loss_scale=64, train_wall=41, gb_free=28.7, wall=234636 2023-05-03 19:44:23 - progress_bar.py[line:274] - INFO: epoch 010: 3010 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7481.6, nsentences=120, sample_size=4114.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1872.8, ups=0.25, wpb=7481.6, bsz=120, num_updates=57290, lr=1.65331e-06, gnorm=0.98, clip=20, loss_scale=64, train_wall=40, gb_free=29.4, wall=234675 2023-05-03 19:45:03 - progress_bar.py[line:274] - INFO: epoch 010: 3020 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7638.3, nsentences=120, sample_size=4043.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1904.7, ups=0.25, wpb=7638.3, bsz=120, num_updates=57300, lr=1.64803e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=234716 2023-05-03 19:45:43 - progress_bar.py[line:274] - INFO: epoch 010: 3030 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7606.9, nsentences=120, sample_size=4178.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1918.4, ups=0.25, wpb=7606.9, bsz=120, num_updates=57310, lr=1.64275e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=234755 2023-05-03 19:46:23 - progress_bar.py[line:274] - INFO: epoch 010: 3040 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7835.5, nsentences=120, sample_size=3906.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1957.2, ups=0.25, wpb=7835.5, bsz=120, num_updates=57320, lr=1.63747e-06, gnorm=0.977, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=234795 2023-05-03 19:47:03 - progress_bar.py[line:274] - INFO: epoch 010: 3050 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7649.8, nsentences=120, sample_size=4027.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1912.2, ups=0.25, wpb=7649.8, bsz=120, num_updates=57330, lr=1.63219e-06, gnorm=1.011, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=234835 2023-05-03 19:47:43 - progress_bar.py[line:274] - INFO: epoch 010: 3060 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.112, ntokens=8054.1, nsentences=120, sample_size=4013.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2001.2, ups=0.25, wpb=8054.1, bsz=120, num_updates=57340, lr=1.6269e-06, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=28.5, wall=234876 2023-05-03 19:48:22 - progress_bar.py[line:274] - INFO: epoch 010: 3070 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7788.2, nsentences=120, sample_size=3565.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1982.2, ups=0.25, wpb=7788.2, bsz=120, num_updates=57350, lr=1.62162e-06, gnorm=1.051, clip=60, loss_scale=64, train_wall=39, gb_free=30.8, wall=234915 2023-05-03 19:49:02 - progress_bar.py[line:274] - INFO: epoch 010: 3080 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7752.4, nsentences=120, sample_size=3913.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1935.2, ups=0.25, wpb=7752.4, bsz=120, num_updates=57360, lr=1.61634e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=234955 2023-05-03 19:49:42 - progress_bar.py[line:274] - INFO: epoch 010: 3090 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7894.8, nsentences=120, sample_size=3816.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1988.6, ups=0.25, wpb=7894.8, bsz=120, num_updates=57370, lr=1.61106e-06, gnorm=1.017, clip=60, loss_scale=64, train_wall=40, gb_free=27.7, wall=234995 2023-05-03 19:50:22 - progress_bar.py[line:274] - INFO: epoch 010: 3100 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7524.5, nsentences=120, sample_size=4124.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1887.7, ups=0.25, wpb=7524.5, bsz=120, num_updates=57380, lr=1.60578e-06, gnorm=0.958, clip=0, loss_scale=64, train_wall=40, gb_free=29.7, wall=235034 2023-05-03 19:51:02 - progress_bar.py[line:274] - INFO: epoch 010: 3110 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7468.2, nsentences=120, sample_size=3813.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1887, ups=0.25, wpb=7468.2, bsz=120, num_updates=57390, lr=1.60049e-06, gnorm=1.033, clip=50, loss_scale=64, train_wall=40, gb_free=28.9, wall=235074 2023-05-03 19:51:41 - progress_bar.py[line:274] - INFO: epoch 010: 3120 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7868.7, nsentences=120, sample_size=3920.3, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2008.5, ups=0.26, wpb=7868.7, bsz=120, num_updates=57400, lr=1.59521e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=29.6, wall=235113 2023-05-03 19:52:20 - progress_bar.py[line:274] - INFO: epoch 010: 3130 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7760.2, nsentences=120, sample_size=3893.3, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1970.6, ups=0.25, wpb=7760.2, bsz=120, num_updates=57410, lr=1.58993e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=235153 2023-05-03 19:53:00 - progress_bar.py[line:274] - INFO: epoch 010: 3140 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7567.7, nsentences=120, sample_size=4106.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1893, ups=0.25, wpb=7567.7, bsz=120, num_updates=57420, lr=1.58465e-06, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=235193 2023-05-03 19:53:40 - progress_bar.py[line:274] - INFO: epoch 010: 3150 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7701.7, nsentences=120, sample_size=4011.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1923.2, ups=0.25, wpb=7701.7, bsz=120, num_updates=57430, lr=1.57936e-06, gnorm=0.987, clip=50, loss_scale=64, train_wall=40, gb_free=29.5, wall=235233 2023-05-03 19:54:20 - progress_bar.py[line:274] - INFO: epoch 010: 3160 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7818.5, nsentences=120, sample_size=4013.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1965.1, ups=0.25, wpb=7818.5, bsz=120, num_updates=57440, lr=1.57408e-06, gnorm=1.001, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=235272 2023-05-03 19:54:59 - progress_bar.py[line:274] - INFO: epoch 010: 3170 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7669.2, nsentences=120, sample_size=3827.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1945.1, ups=0.25, wpb=7669.2, bsz=120, num_updates=57450, lr=1.5688e-06, gnorm=1.024, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=235312 2023-05-03 19:55:39 - progress_bar.py[line:274] - INFO: epoch 010: 3180 / 6042 loss=2.399, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7796.6, nsentences=120, sample_size=3799.3, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1957.2, ups=0.25, wpb=7796.6, bsz=120, num_updates=57460, lr=1.56352e-06, gnorm=1.023, clip=60, loss_scale=64, train_wall=40, gb_free=30.8, wall=235352 2023-05-03 19:56:19 - progress_bar.py[line:274] - INFO: epoch 010: 3190 / 6042 loss=2.313, loss_v1=0, loss_v2=0, nll_loss=1.047, ntokens=7501.1, nsentences=120, sample_size=3993.8, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1889.8, ups=0.25, wpb=7501.1, bsz=120, num_updates=57470, lr=1.55824e-06, gnorm=1.012, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=235391 2023-05-03 19:56:59 - progress_bar.py[line:274] - INFO: epoch 010: 3200 / 6042 loss=2.354, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7854, nsentences=120, sample_size=3851.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1941.5, ups=0.25, wpb=7854, bsz=120, num_updates=57480, lr=1.55295e-06, gnorm=1.01, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=235432 2023-05-03 19:57:40 - progress_bar.py[line:274] - INFO: epoch 010: 3210 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7691.6, nsentences=120, sample_size=4391.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1910.5, ups=0.25, wpb=7691.6, bsz=120, num_updates=57490, lr=1.54767e-06, gnorm=0.927, clip=10, loss_scale=64, train_wall=40, gb_free=31.1, wall=235472 2023-05-03 19:58:20 - progress_bar.py[line:274] - INFO: epoch 010: 3220 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=8065, nsentences=120, sample_size=3947.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2015.5, ups=0.25, wpb=8065, bsz=120, num_updates=57500, lr=1.54239e-06, gnorm=1.017, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=235512 2023-05-03 19:59:00 - progress_bar.py[line:274] - INFO: epoch 010: 3230 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7998, nsentences=120, sample_size=4153.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2002.6, ups=0.25, wpb=7998, bsz=120, num_updates=57510, lr=1.53711e-06, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=27.4, wall=235552 2023-05-03 19:59:39 - progress_bar.py[line:274] - INFO: epoch 010: 3240 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7812.4, nsentences=120, sample_size=4250.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1975.1, ups=0.25, wpb=7812.4, bsz=120, num_updates=57520, lr=1.53182e-06, gnorm=0.947, clip=10, loss_scale=64, train_wall=39, gb_free=29.3, wall=235592 2023-05-03 20:00:19 - progress_bar.py[line:274] - INFO: epoch 010: 3250 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7977.4, nsentences=120, sample_size=4050.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1986.1, ups=0.25, wpb=7977.4, bsz=120, num_updates=57530, lr=1.52654e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.3, wall=235632 2023-05-03 20:00:59 - progress_bar.py[line:274] - INFO: epoch 010: 3260 / 6042 loss=2.326, loss_v1=0, loss_v2=0, nll_loss=1.059, ntokens=7298.3, nsentences=120, sample_size=4060.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1820.1, ups=0.25, wpb=7298.3, bsz=120, num_updates=57540, lr=1.52126e-06, gnorm=0.993, clip=60, loss_scale=64, train_wall=40, gb_free=31.3, wall=235672 2023-05-03 20:01:39 - progress_bar.py[line:274] - INFO: epoch 010: 3270 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7455.7, nsentences=120, sample_size=4264, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1878, ups=0.25, wpb=7455.7, bsz=120, num_updates=57550, lr=1.51598e-06, gnorm=0.95, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=235712 2023-05-03 20:02:20 - progress_bar.py[line:274] - INFO: epoch 010: 3280 / 6042 loss=2.375, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7763.7, nsentences=120, sample_size=4074, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1919.6, ups=0.25, wpb=7763.7, bsz=120, num_updates=57560, lr=1.5107e-06, gnorm=0.972, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=235752 2023-05-03 20:03:00 - progress_bar.py[line:274] - INFO: epoch 010: 3290 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7835, nsentences=120, sample_size=4201.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1947.1, ups=0.25, wpb=7835, bsz=120, num_updates=57570, lr=1.50541e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=235792 2023-05-03 20:03:39 - progress_bar.py[line:274] - INFO: epoch 010: 3300 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7457.4, nsentences=120, sample_size=3995.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1902.3, ups=0.26, wpb=7457.4, bsz=120, num_updates=57580, lr=1.50013e-06, gnorm=0.998, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=235831 2023-05-03 20:04:19 - progress_bar.py[line:274] - INFO: epoch 010: 3310 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.131, ntokens=7923.4, nsentences=120, sample_size=4008.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1986.5, ups=0.25, wpb=7923.4, bsz=120, num_updates=57590, lr=1.49485e-06, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=235871 2023-05-03 20:04:58 - progress_bar.py[line:274] - INFO: epoch 010: 3320 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7578.8, nsentences=120, sample_size=4149.7, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1918.1, ups=0.25, wpb=7578.8, bsz=120, num_updates=57600, lr=1.48957e-06, gnorm=0.984, clip=50, loss_scale=64, train_wall=39, gb_free=31, wall=235911 2023-05-03 20:05:38 - progress_bar.py[line:274] - INFO: epoch 010: 3330 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7525.1, nsentences=120, sample_size=3812.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1912.1, ups=0.25, wpb=7525.1, bsz=120, num_updates=57610, lr=1.48429e-06, gnorm=1.02, clip=70, loss_scale=64, train_wall=39, gb_free=31, wall=235950 2023-05-03 20:06:18 - progress_bar.py[line:274] - INFO: epoch 010: 3340 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7957.8, nsentences=120, sample_size=4006.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1979, ups=0.25, wpb=7957.8, bsz=120, num_updates=57620, lr=1.479e-06, gnorm=0.979, clip=30, loss_scale=64, train_wall=40, gb_free=30.6, wall=235990 2023-05-03 20:06:58 - progress_bar.py[line:274] - INFO: epoch 010: 3350 / 6042 loss=2.396, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7889.6, nsentences=120, sample_size=3940.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1995, ups=0.25, wpb=7889.6, bsz=120, num_updates=57630, lr=1.47372e-06, gnorm=0.985, clip=50, loss_scale=64, train_wall=39, gb_free=29, wall=236030 2023-05-03 20:07:37 - progress_bar.py[line:274] - INFO: epoch 010: 3360 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7783.1, nsentences=120, sample_size=4171.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1955.5, ups=0.25, wpb=7783.1, bsz=120, num_updates=57640, lr=1.46844e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=236070 2023-05-03 20:08:17 - progress_bar.py[line:274] - INFO: epoch 010: 3370 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7831.4, nsentences=120, sample_size=4131.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1972.2, ups=0.25, wpb=7831.4, bsz=120, num_updates=57650, lr=1.46316e-06, gnorm=0.963, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=236109 2023-05-03 20:08:56 - progress_bar.py[line:274] - INFO: epoch 010: 3380 / 6042 loss=2.318, loss_v1=0, loss_v2=0, nll_loss=1.048, ntokens=7471.1, nsentences=120, sample_size=3985.8, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1921.1, ups=0.26, wpb=7471.1, bsz=120, num_updates=57660, lr=1.45787e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=31.3, wall=236148 2023-05-03 20:09:36 - progress_bar.py[line:274] - INFO: epoch 010: 3390 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7550, nsentences=120, sample_size=4308.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1869.4, ups=0.25, wpb=7550, bsz=120, num_updates=57670, lr=1.45259e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=29.9, wall=236189 2023-05-03 20:10:16 - progress_bar.py[line:274] - INFO: epoch 010: 3400 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7579.5, nsentences=120, sample_size=3905.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1888.5, ups=0.25, wpb=7579.5, bsz=120, num_updates=57680, lr=1.44731e-06, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=28.8, wall=236229 2023-05-03 20:10:55 - progress_bar.py[line:274] - INFO: epoch 010: 3410 / 6042 loss=2.315, loss_v1=0, loss_v2=0, nll_loss=1.048, ntokens=7576, nsentences=120, sample_size=4090.1, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1948.2, ups=0.26, wpb=7576, bsz=120, num_updates=57690, lr=1.44203e-06, gnorm=0.987, clip=50, loss_scale=64, train_wall=39, gb_free=30, wall=236268 2023-05-03 20:11:36 - progress_bar.py[line:274] - INFO: epoch 010: 3420 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=8057.7, nsentences=120, sample_size=4136.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1997.3, ups=0.25, wpb=8057.7, bsz=120, num_updates=57700, lr=1.43675e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.1, wall=236308 2023-05-03 20:12:16 - progress_bar.py[line:274] - INFO: epoch 010: 3430 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7523.2, nsentences=120, sample_size=4197.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1881.1, ups=0.25, wpb=7523.2, bsz=120, num_updates=57710, lr=1.43146e-06, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=236348 2023-05-03 20:12:55 - progress_bar.py[line:274] - INFO: epoch 010: 3440 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7519.1, nsentences=120, sample_size=3895.1, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1897.1, ups=0.25, wpb=7519.1, bsz=120, num_updates=57720, lr=1.42618e-06, gnorm=0.978, clip=40, loss_scale=128, train_wall=40, gb_free=29.1, wall=236388 2023-05-03 20:13:35 - progress_bar.py[line:274] - INFO: epoch 010: 3450 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.102, ntokens=8011.3, nsentences=120, sample_size=3743.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2012.6, ups=0.25, wpb=8011.3, bsz=120, num_updates=57730, lr=1.4209e-06, gnorm=1.016, clip=70, loss_scale=128, train_wall=40, gb_free=28, wall=236428 2023-05-03 20:14:15 - progress_bar.py[line:274] - INFO: epoch 010: 3460 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=8113.9, nsentences=120, sample_size=3861.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2023.1, ups=0.25, wpb=8113.9, bsz=120, num_updates=57740, lr=1.41562e-06, gnorm=0.987, clip=40, loss_scale=128, train_wall=40, gb_free=29.5, wall=236468 2023-05-03 20:14:24 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 20:15:00 - progress_bar.py[line:274] - INFO: epoch 010: 3471 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7938.5, nsentences=120, sample_size=4261.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1774.9, ups=0.22, wpb=7938.5, bsz=120, num_updates=57750, lr=1.41034e-06, gnorm=0.95, clip=20, loss_scale=64, train_wall=45, gb_free=30.4, wall=236512 2023-05-03 20:15:40 - progress_bar.py[line:274] - INFO: epoch 010: 3481 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7724.8, nsentences=120, sample_size=4002.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1909.8, ups=0.25, wpb=7724.8, bsz=120, num_updates=57760, lr=1.40505e-06, gnorm=0.993, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=236553 2023-05-03 20:16:20 - progress_bar.py[line:274] - INFO: epoch 010: 3491 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7636.4, nsentences=120, sample_size=3956.4, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1915.3, ups=0.25, wpb=7636.4, bsz=120, num_updates=57770, lr=1.39977e-06, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=236593 2023-05-03 20:16:59 - progress_bar.py[line:274] - INFO: epoch 010: 3501 / 6042 loss=2.383, loss_v1=0, loss_v2=0, nll_loss=1.126, ntokens=7856.5, nsentences=120, sample_size=3846.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2023.1, ups=0.26, wpb=7856.5, bsz=120, num_updates=57780, lr=1.39449e-06, gnorm=1, clip=70, loss_scale=64, train_wall=39, gb_free=29.8, wall=236632 2023-05-03 20:17:39 - progress_bar.py[line:274] - INFO: epoch 010: 3511 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7882.2, nsentences=120, sample_size=4030.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1981.4, ups=0.25, wpb=7882.2, bsz=120, num_updates=57790, lr=1.38921e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=236671 2023-05-03 20:18:18 - progress_bar.py[line:274] - INFO: epoch 010: 3521 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.11, ntokens=7686.5, nsentences=120, sample_size=3991.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1949.5, ups=0.25, wpb=7686.5, bsz=120, num_updates=57800, lr=1.38392e-06, gnorm=0.968, clip=30, loss_scale=64, train_wall=39, gb_free=30.6, wall=236711 2023-05-03 20:18:58 - progress_bar.py[line:274] - INFO: epoch 010: 3531 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7422.1, nsentences=120, sample_size=4071.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1869.7, ups=0.25, wpb=7422.1, bsz=120, num_updates=57810, lr=1.37864e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=236750 2023-05-03 20:19:38 - progress_bar.py[line:274] - INFO: epoch 010: 3541 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7687.6, nsentences=120, sample_size=3895.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1942.6, ups=0.25, wpb=7687.6, bsz=120, num_updates=57820, lr=1.37336e-06, gnorm=0.987, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=236790 2023-05-03 20:20:17 - progress_bar.py[line:274] - INFO: epoch 010: 3551 / 6042 loss=2.317, loss_v1=0, loss_v2=0, nll_loss=1.055, ntokens=7605.9, nsentences=120, sample_size=4300.5, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1934.4, ups=0.25, wpb=7605.9, bsz=120, num_updates=57830, lr=1.36808e-06, gnorm=0.938, clip=10, loss_scale=64, train_wall=39, gb_free=29.9, wall=236829 2023-05-03 20:20:57 - progress_bar.py[line:274] - INFO: epoch 010: 3561 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.133, ntokens=7681.1, nsentences=120, sample_size=3955.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1917.9, ups=0.25, wpb=7681.1, bsz=120, num_updates=57840, lr=1.3628e-06, gnorm=1.003, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=236869 2023-05-03 20:21:37 - progress_bar.py[line:274] - INFO: epoch 010: 3571 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7952.7, nsentences=120, sample_size=4060.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2008.1, ups=0.25, wpb=7952.7, bsz=120, num_updates=57850, lr=1.35751e-06, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=236909 2023-05-03 20:22:16 - progress_bar.py[line:274] - INFO: epoch 010: 3581 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7589.7, nsentences=120, sample_size=4093.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1948.4, ups=0.26, wpb=7589.7, bsz=120, num_updates=57860, lr=1.35223e-06, gnorm=0.983, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=236948 2023-05-03 20:22:55 - progress_bar.py[line:274] - INFO: epoch 010: 3591 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.108, ntokens=7641.3, nsentences=120, sample_size=4073.2, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1934.3, ups=0.25, wpb=7641.3, bsz=120, num_updates=57870, lr=1.34695e-06, gnorm=0.995, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=236987 2023-05-03 20:23:34 - progress_bar.py[line:274] - INFO: epoch 010: 3601 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7658.1, nsentences=120, sample_size=3903.2, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1946.5, ups=0.25, wpb=7658.1, bsz=120, num_updates=57880, lr=1.34167e-06, gnorm=1.002, clip=50, loss_scale=64, train_wall=39, gb_free=29.2, wall=237027 2023-05-03 20:24:15 - progress_bar.py[line:274] - INFO: epoch 010: 3611 / 6042 loss=2.32, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7923.8, nsentences=120, sample_size=4234.8, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1971.7, ups=0.25, wpb=7923.8, bsz=120, num_updates=57890, lr=1.33639e-06, gnorm=0.955, clip=30, loss_scale=64, train_wall=40, gb_free=28.9, wall=237067 2023-05-03 20:24:54 - progress_bar.py[line:274] - INFO: epoch 010: 3621 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7851.8, nsentences=120, sample_size=4085.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1970.8, ups=0.25, wpb=7851.8, bsz=120, num_updates=57900, lr=1.3311e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=237107 2023-05-03 20:25:34 - progress_bar.py[line:274] - INFO: epoch 010: 3631 / 6042 loss=2.408, loss_v1=0, loss_v2=0, nll_loss=1.156, ntokens=7630.9, nsentences=120, sample_size=4093.5, sample_size_v1=0, sample_size_v2=0, ppl=2.23, wps=1910.1, ups=0.25, wpb=7630.9, bsz=120, num_updates=57910, lr=1.32582e-06, gnorm=0.989, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=237147 2023-05-03 20:26:14 - progress_bar.py[line:274] - INFO: epoch 010: 3641 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7679.7, nsentences=120, sample_size=4118.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1927.5, ups=0.25, wpb=7679.7, bsz=120, num_updates=57920, lr=1.32054e-06, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=28.9, wall=237187 2023-05-03 20:26:55 - progress_bar.py[line:274] - INFO: epoch 010: 3651 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7421.6, nsentences=120, sample_size=4257.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1828.4, ups=0.25, wpb=7421.6, bsz=120, num_updates=57930, lr=1.31526e-06, gnorm=0.943, clip=30, loss_scale=64, train_wall=41, gb_free=30.3, wall=237227 2023-05-03 20:27:35 - progress_bar.py[line:274] - INFO: epoch 010: 3661 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=8001.3, nsentences=120, sample_size=4208.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2013.1, ups=0.25, wpb=8001.3, bsz=120, num_updates=57940, lr=1.30997e-06, gnorm=0.969, clip=30, loss_scale=64, train_wall=40, gb_free=30.3, wall=237267 2023-05-03 20:28:13 - progress_bar.py[line:274] - INFO: epoch 010: 3671 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7916.2, nsentences=120, sample_size=4036.8, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2039.1, ups=0.26, wpb=7916.2, bsz=120, num_updates=57950, lr=1.30469e-06, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=237306 2023-05-03 20:28:52 - progress_bar.py[line:274] - INFO: epoch 010: 3681 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7452.2, nsentences=120, sample_size=3960.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1921.7, ups=0.26, wpb=7452.2, bsz=120, num_updates=57960, lr=1.29941e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=39, gb_free=28.8, wall=237345 2023-05-03 20:29:32 - progress_bar.py[line:274] - INFO: epoch 010: 3691 / 6042 loss=2.335, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7546.9, nsentences=120, sample_size=4102.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1891.8, ups=0.25, wpb=7546.9, bsz=120, num_updates=57970, lr=1.29413e-06, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=237384 2023-05-03 20:30:12 - progress_bar.py[line:274] - INFO: epoch 010: 3701 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7666.3, nsentences=120, sample_size=3978.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1925.9, ups=0.25, wpb=7666.3, bsz=120, num_updates=57980, lr=1.28885e-06, gnorm=0.958, clip=20, loss_scale=64, train_wall=40, gb_free=29, wall=237424 2023-05-03 20:30:51 - progress_bar.py[line:274] - INFO: epoch 010: 3711 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.122, ntokens=8012.5, nsentences=120, sample_size=3802.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2050.3, ups=0.26, wpb=8012.5, bsz=120, num_updates=57990, lr=1.28356e-06, gnorm=1.008, clip=60, loss_scale=64, train_wall=39, gb_free=29.3, wall=237463 2023-05-03 20:31:31 - progress_bar.py[line:274] - INFO: epoch 010: 3721 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7717.5, nsentences=120, sample_size=4159.1, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1912, ups=0.25, wpb=7717.5, bsz=120, num_updates=58000, lr=1.27828e-06, gnorm=0.961, clip=20, loss_scale=64, train_wall=40, gb_free=31, wall=237504 2023-05-03 20:31:31 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 20:31:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 20:31:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:31:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 20:31:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:31:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:31:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:31:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 20:32:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:32:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:13 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 20:32:13 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:32:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:18 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 20:32:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:32:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 20:32:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 20:32:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 20:32:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 20:32:23 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.269 | loss_v1 0 | loss_v2 0 | nll_loss 2.104 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.3 | score 0.7578 | wps 3297.8 | wpb 3202.1 | bsz 39.4 | num_updates 58000 | best_score 0.7627 2023-05-03 20:32:23 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 58000 updates 2023-05-03 20:32:23 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_58000.pt 2023-05-03 20:32:47 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_58000.pt 2023-05-03 20:33:01 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_58000.pt (epoch 10 @ 58000 updates, score 0.7578) (writing took 37.979153549997136 seconds) 2023-05-03 20:33:39 - progress_bar.py[line:274] - INFO: epoch 010: 3731 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7465, nsentences=120, sample_size=4153.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=582.9, ups=0.08, wpb=7465, bsz=120, num_updates=58010, lr=1.273e-06, gnorm=0.993, clip=50, loss_scale=64, train_wall=39, gb_free=29.5, wall=237632 2023-05-03 20:34:20 - progress_bar.py[line:274] - INFO: epoch 010: 3741 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7545.9, nsentences=120, sample_size=4382.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1875.8, ups=0.25, wpb=7545.9, bsz=120, num_updates=58020, lr=1.26772e-06, gnorm=0.942, clip=10, loss_scale=64, train_wall=40, gb_free=30.3, wall=237672 2023-05-03 20:35:00 - progress_bar.py[line:274] - INFO: epoch 010: 3751 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7819.1, nsentences=120, sample_size=3912.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1953.7, ups=0.25, wpb=7819.1, bsz=120, num_updates=58030, lr=1.26244e-06, gnorm=0.984, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=237712 2023-05-03 20:35:39 - progress_bar.py[line:274] - INFO: epoch 010: 3761 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7914.1, nsentences=120, sample_size=4131.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1986.4, ups=0.25, wpb=7914.1, bsz=120, num_updates=58040, lr=1.25715e-06, gnorm=0.976, clip=40, loss_scale=64, train_wall=40, gb_free=29.3, wall=237752 2023-05-03 20:36:20 - progress_bar.py[line:274] - INFO: epoch 010: 3771 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7738.4, nsentences=120, sample_size=3896.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1898.1, ups=0.25, wpb=7738.4, bsz=120, num_updates=58050, lr=1.25187e-06, gnorm=0.98, clip=30, loss_scale=64, train_wall=41, gb_free=23.6, wall=237793 2023-05-03 20:37:00 - progress_bar.py[line:274] - INFO: epoch 010: 3781 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=8029.7, nsentences=120, sample_size=3757.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2003, ups=0.25, wpb=8029.7, bsz=120, num_updates=58060, lr=1.24659e-06, gnorm=1.01, clip=50, loss_scale=64, train_wall=40, gb_free=30.8, wall=237833 2023-05-03 20:37:40 - progress_bar.py[line:274] - INFO: epoch 010: 3791 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.081, ntokens=7714.2, nsentences=120, sample_size=3802.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1921.9, ups=0.25, wpb=7714.2, bsz=120, num_updates=58070, lr=1.24131e-06, gnorm=0.981, clip=30, loss_scale=64, train_wall=40, gb_free=30.2, wall=237873 2023-05-03 20:38:20 - progress_bar.py[line:274] - INFO: epoch 010: 3801 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7778.4, nsentences=120, sample_size=4020, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1976.6, ups=0.25, wpb=7778.4, bsz=120, num_updates=58080, lr=1.23602e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=39, gb_free=30.2, wall=237912 2023-05-03 20:38:59 - progress_bar.py[line:274] - INFO: epoch 010: 3811 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7760.2, nsentences=120, sample_size=4095.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1973.3, ups=0.25, wpb=7760.2, bsz=120, num_updates=58090, lr=1.23074e-06, gnorm=1.008, clip=50, loss_scale=64, train_wall=39, gb_free=29.6, wall=237952 2023-05-03 20:39:39 - progress_bar.py[line:274] - INFO: epoch 010: 3821 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7873.3, nsentences=120, sample_size=3749.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1967.4, ups=0.25, wpb=7873.3, bsz=120, num_updates=58100, lr=1.22546e-06, gnorm=1.02, clip=60, loss_scale=64, train_wall=40, gb_free=28.6, wall=237992 2023-05-03 20:40:18 - progress_bar.py[line:274] - INFO: epoch 010: 3831 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7526.6, nsentences=120, sample_size=4136.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1924.8, ups=0.26, wpb=7526.6, bsz=120, num_updates=58110, lr=1.22018e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=29.4, wall=238031 2023-05-03 20:40:57 - progress_bar.py[line:274] - INFO: epoch 010: 3841 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7609, nsentences=120, sample_size=3948.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1951.1, ups=0.26, wpb=7609, bsz=120, num_updates=58120, lr=1.2149e-06, gnorm=1.027, clip=60, loss_scale=64, train_wall=39, gb_free=30.2, wall=238070 2023-05-03 20:41:37 - progress_bar.py[line:274] - INFO: epoch 010: 3851 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7755.8, nsentences=120, sample_size=4055.9, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1945.6, ups=0.25, wpb=7755.8, bsz=120, num_updates=58130, lr=1.20961e-06, gnorm=0.99, clip=30, loss_scale=64, train_wall=40, gb_free=26.9, wall=238110 2023-05-03 20:42:17 - progress_bar.py[line:274] - INFO: epoch 010: 3861 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7639.7, nsentences=120, sample_size=4018.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1905.7, ups=0.25, wpb=7639.7, bsz=120, num_updates=58140, lr=1.20433e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=30.3, wall=238150 2023-05-03 20:42:57 - progress_bar.py[line:274] - INFO: epoch 010: 3871 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=7532.7, nsentences=120, sample_size=4069.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1877.2, ups=0.25, wpb=7532.7, bsz=120, num_updates=58150, lr=1.19905e-06, gnorm=0.954, clip=10, loss_scale=64, train_wall=40, gb_free=28, wall=238190 2023-05-03 20:43:37 - progress_bar.py[line:274] - INFO: epoch 010: 3881 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7954.7, nsentences=120, sample_size=4062.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1981.7, ups=0.25, wpb=7954.7, bsz=120, num_updates=58160, lr=1.19377e-06, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=31.2, wall=238230 2023-05-03 20:44:17 - progress_bar.py[line:274] - INFO: epoch 010: 3891 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7444.3, nsentences=120, sample_size=4241.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1878.5, ups=0.25, wpb=7444.3, bsz=120, num_updates=58170, lr=1.18848e-06, gnorm=0.966, clip=30, loss_scale=64, train_wall=40, gb_free=29, wall=238270 2023-05-03 20:44:57 - progress_bar.py[line:274] - INFO: epoch 010: 3901 / 6042 loss=2.315, loss_v1=0, loss_v2=0, nll_loss=1.047, ntokens=7754, nsentences=120, sample_size=4403, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1940.8, ups=0.25, wpb=7754, bsz=120, num_updates=58180, lr=1.1832e-06, gnorm=0.951, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=238310 2023-05-03 20:45:37 - progress_bar.py[line:274] - INFO: epoch 010: 3911 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.114, ntokens=7853.3, nsentences=120, sample_size=4054.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1960.3, ups=0.25, wpb=7853.3, bsz=120, num_updates=58190, lr=1.17792e-06, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=238350 2023-05-03 20:46:17 - progress_bar.py[line:274] - INFO: epoch 010: 3921 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7881.8, nsentences=120, sample_size=3685.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1998.2, ups=0.25, wpb=7881.8, bsz=120, num_updates=58200, lr=1.17264e-06, gnorm=1.026, clip=70, loss_scale=64, train_wall=39, gb_free=24.6, wall=238389 2023-05-03 20:46:57 - progress_bar.py[line:274] - INFO: epoch 010: 3931 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7760.9, nsentences=120, sample_size=4116.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1941.6, ups=0.25, wpb=7760.9, bsz=120, num_updates=58210, lr=1.16736e-06, gnorm=0.973, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=238429 2023-05-03 20:47:37 - progress_bar.py[line:274] - INFO: epoch 010: 3941 / 6042 loss=2.312, loss_v1=0, loss_v2=0, nll_loss=1.054, ntokens=7605.1, nsentences=120, sample_size=4015.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1897.7, ups=0.25, wpb=7605.1, bsz=120, num_updates=58220, lr=1.16207e-06, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=238469 2023-05-03 20:48:17 - progress_bar.py[line:274] - INFO: epoch 010: 3951 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7532.4, nsentences=120, sample_size=4026.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1883.1, ups=0.25, wpb=7532.4, bsz=120, num_updates=58230, lr=1.15679e-06, gnorm=0.953, clip=0, loss_scale=64, train_wall=40, gb_free=30.5, wall=238509 2023-05-03 20:48:57 - progress_bar.py[line:274] - INFO: epoch 010: 3961 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.062, ntokens=7825.3, nsentences=120, sample_size=3874.9, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1951.4, ups=0.25, wpb=7825.3, bsz=120, num_updates=58240, lr=1.15151e-06, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=238549 2023-05-03 20:49:36 - progress_bar.py[line:274] - INFO: epoch 010: 3971 / 6042 loss=2.384, loss_v1=0, loss_v2=0, nll_loss=1.127, ntokens=7437.8, nsentences=120, sample_size=3971.5, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1879.8, ups=0.25, wpb=7437.8, bsz=120, num_updates=58250, lr=1.14623e-06, gnorm=1.002, clip=60, loss_scale=64, train_wall=39, gb_free=30.2, wall=238589 2023-05-03 20:50:04 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 20:50:21 - progress_bar.py[line:274] - INFO: epoch 010: 3982 / 6042 loss=2.372, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7732.4, nsentences=120, sample_size=4315.4, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1744.4, ups=0.23, wpb=7732.4, bsz=120, num_updates=58260, lr=1.14095e-06, gnorm=0.954, clip=20, loss_scale=64, train_wall=44, gb_free=29.4, wall=238633 2023-05-03 20:51:00 - progress_bar.py[line:274] - INFO: epoch 010: 3992 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7898.6, nsentences=120, sample_size=4091.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1994.1, ups=0.25, wpb=7898.6, bsz=120, num_updates=58270, lr=1.13566e-06, gnorm=0.977, clip=50, loss_scale=64, train_wall=40, gb_free=23.6, wall=238673 2023-05-03 20:51:40 - progress_bar.py[line:274] - INFO: epoch 010: 4002 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7488.2, nsentences=120, sample_size=4196.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1888.9, ups=0.25, wpb=7488.2, bsz=120, num_updates=58280, lr=1.13038e-06, gnorm=0.984, clip=30, loss_scale=64, train_wall=40, gb_free=31, wall=238712 2023-05-03 20:52:19 - progress_bar.py[line:274] - INFO: epoch 010: 4012 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7818.4, nsentences=120, sample_size=3824, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1976.3, ups=0.25, wpb=7818.4, bsz=120, num_updates=58290, lr=1.1251e-06, gnorm=1.032, clip=50, loss_scale=64, train_wall=39, gb_free=30.4, wall=238752 2023-05-03 20:52:58 - progress_bar.py[line:274] - INFO: epoch 010: 4022 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7766, nsentences=120, sample_size=4082.9, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1990.6, ups=0.26, wpb=7766, bsz=120, num_updates=58300, lr=1.11982e-06, gnorm=0.991, clip=60, loss_scale=64, train_wall=39, gb_free=28.3, wall=238791 2023-05-03 20:53:38 - progress_bar.py[line:274] - INFO: epoch 010: 4032 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7843.1, nsentences=120, sample_size=4188.7, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1971.7, ups=0.25, wpb=7843.1, bsz=120, num_updates=58310, lr=1.11453e-06, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=31.1, wall=238831 2023-05-03 20:54:18 - progress_bar.py[line:274] - INFO: epoch 010: 4042 / 6042 loss=2.367, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7664.3, nsentences=120, sample_size=3920.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1925.7, ups=0.25, wpb=7664.3, bsz=120, num_updates=58320, lr=1.10925e-06, gnorm=0.997, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=238871 2023-05-03 20:54:58 - progress_bar.py[line:274] - INFO: epoch 010: 4052 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.147, ntokens=7953.5, nsentences=120, sample_size=3998.6, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=2014.3, ups=0.25, wpb=7953.5, bsz=120, num_updates=58330, lr=1.10397e-06, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=238910 2023-05-03 20:55:38 - progress_bar.py[line:274] - INFO: epoch 010: 4062 / 6042 loss=2.394, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7586.6, nsentences=120, sample_size=4158.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1864.9, ups=0.25, wpb=7586.6, bsz=120, num_updates=58340, lr=1.09869e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=41, gb_free=29.5, wall=238951 2023-05-03 20:56:19 - progress_bar.py[line:274] - INFO: epoch 010: 4072 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7821.5, nsentences=120, sample_size=4030.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1920.5, ups=0.25, wpb=7821.5, bsz=120, num_updates=58350, lr=1.09341e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=41, gb_free=30, wall=238991 2023-05-03 20:56:59 - progress_bar.py[line:274] - INFO: epoch 010: 4082 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7640.3, nsentences=120, sample_size=4105.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1896.8, ups=0.25, wpb=7640.3, bsz=120, num_updates=58360, lr=1.08812e-06, gnorm=0.965, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=239032 2023-05-03 20:57:39 - progress_bar.py[line:274] - INFO: epoch 010: 4092 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7897, nsentences=120, sample_size=3897.7, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1989.3, ups=0.25, wpb=7897, bsz=120, num_updates=58370, lr=1.08284e-06, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=239071 2023-05-03 20:58:19 - progress_bar.py[line:274] - INFO: epoch 010: 4102 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7401.7, nsentences=120, sample_size=4346.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1855, ups=0.25, wpb=7401.7, bsz=120, num_updates=58380, lr=1.07756e-06, gnorm=0.945, clip=10, loss_scale=64, train_wall=40, gb_free=30.6, wall=239111 2023-05-03 20:58:58 - progress_bar.py[line:274] - INFO: epoch 010: 4112 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7771.6, nsentences=120, sample_size=4142.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1970.3, ups=0.25, wpb=7771.6, bsz=120, num_updates=58390, lr=1.07228e-06, gnorm=0.965, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=239151 2023-05-03 20:59:38 - progress_bar.py[line:274] - INFO: epoch 010: 4122 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7675.8, nsentences=120, sample_size=4323.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1915, ups=0.25, wpb=7675.8, bsz=120, num_updates=58400, lr=1.067e-06, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=239191 2023-05-03 21:00:18 - progress_bar.py[line:274] - INFO: epoch 010: 4132 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7916.2, nsentences=120, sample_size=4411.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1986, ups=0.25, wpb=7916.2, bsz=120, num_updates=58410, lr=1.06171e-06, gnorm=0.946, clip=10, loss_scale=64, train_wall=40, gb_free=29.7, wall=239231 2023-05-03 21:00:59 - progress_bar.py[line:274] - INFO: epoch 010: 4142 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=8055.7, nsentences=120, sample_size=4065.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1993, ups=0.25, wpb=8055.7, bsz=120, num_updates=58420, lr=1.05643e-06, gnorm=0.963, clip=20, loss_scale=64, train_wall=40, gb_free=29.5, wall=239271 2023-05-03 21:01:38 - progress_bar.py[line:274] - INFO: epoch 010: 4152 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7997.2, nsentences=120, sample_size=3579.7, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=2009.9, ups=0.25, wpb=7997.2, bsz=120, num_updates=58430, lr=1.05115e-06, gnorm=1.008, clip=30, loss_scale=64, train_wall=40, gb_free=30.7, wall=239311 2023-05-03 21:02:19 - progress_bar.py[line:274] - INFO: epoch 010: 4162 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7822.2, nsentences=120, sample_size=3793.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1930.1, ups=0.25, wpb=7822.2, bsz=120, num_updates=58440, lr=1.04587e-06, gnorm=1.022, clip=60, loss_scale=64, train_wall=40, gb_free=29.5, wall=239351 2023-05-03 21:03:00 - progress_bar.py[line:274] - INFO: epoch 010: 4172 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7749.4, nsentences=120, sample_size=4381.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1904.2, ups=0.25, wpb=7749.4, bsz=120, num_updates=58450, lr=1.04058e-06, gnorm=0.931, clip=10, loss_scale=64, train_wall=41, gb_free=29.1, wall=239392 2023-05-03 21:03:39 - progress_bar.py[line:274] - INFO: epoch 010: 4182 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7505.4, nsentences=120, sample_size=4308.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1921, ups=0.26, wpb=7505.4, bsz=120, num_updates=58460, lr=1.0353e-06, gnorm=0.953, clip=30, loss_scale=64, train_wall=39, gb_free=28.1, wall=239431 2023-05-03 21:04:19 - progress_bar.py[line:274] - INFO: epoch 010: 4192 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7447, nsentences=120, sample_size=4088.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1851.7, ups=0.25, wpb=7447, bsz=120, num_updates=58470, lr=1.03002e-06, gnorm=0.986, clip=50, loss_scale=64, train_wall=40, gb_free=27.8, wall=239471 2023-05-03 21:04:58 - progress_bar.py[line:274] - INFO: epoch 010: 4202 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7778.9, nsentences=120, sample_size=4118.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1981.3, ups=0.25, wpb=7778.9, bsz=120, num_updates=58480, lr=1.02474e-06, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=27.1, wall=239511 2023-05-03 21:05:38 - progress_bar.py[line:274] - INFO: epoch 010: 4212 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7794.4, nsentences=120, sample_size=4038, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1956.4, ups=0.25, wpb=7794.4, bsz=120, num_updates=58490, lr=1.01946e-06, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=239551 2023-05-03 21:06:18 - progress_bar.py[line:274] - INFO: epoch 010: 4222 / 6042 loss=2.388, loss_v1=0, loss_v2=0, nll_loss=1.134, ntokens=7828.9, nsentences=120, sample_size=3814.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1945.4, ups=0.25, wpb=7828.9, bsz=120, num_updates=58500, lr=1.01417e-06, gnorm=1.018, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=239591 2023-05-03 21:06:59 - progress_bar.py[line:274] - INFO: epoch 010: 4232 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7733.6, nsentences=120, sample_size=3828.4, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1881, ups=0.24, wpb=7733.6, bsz=120, num_updates=58510, lr=1.00889e-06, gnorm=0.986, clip=30, loss_scale=64, train_wall=41, gb_free=30.2, wall=239632 2023-05-03 21:07:39 - progress_bar.py[line:274] - INFO: epoch 010: 4242 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.124, ntokens=7873.2, nsentences=120, sample_size=3947.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1986.7, ups=0.25, wpb=7873.2, bsz=120, num_updates=58520, lr=1.00361e-06, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=239672 2023-05-03 21:08:19 - progress_bar.py[line:274] - INFO: epoch 010: 4252 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.079, ntokens=7673.1, nsentences=120, sample_size=3825.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1924, ups=0.25, wpb=7673.1, bsz=120, num_updates=58530, lr=9.98327e-07, gnorm=1.01, clip=60, loss_scale=64, train_wall=40, gb_free=30.5, wall=239711 2023-05-03 21:08:58 - progress_bar.py[line:274] - INFO: epoch 010: 4262 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7800.2, nsentences=120, sample_size=3582.8, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1987.6, ups=0.25, wpb=7800.2, bsz=120, num_updates=58540, lr=9.93045e-07, gnorm=1.007, clip=40, loss_scale=64, train_wall=39, gb_free=31.4, wall=239751 2023-05-03 21:09:38 - progress_bar.py[line:274] - INFO: epoch 010: 4272 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7522.6, nsentences=120, sample_size=4439.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1907.9, ups=0.25, wpb=7522.6, bsz=120, num_updates=58550, lr=9.87763e-07, gnorm=0.921, clip=10, loss_scale=64, train_wall=39, gb_free=31.3, wall=239790 2023-05-03 21:10:18 - progress_bar.py[line:274] - INFO: epoch 010: 4282 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7619.5, nsentences=120, sample_size=4141.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1898.3, ups=0.25, wpb=7619.5, bsz=120, num_updates=58560, lr=9.82481e-07, gnorm=0.979, clip=20, loss_scale=64, train_wall=40, gb_free=30.3, wall=239830 2023-05-03 21:10:58 - progress_bar.py[line:274] - INFO: epoch 010: 4292 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7635.4, nsentences=120, sample_size=4095, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1902.1, ups=0.25, wpb=7635.4, bsz=120, num_updates=58570, lr=9.77199e-07, gnorm=0.982, clip=40, loss_scale=64, train_wall=40, gb_free=30.7, wall=239870 2023-05-03 21:11:38 - progress_bar.py[line:274] - INFO: epoch 010: 4302 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7771.5, nsentences=120, sample_size=4093.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1947.4, ups=0.25, wpb=7771.5, bsz=120, num_updates=58580, lr=9.71917e-07, gnorm=0.952, clip=10, loss_scale=64, train_wall=40, gb_free=28.1, wall=239910 2023-05-03 21:12:17 - progress_bar.py[line:274] - INFO: epoch 010: 4312 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7537.5, nsentences=120, sample_size=3725.8, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1924.1, ups=0.26, wpb=7537.5, bsz=120, num_updates=58590, lr=9.66634e-07, gnorm=1.042, clip=70, loss_scale=64, train_wall=39, gb_free=30.3, wall=239949 2023-05-03 21:12:56 - progress_bar.py[line:274] - INFO: epoch 010: 4322 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7645.5, nsentences=120, sample_size=3850.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1952.8, ups=0.26, wpb=7645.5, bsz=120, num_updates=58600, lr=9.61352e-07, gnorm=1.011, clip=50, loss_scale=64, train_wall=39, gb_free=31.5, wall=239989 2023-05-03 21:13:36 - progress_bar.py[line:274] - INFO: epoch 010: 4332 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7909.6, nsentences=120, sample_size=4027.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1990.9, ups=0.25, wpb=7909.6, bsz=120, num_updates=58610, lr=9.5607e-07, gnorm=0.999, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=240028 2023-05-03 21:14:15 - progress_bar.py[line:274] - INFO: epoch 010: 4342 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7433.9, nsentences=120, sample_size=3898.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1898.3, ups=0.26, wpb=7433.9, bsz=120, num_updates=58620, lr=9.50788e-07, gnorm=0.99, clip=40, loss_scale=64, train_wall=39, gb_free=25.5, wall=240067 2023-05-03 21:14:55 - progress_bar.py[line:274] - INFO: epoch 010: 4352 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7844.1, nsentences=120, sample_size=3695.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1971.2, ups=0.25, wpb=7844.1, bsz=120, num_updates=58630, lr=9.45506e-07, gnorm=1.023, clip=60, loss_scale=64, train_wall=40, gb_free=28.4, wall=240107 2023-05-03 21:15:34 - progress_bar.py[line:274] - INFO: epoch 010: 4362 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.136, ntokens=7604.1, nsentences=120, sample_size=4026.6, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1917, ups=0.25, wpb=7604.1, bsz=120, num_updates=58640, lr=9.40224e-07, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=240147 2023-05-03 21:16:14 - progress_bar.py[line:274] - INFO: epoch 010: 4372 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7533.1, nsentences=120, sample_size=3959.1, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1910.6, ups=0.25, wpb=7533.1, bsz=120, num_updates=58650, lr=9.34941e-07, gnorm=0.966, clip=20, loss_scale=64, train_wall=39, gb_free=30.4, wall=240186 2023-05-03 21:16:53 - progress_bar.py[line:274] - INFO: epoch 010: 4382 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.106, ntokens=7372.2, nsentences=120, sample_size=3774.9, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1874.1, ups=0.25, wpb=7372.2, bsz=120, num_updates=58660, lr=9.29659e-07, gnorm=1.023, clip=50, loss_scale=64, train_wall=39, gb_free=30.5, wall=240226 2023-05-03 21:17:33 - progress_bar.py[line:274] - INFO: epoch 010: 4392 / 6042 loss=2.369, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7840.4, nsentences=120, sample_size=4079.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1962.5, ups=0.25, wpb=7840.4, bsz=120, num_updates=58670, lr=9.24377e-07, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=30.5, wall=240266 2023-05-03 21:18:13 - progress_bar.py[line:274] - INFO: epoch 010: 4402 / 6042 loss=2.391, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7961.8, nsentences=120, sample_size=4106.9, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1988, ups=0.25, wpb=7961.8, bsz=120, num_updates=58680, lr=9.19095e-07, gnorm=0.981, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=240306 2023-05-03 21:18:53 - progress_bar.py[line:274] - INFO: epoch 010: 4412 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=8014.3, nsentences=120, sample_size=4314, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2003.3, ups=0.25, wpb=8014.3, bsz=120, num_updates=58690, lr=9.13813e-07, gnorm=0.927, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=240346 2023-05-03 21:19:33 - progress_bar.py[line:274] - INFO: epoch 010: 4422 / 6042 loss=2.325, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7505.6, nsentences=120, sample_size=4060.4, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1876.8, ups=0.25, wpb=7505.6, bsz=120, num_updates=58700, lr=9.08531e-07, gnorm=0.972, clip=50, loss_scale=64, train_wall=40, gb_free=29.8, wall=240386 2023-05-03 21:20:13 - progress_bar.py[line:274] - INFO: epoch 010: 4432 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7918.6, nsentences=120, sample_size=3719.4, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1993.7, ups=0.25, wpb=7918.6, bsz=120, num_updates=58710, lr=9.03249e-07, gnorm=1.008, clip=60, loss_scale=64, train_wall=40, gb_free=29, wall=240425 2023-05-03 21:20:53 - progress_bar.py[line:274] - INFO: epoch 010: 4442 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7856.1, nsentences=120, sample_size=3841.5, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1964.1, ups=0.25, wpb=7856.1, bsz=120, num_updates=58720, lr=8.97966e-07, gnorm=1.013, clip=70, loss_scale=64, train_wall=40, gb_free=29.4, wall=240465 2023-05-03 21:21:34 - progress_bar.py[line:274] - INFO: epoch 010: 4452 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7838.5, nsentences=120, sample_size=3955.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1922.5, ups=0.25, wpb=7838.5, bsz=120, num_updates=58730, lr=8.92684e-07, gnorm=0.995, clip=40, loss_scale=64, train_wall=41, gb_free=29.2, wall=240506 2023-05-03 21:22:14 - progress_bar.py[line:274] - INFO: epoch 010: 4462 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.097, ntokens=7791.8, nsentences=120, sample_size=4060.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1957.2, ups=0.25, wpb=7791.8, bsz=120, num_updates=58740, lr=8.87402e-07, gnorm=0.971, clip=30, loss_scale=64, train_wall=40, gb_free=30.4, wall=240546 2023-05-03 21:22:52 - progress_bar.py[line:274] - INFO: epoch 010: 4472 / 6042 loss=2.311, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7577.8, nsentences=120, sample_size=3919.9, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1952.9, ups=0.26, wpb=7577.8, bsz=120, num_updates=58750, lr=8.8212e-07, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=30.2, wall=240585 2023-05-03 21:23:32 - progress_bar.py[line:274] - INFO: epoch 010: 4482 / 6042 loss=2.366, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7956.5, nsentences=120, sample_size=4051.7, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2007.9, ups=0.25, wpb=7956.5, bsz=120, num_updates=58760, lr=8.76838e-07, gnorm=0.996, clip=40, loss_scale=64, train_wall=40, gb_free=28.7, wall=240624 2023-05-03 21:24:11 - progress_bar.py[line:274] - INFO: epoch 010: 4492 / 6042 loss=2.348, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7721.4, nsentences=120, sample_size=4097.9, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1972.6, ups=0.26, wpb=7721.4, bsz=120, num_updates=58770, lr=8.71556e-07, gnorm=0.995, clip=50, loss_scale=128, train_wall=39, gb_free=31, wall=240664 2023-05-03 21:24:51 - progress_bar.py[line:274] - INFO: epoch 010: 4502 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7576, nsentences=120, sample_size=4071.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1898, ups=0.25, wpb=7576, bsz=120, num_updates=58780, lr=8.66273e-07, gnorm=0.997, clip=30, loss_scale=128, train_wall=40, gb_free=29.5, wall=240704 2023-05-03 21:24:59 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 21:25:35 - progress_bar.py[line:274] - INFO: epoch 010: 4513 / 6042 loss=2.316, loss_v1=0, loss_v2=0, nll_loss=1.053, ntokens=7790.2, nsentences=120, sample_size=4251.4, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1784.8, ups=0.23, wpb=7790.2, bsz=120, num_updates=58790, lr=8.60991e-07, gnorm=0.968, clip=10, loss_scale=64, train_wall=44, gb_free=30.3, wall=240747 2023-05-03 21:26:15 - progress_bar.py[line:274] - INFO: epoch 010: 4523 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.119, ntokens=7711.7, nsentences=120, sample_size=4349.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1925.3, ups=0.25, wpb=7711.7, bsz=120, num_updates=58800, lr=8.55709e-07, gnorm=0.938, clip=10, loss_scale=64, train_wall=40, gb_free=30.4, wall=240787 2023-05-03 21:26:55 - progress_bar.py[line:274] - INFO: epoch 010: 4533 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.138, ntokens=7668.6, nsentences=119.2, sample_size=3996.1, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1900.5, ups=0.25, wpb=7668.6, bsz=119.2, num_updates=58810, lr=8.50427e-07, gnorm=0.98, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=240828 2023-05-03 21:27:36 - progress_bar.py[line:274] - INFO: epoch 010: 4543 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7769, nsentences=120, sample_size=4184.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1899.1, ups=0.24, wpb=7769, bsz=120, num_updates=58820, lr=8.45145e-07, gnorm=0.995, clip=60, loss_scale=64, train_wall=41, gb_free=29.3, wall=240868 2023-05-03 21:28:16 - progress_bar.py[line:274] - INFO: epoch 010: 4553 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7937.6, nsentences=120, sample_size=4167.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1979.6, ups=0.25, wpb=7937.6, bsz=120, num_updates=58830, lr=8.39863e-07, gnorm=0.973, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=240909 2023-05-03 21:28:55 - progress_bar.py[line:274] - INFO: epoch 010: 4563 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7509.6, nsentences=120, sample_size=4320.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1910, ups=0.25, wpb=7509.6, bsz=120, num_updates=58840, lr=8.34581e-07, gnorm=0.957, clip=10, loss_scale=64, train_wall=39, gb_free=30.2, wall=240948 2023-05-03 21:29:35 - progress_bar.py[line:274] - INFO: epoch 010: 4573 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7891.7, nsentences=120, sample_size=3704, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1981.7, ups=0.25, wpb=7891.7, bsz=120, num_updates=58850, lr=8.29298e-07, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=240988 2023-05-03 21:30:14 - progress_bar.py[line:274] - INFO: epoch 010: 4583 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7731.2, nsentences=120, sample_size=4051.1, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1975.6, ups=0.26, wpb=7731.2, bsz=120, num_updates=58860, lr=8.24016e-07, gnorm=0.998, clip=30, loss_scale=64, train_wall=39, gb_free=29.8, wall=241027 2023-05-03 21:30:55 - progress_bar.py[line:274] - INFO: epoch 010: 4593 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7912.8, nsentences=120, sample_size=3829.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1948.4, ups=0.25, wpb=7912.8, bsz=120, num_updates=58870, lr=8.18734e-07, gnorm=1.008, clip=60, loss_scale=64, train_wall=41, gb_free=30.9, wall=241067 2023-05-03 21:31:35 - progress_bar.py[line:274] - INFO: epoch 010: 4603 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7983, nsentences=120, sample_size=4334.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1982.8, ups=0.25, wpb=7983, bsz=120, num_updates=58880, lr=8.13452e-07, gnorm=0.945, clip=20, loss_scale=64, train_wall=40, gb_free=30, wall=241108 2023-05-03 21:32:14 - progress_bar.py[line:274] - INFO: epoch 010: 4613 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7640.2, nsentences=120, sample_size=4043.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1947.7, ups=0.25, wpb=7640.2, bsz=120, num_updates=58890, lr=8.0817e-07, gnorm=1.003, clip=60, loss_scale=64, train_wall=39, gb_free=29.7, wall=241147 2023-05-03 21:32:55 - progress_bar.py[line:274] - INFO: epoch 010: 4623 / 6042 loss=2.393, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7956.8, nsentences=120, sample_size=3842.5, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1959.8, ups=0.25, wpb=7956.8, bsz=120, num_updates=58900, lr=8.02888e-07, gnorm=0.989, clip=50, loss_scale=64, train_wall=41, gb_free=30.8, wall=241188 2023-05-03 21:33:35 - progress_bar.py[line:274] - INFO: epoch 010: 4633 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7961.9, nsentences=120, sample_size=3975.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2004.6, ups=0.25, wpb=7961.9, bsz=120, num_updates=58910, lr=7.97605e-07, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=29.8, wall=241227 2023-05-03 21:34:13 - progress_bar.py[line:274] - INFO: epoch 010: 4643 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7559.3, nsentences=120, sample_size=3926.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1957.4, ups=0.26, wpb=7559.3, bsz=120, num_updates=58920, lr=7.92323e-07, gnorm=1.005, clip=60, loss_scale=64, train_wall=39, gb_free=28.7, wall=241266 2023-05-03 21:34:53 - progress_bar.py[line:274] - INFO: epoch 010: 4653 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7703.7, nsentences=120, sample_size=4064.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1926, ups=0.25, wpb=7703.7, bsz=120, num_updates=58930, lr=7.87041e-07, gnorm=0.967, clip=40, loss_scale=64, train_wall=40, gb_free=29.6, wall=241306 2023-05-03 21:35:33 - progress_bar.py[line:274] - INFO: epoch 010: 4663 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7628.1, nsentences=120, sample_size=3926.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1925.3, ups=0.25, wpb=7628.1, bsz=120, num_updates=58940, lr=7.81759e-07, gnorm=1.005, clip=40, loss_scale=64, train_wall=40, gb_free=30.3, wall=241346 2023-05-03 21:36:14 - progress_bar.py[line:274] - INFO: epoch 010: 4673 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.137, ntokens=7859.1, nsentences=120, sample_size=3814.4, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1931.6, ups=0.25, wpb=7859.1, bsz=120, num_updates=58950, lr=7.76477e-07, gnorm=1.018, clip=50, loss_scale=64, train_wall=41, gb_free=30.5, wall=241386 2023-05-03 21:36:53 - progress_bar.py[line:274] - INFO: epoch 010: 4683 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7937.1, nsentences=120, sample_size=3788.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2023.9, ups=0.25, wpb=7937.1, bsz=120, num_updates=58960, lr=7.71195e-07, gnorm=1.031, clip=70, loss_scale=64, train_wall=39, gb_free=30, wall=241425 2023-05-03 21:37:34 - progress_bar.py[line:274] - INFO: epoch 010: 4693 / 6042 loss=2.376, loss_v1=0, loss_v2=0, nll_loss=1.116, ntokens=7979.7, nsentences=120, sample_size=4104.6, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1966.7, ups=0.25, wpb=7979.7, bsz=120, num_updates=58970, lr=7.65912e-07, gnorm=0.96, clip=30, loss_scale=64, train_wall=41, gb_free=30.3, wall=241466 2023-05-03 21:38:13 - progress_bar.py[line:274] - INFO: epoch 010: 4703 / 6042 loss=2.389, loss_v1=0, loss_v2=0, nll_loss=1.141, ntokens=7843.6, nsentences=120, sample_size=3918.8, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1974.6, ups=0.25, wpb=7843.6, bsz=120, num_updates=58980, lr=7.6063e-07, gnorm=1.011, clip=50, loss_scale=64, train_wall=40, gb_free=30.9, wall=241506 2023-05-03 21:38:52 - progress_bar.py[line:274] - INFO: epoch 010: 4713 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7855.4, nsentences=120, sample_size=3960.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=2022.7, ups=0.26, wpb=7855.4, bsz=120, num_updates=58990, lr=7.55348e-07, gnorm=1.011, clip=70, loss_scale=64, train_wall=39, gb_free=30, wall=241545 2023-05-03 21:39:31 - progress_bar.py[line:274] - INFO: epoch 010: 4723 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7671.3, nsentences=120, sample_size=3992, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1953.8, ups=0.25, wpb=7671.3, bsz=120, num_updates=59000, lr=7.50066e-07, gnorm=0.961, clip=20, loss_scale=64, train_wall=39, gb_free=31.2, wall=241584 2023-05-03 21:39:31 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 21:39:33 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 21:39:33 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:39:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:50 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 21:39:50 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:39:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:58 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:58 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:39:59 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:39:59 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 21:40:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:40:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:14 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 21:40:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:40:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:18 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 21:40:18 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:40:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:22 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 21:40:22 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 21:40:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 21:40:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 21:40:23 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.269 | loss_v1 0 | loss_v2 0 | nll_loss 2.104 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.3 | score 0.7593 | wps 3296.4 | wpb 3202.1 | bsz 39.4 | num_updates 59000 | best_score 0.7627 2023-05-03 21:40:23 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 59000 updates 2023-05-03 21:40:23 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_59000.pt 2023-05-03 21:40:47 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_59000.pt 2023-05-03 21:41:01 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_59000.pt (epoch 10 @ 59000 updates, score 0.7593) (writing took 38.13779605994932 seconds) 2023-05-03 21:41:40 - progress_bar.py[line:274] - INFO: epoch 010: 4733 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7639, nsentences=120, sample_size=3997.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=592.5, ups=0.08, wpb=7639, bsz=120, num_updates=59010, lr=7.44784e-07, gnorm=0.993, clip=30, loss_scale=64, train_wall=39, gb_free=29.4, wall=241713 2023-05-03 21:42:19 - progress_bar.py[line:274] - INFO: epoch 010: 4743 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7528.3, nsentences=120, sample_size=3922.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1932.7, ups=0.26, wpb=7528.3, bsz=120, num_updates=59020, lr=7.39502e-07, gnorm=1.012, clip=70, loss_scale=64, train_wall=39, gb_free=31.4, wall=241752 2023-05-03 21:42:58 - progress_bar.py[line:274] - INFO: epoch 010: 4753 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7526.5, nsentences=120, sample_size=4172.3, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1919.8, ups=0.26, wpb=7526.5, bsz=120, num_updates=59030, lr=7.3422e-07, gnorm=0.991, clip=50, loss_scale=64, train_wall=39, gb_free=31, wall=241791 2023-05-03 21:43:38 - progress_bar.py[line:274] - INFO: epoch 010: 4763 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7644, nsentences=120, sample_size=4220.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1955.7, ups=0.26, wpb=7644, bsz=120, num_updates=59040, lr=7.28937e-07, gnorm=0.974, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=241830 2023-05-03 21:44:18 - progress_bar.py[line:274] - INFO: epoch 010: 4773 / 6042 loss=2.39, loss_v1=0, loss_v2=0, nll_loss=1.132, ntokens=7832.7, nsentences=120, sample_size=4283.5, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1933.7, ups=0.25, wpb=7832.7, bsz=120, num_updates=59050, lr=7.23655e-07, gnorm=0.949, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=241871 2023-05-03 21:44:58 - progress_bar.py[line:274] - INFO: epoch 010: 4783 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7616.6, nsentences=120, sample_size=4142.5, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1900.3, ups=0.25, wpb=7616.6, bsz=120, num_updates=59060, lr=7.18373e-07, gnorm=0.985, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=241911 2023-05-03 21:45:38 - progress_bar.py[line:274] - INFO: epoch 010: 4793 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=7637.3, nsentences=120, sample_size=3949.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1937.3, ups=0.25, wpb=7637.3, bsz=120, num_updates=59070, lr=7.13091e-07, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=28.5, wall=241950 2023-05-03 21:46:18 - progress_bar.py[line:274] - INFO: epoch 010: 4803 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7690.3, nsentences=120, sample_size=4052.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1914, ups=0.25, wpb=7690.3, bsz=120, num_updates=59080, lr=7.07809e-07, gnorm=1.01, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=241990 2023-05-03 21:46:57 - progress_bar.py[line:274] - INFO: epoch 010: 4813 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7300.9, nsentences=120, sample_size=3907.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1842.8, ups=0.25, wpb=7300.9, bsz=120, num_updates=59090, lr=7.02527e-07, gnorm=1.013, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=242030 2023-05-03 21:47:37 - progress_bar.py[line:274] - INFO: epoch 010: 4823 / 6042 loss=2.318, loss_v1=0, loss_v2=0, nll_loss=1.054, ntokens=7367.6, nsentences=120, sample_size=4091, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1863.4, ups=0.25, wpb=7367.6, bsz=120, num_updates=59100, lr=6.97244e-07, gnorm=0.977, clip=40, loss_scale=64, train_wall=39, gb_free=29.8, wall=242069 2023-05-03 21:48:17 - progress_bar.py[line:274] - INFO: epoch 010: 4833 / 6042 loss=2.338, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7629.8, nsentences=120, sample_size=4473.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1903.4, ups=0.25, wpb=7629.8, bsz=120, num_updates=59110, lr=6.91962e-07, gnorm=0.943, clip=0, loss_scale=64, train_wall=40, gb_free=29.4, wall=242109 2023-05-03 21:48:56 - progress_bar.py[line:274] - INFO: epoch 010: 4843 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=8226.7, nsentences=120, sample_size=3923.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=2097.4, ups=0.25, wpb=8226.7, bsz=120, num_updates=59120, lr=6.8668e-07, gnorm=0.983, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=242149 2023-05-03 21:49:35 - progress_bar.py[line:274] - INFO: epoch 010: 4853 / 6042 loss=2.316, loss_v1=0, loss_v2=0, nll_loss=1.048, ntokens=7845.2, nsentences=120, sample_size=3911.3, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1996.7, ups=0.25, wpb=7845.2, bsz=120, num_updates=59130, lr=6.81398e-07, gnorm=0.992, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=242188 2023-05-03 21:50:15 - progress_bar.py[line:274] - INFO: epoch 010: 4863 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7664.1, nsentences=120, sample_size=4319.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1936.1, ups=0.25, wpb=7664.1, bsz=120, num_updates=59140, lr=6.76116e-07, gnorm=0.957, clip=20, loss_scale=64, train_wall=40, gb_free=29.7, wall=242228 2023-05-03 21:50:55 - progress_bar.py[line:274] - INFO: epoch 010: 4873 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7897.7, nsentences=120, sample_size=3870.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1993.7, ups=0.25, wpb=7897.7, bsz=120, num_updates=59150, lr=6.70834e-07, gnorm=0.993, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=242267 2023-05-03 21:51:34 - progress_bar.py[line:274] - INFO: epoch 010: 4883 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.044, ntokens=7575.9, nsentences=120, sample_size=4133.5, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1925.8, ups=0.25, wpb=7575.9, bsz=120, num_updates=59160, lr=6.65552e-07, gnorm=0.988, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=242307 2023-05-03 21:52:15 - progress_bar.py[line:274] - INFO: epoch 010: 4893 / 6042 loss=2.373, loss_v1=0, loss_v2=0, nll_loss=1.125, ntokens=7620.6, nsentences=120, sample_size=3908.2, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1867.2, ups=0.25, wpb=7620.6, bsz=120, num_updates=59170, lr=6.60269e-07, gnorm=0.989, clip=60, loss_scale=64, train_wall=41, gb_free=30.2, wall=242347 2023-05-03 21:52:55 - progress_bar.py[line:274] - INFO: epoch 010: 4903 / 6042 loss=2.31, loss_v1=0, loss_v2=0, nll_loss=1.041, ntokens=7453.8, nsentences=120, sample_size=4126.5, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1878.2, ups=0.25, wpb=7453.8, bsz=120, num_updates=59180, lr=6.54987e-07, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=30.4, wall=242387 2023-05-03 21:53:34 - progress_bar.py[line:274] - INFO: epoch 010: 4913 / 6042 loss=2.316, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7505.4, nsentences=120, sample_size=3732.7, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1899.6, ups=0.25, wpb=7505.4, bsz=120, num_updates=59190, lr=6.49705e-07, gnorm=1.007, clip=60, loss_scale=64, train_wall=39, gb_free=29.2, wall=242427 2023-05-03 21:54:13 - progress_bar.py[line:274] - INFO: epoch 010: 4923 / 6042 loss=2.301, loss_v1=0, loss_v2=0, nll_loss=1.042, ntokens=7552.9, nsentences=120, sample_size=4104.5, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1923.3, ups=0.25, wpb=7552.9, bsz=120, num_updates=59200, lr=6.44423e-07, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=29.1, wall=242466 2023-05-03 21:54:53 - progress_bar.py[line:274] - INFO: epoch 010: 4933 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7925.3, nsentences=120, sample_size=3839.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1991.2, ups=0.25, wpb=7925.3, bsz=120, num_updates=59210, lr=6.39141e-07, gnorm=1.013, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=242506 2023-05-03 21:55:33 - progress_bar.py[line:274] - INFO: epoch 010: 4943 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7794, nsentences=120, sample_size=3854, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1963.6, ups=0.25, wpb=7794, bsz=120, num_updates=59220, lr=6.33859e-07, gnorm=1.003, clip=50, loss_scale=64, train_wall=40, gb_free=29.6, wall=242545 2023-05-03 21:56:13 - progress_bar.py[line:274] - INFO: epoch 010: 4953 / 6042 loss=2.322, loss_v1=0, loss_v2=0, nll_loss=1.057, ntokens=7796.2, nsentences=120, sample_size=4094.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1929.9, ups=0.25, wpb=7796.2, bsz=120, num_updates=59230, lr=6.28576e-07, gnorm=0.959, clip=20, loss_scale=64, train_wall=40, gb_free=29.8, wall=242586 2023-05-03 21:56:53 - progress_bar.py[line:274] - INFO: epoch 010: 4963 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7697.4, nsentences=120, sample_size=3818.2, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1927.7, ups=0.25, wpb=7697.4, bsz=120, num_updates=59240, lr=6.23294e-07, gnorm=0.985, clip=50, loss_scale=64, train_wall=40, gb_free=30.1, wall=242626 2023-05-03 21:57:33 - progress_bar.py[line:274] - INFO: epoch 010: 4973 / 6042 loss=2.331, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7798.1, nsentences=120, sample_size=4084.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1937.7, ups=0.25, wpb=7798.1, bsz=120, num_updates=59250, lr=6.18012e-07, gnorm=0.991, clip=40, loss_scale=64, train_wall=40, gb_free=29.4, wall=242666 2023-05-03 21:58:13 - progress_bar.py[line:274] - INFO: epoch 010: 4983 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7642.8, nsentences=120, sample_size=3767.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1939.6, ups=0.25, wpb=7642.8, bsz=120, num_updates=59260, lr=6.1273e-07, gnorm=1.017, clip=70, loss_scale=64, train_wall=39, gb_free=29.4, wall=242705 2023-05-03 21:58:53 - progress_bar.py[line:274] - INFO: epoch 010: 4993 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7896.9, nsentences=120, sample_size=3873.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1961.8, ups=0.25, wpb=7896.9, bsz=120, num_updates=59270, lr=6.07448e-07, gnorm=1.016, clip=60, loss_scale=64, train_wall=40, gb_free=30.6, wall=242746 2023-05-03 21:59:33 - progress_bar.py[line:274] - INFO: epoch 010: 5003 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7676.5, nsentences=120, sample_size=3547.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1927.5, ups=0.25, wpb=7676.5, bsz=120, num_updates=59280, lr=6.02166e-07, gnorm=1.02, clip=60, loss_scale=64, train_wall=40, gb_free=29.6, wall=242785 2023-05-03 22:00:13 - progress_bar.py[line:274] - INFO: epoch 010: 5013 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.135, ntokens=7631.3, nsentences=120, sample_size=4047.2, sample_size_v1=0, sample_size_v2=0, ppl=2.2, wps=1910.3, ups=0.25, wpb=7631.3, bsz=120, num_updates=59290, lr=5.96884e-07, gnorm=0.971, clip=40, loss_scale=64, train_wall=40, gb_free=31.6, wall=242825 2023-05-03 22:00:52 - progress_bar.py[line:274] - INFO: epoch 010: 5023 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.091, ntokens=7798.5, nsentences=120, sample_size=4018.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1985.5, ups=0.25, wpb=7798.5, bsz=120, num_updates=59300, lr=5.91601e-07, gnorm=0.971, clip=30, loss_scale=128, train_wall=39, gb_free=29.5, wall=242865 2023-05-03 22:01:32 - progress_bar.py[line:274] - INFO: epoch 010: 5033 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.113, ntokens=7993.1, nsentences=120, sample_size=3736.3, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2010.5, ups=0.25, wpb=7993.1, bsz=120, num_updates=59310, lr=5.86319e-07, gnorm=1.023, clip=60, loss_scale=128, train_wall=40, gb_free=30.8, wall=242904 2023-05-03 22:01:56 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 22:02:15 - progress_bar.py[line:274] - INFO: epoch 010: 5044 / 6042 loss=2.313, loss_v1=0, loss_v2=0, nll_loss=1.051, ntokens=7885.6, nsentences=120, sample_size=4209.8, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1811.7, ups=0.23, wpb=7885.6, bsz=120, num_updates=59320, lr=5.81037e-07, gnorm=0.96, clip=30, loss_scale=64, train_wall=43, gb_free=30.2, wall=242948 2023-05-03 22:02:55 - progress_bar.py[line:274] - INFO: epoch 010: 5054 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7653.6, nsentences=120, sample_size=4056.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1952.7, ups=0.26, wpb=7653.6, bsz=120, num_updates=59330, lr=5.75755e-07, gnorm=0.984, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=242987 2023-05-03 22:03:34 - progress_bar.py[line:274] - INFO: epoch 010: 5064 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.057, ntokens=7459.5, nsentences=120, sample_size=4103.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1872.4, ups=0.25, wpb=7459.5, bsz=120, num_updates=59340, lr=5.70473e-07, gnorm=0.995, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=243027 2023-05-03 22:04:14 - progress_bar.py[line:274] - INFO: epoch 010: 5074 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7542.8, nsentences=120, sample_size=4211.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1893.8, ups=0.25, wpb=7542.8, bsz=120, num_updates=59350, lr=5.65191e-07, gnorm=0.964, clip=20, loss_scale=64, train_wall=40, gb_free=30.1, wall=243067 2023-05-03 22:04:54 - progress_bar.py[line:274] - INFO: epoch 010: 5084 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7808.1, nsentences=120, sample_size=3922.5, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1972.4, ups=0.25, wpb=7808.1, bsz=120, num_updates=59360, lr=5.59908e-07, gnorm=0.98, clip=30, loss_scale=64, train_wall=40, gb_free=28.7, wall=243106 2023-05-03 22:05:33 - progress_bar.py[line:274] - INFO: epoch 010: 5094 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7545.8, nsentences=120, sample_size=4204.4, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1920.1, ups=0.25, wpb=7545.8, bsz=120, num_updates=59370, lr=5.54626e-07, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=30, wall=243146 2023-05-03 22:06:14 - progress_bar.py[line:274] - INFO: epoch 010: 5104 / 6042 loss=2.374, loss_v1=0, loss_v2=0, nll_loss=1.117, ntokens=7819, nsentences=120, sample_size=4257, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1934.5, ups=0.25, wpb=7819, bsz=120, num_updates=59380, lr=5.49344e-07, gnorm=0.96, clip=40, loss_scale=64, train_wall=40, gb_free=30.1, wall=243186 2023-05-03 22:06:54 - progress_bar.py[line:274] - INFO: epoch 010: 5114 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.078, ntokens=7853.2, nsentences=120, sample_size=3772.5, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1941, ups=0.25, wpb=7853.2, bsz=120, num_updates=59390, lr=5.44062e-07, gnorm=0.982, clip=50, loss_scale=64, train_wall=40, gb_free=26.1, wall=243226 2023-05-03 22:07:34 - progress_bar.py[line:274] - INFO: epoch 010: 5124 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.073, ntokens=7740.3, nsentences=120, sample_size=4065.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1932.9, ups=0.25, wpb=7740.3, bsz=120, num_updates=59400, lr=5.3878e-07, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=243267 2023-05-03 22:08:14 - progress_bar.py[line:274] - INFO: epoch 010: 5134 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7339.3, nsentences=120, sample_size=3942.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1852.3, ups=0.25, wpb=7339.3, bsz=120, num_updates=59410, lr=5.33498e-07, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=29.5, wall=243306 2023-05-03 22:08:54 - progress_bar.py[line:274] - INFO: epoch 010: 5144 / 6042 loss=2.312, loss_v1=0, loss_v2=0, nll_loss=1.049, ntokens=7792.7, nsentences=120, sample_size=4069.8, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1956.5, ups=0.25, wpb=7792.7, bsz=120, num_updates=59420, lr=5.28216e-07, gnorm=0.97, clip=20, loss_scale=64, train_wall=40, gb_free=29.9, wall=243346 2023-05-03 22:09:32 - progress_bar.py[line:274] - INFO: epoch 010: 5154 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7600.1, nsentences=120, sample_size=4102, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1966, ups=0.26, wpb=7600.1, bsz=120, num_updates=59430, lr=5.22933e-07, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=243385 2023-05-03 22:10:12 - progress_bar.py[line:274] - INFO: epoch 010: 5164 / 6042 loss=2.381, loss_v1=0, loss_v2=0, nll_loss=1.123, ntokens=7664.1, nsentences=120, sample_size=4176.1, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1937, ups=0.25, wpb=7664.1, bsz=120, num_updates=59440, lr=5.17651e-07, gnorm=0.967, clip=40, loss_scale=64, train_wall=39, gb_free=30, wall=243424 2023-05-03 22:10:51 - progress_bar.py[line:274] - INFO: epoch 010: 5174 / 6042 loss=2.38, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7532.3, nsentences=120, sample_size=3908, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1895.7, ups=0.25, wpb=7532.3, bsz=120, num_updates=59450, lr=5.12369e-07, gnorm=1.006, clip=50, loss_scale=64, train_wall=40, gb_free=29.4, wall=243464 2023-05-03 22:11:31 - progress_bar.py[line:274] - INFO: epoch 010: 5184 / 6042 loss=2.362, loss_v1=0, loss_v2=0, nll_loss=1.104, ntokens=8032.4, nsentences=120, sample_size=4082.6, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=2015.6, ups=0.25, wpb=8032.4, bsz=120, num_updates=59460, lr=5.07087e-07, gnorm=0.956, clip=20, loss_scale=64, train_wall=40, gb_free=27.8, wall=243504 2023-05-03 22:12:12 - progress_bar.py[line:274] - INFO: epoch 010: 5194 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7960.5, nsentences=120, sample_size=3867.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1977.7, ups=0.25, wpb=7960.5, bsz=120, num_updates=59470, lr=5.01805e-07, gnorm=0.989, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=243544 2023-05-03 22:12:52 - progress_bar.py[line:274] - INFO: epoch 010: 5204 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7696.2, nsentences=120, sample_size=3904.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1927.2, ups=0.25, wpb=7696.2, bsz=120, num_updates=59480, lr=4.96523e-07, gnorm=1.007, clip=50, loss_scale=64, train_wall=40, gb_free=31.2, wall=243584 2023-05-03 22:13:31 - progress_bar.py[line:274] - INFO: epoch 010: 5214 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7557.1, nsentences=120, sample_size=4174.8, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1899.8, ups=0.25, wpb=7557.1, bsz=120, num_updates=59490, lr=4.9124e-07, gnorm=0.98, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=243624 2023-05-03 22:14:10 - progress_bar.py[line:274] - INFO: epoch 010: 5224 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7830.5, nsentences=120, sample_size=4085.9, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=2010.8, ups=0.26, wpb=7830.5, bsz=120, num_updates=59500, lr=4.85958e-07, gnorm=0.954, clip=20, loss_scale=64, train_wall=39, gb_free=29.9, wall=243663 2023-05-03 22:14:50 - progress_bar.py[line:274] - INFO: epoch 010: 5234 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7704.2, nsentences=120, sample_size=3771.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1943.3, ups=0.25, wpb=7704.2, bsz=120, num_updates=59510, lr=4.80676e-07, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=243702 2023-05-03 22:15:30 - progress_bar.py[line:274] - INFO: epoch 010: 5244 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7909.9, nsentences=120, sample_size=4024.6, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1958.7, ups=0.25, wpb=7909.9, bsz=120, num_updates=59520, lr=4.75394e-07, gnorm=0.984, clip=40, loss_scale=64, train_wall=40, gb_free=28.4, wall=243743 2023-05-03 22:16:11 - progress_bar.py[line:274] - INFO: epoch 010: 5254 / 6042 loss=2.339, loss_v1=0, loss_v2=0, nll_loss=1.07, ntokens=7774.2, nsentences=120, sample_size=4216.3, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1894.8, ups=0.24, wpb=7774.2, bsz=120, num_updates=59530, lr=4.70112e-07, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=30.5, wall=243784 2023-05-03 22:16:51 - progress_bar.py[line:274] - INFO: epoch 010: 5264 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.056, ntokens=7609.2, nsentences=120, sample_size=4002.9, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1894.7, ups=0.25, wpb=7609.2, bsz=120, num_updates=59540, lr=4.6483e-07, gnorm=0.981, clip=20, loss_scale=64, train_wall=40, gb_free=30.6, wall=243824 2023-05-03 22:17:31 - progress_bar.py[line:274] - INFO: epoch 010: 5274 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7579.1, nsentences=120, sample_size=4032.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1903, ups=0.25, wpb=7579.1, bsz=120, num_updates=59550, lr=4.59547e-07, gnorm=0.99, clip=50, loss_scale=64, train_wall=40, gb_free=30, wall=243864 2023-05-03 22:18:11 - progress_bar.py[line:274] - INFO: epoch 010: 5284 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7684.3, nsentences=120, sample_size=4494.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1960.2, ups=0.26, wpb=7684.3, bsz=120, num_updates=59560, lr=4.54265e-07, gnorm=0.934, clip=0, loss_scale=64, train_wall=39, gb_free=30.2, wall=243903 2023-05-03 22:18:51 - progress_bar.py[line:274] - INFO: epoch 010: 5294 / 6042 loss=2.382, loss_v1=0, loss_v2=0, nll_loss=1.128, ntokens=7672.4, nsentences=120, sample_size=4121.9, sample_size_v1=0, sample_size_v2=0, ppl=2.18, wps=1900.3, ups=0.25, wpb=7672.4, bsz=120, num_updates=59570, lr=4.48983e-07, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=30.7, wall=243943 2023-05-03 22:19:31 - progress_bar.py[line:274] - INFO: epoch 010: 5304 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7556.8, nsentences=120, sample_size=4064.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1902, ups=0.25, wpb=7556.8, bsz=120, num_updates=59580, lr=4.43701e-07, gnorm=0.998, clip=50, loss_scale=64, train_wall=40, gb_free=28.6, wall=243983 2023-05-03 22:20:11 - progress_bar.py[line:274] - INFO: epoch 010: 5314 / 6042 loss=2.386, loss_v1=0, loss_v2=0, nll_loss=1.142, ntokens=7833.3, nsentences=120, sample_size=4124.1, sample_size_v1=0, sample_size_v2=0, ppl=2.21, wps=1942.5, ups=0.25, wpb=7833.3, bsz=120, num_updates=59590, lr=4.38419e-07, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=28.5, wall=244023 2023-05-03 22:20:50 - progress_bar.py[line:274] - INFO: epoch 010: 5324 / 6042 loss=2.315, loss_v1=0, loss_v2=0, nll_loss=1.05, ntokens=7717.7, nsentences=120, sample_size=4213.5, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1964.3, ups=0.25, wpb=7717.7, bsz=120, num_updates=59600, lr=4.33137e-07, gnorm=0.961, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=244063 2023-05-03 22:21:30 - progress_bar.py[line:274] - INFO: epoch 010: 5334 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7820.9, nsentences=120, sample_size=4022.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1957.2, ups=0.25, wpb=7820.9, bsz=120, num_updates=59610, lr=4.27855e-07, gnorm=0.968, clip=40, loss_scale=64, train_wall=40, gb_free=30.2, wall=244103 2023-05-03 22:22:10 - progress_bar.py[line:274] - INFO: epoch 010: 5344 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.084, ntokens=7733.8, nsentences=120, sample_size=4123, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1945.8, ups=0.25, wpb=7733.8, bsz=120, num_updates=59620, lr=4.22572e-07, gnorm=0.979, clip=50, loss_scale=64, train_wall=40, gb_free=29, wall=244142 2023-05-03 22:22:50 - progress_bar.py[line:274] - INFO: epoch 010: 5354 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7439.7, nsentences=120, sample_size=4127.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1848.7, ups=0.25, wpb=7439.7, bsz=120, num_updates=59630, lr=4.1729e-07, gnorm=0.997, clip=40, loss_scale=64, train_wall=40, gb_free=29.9, wall=244183 2023-05-03 22:23:29 - progress_bar.py[line:274] - INFO: epoch 010: 5364 / 6042 loss=2.34, loss_v1=0, loss_v2=0, nll_loss=1.086, ntokens=7652.6, nsentences=120, sample_size=3794.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1959.2, ups=0.26, wpb=7652.6, bsz=120, num_updates=59640, lr=4.12008e-07, gnorm=1.027, clip=60, loss_scale=64, train_wall=39, gb_free=26.2, wall=244222 2023-05-03 22:24:09 - progress_bar.py[line:274] - INFO: epoch 010: 5374 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7605.3, nsentences=120, sample_size=3752.2, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1903.9, ups=0.25, wpb=7605.3, bsz=120, num_updates=59650, lr=4.06726e-07, gnorm=1.016, clip=50, loss_scale=64, train_wall=40, gb_free=29.3, wall=244262 2023-05-03 22:24:51 - progress_bar.py[line:274] - INFO: epoch 010: 5384 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.105, ntokens=7782.5, nsentences=120, sample_size=4094.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1849.4, ups=0.24, wpb=7782.5, bsz=120, num_updates=59660, lr=4.01444e-07, gnorm=0.951, clip=10, loss_scale=64, train_wall=42, gb_free=30.6, wall=244304 2023-05-03 22:25:30 - progress_bar.py[line:274] - INFO: epoch 010: 5394 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.061, ntokens=7510.4, nsentences=120, sample_size=4254.6, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1949.8, ups=0.26, wpb=7510.4, bsz=120, num_updates=59670, lr=3.96162e-07, gnorm=0.965, clip=20, loss_scale=64, train_wall=38, gb_free=31.5, wall=244342 2023-05-03 22:26:09 - progress_bar.py[line:274] - INFO: epoch 010: 5404 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.053, ntokens=7366.4, nsentences=120, sample_size=4269.1, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1863.1, ups=0.25, wpb=7366.4, bsz=120, num_updates=59680, lr=3.90879e-07, gnorm=0.967, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=244382 2023-05-03 22:26:48 - progress_bar.py[line:274] - INFO: epoch 010: 5414 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7525.1, nsentences=120, sample_size=4057.3, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1949.2, ups=0.26, wpb=7525.1, bsz=120, num_updates=59690, lr=3.85597e-07, gnorm=0.982, clip=30, loss_scale=64, train_wall=39, gb_free=30.3, wall=244420 2023-05-03 22:27:27 - progress_bar.py[line:274] - INFO: epoch 010: 5424 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7386.9, nsentences=120, sample_size=3808.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1869.4, ups=0.25, wpb=7386.9, bsz=120, num_updates=59700, lr=3.80315e-07, gnorm=1.009, clip=40, loss_scale=64, train_wall=39, gb_free=29.1, wall=244460 2023-05-03 22:28:07 - progress_bar.py[line:274] - INFO: epoch 010: 5434 / 6042 loss=2.355, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7622.1, nsentences=120, sample_size=4253, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1905.1, ups=0.25, wpb=7622.1, bsz=120, num_updates=59710, lr=3.75033e-07, gnorm=0.957, clip=40, loss_scale=64, train_wall=40, gb_free=31, wall=244500 2023-05-03 22:28:47 - progress_bar.py[line:274] - INFO: epoch 010: 5444 / 6042 loss=2.37, loss_v1=0, loss_v2=0, nll_loss=1.13, ntokens=7663.3, nsentences=120, sample_size=4156.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1949.4, ups=0.25, wpb=7663.3, bsz=120, num_updates=59720, lr=3.69751e-07, gnorm=0.973, clip=40, loss_scale=64, train_wall=39, gb_free=28.4, wall=244539 2023-05-03 22:29:28 - progress_bar.py[line:274] - INFO: epoch 010: 5454 / 6042 loss=2.349, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7746.9, nsentences=120, sample_size=4074.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1881.9, ups=0.24, wpb=7746.9, bsz=120, num_updates=59730, lr=3.64469e-07, gnorm=0.979, clip=40, loss_scale=64, train_wall=41, gb_free=29.6, wall=244580 2023-05-03 22:30:07 - progress_bar.py[line:274] - INFO: epoch 010: 5464 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=7981.9, nsentences=120, sample_size=3817.6, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=2021, ups=0.25, wpb=7981.9, bsz=120, num_updates=59740, lr=3.59187e-07, gnorm=0.994, clip=40, loss_scale=64, train_wall=39, gb_free=29.5, wall=244620 2023-05-03 22:30:47 - progress_bar.py[line:274] - INFO: epoch 010: 5474 / 6042 loss=2.342, loss_v1=0, loss_v2=0, nll_loss=1.076, ntokens=7699.7, nsentences=120, sample_size=4117.2, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1925, ups=0.25, wpb=7699.7, bsz=120, num_updates=59750, lr=3.53904e-07, gnorm=0.993, clip=50, loss_scale=64, train_wall=40, gb_free=31, wall=244660 2023-05-03 22:31:27 - progress_bar.py[line:274] - INFO: epoch 010: 5484 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.109, ntokens=7949.6, nsentences=120, sample_size=4159.9, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=2031.9, ups=0.26, wpb=7949.6, bsz=120, num_updates=59760, lr=3.48622e-07, gnorm=0.98, clip=50, loss_scale=64, train_wall=39, gb_free=30.9, wall=244699 2023-05-03 22:32:06 - progress_bar.py[line:274] - INFO: epoch 010: 5494 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7426.1, nsentences=120, sample_size=4024.6, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1890.6, ups=0.25, wpb=7426.1, bsz=120, num_updates=59770, lr=3.4334e-07, gnorm=0.979, clip=40, loss_scale=64, train_wall=39, gb_free=24.4, wall=244738 2023-05-03 22:32:45 - progress_bar.py[line:274] - INFO: epoch 010: 5504 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7785, nsentences=120, sample_size=3656.7, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1968, ups=0.25, wpb=7785, bsz=120, num_updates=59780, lr=3.38058e-07, gnorm=1.053, clip=90, loss_scale=64, train_wall=39, gb_free=30.4, wall=244778 2023-05-03 22:33:25 - progress_bar.py[line:274] - INFO: epoch 010: 5514 / 6042 loss=2.324, loss_v1=0, loss_v2=0, nll_loss=1.063, ntokens=7775.6, nsentences=120, sample_size=3784.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1975.7, ups=0.25, wpb=7775.6, bsz=120, num_updates=59790, lr=3.32776e-07, gnorm=1.018, clip=60, loss_scale=64, train_wall=39, gb_free=29.9, wall=244817 2023-05-03 22:34:05 - progress_bar.py[line:274] - INFO: epoch 010: 5524 / 6042 loss=2.318, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7524.5, nsentences=120, sample_size=4021.6, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1869.4, ups=0.25, wpb=7524.5, bsz=120, num_updates=59800, lr=3.27494e-07, gnorm=0.962, clip=40, loss_scale=64, train_wall=40, gb_free=30, wall=244857 2023-05-03 22:34:45 - progress_bar.py[line:274] - INFO: epoch 010: 5534 / 6042 loss=2.359, loss_v1=0, loss_v2=0, nll_loss=1.094, ntokens=7514.9, nsentences=120, sample_size=4299.4, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1868.6, ups=0.25, wpb=7514.9, bsz=120, num_updates=59810, lr=3.22211e-07, gnorm=0.968, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=244898 2023-05-03 22:35:25 - progress_bar.py[line:274] - INFO: epoch 010: 5544 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.067, ntokens=7706, nsentences=120, sample_size=3856.7, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1936.3, ups=0.25, wpb=7706, bsz=120, num_updates=59820, lr=3.16929e-07, gnorm=0.998, clip=60, loss_scale=64, train_wall=40, gb_free=30.1, wall=244938 2023-05-03 22:36:05 - progress_bar.py[line:274] - INFO: epoch 010: 5554 / 6042 loss=2.327, loss_v1=0, loss_v2=0, nll_loss=1.064, ntokens=7714.7, nsentences=120, sample_size=4049.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1954.8, ups=0.25, wpb=7714.7, bsz=120, num_updates=59830, lr=3.11647e-07, gnorm=0.996, clip=50, loss_scale=128, train_wall=39, gb_free=29.7, wall=244977 2023-05-03 22:36:25 - trainer.py[line:922] - INFO: NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0 2023-05-03 22:36:49 - progress_bar.py[line:274] - INFO: epoch 010: 5565 / 6042 loss=2.361, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7886.2, nsentences=120, sample_size=4081, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1773.4, ups=0.22, wpb=7886.2, bsz=120, num_updates=59840, lr=3.06365e-07, gnorm=0.977, clip=30, loss_scale=64, train_wall=44, gb_free=30.1, wall=245021 2023-05-03 22:37:28 - progress_bar.py[line:274] - INFO: epoch 010: 5575 / 6042 loss=2.343, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7567.3, nsentences=120, sample_size=4086.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1915.1, ups=0.25, wpb=7567.3, bsz=120, num_updates=59850, lr=3.01083e-07, gnorm=0.989, clip=30, loss_scale=64, train_wall=39, gb_free=30.2, wall=245061 2023-05-03 22:38:08 - progress_bar.py[line:274] - INFO: epoch 010: 5585 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7583.5, nsentences=120, sample_size=4216.6, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1898.1, ups=0.25, wpb=7583.5, bsz=120, num_updates=59860, lr=2.95801e-07, gnorm=0.949, clip=10, loss_scale=64, train_wall=40, gb_free=29.8, wall=245101 2023-05-03 22:38:49 - progress_bar.py[line:274] - INFO: epoch 010: 5595 / 6042 loss=2.368, loss_v1=0, loss_v2=0, nll_loss=1.111, ntokens=7912, nsentences=120, sample_size=4103.6, sample_size_v1=0, sample_size_v2=0, ppl=2.16, wps=1952.4, ups=0.25, wpb=7912, bsz=120, num_updates=59870, lr=2.90519e-07, gnorm=0.959, clip=30, loss_scale=64, train_wall=40, gb_free=28.3, wall=245141 2023-05-03 22:39:29 - progress_bar.py[line:274] - INFO: epoch 010: 5605 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7734.1, nsentences=120, sample_size=3877.3, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1938.3, ups=0.25, wpb=7734.1, bsz=120, num_updates=59880, lr=2.85236e-07, gnorm=0.994, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=245181 2023-05-03 22:40:09 - progress_bar.py[line:274] - INFO: epoch 010: 5615 / 6042 loss=2.379, loss_v1=0, loss_v2=0, nll_loss=1.121, ntokens=7763, nsentences=120, sample_size=3861.9, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1938.1, ups=0.25, wpb=7763, bsz=120, num_updates=59890, lr=2.79954e-07, gnorm=1.018, clip=50, loss_scale=64, train_wall=40, gb_free=29.1, wall=245221 2023-05-03 22:40:49 - progress_bar.py[line:274] - INFO: epoch 010: 5625 / 6042 loss=2.378, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7756.5, nsentences=120, sample_size=4034, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1940.2, ups=0.25, wpb=7756.5, bsz=120, num_updates=59900, lr=2.74672e-07, gnorm=0.947, clip=20, loss_scale=64, train_wall=40, gb_free=29.3, wall=245261 2023-05-03 22:41:29 - progress_bar.py[line:274] - INFO: epoch 010: 5635 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.074, ntokens=7735.8, nsentences=120, sample_size=4076.1, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1951.6, ups=0.25, wpb=7735.8, bsz=120, num_updates=59910, lr=2.6939e-07, gnorm=0.999, clip=50, loss_scale=64, train_wall=40, gb_free=30.6, wall=245301 2023-05-03 22:42:08 - progress_bar.py[line:274] - INFO: epoch 010: 5645 / 6042 loss=2.363, loss_v1=0, loss_v2=0, nll_loss=1.101, ntokens=7576.5, nsentences=120, sample_size=4017.2, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1901.5, ups=0.25, wpb=7576.5, bsz=120, num_updates=59920, lr=2.64108e-07, gnorm=1, clip=40, loss_scale=64, train_wall=40, gb_free=30.5, wall=245341 2023-05-03 22:42:48 - progress_bar.py[line:274] - INFO: epoch 010: 5655 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7541.2, nsentences=120, sample_size=3953.5, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1903.4, ups=0.25, wpb=7541.2, bsz=120, num_updates=59930, lr=2.58826e-07, gnorm=1.001, clip=60, loss_scale=64, train_wall=40, gb_free=30.2, wall=245380 2023-05-03 22:43:29 - progress_bar.py[line:274] - INFO: epoch 010: 5665 / 6042 loss=2.387, loss_v1=0, loss_v2=0, nll_loss=1.129, ntokens=8019, nsentences=120, sample_size=4110.7, sample_size_v1=0, sample_size_v2=0, ppl=2.19, wps=1934.7, ups=0.24, wpb=8019, bsz=120, num_updates=59940, lr=2.53543e-07, gnorm=0.974, clip=50, loss_scale=64, train_wall=41, gb_free=29.8, wall=245422 2023-05-03 22:44:09 - progress_bar.py[line:274] - INFO: epoch 010: 5675 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.093, ntokens=7579.9, nsentences=120, sample_size=3906.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1932.4, ups=0.25, wpb=7579.9, bsz=120, num_updates=59950, lr=2.48261e-07, gnorm=0.984, clip=40, loss_scale=64, train_wall=39, gb_free=31.2, wall=245461 2023-05-03 22:44:48 - progress_bar.py[line:274] - INFO: epoch 010: 5685 / 6042 loss=2.319, loss_v1=0, loss_v2=0, nll_loss=1.054, ntokens=7583.1, nsentences=120, sample_size=3965.2, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1909.4, ups=0.25, wpb=7583.1, bsz=120, num_updates=59960, lr=2.42979e-07, gnorm=0.975, clip=30, loss_scale=64, train_wall=40, gb_free=29.8, wall=245501 2023-05-03 22:45:28 - progress_bar.py[line:274] - INFO: epoch 010: 5695 / 6042 loss=2.345, loss_v1=0, loss_v2=0, nll_loss=1.08, ntokens=7710.3, nsentences=120, sample_size=3994, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1936.4, ups=0.25, wpb=7710.3, bsz=120, num_updates=59970, lr=2.37697e-07, gnorm=1.02, clip=70, loss_scale=64, train_wall=40, gb_free=31, wall=245541 2023-05-03 22:46:08 - progress_bar.py[line:274] - INFO: epoch 010: 5705 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.068, ntokens=7471.2, nsentences=120, sample_size=4211.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1870.9, ups=0.25, wpb=7471.2, bsz=120, num_updates=59980, lr=2.32415e-07, gnorm=0.961, clip=30, loss_scale=64, train_wall=40, gb_free=28.4, wall=245581 2023-05-03 22:46:47 - progress_bar.py[line:274] - INFO: epoch 010: 5715 / 6042 loss=2.323, loss_v1=0, loss_v2=0, nll_loss=1.058, ntokens=7609.2, nsentences=120, sample_size=4069.3, sample_size_v1=0, sample_size_v2=0, ppl=2.08, wps=1935.1, ups=0.25, wpb=7609.2, bsz=120, num_updates=59990, lr=2.27133e-07, gnorm=1.017, clip=60, loss_scale=64, train_wall=39, gb_free=29.2, wall=245620 2023-05-03 22:47:27 - progress_bar.py[line:274] - INFO: epoch 010: 5725 / 6042 loss=2.334, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7642.5, nsentences=120, sample_size=4333.6, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1913.9, ups=0.25, wpb=7642.5, bsz=120, num_updates=60000, lr=2.21851e-07, gnorm=0.948, clip=20, loss_scale=64, train_wall=40, gb_free=30.9, wall=245660 2023-05-03 22:47:27 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 22:47:30 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 22:47:30 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:47:31 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:31 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:33 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:33 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:42 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:42 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:46 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:46 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:47 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 22:47:47 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:47:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:52 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:52 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:53 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:53 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:54 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:54 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:55 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:55 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:56 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:56 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:57 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:47:57 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:47:59 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 22:47:59 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:48:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:00 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:00 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:01 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:01 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:02 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:02 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:10 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 22:48:10 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:48:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:14 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 22:48:14 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:48:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 22:48:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 22:48:19 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 22:48:19 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 22:48:19 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.267 | loss_v1 0 | loss_v2 0 | nll_loss 2.102 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.29 | score 0.7583 | wps 3283.4 | wpb 3202.1 | bsz 39.4 | num_updates 60000 | best_score 0.7627 2023-05-03 22:48:19 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 60000 updates 2023-05-03 22:48:19 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_60000.pt 2023-05-03 22:48:44 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_60000.pt 2023-05-03 22:48:58 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint_10_60000.pt (epoch 10 @ 60000 updates, score 0.7583) (writing took 38.21468228800222 seconds) 2023-05-03 22:49:37 - progress_bar.py[line:274] - INFO: epoch 010: 5735 / 6042 loss=2.402, loss_v1=0, loss_v2=0, nll_loss=1.148, ntokens=7974.2, nsentences=120, sample_size=3854.8, sample_size_v1=0, sample_size_v2=0, ppl=2.22, wps=615.6, ups=0.08, wpb=7974.2, bsz=120, num_updates=60010, lr=2.16568e-07, gnorm=0.991, clip=30, loss_scale=64, train_wall=39, gb_free=30.9, wall=245789 2023-05-03 22:50:17 - progress_bar.py[line:274] - INFO: epoch 010: 5745 / 6042 loss=2.337, loss_v1=0, loss_v2=0, nll_loss=1.072, ntokens=7670.3, nsentences=120, sample_size=4053.2, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1919, ups=0.25, wpb=7670.3, bsz=120, num_updates=60020, lr=2.11286e-07, gnorm=0.992, clip=50, loss_scale=64, train_wall=40, gb_free=28.8, wall=245829 2023-05-03 22:50:57 - progress_bar.py[line:274] - INFO: epoch 010: 5755 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.115, ntokens=7691.3, nsentences=120, sample_size=3843.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=1943.5, ups=0.25, wpb=7691.3, bsz=120, num_updates=60030, lr=2.06004e-07, gnorm=1.003, clip=60, loss_scale=64, train_wall=39, gb_free=26.4, wall=245869 2023-05-03 22:51:36 - progress_bar.py[line:274] - INFO: epoch 010: 5765 / 6042 loss=2.377, loss_v1=0, loss_v2=0, nll_loss=1.12, ntokens=8039.7, nsentences=120, sample_size=3938.8, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2031.5, ups=0.25, wpb=8039.7, bsz=120, num_updates=60040, lr=2.00722e-07, gnorm=0.969, clip=40, loss_scale=64, train_wall=39, gb_free=30.4, wall=245909 2023-05-03 22:52:16 - progress_bar.py[line:274] - INFO: epoch 010: 5775 / 6042 loss=2.308, loss_v1=0, loss_v2=0, nll_loss=1.042, ntokens=7552.8, nsentences=120, sample_size=4293, sample_size_v1=0, sample_size_v2=0, ppl=2.06, wps=1885.6, ups=0.25, wpb=7552.8, bsz=120, num_updates=60050, lr=1.9544e-07, gnorm=0.933, clip=20, loss_scale=64, train_wall=40, gb_free=30.2, wall=245949 2023-05-03 22:52:56 - progress_bar.py[line:274] - INFO: epoch 010: 5785 / 6042 loss=2.357, loss_v1=0, loss_v2=0, nll_loss=1.099, ntokens=7767.5, nsentences=120, sample_size=3747.7, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1961.4, ups=0.25, wpb=7767.5, bsz=120, num_updates=60060, lr=1.90158e-07, gnorm=1.018, clip=50, loss_scale=64, train_wall=40, gb_free=30.4, wall=245988 2023-05-03 22:53:36 - progress_bar.py[line:274] - INFO: epoch 010: 5795 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7643.9, nsentences=120, sample_size=3735.9, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1878.4, ups=0.25, wpb=7643.9, bsz=120, num_updates=60070, lr=1.84875e-07, gnorm=1.025, clip=60, loss_scale=64, train_wall=41, gb_free=30, wall=246029 2023-05-03 22:54:15 - progress_bar.py[line:274] - INFO: epoch 010: 5805 / 6042 loss=2.364, loss_v1=0, loss_v2=0, nll_loss=1.1, ntokens=7556, nsentences=120, sample_size=4112.6, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1948.9, ups=0.26, wpb=7556, bsz=120, num_updates=60080, lr=1.79593e-07, gnorm=0.967, clip=20, loss_scale=64, train_wall=39, gb_free=30.1, wall=246068 2023-05-03 22:54:54 - progress_bar.py[line:274] - INFO: epoch 010: 5815 / 6042 loss=2.352, loss_v1=0, loss_v2=0, nll_loss=1.095, ntokens=7625.4, nsentences=120, sample_size=3900.8, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1987.9, ups=0.26, wpb=7625.4, bsz=120, num_updates=60090, lr=1.74311e-07, gnorm=1.039, clip=70, loss_scale=64, train_wall=38, gb_free=30.1, wall=246106 2023-05-03 22:55:34 - progress_bar.py[line:274] - INFO: epoch 010: 5825 / 6042 loss=2.347, loss_v1=0, loss_v2=0, nll_loss=1.089, ntokens=7702.5, nsentences=120, sample_size=4214.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1926, ups=0.25, wpb=7702.5, bsz=120, num_updates=60100, lr=1.69029e-07, gnorm=0.992, clip=40, loss_scale=64, train_wall=40, gb_free=30.9, wall=246146 2023-05-03 22:56:13 - progress_bar.py[line:274] - INFO: epoch 010: 5835 / 6042 loss=2.371, loss_v1=0, loss_v2=0, nll_loss=1.118, ntokens=7918.8, nsentences=120, sample_size=3925.4, sample_size_v1=0, sample_size_v2=0, ppl=2.17, wps=2022.5, ups=0.26, wpb=7918.8, bsz=120, num_updates=60110, lr=1.63747e-07, gnorm=1.013, clip=50, loss_scale=64, train_wall=39, gb_free=29.2, wall=246185 2023-05-03 22:56:52 - progress_bar.py[line:274] - INFO: epoch 010: 5845 / 6042 loss=2.329, loss_v1=0, loss_v2=0, nll_loss=1.071, ntokens=7674.8, nsentences=120, sample_size=4334.5, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1949.8, ups=0.25, wpb=7674.8, bsz=120, num_updates=60120, lr=1.58465e-07, gnorm=0.971, clip=30, loss_scale=64, train_wall=39, gb_free=30.1, wall=246225 2023-05-03 22:57:32 - progress_bar.py[line:274] - INFO: epoch 010: 5855 / 6042 loss=2.353, loss_v1=0, loss_v2=0, nll_loss=1.088, ntokens=7684.3, nsentences=120, sample_size=4013.2, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1905.8, ups=0.25, wpb=7684.3, bsz=120, num_updates=60130, lr=1.53182e-07, gnorm=0.994, clip=50, loss_scale=64, train_wall=40, gb_free=30.7, wall=246265 2023-05-03 22:58:13 - progress_bar.py[line:274] - INFO: epoch 010: 5865 / 6042 loss=2.341, loss_v1=0, loss_v2=0, nll_loss=1.083, ntokens=7505.9, nsentences=120, sample_size=4080.5, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1860.8, ups=0.25, wpb=7505.9, bsz=120, num_updates=60140, lr=1.479e-07, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=29.7, wall=246305 2023-05-03 22:58:52 - progress_bar.py[line:274] - INFO: epoch 010: 5875 / 6042 loss=2.332, loss_v1=0, loss_v2=0, nll_loss=1.069, ntokens=7865.2, nsentences=120, sample_size=4236.9, sample_size_v1=0, sample_size_v2=0, ppl=2.1, wps=1999.4, ups=0.25, wpb=7865.2, bsz=120, num_updates=60150, lr=1.42618e-07, gnorm=0.943, clip=30, loss_scale=64, train_wall=39, gb_free=30, wall=246345 2023-05-03 22:59:32 - progress_bar.py[line:274] - INFO: epoch 010: 5885 / 6042 loss=2.344, loss_v1=0, loss_v2=0, nll_loss=1.082, ntokens=7762, nsentences=120, sample_size=4316.1, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1935.6, ups=0.25, wpb=7762, bsz=120, num_updates=60160, lr=1.37336e-07, gnorm=0.962, clip=20, loss_scale=64, train_wall=40, gb_free=27.4, wall=246385 2023-05-03 23:00:12 - progress_bar.py[line:274] - INFO: epoch 010: 5895 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.087, ntokens=7815.9, nsentences=120, sample_size=4021.4, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1950.9, ups=0.25, wpb=7815.9, bsz=120, num_updates=60170, lr=1.32054e-07, gnorm=0.977, clip=30, loss_scale=64, train_wall=40, gb_free=30.5, wall=246425 2023-05-03 23:00:53 - progress_bar.py[line:274] - INFO: epoch 010: 5905 / 6042 loss=2.356, loss_v1=0, loss_v2=0, nll_loss=1.096, ntokens=7457.1, nsentences=120, sample_size=3986.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1845.8, ups=0.25, wpb=7457.1, bsz=120, num_updates=60180, lr=1.26772e-07, gnorm=0.993, clip=60, loss_scale=64, train_wall=40, gb_free=30, wall=246465 2023-05-03 23:01:32 - progress_bar.py[line:274] - INFO: epoch 010: 5915 / 6042 loss=2.358, loss_v1=0, loss_v2=0, nll_loss=1.098, ntokens=7671.7, nsentences=120, sample_size=3805.3, sample_size_v1=0, sample_size_v2=0, ppl=2.14, wps=1945.1, ups=0.25, wpb=7671.7, bsz=120, num_updates=60190, lr=1.2149e-07, gnorm=0.991, clip=40, loss_scale=64, train_wall=39, gb_free=28.4, wall=246505 2023-05-03 23:02:13 - progress_bar.py[line:274] - INFO: epoch 010: 5925 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7761.9, nsentences=120, sample_size=3995.7, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1900, ups=0.24, wpb=7761.9, bsz=120, num_updates=60200, lr=1.16207e-07, gnorm=0.984, clip=40, loss_scale=64, train_wall=41, gb_free=29.2, wall=246545 2023-05-03 23:02:52 - progress_bar.py[line:274] - INFO: epoch 010: 5935 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7434.5, nsentences=120, sample_size=3966.1, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1896.3, ups=0.26, wpb=7434.5, bsz=120, num_updates=60210, lr=1.10925e-07, gnorm=1.004, clip=70, loss_scale=64, train_wall=39, gb_free=30.8, wall=246585 2023-05-03 23:03:31 - progress_bar.py[line:274] - INFO: epoch 010: 5945 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.075, ntokens=7432.9, nsentences=120, sample_size=3725.9, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1903.8, ups=0.26, wpb=7432.9, bsz=120, num_updates=60220, lr=1.05643e-07, gnorm=1.006, clip=50, loss_scale=64, train_wall=39, gb_free=29.8, wall=246624 2023-05-03 23:04:11 - progress_bar.py[line:274] - INFO: epoch 010: 5955 / 6042 loss=2.336, loss_v1=0, loss_v2=0, nll_loss=1.077, ntokens=7517.3, nsentences=120, sample_size=4027.6, sample_size_v1=0, sample_size_v2=0, ppl=2.11, wps=1896.8, ups=0.25, wpb=7517.3, bsz=120, num_updates=60230, lr=1.00361e-07, gnorm=0.986, clip=40, loss_scale=64, train_wall=40, gb_free=27.2, wall=246663 2023-05-03 23:04:51 - progress_bar.py[line:274] - INFO: epoch 010: 5965 / 6042 loss=2.346, loss_v1=0, loss_v2=0, nll_loss=1.085, ntokens=7585.5, nsentences=120, sample_size=3939.8, sample_size_v1=0, sample_size_v2=0, ppl=2.12, wps=1881.1, ups=0.25, wpb=7585.5, bsz=120, num_updates=60240, lr=9.50788e-08, gnorm=0.983, clip=40, loss_scale=64, train_wall=40, gb_free=31.2, wall=246704 2023-05-03 23:05:31 - progress_bar.py[line:274] - INFO: epoch 010: 5975 / 6042 loss=2.333, loss_v1=0, loss_v2=0, nll_loss=1.065, ntokens=7461.1, nsentences=120, sample_size=4115.8, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1849.9, ups=0.25, wpb=7461.1, bsz=120, num_updates=60250, lr=8.97966e-08, gnorm=0.988, clip=30, loss_scale=64, train_wall=40, gb_free=29.6, wall=246744 2023-05-03 23:06:12 - progress_bar.py[line:274] - INFO: epoch 010: 5985 / 6042 loss=2.365, loss_v1=0, loss_v2=0, nll_loss=1.107, ntokens=7682.2, nsentences=120, sample_size=4109.4, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1895.9, ups=0.25, wpb=7682.2, bsz=120, num_updates=60260, lr=8.45145e-08, gnorm=0.972, clip=50, loss_scale=64, train_wall=40, gb_free=29.2, wall=246784 2023-05-03 23:06:52 - progress_bar.py[line:274] - INFO: epoch 010: 5995 / 6042 loss=2.33, loss_v1=0, loss_v2=0, nll_loss=1.06, ntokens=7552.4, nsentences=120, sample_size=4152.2, sample_size_v1=0, sample_size_v2=0, ppl=2.09, wps=1898.6, ups=0.25, wpb=7552.4, bsz=120, num_updates=60270, lr=7.92323e-08, gnorm=0.99, clip=40, loss_scale=64, train_wall=40, gb_free=27.1, wall=246824 2023-05-03 23:07:31 - progress_bar.py[line:274] - INFO: epoch 010: 6005 / 6042 loss=2.35, loss_v1=0, loss_v2=0, nll_loss=1.092, ntokens=7591.9, nsentences=120, sample_size=4252.1, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1927.7, ups=0.25, wpb=7591.9, bsz=120, num_updates=60280, lr=7.39502e-08, gnorm=0.971, clip=20, loss_scale=64, train_wall=39, gb_free=30.3, wall=246864 2023-05-03 23:08:12 - progress_bar.py[line:274] - INFO: epoch 010: 6015 / 6042 loss=2.314, loss_v1=0, loss_v2=0, nll_loss=1.051, ntokens=7944.4, nsentences=120, sample_size=4135.2, sample_size_v1=0, sample_size_v2=0, ppl=2.07, wps=1943.4, ups=0.24, wpb=7944.4, bsz=120, num_updates=60290, lr=6.8668e-08, gnorm=0.953, clip=30, loss_scale=64, train_wall=41, gb_free=30.6, wall=246905 2023-05-03 23:08:53 - progress_bar.py[line:274] - INFO: epoch 010: 6025 / 6042 loss=2.36, loss_v1=0, loss_v2=0, nll_loss=1.103, ntokens=7974.1, nsentences=120, sample_size=3992.3, sample_size_v1=0, sample_size_v2=0, ppl=2.15, wps=1941.6, ups=0.24, wpb=7974.1, bsz=120, num_updates=60300, lr=6.33859e-08, gnorm=0.97, clip=30, loss_scale=64, train_wall=41, gb_free=28.5, wall=246946 2023-05-03 23:09:33 - progress_bar.py[line:274] - INFO: epoch 010: 6035 / 6042 loss=2.351, loss_v1=0, loss_v2=0, nll_loss=1.09, ntokens=7632.7, nsentences=120, sample_size=4039.8, sample_size_v1=0, sample_size_v2=0, ppl=2.13, wps=1913.1, ups=0.25, wpb=7632.7, bsz=120, num_updates=60310, lr=5.81037e-08, gnorm=0.972, clip=30, loss_scale=64, train_wall=40, gb_free=30, wall=246985 2023-05-03 23:10:00 - train.py[line:445] - INFO: begin validation on "valid" subset 2023-05-03 23:10:02 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 23:10:02 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:03 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:03 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:04 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:04 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:05 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:05 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:06 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:06 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:07 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:07 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:08 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:08 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:09 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:09 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:10 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:10 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:11 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:11 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:12 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:12 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:13 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:13 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:14 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:14 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:15 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:15 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:16 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:16 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:17 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:17 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:18 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:18 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:19 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 23:10:19 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:20 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:20 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:21 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:21 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:22 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:22 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:23 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:23 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:24 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:24 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:25 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:25 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:26 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:26 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:27 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:27 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:28 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:28 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:29 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:29 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:30 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:30 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:31 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 23:10:31 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:32 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:32 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:34 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:34 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:35 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:35 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:36 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:36 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:37 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:37 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:38 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:38 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:39 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:39 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:40 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:40 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:41 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:41 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:42 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 23:10:42 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:43 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:43 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:44 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:44 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:45 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:45 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:46 - task_invig.py[line:181] - INFO: example hyp(gen): not yet. is it the one on the left? 2023-05-03 23:10:46 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:47 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:47 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:48 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:48 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:49 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:49 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:50 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:50 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:51 - task_invig.py[line:181] - INFO: example hyp(gen): sure. region: 2023-05-03 23:10:51 - task_invig.py[line:182] - INFO: example ref(gt): sure. region: 2023-05-03 23:10:51 - task_invig.py[line:181] - INFO: example hyp(gen): region: 2023-05-03 23:10:51 - task_invig.py[line:182] - INFO: example ref(gt): region: 2023-05-03 23:10:51 - progress_bar.py[line:282] - INFO: epoch 010 | valid on 'valid' subset | loss 3.267 | loss_v1 0 | loss_v2 0 | nll_loss 2.102 | ntokens 3202.14 | nsentences 39.385 | sample_size 285.231 | sample_size_v1 0 | sample_size_v2 0 | ppl 4.29 | score 0.7617 | wps 3293.7 | wpb 3202.1 | bsz 39.4 | num_updates 60317 | best_score 0.7627 2023-05-03 23:10:51 - checkpoint_utils.py[line:64] - INFO: Preparing to save checkpoint for epoch 10 @ 60317 updates 2023-05-03 23:10:51 - trainer.py[line:431] - INFO: Saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint.best_score_0.7621.pt 2023-05-03 23:11:16 - trainer.py[line:441] - INFO: Finished saving checkpoint to /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint.best_score_0.7621.pt 2023-05-03 23:11:31 - checkpoint_utils.py[line:135] - INFO: Saved checkpoint /mnt/bn/ckpt-lq/vldd/invig_huge_grounding_checkpoints/10_3e-5_512_20230501-0232/checkpoint.best_score_0.7621.pt (epoch 10 @ 60317 updates, score 0.7617) (writing took 39.09266485692933 seconds) 2023-05-03 23:11:31 - train.py[line:332] - INFO: end of epoch 10 (average epoch stats below) 2023-05-03 23:11:31 - progress_bar.py[line:282] - INFO: epoch 010 | loss 2.352 | loss_v1 0 | loss_v2 0 | nll_loss 1.093 | ntokens 7722.89 | nsentences 119.992 | sample_size 4033.85 | sample_size_v1 0 | sample_size_v2 0 | ppl 2.13 | wps 1885.4 | ups 0.24 | wpb 7722.9 | bsz 120 | num_updates 60317 | lr 5.44062e-08 | gnorm 0.984 | clip 38.6 | loss_scale 64 | train_wall 24017 | gb_free 30.1 | wall 247104 2023-05-03 23:11:31 - trainer.py[line:639] - INFO: loading train data for epoch 11 2023-05-03 23:11:31 - dialog_dataset.py[line:647] - INFO: loading invig-train from /mnt/bn/hri-lq/datasets/hf-cache/invig 2023-05-03 23:11:32 - dialog_dataset.py[line:647] - INFO: loading guesswhat-train from /mnt/bn/hri-lq/datasets/hf-cache/guesswhat 2023-05-03 23:11:33 - dialog_dataset.py[line:647] - INFO: loading visdial-train from /mnt/bn/hri-lq/datasets/hf-cache/visdial 2023-05-03 23:11:35 - dialog_dataset.py[line:647] - INFO: loading refcoco-train from /mnt/bn/hri-lq/datasets/hf-cache/refcoco 2023-05-03 23:11:35 - dialog_dataset.py[line:647] - INFO: loading refcocog-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocog 2023-05-03 23:11:35 - dialog_dataset.py[line:647] - INFO: loading refcocoplus-train from /mnt/bn/hri-lq/datasets/hf-cache/refcocoplus 2023-05-03 23:11:36 - dialog_dataset.py[line:647] - INFO: loading cc_sbu_align-train from /mnt/bn/hri-lq/datasets/hf-cache/cc_sbu_align 2023-05-03 23:11:36 - dialog_dataset.py[line:647] - INFO: loading llava_instruct_150k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_instruct_150k 2023-05-03 23:11:37 - dialog_dataset.py[line:647] - INFO: loading llava_conversation_58k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_conversation_58k 2023-05-03 23:11:37 - dialog_dataset.py[line:647] - INFO: loading llava_complex_reasoning_77k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_complex_reasoning_77k 2023-05-03 23:11:37 - dialog_dataset.py[line:647] - INFO: loading llava_detail_23k-train from /mnt/bn/hri-lq/datasets/hf-cache/llava_detail_23k 2023-05-03 23:11:38 - dialog_dataset.py[line:647] - INFO: loading openimages-train from /mnt/bn/hri-lq/datasets/hf-cache/openimages_v1.2 2023-05-03 23:11:38 - dialog_dataset.py[line:671] - INFO: load train data: 18 (90624/724992 samples) dataset(s) 2023-05-03 23:11:38 - dialog_dataset.py[line:672] - INFO: Tasks: invig_question(17652), invig_answer(17652), invig_grounding(17652), guesswhat_question(68653), guesswhat_answer(68653), guesswhat_grounding(68653), visdial_question(103447), visdial_answer(103447), visdial_caption(20689), refcoco_grounding(9523), refcoco_grounding(9920), refcoco_grounding(9494), cc_sbu_align_caption(3439), llava_instruct_150k(90372), llava_conversation_58k(46965), llava_complex_reasoning_77k(44353), llava_detail_23k(12471), openimages_detection(11957) 2023-05-03 23:11:38 - train.py[line:214] - INFO: done training in 247073.8 seconds 2023-05-01 02:32:16.811 - byted-torch - WARNING - /usr/local/lib/python3.7/dist-packages/torch/utils/system/environment.py:44: tcc failed with exception: No module named 'bytedtcc'