[2024-09-15 07:21:25,368][00574] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-15 07:21:25,370][00574] Rollout worker 0 uses device cpu [2024-09-15 07:21:25,372][00574] Rollout worker 1 uses device cpu [2024-09-15 07:21:25,373][00574] Rollout worker 2 uses device cpu [2024-09-15 07:21:25,375][00574] Rollout worker 3 uses device cpu [2024-09-15 07:21:25,376][00574] Rollout worker 4 uses device cpu [2024-09-15 07:21:25,377][00574] Rollout worker 5 uses device cpu [2024-09-15 07:21:25,378][00574] Rollout worker 6 uses device cpu [2024-09-15 07:21:25,379][00574] Rollout worker 7 uses device cpu [2024-09-15 07:21:25,532][00574] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-15 07:21:25,533][00574] InferenceWorker_p0-w0: min num requests: 2 [2024-09-15 07:21:25,565][00574] Starting all processes... [2024-09-15 07:21:25,567][00574] Starting process learner_proc0 [2024-09-15 07:21:26,213][00574] Starting all processes... [2024-09-15 07:21:26,223][00574] Starting process inference_proc0-0 [2024-09-15 07:21:26,223][00574] Starting process rollout_proc0 [2024-09-15 07:21:26,225][00574] Starting process rollout_proc1 [2024-09-15 07:21:26,225][00574] Starting process rollout_proc2 [2024-09-15 07:21:26,226][00574] Starting process rollout_proc3 [2024-09-15 07:21:26,226][00574] Starting process rollout_proc4 [2024-09-15 07:21:26,226][00574] Starting process rollout_proc5 [2024-09-15 07:21:26,226][00574] Starting process rollout_proc6 [2024-09-15 07:21:26,226][00574] Starting process rollout_proc7 [2024-09-15 07:21:43,297][02628] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-15 07:21:43,297][02628] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-15 07:21:43,355][02643] Worker 0 uses CPU cores [0] [2024-09-15 07:21:43,370][02628] Num visible devices: 1 [2024-09-15 07:21:43,416][02628] Starting seed is not provided [2024-09-15 07:21:43,417][02628] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-15 07:21:43,417][02628] Initializing actor-critic model on device cuda:0 [2024-09-15 07:21:43,418][02628] RunningMeanStd input shape: (3, 72, 128) [2024-09-15 07:21:43,422][02628] RunningMeanStd input shape: (1,) [2024-09-15 07:21:43,468][02652] Worker 6 uses CPU cores [0] [2024-09-15 07:21:43,487][02651] Worker 5 uses CPU cores [1] [2024-09-15 07:21:43,520][02628] ConvEncoder: input_channels=3 [2024-09-15 07:21:43,587][02649] Worker 3 uses CPU cores [1] [2024-09-15 07:21:43,728][02650] Worker 4 uses CPU cores [0] [2024-09-15 07:21:43,799][02653] Worker 7 uses CPU cores [1] [2024-09-15 07:21:43,826][02641] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-15 07:21:43,827][02641] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-15 07:21:43,860][02642] Worker 1 uses CPU cores [1] [2024-09-15 07:21:43,881][02641] Num visible devices: 1 [2024-09-15 07:21:43,924][02644] Worker 2 uses CPU cores [0] [2024-09-15 07:21:44,018][02628] Conv encoder output size: 512 [2024-09-15 07:21:44,019][02628] Policy head output size: 512 [2024-09-15 07:21:44,093][02628] Created Actor Critic model with architecture: [2024-09-15 07:21:44,094][02628] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-15 07:21:44,617][02628] Using optimizer [2024-09-15 07:21:45,532][00574] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-15 07:21:45,541][00574] Heartbeat connected on RolloutWorker_w0 [2024-09-15 07:21:45,544][00574] Heartbeat connected on RolloutWorker_w1 [2024-09-15 07:21:45,548][00574] Heartbeat connected on RolloutWorker_w2 [2024-09-15 07:21:45,552][00574] Heartbeat connected on RolloutWorker_w3 [2024-09-15 07:21:45,554][00574] Heartbeat connected on RolloutWorker_w4 [2024-09-15 07:21:45,559][00574] Heartbeat connected on RolloutWorker_w5 [2024-09-15 07:21:45,562][00574] Heartbeat connected on RolloutWorker_w6 [2024-09-15 07:21:45,566][00574] Heartbeat connected on RolloutWorker_w7 [2024-09-15 07:21:45,658][00574] Heartbeat connected on Batcher_0 [2024-09-15 07:21:45,954][02628] No checkpoints found [2024-09-15 07:21:45,956][02628] Did not load from checkpoint, starting from scratch! [2024-09-15 07:21:45,957][02628] Initialized policy 0 weights for model version 0 [2024-09-15 07:21:45,972][02628] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-15 07:21:45,979][02628] LearnerWorker_p0 finished initialization! [2024-09-15 07:21:45,979][00574] Heartbeat connected on LearnerWorker_p0 [2024-09-15 07:21:46,184][02641] RunningMeanStd input shape: (3, 72, 128) [2024-09-15 07:21:46,185][02641] RunningMeanStd input shape: (1,) [2024-09-15 07:21:46,222][02641] ConvEncoder: input_channels=3 [2024-09-15 07:21:46,467][02641] Conv encoder output size: 512 [2024-09-15 07:21:46,468][02641] Policy head output size: 512 [2024-09-15 07:21:46,715][00574] Inference worker 0-0 is ready! [2024-09-15 07:21:46,718][00574] All inference workers are ready! Signal rollout workers to start! [2024-09-15 07:21:47,095][02650] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,099][02643] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,103][02652] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,116][02644] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,619][02649] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,628][02651] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,634][02653] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:47,637][02642] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:21:49,564][00574] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-15 07:21:50,497][02650] Decorrelating experience for 0 frames... [2024-09-15 07:21:50,499][02643] Decorrelating experience for 0 frames... [2024-09-15 07:21:51,326][02649] Decorrelating experience for 0 frames... [2024-09-15 07:21:51,323][02653] Decorrelating experience for 0 frames... [2024-09-15 07:21:51,334][02642] Decorrelating experience for 0 frames... [2024-09-15 07:21:51,339][02651] Decorrelating experience for 0 frames... [2024-09-15 07:21:51,962][02643] Decorrelating experience for 32 frames... [2024-09-15 07:21:51,966][02650] Decorrelating experience for 32 frames... [2024-09-15 07:21:52,125][02653] Decorrelating experience for 32 frames... [2024-09-15 07:21:52,128][02651] Decorrelating experience for 32 frames... [2024-09-15 07:21:52,565][02644] Decorrelating experience for 0 frames... [2024-09-15 07:21:52,725][02649] Decorrelating experience for 32 frames... [2024-09-15 07:21:53,080][02652] Decorrelating experience for 0 frames... [2024-09-15 07:21:53,174][02650] Decorrelating experience for 64 frames... [2024-09-15 07:21:53,563][02649] Decorrelating experience for 64 frames... [2024-09-15 07:21:53,625][02644] Decorrelating experience for 32 frames... [2024-09-15 07:21:53,994][02653] Decorrelating experience for 64 frames... [2024-09-15 07:21:54,228][02652] Decorrelating experience for 32 frames... [2024-09-15 07:21:54,496][02650] Decorrelating experience for 96 frames... [2024-09-15 07:21:54,564][00574] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-15 07:21:54,793][02642] Decorrelating experience for 32 frames... [2024-09-15 07:21:55,017][02643] Decorrelating experience for 64 frames... [2024-09-15 07:21:55,608][02644] Decorrelating experience for 64 frames... [2024-09-15 07:21:56,185][02653] Decorrelating experience for 96 frames... [2024-09-15 07:21:56,977][02652] Decorrelating experience for 64 frames... [2024-09-15 07:21:56,985][02651] Decorrelating experience for 64 frames... [2024-09-15 07:21:57,751][02643] Decorrelating experience for 96 frames... [2024-09-15 07:21:57,836][02642] Decorrelating experience for 64 frames... [2024-09-15 07:21:58,128][02644] Decorrelating experience for 96 frames... [2024-09-15 07:21:59,155][02649] Decorrelating experience for 96 frames... [2024-09-15 07:21:59,564][00574] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 84.2. Samples: 842. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-15 07:21:59,568][00574] Avg episode reward: [(0, '2.687')] [2024-09-15 07:21:59,987][02651] Decorrelating experience for 96 frames... [2024-09-15 07:22:00,297][02642] Decorrelating experience for 96 frames... [2024-09-15 07:22:03,584][02628] Signal inference workers to stop experience collection... [2024-09-15 07:22:03,595][02641] InferenceWorker_p0-w0: stopping experience collection [2024-09-15 07:22:03,634][02652] Decorrelating experience for 96 frames... [2024-09-15 07:22:04,564][00574] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 164.7. Samples: 2470. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-15 07:22:04,567][00574] Avg episode reward: [(0, '3.325')] [2024-09-15 07:22:06,174][02628] Signal inference workers to resume experience collection... [2024-09-15 07:22:06,180][02641] InferenceWorker_p0-w0: resuming experience collection [2024-09-15 07:22:09,564][00574] Fps is (10 sec: 2457.5, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 217.1. Samples: 4342. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-15 07:22:09,567][00574] Avg episode reward: [(0, '3.401')] [2024-09-15 07:22:12,845][02641] Updated weights for policy 0, policy_version 10 (0.0227) [2024-09-15 07:22:14,565][00574] Fps is (10 sec: 4914.6, 60 sec: 1965.9, 300 sec: 1965.9). Total num frames: 49152. Throughput: 0: 466.2. Samples: 11656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-15 07:22:14,573][00574] Avg episode reward: [(0, '4.213')] [2024-09-15 07:22:19,566][00574] Fps is (10 sec: 4095.4, 60 sec: 2184.4, 300 sec: 2184.4). Total num frames: 65536. Throughput: 0: 505.4. Samples: 15162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:22:19,570][00574] Avg episode reward: [(0, '4.605')] [2024-09-15 07:22:24,256][02641] Updated weights for policy 0, policy_version 20 (0.0035) [2024-09-15 07:22:24,563][00574] Fps is (10 sec: 3277.4, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 557.9. Samples: 19526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:22:24,568][00574] Avg episode reward: [(0, '4.451')] [2024-09-15 07:22:29,564][00574] Fps is (10 sec: 4096.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 655.9. Samples: 26236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:22:29,566][00574] Avg episode reward: [(0, '4.259')] [2024-09-15 07:22:29,576][02628] Saving new best policy, reward=4.259! [2024-09-15 07:22:32,896][02641] Updated weights for policy 0, policy_version 30 (0.0043) [2024-09-15 07:22:34,564][00574] Fps is (10 sec: 4505.6, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 662.1. Samples: 29794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:22:34,566][00574] Avg episode reward: [(0, '4.370')] [2024-09-15 07:22:34,568][02628] Saving new best policy, reward=4.370! [2024-09-15 07:22:39,568][00574] Fps is (10 sec: 3275.3, 60 sec: 2785.0, 300 sec: 2785.0). Total num frames: 139264. Throughput: 0: 775.3. Samples: 34892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:22:39,572][00574] Avg episode reward: [(0, '4.440')] [2024-09-15 07:22:39,624][02628] Saving new best policy, reward=4.440! [2024-09-15 07:22:43,946][02641] Updated weights for policy 0, policy_version 40 (0.0022) [2024-09-15 07:22:44,564][00574] Fps is (10 sec: 3686.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 894.2. Samples: 41082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:22:44,566][00574] Avg episode reward: [(0, '4.500')] [2024-09-15 07:22:44,569][02628] Saving new best policy, reward=4.500! [2024-09-15 07:22:49,564][00574] Fps is (10 sec: 4917.5, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 934.4. Samples: 44518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:22:49,566][00574] Avg episode reward: [(0, '4.386')] [2024-09-15 07:22:54,565][00574] Fps is (10 sec: 3685.8, 60 sec: 3345.0, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 1022.6. Samples: 50358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:22:54,572][00574] Avg episode reward: [(0, '4.391')] [2024-09-15 07:22:55,188][02641] Updated weights for policy 0, policy_version 50 (0.0019) [2024-09-15 07:22:59,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 959.1. Samples: 54812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:22:59,569][00574] Avg episode reward: [(0, '4.388')] [2024-09-15 07:23:04,564][00574] Fps is (10 sec: 4096.7, 60 sec: 4027.8, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 946.9. Samples: 57772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:23:04,571][00574] Avg episode reward: [(0, '4.564')] [2024-09-15 07:23:04,574][02628] Saving new best policy, reward=4.564! [2024-09-15 07:23:05,232][02641] Updated weights for policy 0, policy_version 60 (0.0045) [2024-09-15 07:23:09,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 1000.6. Samples: 64552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:23:09,566][00574] Avg episode reward: [(0, '4.501')] [2024-09-15 07:23:14,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3228.6). Total num frames: 274432. Throughput: 0: 949.2. Samples: 68948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:23:14,569][00574] Avg episode reward: [(0, '4.462')] [2024-09-15 07:23:16,803][02641] Updated weights for policy 0, policy_version 70 (0.0020) [2024-09-15 07:23:19,564][00574] Fps is (10 sec: 4095.9, 60 sec: 3891.3, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 943.8. Samples: 72264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:23:19,566][00574] Avg episode reward: [(0, '4.446')] [2024-09-15 07:23:19,575][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-09-15 07:23:24,564][00574] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 983.9. Samples: 79162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:23:24,565][00574] Avg episode reward: [(0, '4.358')] [2024-09-15 07:23:26,340][02641] Updated weights for policy 0, policy_version 80 (0.0030) [2024-09-15 07:23:29,564][00574] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 956.2. Samples: 84110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:23:29,566][00574] Avg episode reward: [(0, '4.322')] [2024-09-15 07:23:34,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 934.2. Samples: 86556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:23:34,570][00574] Avg episode reward: [(0, '4.484')] [2024-09-15 07:23:37,054][02641] Updated weights for policy 0, policy_version 90 (0.0023) [2024-09-15 07:23:39,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 959.7. Samples: 93542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:23:39,570][00574] Avg episode reward: [(0, '4.789')] [2024-09-15 07:23:39,580][02628] Saving new best policy, reward=4.789! [2024-09-15 07:23:44,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 981.6. Samples: 98986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:23:44,566][00574] Avg episode reward: [(0, '4.978')] [2024-09-15 07:23:44,574][02628] Saving new best policy, reward=4.978! [2024-09-15 07:23:49,079][02641] Updated weights for policy 0, policy_version 100 (0.0032) [2024-09-15 07:23:49,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 960.0. Samples: 100974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:23:49,566][00574] Avg episode reward: [(0, '4.888')] [2024-09-15 07:23:54,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 947.1. Samples: 107170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-15 07:23:54,568][00574] Avg episode reward: [(0, '4.524')] [2024-09-15 07:23:57,894][02641] Updated weights for policy 0, policy_version 110 (0.0031) [2024-09-15 07:23:59,564][00574] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 1000.4. Samples: 113966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:23:59,567][00574] Avg episode reward: [(0, '4.489')] [2024-09-15 07:24:04,564][00574] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 973.1. Samples: 116056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:24:04,567][00574] Avg episode reward: [(0, '4.452')] [2024-09-15 07:24:09,284][02641] Updated weights for policy 0, policy_version 120 (0.0050) [2024-09-15 07:24:09,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 944.5. Samples: 121664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:24:09,570][00574] Avg episode reward: [(0, '4.404')] [2024-09-15 07:24:14,564][00574] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 997.4. Samples: 128992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:24:14,570][00574] Avg episode reward: [(0, '4.681')] [2024-09-15 07:24:19,341][02641] Updated weights for policy 0, policy_version 130 (0.0027) [2024-09-15 07:24:19,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 1006.3. Samples: 131838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:24:19,567][00574] Avg episode reward: [(0, '4.752')] [2024-09-15 07:24:24,563][00574] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 957.0. Samples: 136606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:24:24,566][00574] Avg episode reward: [(0, '4.526')] [2024-09-15 07:24:28,789][02641] Updated weights for policy 0, policy_version 140 (0.0023) [2024-09-15 07:24:29,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 997.9. Samples: 143890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:24:29,566][00574] Avg episode reward: [(0, '4.605')] [2024-09-15 07:24:34,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3599.5). Total num frames: 593920. Throughput: 0: 1029.8. Samples: 147314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:24:34,567][00574] Avg episode reward: [(0, '4.637')] [2024-09-15 07:24:39,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 992.5. Samples: 151834. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-15 07:24:39,566][00574] Avg episode reward: [(0, '4.585')] [2024-09-15 07:24:39,956][02641] Updated weights for policy 0, policy_version 150 (0.0044) [2024-09-15 07:24:44,564][00574] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 634880. Throughput: 0: 990.8. Samples: 158550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:24:44,568][00574] Avg episode reward: [(0, '4.706')] [2024-09-15 07:24:48,568][02641] Updated weights for policy 0, policy_version 160 (0.0034) [2024-09-15 07:24:49,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3640.9). Total num frames: 655360. Throughput: 0: 1026.2. Samples: 162232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:24:49,566][00574] Avg episode reward: [(0, '4.731')] [2024-09-15 07:24:54,568][00574] Fps is (10 sec: 3685.0, 60 sec: 3959.2, 300 sec: 3631.0). Total num frames: 671744. Throughput: 0: 1020.2. Samples: 167578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:24:54,572][00574] Avg episode reward: [(0, '4.548')] [2024-09-15 07:24:59,563][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3643.3). Total num frames: 692224. Throughput: 0: 990.5. Samples: 173564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:24:59,565][00574] Avg episode reward: [(0, '4.529')] [2024-09-15 07:24:59,666][02641] Updated weights for policy 0, policy_version 170 (0.0029) [2024-09-15 07:25:04,564][00574] Fps is (10 sec: 4507.5, 60 sec: 4164.3, 300 sec: 3675.9). Total num frames: 716800. Throughput: 0: 1004.9. Samples: 177058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:04,566][00574] Avg episode reward: [(0, '4.847')] [2024-09-15 07:25:09,564][00574] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3665.9). Total num frames: 733184. Throughput: 0: 1038.9. Samples: 183356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:09,566][00574] Avg episode reward: [(0, '4.813')] [2024-09-15 07:25:09,748][02641] Updated weights for policy 0, policy_version 180 (0.0021) [2024-09-15 07:25:14,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3676.4). Total num frames: 753664. Throughput: 0: 987.7. Samples: 188338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:25:14,566][00574] Avg episode reward: [(0, '4.875')] [2024-09-15 07:25:19,378][02641] Updated weights for policy 0, policy_version 190 (0.0027) [2024-09-15 07:25:19,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3705.9). Total num frames: 778240. Throughput: 0: 992.6. Samples: 191982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:19,566][00574] Avg episode reward: [(0, '4.992')] [2024-09-15 07:25:19,573][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth... [2024-09-15 07:25:19,687][02628] Saving new best policy, reward=4.992! [2024-09-15 07:25:24,564][00574] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3715.0). Total num frames: 798720. Throughput: 0: 1051.1. Samples: 199132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:24,566][00574] Avg episode reward: [(0, '5.054')] [2024-09-15 07:25:24,568][02628] Saving new best policy, reward=5.054! [2024-09-15 07:25:29,564][00574] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 811008. Throughput: 0: 997.1. Samples: 203418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:25:29,566][00574] Avg episode reward: [(0, '5.326')] [2024-09-15 07:25:29,581][02628] Saving new best policy, reward=5.326! [2024-09-15 07:25:30,620][02641] Updated weights for policy 0, policy_version 200 (0.0025) [2024-09-15 07:25:34,564][00574] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3713.7). Total num frames: 835584. Throughput: 0: 984.4. Samples: 206528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:34,570][00574] Avg episode reward: [(0, '5.145')] [2024-09-15 07:25:39,141][02641] Updated weights for policy 0, policy_version 210 (0.0026) [2024-09-15 07:25:39,564][00574] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3739.8). Total num frames: 860160. Throughput: 0: 1029.1. Samples: 213882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:25:39,566][00574] Avg episode reward: [(0, '4.807')] [2024-09-15 07:25:44,565][00574] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 3729.9). Total num frames: 876544. Throughput: 0: 1012.6. Samples: 219134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:25:44,568][00574] Avg episode reward: [(0, '4.837')] [2024-09-15 07:25:49,564][00574] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3737.6). Total num frames: 897024. Throughput: 0: 991.6. Samples: 221678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:25:49,569][00574] Avg episode reward: [(0, '5.253')] [2024-09-15 07:25:50,189][02641] Updated weights for policy 0, policy_version 220 (0.0018) [2024-09-15 07:25:54,564][00574] Fps is (10 sec: 4506.3, 60 sec: 4164.6, 300 sec: 3761.6). Total num frames: 921600. Throughput: 0: 1013.0. Samples: 228942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:25:54,568][00574] Avg episode reward: [(0, '5.133')] [2024-09-15 07:25:59,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3751.9). Total num frames: 937984. Throughput: 0: 1039.5. Samples: 235114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:25:59,568][00574] Avg episode reward: [(0, '5.061')] [2024-09-15 07:26:00,422][02641] Updated weights for policy 0, policy_version 230 (0.0043) [2024-09-15 07:26:04,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 1004.0. Samples: 237160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:26:04,567][00574] Avg episode reward: [(0, '5.447')] [2024-09-15 07:26:04,571][02628] Saving new best policy, reward=5.447! [2024-09-15 07:26:09,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3765.2). Total num frames: 978944. Throughput: 0: 989.7. Samples: 243670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:26:09,568][00574] Avg episode reward: [(0, '5.650')] [2024-09-15 07:26:09,582][02628] Saving new best policy, reward=5.650! [2024-09-15 07:26:10,094][02641] Updated weights for policy 0, policy_version 240 (0.0032) [2024-09-15 07:26:14,566][00574] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 3771.4). Total num frames: 999424. Throughput: 0: 1050.1. Samples: 250676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:14,571][00574] Avg episode reward: [(0, '5.339')] [2024-09-15 07:26:19,565][00574] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3762.2). Total num frames: 1015808. Throughput: 0: 1029.3. Samples: 252846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:26:19,570][00574] Avg episode reward: [(0, '5.164')] [2024-09-15 07:26:21,217][02641] Updated weights for policy 0, policy_version 250 (0.0043) [2024-09-15 07:26:24,564][00574] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 3783.2). Total num frames: 1040384. Throughput: 0: 992.5. Samples: 258544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-15 07:26:24,569][00574] Avg episode reward: [(0, '5.457')] [2024-09-15 07:26:29,564][00574] Fps is (10 sec: 4506.2, 60 sec: 4164.3, 300 sec: 3788.8). Total num frames: 1060864. Throughput: 0: 1038.2. Samples: 265850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:29,567][00574] Avg episode reward: [(0, '5.358')] [2024-09-15 07:26:29,625][02641] Updated weights for policy 0, policy_version 260 (0.0037) [2024-09-15 07:26:34,565][00574] Fps is (10 sec: 3685.7, 60 sec: 4027.6, 300 sec: 3779.8). Total num frames: 1077248. Throughput: 0: 1045.9. Samples: 268744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:34,572][00574] Avg episode reward: [(0, '5.100')] [2024-09-15 07:26:39,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 985.8. Samples: 273304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:39,566][00574] Avg episode reward: [(0, '5.045')] [2024-09-15 07:26:40,820][02641] Updated weights for policy 0, policy_version 270 (0.0055) [2024-09-15 07:26:44,564][00574] Fps is (10 sec: 4506.4, 60 sec: 4096.1, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 1009.7. Samples: 280552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:26:44,569][00574] Avg episode reward: [(0, '5.221')] [2024-09-15 07:26:49,564][00574] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1046.5. Samples: 284254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:49,566][00574] Avg episode reward: [(0, '5.157')] [2024-09-15 07:26:50,694][02641] Updated weights for policy 0, policy_version 280 (0.0041) [2024-09-15 07:26:54,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 1004.9. Samples: 288890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:26:54,570][00574] Avg episode reward: [(0, '5.113')] [2024-09-15 07:26:59,568][00574] Fps is (10 sec: 3684.8, 60 sec: 4027.4, 300 sec: 3998.7). Total num frames: 1179648. Throughput: 0: 995.5. Samples: 295474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:26:59,572][00574] Avg episode reward: [(0, '5.342')] [2024-09-15 07:27:00,517][02641] Updated weights for policy 0, policy_version 290 (0.0032) [2024-09-15 07:27:04,564][00574] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 1204224. Throughput: 0: 1027.3. Samples: 299074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:27:04,566][00574] Avg episode reward: [(0, '5.537')] [2024-09-15 07:27:09,564][00574] Fps is (10 sec: 3688.1, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1216512. Throughput: 0: 1019.1. Samples: 304404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:27:09,566][00574] Avg episode reward: [(0, '6.072')] [2024-09-15 07:27:09,645][02628] Saving new best policy, reward=6.072! [2024-09-15 07:27:12,073][02641] Updated weights for policy 0, policy_version 300 (0.0051) [2024-09-15 07:27:14,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3971.1). Total num frames: 1236992. Throughput: 0: 974.1. Samples: 309684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:27:14,566][00574] Avg episode reward: [(0, '6.262')] [2024-09-15 07:27:14,571][02628] Saving new best policy, reward=6.262! [2024-09-15 07:27:19,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 1257472. Throughput: 0: 983.0. Samples: 312978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:27:19,566][00574] Avg episode reward: [(0, '6.769')] [2024-09-15 07:27:19,581][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth... [2024-09-15 07:27:19,702][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-09-15 07:27:19,711][02628] Saving new best policy, reward=6.769! [2024-09-15 07:27:21,631][02641] Updated weights for policy 0, policy_version 310 (0.0049) [2024-09-15 07:27:24,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1277952. Throughput: 0: 1015.1. Samples: 318982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:27:24,568][00574] Avg episode reward: [(0, '7.078')] [2024-09-15 07:27:24,570][02628] Saving new best policy, reward=7.078! [2024-09-15 07:27:29,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1294336. Throughput: 0: 949.2. Samples: 323266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:27:29,565][00574] Avg episode reward: [(0, '7.367')] [2024-09-15 07:27:29,576][02628] Saving new best policy, reward=7.367! [2024-09-15 07:27:33,030][02641] Updated weights for policy 0, policy_version 320 (0.0021) [2024-09-15 07:27:34,563][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3985.0). Total num frames: 1314816. Throughput: 0: 945.0. Samples: 326780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:27:34,566][00574] Avg episode reward: [(0, '7.234')] [2024-09-15 07:27:39,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1335296. Throughput: 0: 1001.0. Samples: 333934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:27:39,565][00574] Avg episode reward: [(0, '7.365')] [2024-09-15 07:27:43,917][02641] Updated weights for policy 0, policy_version 330 (0.0029) [2024-09-15 07:27:44,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1351680. Throughput: 0: 955.0. Samples: 338444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:27:44,567][00574] Avg episode reward: [(0, '7.844')] [2024-09-15 07:27:44,573][02628] Saving new best policy, reward=7.844! [2024-09-15 07:27:49,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3971.1). Total num frames: 1372160. Throughput: 0: 938.4. Samples: 341304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:27:49,570][00574] Avg episode reward: [(0, '7.594')] [2024-09-15 07:27:53,047][02641] Updated weights for policy 0, policy_version 340 (0.0014) [2024-09-15 07:27:54,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1396736. Throughput: 0: 973.7. Samples: 348220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:27:54,570][00574] Avg episode reward: [(0, '8.217')] [2024-09-15 07:27:54,573][02628] Saving new best policy, reward=8.217! [2024-09-15 07:27:59,564][00574] Fps is (10 sec: 4095.7, 60 sec: 3891.5, 300 sec: 3971.0). Total num frames: 1413120. Throughput: 0: 974.4. Samples: 353532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:27:59,567][00574] Avg episode reward: [(0, '8.010')] [2024-09-15 07:28:04,550][02641] Updated weights for policy 0, policy_version 350 (0.0030) [2024-09-15 07:28:04,567][00574] Fps is (10 sec: 3685.3, 60 sec: 3822.7, 300 sec: 3984.9). Total num frames: 1433600. Throughput: 0: 950.5. Samples: 355752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:28:04,573][00574] Avg episode reward: [(0, '8.865')] [2024-09-15 07:28:04,579][02628] Saving new best policy, reward=8.865! [2024-09-15 07:28:09,564][00574] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1454080. Throughput: 0: 967.8. Samples: 362532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:28:09,571][00574] Avg episode reward: [(0, '8.682')] [2024-09-15 07:28:14,061][02641] Updated weights for policy 0, policy_version 360 (0.0022) [2024-09-15 07:28:14,566][00574] Fps is (10 sec: 4096.1, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 1474560. Throughput: 0: 1015.0. Samples: 368944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:28:14,572][00574] Avg episode reward: [(0, '9.068')] [2024-09-15 07:28:14,575][02628] Saving new best policy, reward=9.068! [2024-09-15 07:28:19,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1490944. Throughput: 0: 984.5. Samples: 371082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:28:19,569][00574] Avg episode reward: [(0, '8.914')] [2024-09-15 07:28:24,476][02641] Updated weights for policy 0, policy_version 370 (0.0025) [2024-09-15 07:28:24,564][00574] Fps is (10 sec: 4097.1, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1515520. Throughput: 0: 963.9. Samples: 377308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:28:24,565][00574] Avg episode reward: [(0, '9.219')] [2024-09-15 07:28:24,570][02628] Saving new best policy, reward=9.219! [2024-09-15 07:28:29,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1536000. Throughput: 0: 1019.7. Samples: 384332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:28:29,567][00574] Avg episode reward: [(0, '10.013')] [2024-09-15 07:28:29,578][02628] Saving new best policy, reward=10.013! [2024-09-15 07:28:34,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1548288. Throughput: 0: 1005.0. Samples: 386528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:28:34,567][00574] Avg episode reward: [(0, '10.111')] [2024-09-15 07:28:34,573][02628] Saving new best policy, reward=10.111! [2024-09-15 07:28:36,400][02641] Updated weights for policy 0, policy_version 380 (0.0029) [2024-09-15 07:28:39,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1568768. Throughput: 0: 957.4. Samples: 391302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-15 07:28:39,566][00574] Avg episode reward: [(0, '10.877')] [2024-09-15 07:28:39,574][02628] Saving new best policy, reward=10.877! [2024-09-15 07:28:44,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1593344. Throughput: 0: 994.9. Samples: 398302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:28:44,566][00574] Avg episode reward: [(0, '10.028')] [2024-09-15 07:28:45,213][02641] Updated weights for policy 0, policy_version 390 (0.0044) [2024-09-15 07:28:49,564][00574] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 1609728. Throughput: 0: 1012.8. Samples: 401326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:28:49,569][00574] Avg episode reward: [(0, '10.842')] [2024-09-15 07:28:54,563][00574] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 1626112. Throughput: 0: 949.3. Samples: 405252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:28:54,567][00574] Avg episode reward: [(0, '11.007')] [2024-09-15 07:28:54,570][02628] Saving new best policy, reward=11.007! [2024-09-15 07:28:57,288][02641] Updated weights for policy 0, policy_version 400 (0.0025) [2024-09-15 07:28:59,564][00574] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1646592. Throughput: 0: 958.4. Samples: 412070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:28:59,567][00574] Avg episode reward: [(0, '11.347')] [2024-09-15 07:28:59,574][02628] Saving new best policy, reward=11.347! [2024-09-15 07:29:04,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3984.9). Total num frames: 1667072. Throughput: 0: 985.7. Samples: 415438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:29:04,570][00574] Avg episode reward: [(0, '11.423')] [2024-09-15 07:29:04,573][02628] Saving new best policy, reward=11.423! [2024-09-15 07:29:08,483][02641] Updated weights for policy 0, policy_version 410 (0.0027) [2024-09-15 07:29:09,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1679360. Throughput: 0: 949.2. Samples: 420024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:29:09,566][00574] Avg episode reward: [(0, '10.974')] [2024-09-15 07:29:14,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3971.0). Total num frames: 1703936. Throughput: 0: 925.2. Samples: 425968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:29:14,567][00574] Avg episode reward: [(0, '10.773')] [2024-09-15 07:29:17,760][02641] Updated weights for policy 0, policy_version 420 (0.0022) [2024-09-15 07:29:19,564][00574] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1724416. Throughput: 0: 955.8. Samples: 429538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:29:19,566][00574] Avg episode reward: [(0, '10.093')] [2024-09-15 07:29:19,576][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000422_1728512.pth... [2024-09-15 07:29:19,708][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth [2024-09-15 07:29:24,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 1740800. Throughput: 0: 970.4. Samples: 434968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:29:24,573][00574] Avg episode reward: [(0, '10.501')] [2024-09-15 07:29:29,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3943.3). Total num frames: 1757184. Throughput: 0: 912.3. Samples: 439356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:29:29,566][00574] Avg episode reward: [(0, '11.066')] [2024-09-15 07:29:30,264][02641] Updated weights for policy 0, policy_version 430 (0.0033) [2024-09-15 07:29:34,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1777664. Throughput: 0: 915.0. Samples: 442500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:29:34,566][00574] Avg episode reward: [(0, '11.695')] [2024-09-15 07:29:34,573][02628] Saving new best policy, reward=11.695! [2024-09-15 07:29:39,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1794048. Throughput: 0: 956.2. Samples: 448282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:29:39,568][00574] Avg episode reward: [(0, '12.062')] [2024-09-15 07:29:39,580][02628] Saving new best policy, reward=12.062! [2024-09-15 07:29:42,022][02641] Updated weights for policy 0, policy_version 440 (0.0019) [2024-09-15 07:29:44,563][00574] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3901.6). Total num frames: 1806336. Throughput: 0: 894.6. Samples: 452326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:29:44,565][00574] Avg episode reward: [(0, '12.491')] [2024-09-15 07:29:44,568][02628] Saving new best policy, reward=12.491! [2024-09-15 07:29:49,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3929.4). Total num frames: 1830912. Throughput: 0: 893.6. Samples: 455652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:29:49,565][00574] Avg episode reward: [(0, '13.582')] [2024-09-15 07:29:49,577][02628] Saving new best policy, reward=13.582! [2024-09-15 07:29:51,960][02641] Updated weights for policy 0, policy_version 450 (0.0026) [2024-09-15 07:29:54,564][00574] Fps is (10 sec: 4505.2, 60 sec: 3754.6, 300 sec: 3929.4). Total num frames: 1851392. Throughput: 0: 937.9. Samples: 462228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:29:54,568][00574] Avg episode reward: [(0, '13.315')] [2024-09-15 07:29:59,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1867776. Throughput: 0: 915.5. Samples: 467164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:29:59,568][00574] Avg episode reward: [(0, '14.037')] [2024-09-15 07:29:59,584][02628] Saving new best policy, reward=14.037! [2024-09-15 07:30:03,584][02641] Updated weights for policy 0, policy_version 460 (0.0020) [2024-09-15 07:30:04,564][00574] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 1888256. Throughput: 0: 885.9. Samples: 469404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:30:04,566][00574] Avg episode reward: [(0, '14.849')] [2024-09-15 07:30:04,576][02628] Saving new best policy, reward=14.849! [2024-09-15 07:30:09,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1908736. Throughput: 0: 912.6. Samples: 476036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:09,567][00574] Avg episode reward: [(0, '15.805')] [2024-09-15 07:30:09,577][02628] Saving new best policy, reward=15.805! [2024-09-15 07:30:13,599][02641] Updated weights for policy 0, policy_version 470 (0.0020) [2024-09-15 07:30:14,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1925120. Throughput: 0: 945.3. Samples: 481894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:14,567][00574] Avg episode reward: [(0, '17.061')] [2024-09-15 07:30:14,568][02628] Saving new best policy, reward=17.061! [2024-09-15 07:30:19,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 1941504. Throughput: 0: 922.6. Samples: 484018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:19,566][00574] Avg episode reward: [(0, '18.078')] [2024-09-15 07:30:19,579][02628] Saving new best policy, reward=18.078! [2024-09-15 07:30:23,880][02641] Updated weights for policy 0, policy_version 480 (0.0024) [2024-09-15 07:30:24,563][00574] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 1966080. Throughput: 0: 942.7. Samples: 490702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:30:24,566][00574] Avg episode reward: [(0, '16.221')] [2024-09-15 07:30:29,565][00574] Fps is (10 sec: 4505.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1986560. Throughput: 0: 1005.1. Samples: 497558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:29,569][00574] Avg episode reward: [(0, '13.666')] [2024-09-15 07:30:34,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3873.8). Total num frames: 2002944. Throughput: 0: 978.2. Samples: 499670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:34,571][00574] Avg episode reward: [(0, '13.775')] [2024-09-15 07:30:35,309][02641] Updated weights for policy 0, policy_version 490 (0.0023) [2024-09-15 07:30:39,564][00574] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2023424. Throughput: 0: 949.2. Samples: 504940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:39,567][00574] Avg episode reward: [(0, '13.526')] [2024-09-15 07:30:44,060][02641] Updated weights for policy 0, policy_version 500 (0.0047) [2024-09-15 07:30:44,564][00574] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2048000. Throughput: 0: 1002.4. Samples: 512270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:30:44,566][00574] Avg episode reward: [(0, '14.830')] [2024-09-15 07:30:49,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2064384. Throughput: 0: 1013.5. Samples: 515010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:30:49,577][00574] Avg episode reward: [(0, '15.429')] [2024-09-15 07:30:54,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 2084864. Throughput: 0: 974.5. Samples: 519890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:30:54,570][00574] Avg episode reward: [(0, '15.393')] [2024-09-15 07:30:55,290][02641] Updated weights for policy 0, policy_version 510 (0.0031) [2024-09-15 07:30:59,565][00574] Fps is (10 sec: 4505.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2109440. Throughput: 0: 1005.6. Samples: 527146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:30:59,571][00574] Avg episode reward: [(0, '16.999')] [2024-09-15 07:31:04,565][00574] Fps is (10 sec: 4095.2, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 2125824. Throughput: 0: 1034.7. Samples: 530580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:31:04,570][00574] Avg episode reward: [(0, '17.732')] [2024-09-15 07:31:05,253][02641] Updated weights for policy 0, policy_version 520 (0.0038) [2024-09-15 07:31:09,564][00574] Fps is (10 sec: 3277.1, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2142208. Throughput: 0: 981.2. Samples: 534856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:31:09,566][00574] Avg episode reward: [(0, '17.587')] [2024-09-15 07:31:14,564][00574] Fps is (10 sec: 4096.8, 60 sec: 4027.8, 300 sec: 3901.6). Total num frames: 2166784. Throughput: 0: 977.5. Samples: 541542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:31:14,566][00574] Avg episode reward: [(0, '18.626')] [2024-09-15 07:31:14,568][02628] Saving new best policy, reward=18.626! [2024-09-15 07:31:15,339][02641] Updated weights for policy 0, policy_version 530 (0.0020) [2024-09-15 07:31:19,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2187264. Throughput: 0: 1009.1. Samples: 545078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:31:19,566][00574] Avg episode reward: [(0, '17.891')] [2024-09-15 07:31:19,579][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth... [2024-09-15 07:31:19,731][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth [2024-09-15 07:31:24,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2203648. Throughput: 0: 1007.4. Samples: 550274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:31:24,570][00574] Avg episode reward: [(0, '17.803')] [2024-09-15 07:31:26,874][02641] Updated weights for policy 0, policy_version 540 (0.0017) [2024-09-15 07:31:29,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3887.8). Total num frames: 2224128. Throughput: 0: 964.4. Samples: 555668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:31:29,568][00574] Avg episode reward: [(0, '16.929')] [2024-09-15 07:31:34,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2248704. Throughput: 0: 984.5. Samples: 559314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:31:34,568][00574] Avg episode reward: [(0, '15.925')] [2024-09-15 07:31:35,474][02641] Updated weights for policy 0, policy_version 550 (0.0038) [2024-09-15 07:31:39,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 2265088. Throughput: 0: 1018.5. Samples: 565724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:31:39,568][00574] Avg episode reward: [(0, '16.470')] [2024-09-15 07:31:44,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2281472. Throughput: 0: 959.1. Samples: 570304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:31:44,571][00574] Avg episode reward: [(0, '15.726')] [2024-09-15 07:31:46,943][02641] Updated weights for policy 0, policy_version 560 (0.0030) [2024-09-15 07:31:49,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2301952. Throughput: 0: 962.7. Samples: 573898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:31:49,570][00574] Avg episode reward: [(0, '17.564')] [2024-09-15 07:31:54,565][00574] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3873.9). Total num frames: 2322432. Throughput: 0: 1014.9. Samples: 580530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:31:54,570][00574] Avg episode reward: [(0, '17.925')] [2024-09-15 07:31:58,186][02641] Updated weights for policy 0, policy_version 570 (0.0038) [2024-09-15 07:31:59,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2334720. Throughput: 0: 955.3. Samples: 584532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:31:59,568][00574] Avg episode reward: [(0, '17.230')] [2024-09-15 07:32:04,564][00574] Fps is (10 sec: 3687.1, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 2359296. Throughput: 0: 946.3. Samples: 587662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:32:04,571][00574] Avg episode reward: [(0, '18.027')] [2024-09-15 07:32:07,549][02641] Updated weights for policy 0, policy_version 580 (0.0042) [2024-09-15 07:32:09,564][00574] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2383872. Throughput: 0: 984.6. Samples: 594582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:32:09,570][00574] Avg episode reward: [(0, '17.159')] [2024-09-15 07:32:14,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2396160. Throughput: 0: 974.0. Samples: 599500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:32:14,569][00574] Avg episode reward: [(0, '16.875')] [2024-09-15 07:32:19,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2412544. Throughput: 0: 937.0. Samples: 601478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:32:19,571][00574] Avg episode reward: [(0, '17.416')] [2024-09-15 07:32:19,824][02641] Updated weights for policy 0, policy_version 590 (0.0033) [2024-09-15 07:32:24,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2433024. Throughput: 0: 930.8. Samples: 607612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:32:24,566][00574] Avg episode reward: [(0, '16.747')] [2024-09-15 07:32:29,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2449408. Throughput: 0: 951.9. Samples: 613138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:32:29,573][00574] Avg episode reward: [(0, '16.638')] [2024-09-15 07:32:31,540][02641] Updated weights for policy 0, policy_version 600 (0.0016) [2024-09-15 07:32:34,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 2465792. Throughput: 0: 913.4. Samples: 615002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:32:34,570][00574] Avg episode reward: [(0, '17.974')] [2024-09-15 07:32:39,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2490368. Throughput: 0: 901.3. Samples: 621088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:32:39,570][00574] Avg episode reward: [(0, '17.127')] [2024-09-15 07:32:41,373][02641] Updated weights for policy 0, policy_version 610 (0.0021) [2024-09-15 07:32:44,565][00574] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3859.9). Total num frames: 2510848. Throughput: 0: 962.2. Samples: 627834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:32:44,569][00574] Avg episode reward: [(0, '17.846')] [2024-09-15 07:32:49,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2523136. Throughput: 0: 940.8. Samples: 629998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:32:49,568][00574] Avg episode reward: [(0, '17.177')] [2024-09-15 07:32:52,865][02641] Updated weights for policy 0, policy_version 620 (0.0016) [2024-09-15 07:32:54,564][00574] Fps is (10 sec: 3277.1, 60 sec: 3686.5, 300 sec: 3832.2). Total num frames: 2543616. Throughput: 0: 901.6. Samples: 635156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:32:54,569][00574] Avg episode reward: [(0, '16.976')] [2024-09-15 07:32:59,564][00574] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2568192. Throughput: 0: 942.0. Samples: 641890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:32:59,570][00574] Avg episode reward: [(0, '17.787')] [2024-09-15 07:33:02,593][02641] Updated weights for policy 0, policy_version 630 (0.0032) [2024-09-15 07:33:04,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2584576. Throughput: 0: 969.1. Samples: 645086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:33:04,569][00574] Avg episode reward: [(0, '16.934')] [2024-09-15 07:33:09,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2600960. Throughput: 0: 933.2. Samples: 649608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:33:09,565][00574] Avg episode reward: [(0, '17.332')] [2024-09-15 07:33:13,119][02641] Updated weights for policy 0, policy_version 640 (0.0029) [2024-09-15 07:33:14,563][00574] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2625536. Throughput: 0: 969.3. Samples: 656756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:33:14,571][00574] Avg episode reward: [(0, '17.241')] [2024-09-15 07:33:19,564][00574] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2646016. Throughput: 0: 1008.6. Samples: 660388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:33:19,566][00574] Avg episode reward: [(0, '18.949')] [2024-09-15 07:33:19,578][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth... [2024-09-15 07:33:19,781][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000422_1728512.pth [2024-09-15 07:33:19,819][02628] Saving new best policy, reward=18.949! [2024-09-15 07:33:24,377][02641] Updated weights for policy 0, policy_version 650 (0.0032) [2024-09-15 07:33:24,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2662400. Throughput: 0: 975.6. Samples: 664990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:33:24,566][00574] Avg episode reward: [(0, '18.457')] [2024-09-15 07:33:29,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2686976. Throughput: 0: 970.5. Samples: 671504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:33:29,571][00574] Avg episode reward: [(0, '19.883')] [2024-09-15 07:33:29,580][02628] Saving new best policy, reward=19.883! [2024-09-15 07:33:32,877][02641] Updated weights for policy 0, policy_version 660 (0.0023) [2024-09-15 07:33:34,566][00574] Fps is (10 sec: 4504.4, 60 sec: 4027.5, 300 sec: 3859.9). Total num frames: 2707456. Throughput: 0: 999.5. Samples: 674980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:33:34,570][00574] Avg episode reward: [(0, '18.012')] [2024-09-15 07:33:39,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2723840. Throughput: 0: 1011.3. Samples: 680664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:33:39,570][00574] Avg episode reward: [(0, '17.056')] [2024-09-15 07:33:44,146][02641] Updated weights for policy 0, policy_version 670 (0.0026) [2024-09-15 07:33:44,564][00574] Fps is (10 sec: 3687.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 2744320. Throughput: 0: 983.4. Samples: 686144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:33:44,565][00574] Avg episode reward: [(0, '16.045')] [2024-09-15 07:33:49,564][00574] Fps is (10 sec: 4505.8, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 2768896. Throughput: 0: 993.2. Samples: 689782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:33:49,570][00574] Avg episode reward: [(0, '15.855')] [2024-09-15 07:33:53,106][02641] Updated weights for policy 0, policy_version 680 (0.0016) [2024-09-15 07:33:54,566][00574] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 3873.8). Total num frames: 2789376. Throughput: 0: 1043.6. Samples: 696574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:33:54,568][00574] Avg episode reward: [(0, '15.949')] [2024-09-15 07:33:59,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2805760. Throughput: 0: 986.7. Samples: 701158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:33:59,570][00574] Avg episode reward: [(0, '17.729')] [2024-09-15 07:34:03,626][02641] Updated weights for policy 0, policy_version 690 (0.0033) [2024-09-15 07:34:04,564][00574] Fps is (10 sec: 4096.8, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2830336. Throughput: 0: 987.1. Samples: 704808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:34:04,566][00574] Avg episode reward: [(0, '18.543')] [2024-09-15 07:34:09,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3887.7). Total num frames: 2850816. Throughput: 0: 1047.0. Samples: 712104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:09,572][00574] Avg episode reward: [(0, '18.925')] [2024-09-15 07:34:14,568][00574] Fps is (10 sec: 3275.2, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 2863104. Throughput: 0: 1003.9. Samples: 716684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:34:14,574][00574] Avg episode reward: [(0, '19.812')] [2024-09-15 07:34:14,635][02641] Updated weights for policy 0, policy_version 700 (0.0025) [2024-09-15 07:34:19,564][00574] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2887680. Throughput: 0: 992.8. Samples: 719654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:19,573][00574] Avg episode reward: [(0, '19.560')] [2024-09-15 07:34:23,314][02641] Updated weights for policy 0, policy_version 710 (0.0026) [2024-09-15 07:34:24,564][00574] Fps is (10 sec: 4917.4, 60 sec: 4164.3, 300 sec: 3915.5). Total num frames: 2912256. Throughput: 0: 1028.0. Samples: 726922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:34:24,566][00574] Avg episode reward: [(0, '18.559')] [2024-09-15 07:34:29,564][00574] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2928640. Throughput: 0: 1033.6. Samples: 732658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:34:29,568][00574] Avg episode reward: [(0, '19.027')] [2024-09-15 07:34:34,467][02641] Updated weights for policy 0, policy_version 720 (0.0035) [2024-09-15 07:34:34,563][00574] Fps is (10 sec: 3686.5, 60 sec: 4027.9, 300 sec: 3915.5). Total num frames: 2949120. Throughput: 0: 1000.6. Samples: 734808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:34,570][00574] Avg episode reward: [(0, '19.302')] [2024-09-15 07:34:39,564][00574] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 2973696. Throughput: 0: 1009.1. Samples: 741982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:39,566][00574] Avg episode reward: [(0, '18.096')] [2024-09-15 07:34:43,291][02641] Updated weights for policy 0, policy_version 730 (0.0037) [2024-09-15 07:34:44,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 2990080. Throughput: 0: 1048.8. Samples: 748356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:44,569][00574] Avg episode reward: [(0, '19.452')] [2024-09-15 07:34:49,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3006464. Throughput: 0: 1014.7. Samples: 750468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:34:49,568][00574] Avg episode reward: [(0, '20.967')] [2024-09-15 07:34:49,578][02628] Saving new best policy, reward=20.967! [2024-09-15 07:34:54,197][02641] Updated weights for policy 0, policy_version 740 (0.0046) [2024-09-15 07:34:54,564][00574] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3943.3). Total num frames: 3031040. Throughput: 0: 989.2. Samples: 756616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:34:54,565][00574] Avg episode reward: [(0, '20.953')] [2024-09-15 07:34:59,567][00574] Fps is (10 sec: 4503.8, 60 sec: 4095.7, 300 sec: 3943.2). Total num frames: 3051520. Throughput: 0: 1037.4. Samples: 763368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:34:59,571][00574] Avg episode reward: [(0, '20.558')] [2024-09-15 07:35:04,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3067904. Throughput: 0: 1021.4. Samples: 765616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:35:04,565][00574] Avg episode reward: [(0, '21.421')] [2024-09-15 07:35:04,577][02628] Saving new best policy, reward=21.421! [2024-09-15 07:35:05,708][02641] Updated weights for policy 0, policy_version 750 (0.0016) [2024-09-15 07:35:09,564][00574] Fps is (10 sec: 3687.8, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3088384. Throughput: 0: 973.6. Samples: 770736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:35:09,572][00574] Avg episode reward: [(0, '22.517')] [2024-09-15 07:35:09,583][02628] Saving new best policy, reward=22.517! [2024-09-15 07:35:14,449][02641] Updated weights for policy 0, policy_version 760 (0.0026) [2024-09-15 07:35:14,563][00574] Fps is (10 sec: 4505.6, 60 sec: 4164.6, 300 sec: 3971.0). Total num frames: 3112960. Throughput: 0: 1004.4. Samples: 777858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:35:14,566][00574] Avg episode reward: [(0, '22.210')] [2024-09-15 07:35:19,564][00574] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3129344. Throughput: 0: 1032.0. Samples: 781250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:35:19,574][00574] Avg episode reward: [(0, '22.794')] [2024-09-15 07:35:19,587][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth... [2024-09-15 07:35:19,742][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth [2024-09-15 07:35:19,768][02628] Saving new best policy, reward=22.794! [2024-09-15 07:35:24,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3145728. Throughput: 0: 961.5. Samples: 785250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:35:24,565][00574] Avg episode reward: [(0, '24.109')] [2024-09-15 07:35:24,576][02628] Saving new best policy, reward=24.109! [2024-09-15 07:35:26,316][02641] Updated weights for policy 0, policy_version 770 (0.0033) [2024-09-15 07:35:29,564][00574] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3166208. Throughput: 0: 965.6. Samples: 791810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:35:29,565][00574] Avg episode reward: [(0, '23.070')] [2024-09-15 07:35:34,563][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3186688. Throughput: 0: 996.2. Samples: 795298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:35:34,568][00574] Avg episode reward: [(0, '21.821')] [2024-09-15 07:35:36,237][02641] Updated weights for policy 0, policy_version 780 (0.0045) [2024-09-15 07:35:39,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3203072. Throughput: 0: 972.3. Samples: 800368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:35:39,567][00574] Avg episode reward: [(0, '20.713')] [2024-09-15 07:35:44,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3227648. Throughput: 0: 957.4. Samples: 806446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:35:44,566][00574] Avg episode reward: [(0, '19.518')] [2024-09-15 07:35:46,275][02641] Updated weights for policy 0, policy_version 790 (0.0020) [2024-09-15 07:35:49,564][00574] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3248128. Throughput: 0: 988.5. Samples: 810098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:35:49,565][00574] Avg episode reward: [(0, '19.513')] [2024-09-15 07:35:54,567][00574] Fps is (10 sec: 3685.1, 60 sec: 3891.0, 300 sec: 3915.5). Total num frames: 3264512. Throughput: 0: 1001.5. Samples: 815808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:35:54,571][00574] Avg episode reward: [(0, '20.248')] [2024-09-15 07:35:57,688][02641] Updated weights for policy 0, policy_version 800 (0.0046) [2024-09-15 07:35:59,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3929.4). Total num frames: 3284992. Throughput: 0: 959.9. Samples: 821052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:35:59,569][00574] Avg episode reward: [(0, '20.010')] [2024-09-15 07:36:04,564][00574] Fps is (10 sec: 4507.1, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3309568. Throughput: 0: 964.5. Samples: 824654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:36:04,566][00574] Avg episode reward: [(0, '20.491')] [2024-09-15 07:36:06,195][02641] Updated weights for policy 0, policy_version 810 (0.0026) [2024-09-15 07:36:09,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3325952. Throughput: 0: 1022.9. Samples: 831282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:36:09,571][00574] Avg episode reward: [(0, '19.955')] [2024-09-15 07:36:14,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3338240. Throughput: 0: 955.5. Samples: 834806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:36:14,571][00574] Avg episode reward: [(0, '21.459')] [2024-09-15 07:36:19,245][02641] Updated weights for policy 0, policy_version 820 (0.0017) [2024-09-15 07:36:19,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3915.5). Total num frames: 3358720. Throughput: 0: 942.8. Samples: 837724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:36:19,571][00574] Avg episode reward: [(0, '20.719')] [2024-09-15 07:36:24,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3375104. Throughput: 0: 962.1. Samples: 843662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:36:24,571][00574] Avg episode reward: [(0, '21.169')] [2024-09-15 07:36:29,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3387392. Throughput: 0: 913.0. Samples: 847530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:36:29,568][00574] Avg episode reward: [(0, '20.802')] [2024-09-15 07:36:32,386][02641] Updated weights for policy 0, policy_version 830 (0.0022) [2024-09-15 07:36:34,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 3403776. Throughput: 0: 880.6. Samples: 849724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:36:34,571][00574] Avg episode reward: [(0, '20.589')] [2024-09-15 07:36:39,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3424256. Throughput: 0: 884.6. Samples: 855610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:36:39,573][00574] Avg episode reward: [(0, '20.025')] [2024-09-15 07:36:44,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 869.6. Samples: 860182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-15 07:36:44,568][00574] Avg episode reward: [(0, '19.495')] [2024-09-15 07:36:44,576][02641] Updated weights for policy 0, policy_version 840 (0.0025) [2024-09-15 07:36:49,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3832.2). Total num frames: 3452928. Throughput: 0: 827.5. Samples: 861892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:36:49,566][00574] Avg episode reward: [(0, '19.122')] [2024-09-15 07:36:54,564][00574] Fps is (10 sec: 3686.3, 60 sec: 3481.8, 300 sec: 3860.0). Total num frames: 3473408. Throughput: 0: 801.6. Samples: 867354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:36:54,566][00574] Avg episode reward: [(0, '19.640')] [2024-09-15 07:36:56,324][02641] Updated weights for policy 0, policy_version 850 (0.0038) [2024-09-15 07:36:59,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3832.2). Total num frames: 3489792. Throughput: 0: 846.7. Samples: 872906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:36:59,569][00574] Avg episode reward: [(0, '21.257')] [2024-09-15 07:37:04,564][00574] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3790.5). Total num frames: 3502080. Throughput: 0: 821.0. Samples: 874670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:37:04,566][00574] Avg episode reward: [(0, '21.758')] [2024-09-15 07:37:09,527][02641] Updated weights for policy 0, policy_version 860 (0.0016) [2024-09-15 07:37:09,564][00574] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3818.3). Total num frames: 3522560. Throughput: 0: 793.5. Samples: 879368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:37:09,566][00574] Avg episode reward: [(0, '22.324')] [2024-09-15 07:37:14,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3818.3). Total num frames: 3538944. Throughput: 0: 839.3. Samples: 885300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:37:14,567][00574] Avg episode reward: [(0, '22.090')] [2024-09-15 07:37:19,564][00574] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3790.5). Total num frames: 3551232. Throughput: 0: 839.4. Samples: 887496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-15 07:37:19,570][00574] Avg episode reward: [(0, '22.108')] [2024-09-15 07:37:19,582][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000867_3551232.pth... [2024-09-15 07:37:19,789][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth [2024-09-15 07:37:22,759][02641] Updated weights for policy 0, policy_version 870 (0.0023) [2024-09-15 07:37:24,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3790.5). Total num frames: 3567616. Throughput: 0: 796.7. Samples: 891460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:37:24,573][00574] Avg episode reward: [(0, '21.971')] [2024-09-15 07:37:29,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3804.4). Total num frames: 3588096. Throughput: 0: 829.2. Samples: 897494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:37:29,568][00574] Avg episode reward: [(0, '20.880')] [2024-09-15 07:37:33,722][02641] Updated weights for policy 0, policy_version 880 (0.0034) [2024-09-15 07:37:34,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3776.7). Total num frames: 3604480. Throughput: 0: 855.4. Samples: 900384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:37:34,566][00574] Avg episode reward: [(0, '21.265')] [2024-09-15 07:37:39,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3748.9). Total num frames: 3616768. Throughput: 0: 812.9. Samples: 903932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:37:39,571][00574] Avg episode reward: [(0, '21.461')] [2024-09-15 07:37:44,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3776.7). Total num frames: 3637248. Throughput: 0: 813.1. Samples: 909496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:37:44,571][00574] Avg episode reward: [(0, '22.460')] [2024-09-15 07:37:46,171][02641] Updated weights for policy 0, policy_version 890 (0.0025) [2024-09-15 07:37:49,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3762.8). Total num frames: 3653632. Throughput: 0: 840.9. Samples: 912510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:37:49,569][00574] Avg episode reward: [(0, '23.266')] [2024-09-15 07:37:54,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3721.1). Total num frames: 3665920. Throughput: 0: 830.6. Samples: 916746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:37:54,570][00574] Avg episode reward: [(0, '24.368')] [2024-09-15 07:37:54,574][02628] Saving new best policy, reward=24.368! [2024-09-15 07:37:59,207][02641] Updated weights for policy 0, policy_version 900 (0.0021) [2024-09-15 07:37:59,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3735.0). Total num frames: 3686400. Throughput: 0: 809.4. Samples: 921724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:37:59,569][00574] Avg episode reward: [(0, '25.682')] [2024-09-15 07:37:59,578][02628] Saving new best policy, reward=25.682! [2024-09-15 07:38:04,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3748.9). Total num frames: 3706880. Throughput: 0: 828.3. Samples: 924768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:38:04,566][00574] Avg episode reward: [(0, '27.593')] [2024-09-15 07:38:04,577][02628] Saving new best policy, reward=27.593! [2024-09-15 07:38:09,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3707.2). Total num frames: 3719168. Throughput: 0: 855.2. Samples: 929942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:38:09,566][00574] Avg episode reward: [(0, '27.397')] [2024-09-15 07:38:11,665][02641] Updated weights for policy 0, policy_version 910 (0.0014) [2024-09-15 07:38:14,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3693.3). Total num frames: 3735552. Throughput: 0: 814.3. Samples: 934138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:14,567][00574] Avg episode reward: [(0, '28.041')] [2024-09-15 07:38:14,571][02628] Saving new best policy, reward=28.041! [2024-09-15 07:38:19,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3707.2). Total num frames: 3756032. Throughput: 0: 817.5. Samples: 937170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:38:19,571][00574] Avg episode reward: [(0, '26.807')] [2024-09-15 07:38:22,107][02641] Updated weights for policy 0, policy_version 920 (0.0022) [2024-09-15 07:38:24,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3679.5). Total num frames: 3772416. Throughput: 0: 870.2. Samples: 943092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:24,566][00574] Avg episode reward: [(0, '25.559')] [2024-09-15 07:38:29,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3651.7). Total num frames: 3784704. Throughput: 0: 826.4. Samples: 946682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:38:29,566][00574] Avg episode reward: [(0, '24.095')] [2024-09-15 07:38:34,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3665.6). Total num frames: 3805184. Throughput: 0: 820.5. Samples: 949434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:34,571][00574] Avg episode reward: [(0, '22.084')] [2024-09-15 07:38:35,292][02641] Updated weights for policy 0, policy_version 930 (0.0022) [2024-09-15 07:38:39,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 3825664. Throughput: 0: 858.7. Samples: 955386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:39,569][00574] Avg episode reward: [(0, '22.434')] [2024-09-15 07:38:44,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3610.0). Total num frames: 3833856. Throughput: 0: 838.0. Samples: 959436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:38:44,565][00574] Avg episode reward: [(0, '22.097')] [2024-09-15 07:38:48,655][02641] Updated weights for policy 0, policy_version 940 (0.0049) [2024-09-15 07:38:49,564][00574] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3596.2). Total num frames: 3850240. Throughput: 0: 811.0. Samples: 961264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:49,570][00574] Avg episode reward: [(0, '22.807')] [2024-09-15 07:38:54,564][00574] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3610.0). Total num frames: 3870720. Throughput: 0: 822.9. Samples: 966972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:38:54,572][00574] Avg episode reward: [(0, '23.072')] [2024-09-15 07:38:59,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3568.4). Total num frames: 3883008. Throughput: 0: 836.9. Samples: 971800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:38:59,571][00574] Avg episode reward: [(0, '23.694')] [2024-09-15 07:39:01,705][02641] Updated weights for policy 0, policy_version 950 (0.0027) [2024-09-15 07:39:04,564][00574] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3540.6). Total num frames: 3895296. Throughput: 0: 805.1. Samples: 973398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:39:04,572][00574] Avg episode reward: [(0, '24.149')] [2024-09-15 07:39:09,564][00574] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3568.4). Total num frames: 3915776. Throughput: 0: 787.7. Samples: 978538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-15 07:39:09,571][00574] Avg episode reward: [(0, '24.773')] [2024-09-15 07:39:12,836][02641] Updated weights for policy 0, policy_version 960 (0.0033) [2024-09-15 07:39:14,564][00574] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3554.5). Total num frames: 3936256. Throughput: 0: 837.8. Samples: 984382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-15 07:39:14,568][00574] Avg episode reward: [(0, '24.100')] [2024-09-15 07:39:19,564][00574] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3512.8). Total num frames: 3948544. Throughput: 0: 815.6. Samples: 986138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-15 07:39:19,570][00574] Avg episode reward: [(0, '23.534')] [2024-09-15 07:39:19,587][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000964_3948544.pth... [2024-09-15 07:39:19,786][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth [2024-09-15 07:39:24,564][00574] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3512.8). Total num frames: 3964928. Throughput: 0: 782.0. Samples: 990574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-15 07:39:24,573][00574] Avg episode reward: [(0, '24.626')] [2024-09-15 07:39:26,003][02641] Updated weights for policy 0, policy_version 970 (0.0026) [2024-09-15 07:39:29,564][00574] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 3985408. Throughput: 0: 830.6. Samples: 996812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-15 07:39:29,572][00574] Avg episode reward: [(0, '24.627')] [2024-09-15 07:39:34,566][00574] Fps is (10 sec: 3685.6, 60 sec: 3276.7, 300 sec: 3485.0). Total num frames: 4001792. Throughput: 0: 847.8. Samples: 999416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-15 07:39:34,572][00574] Avg episode reward: [(0, '24.872')] [2024-09-15 07:39:36,219][02628] Stopping Batcher_0... [2024-09-15 07:39:36,220][02628] Loop batcher_evt_loop terminating... [2024-09-15 07:39:36,220][00574] Component Batcher_0 stopped! [2024-09-15 07:39:36,230][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-15 07:39:36,349][02641] Weights refcount: 2 0 [2024-09-15 07:39:36,362][02641] Stopping InferenceWorker_p0-w0... [2024-09-15 07:39:36,363][02641] Loop inference_proc0-0_evt_loop terminating... [2024-09-15 07:39:36,362][00574] Component InferenceWorker_p0-w0 stopped! [2024-09-15 07:39:36,433][02628] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000867_3551232.pth [2024-09-15 07:39:36,455][02628] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-15 07:39:36,736][02628] Stopping LearnerWorker_p0... [2024-09-15 07:39:36,741][02628] Loop learner_proc0_evt_loop terminating... [2024-09-15 07:39:36,740][00574] Component LearnerWorker_p0 stopped! [2024-09-15 07:39:37,007][00574] Component RolloutWorker_w6 stopped! [2024-09-15 07:39:37,010][02652] Stopping RolloutWorker_w6... [2024-09-15 07:39:37,016][02652] Loop rollout_proc6_evt_loop terminating... [2024-09-15 07:39:37,092][00574] Component RolloutWorker_w0 stopped! [2024-09-15 07:39:37,094][02643] Stopping RolloutWorker_w0... [2024-09-15 07:39:37,099][02643] Loop rollout_proc0_evt_loop terminating... [2024-09-15 07:39:37,106][00574] Component RolloutWorker_w5 stopped! [2024-09-15 07:39:37,113][02651] Stopping RolloutWorker_w5... [2024-09-15 07:39:37,114][02651] Loop rollout_proc5_evt_loop terminating... [2024-09-15 07:39:37,115][00574] Component RolloutWorker_w1 stopped! [2024-09-15 07:39:37,121][02642] Stopping RolloutWorker_w1... [2024-09-15 07:39:37,122][02642] Loop rollout_proc1_evt_loop terminating... [2024-09-15 07:39:37,124][02644] Stopping RolloutWorker_w2... [2024-09-15 07:39:37,124][00574] Component RolloutWorker_w2 stopped! [2024-09-15 07:39:37,135][02650] Stopping RolloutWorker_w4... [2024-09-15 07:39:37,139][00574] Component RolloutWorker_w4 stopped! [2024-09-15 07:39:37,137][02644] Loop rollout_proc2_evt_loop terminating... [2024-09-15 07:39:37,136][02650] Loop rollout_proc4_evt_loop terminating... [2024-09-15 07:39:37,203][00574] Component RolloutWorker_w7 stopped! [2024-09-15 07:39:37,205][02653] Stopping RolloutWorker_w7... [2024-09-15 07:39:37,205][02653] Loop rollout_proc7_evt_loop terminating... [2024-09-15 07:39:37,223][00574] Component RolloutWorker_w3 stopped! [2024-09-15 07:39:37,229][02649] Stopping RolloutWorker_w3... [2024-09-15 07:39:37,225][00574] Waiting for process learner_proc0 to stop... [2024-09-15 07:39:37,232][02649] Loop rollout_proc3_evt_loop terminating... [2024-09-15 07:39:38,612][00574] Waiting for process inference_proc0-0 to join... [2024-09-15 07:39:38,617][00574] Waiting for process rollout_proc0 to join... [2024-09-15 07:39:40,899][00574] Waiting for process rollout_proc1 to join... [2024-09-15 07:39:40,930][00574] Waiting for process rollout_proc2 to join... [2024-09-15 07:39:40,936][00574] Waiting for process rollout_proc3 to join... [2024-09-15 07:39:40,939][00574] Waiting for process rollout_proc4 to join... [2024-09-15 07:39:40,942][00574] Waiting for process rollout_proc5 to join... [2024-09-15 07:39:40,945][00574] Waiting for process rollout_proc6 to join... [2024-09-15 07:39:40,950][00574] Waiting for process rollout_proc7 to join... [2024-09-15 07:39:40,954][00574] Batcher 0 profile tree view: batching: 28.3104, releasing_batches: 0.0295 [2024-09-15 07:39:40,957][00574] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 386.8653 update_model: 9.8698 weight_update: 0.0024 one_step: 0.0086 handle_policy_step: 625.7522 deserialize: 15.5274, stack: 3.3945, obs_to_device_normalize: 127.0558, forward: 334.6049, send_messages: 30.8777 prepare_outputs: 83.7885 to_cpu: 47.4278 [2024-09-15 07:39:40,961][00574] Learner 0 profile tree view: misc: 0.0054, prepare_batch: 13.1903 train: 74.2293 epoch_init: 0.0079, minibatch_init: 0.0141, losses_postprocess: 0.6024, kl_divergence: 0.6824, after_optimizer: 32.9188 calculate_losses: 26.9322 losses_init: 0.0036, forward_head: 1.3269, bptt_initial: 18.3804, tail: 1.0815, advantages_returns: 0.2421, losses: 3.7359 bptt: 1.8583 bptt_forward_core: 1.7614 update: 12.4065 clip: 0.9392 [2024-09-15 07:39:40,963][00574] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4282, enqueue_policy_requests: 93.5035, env_step: 828.6483, overhead: 14.0138, complete_rollouts: 6.9172 save_policy_outputs: 20.6893 split_output_tensors: 8.3578 [2024-09-15 07:39:40,966][00574] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3810, enqueue_policy_requests: 95.9075, env_step: 828.0321, overhead: 14.0744, complete_rollouts: 6.9114 save_policy_outputs: 20.7210 split_output_tensors: 8.4822 [2024-09-15 07:39:40,968][00574] Loop Runner_EvtLoop terminating... [2024-09-15 07:39:40,969][00574] Runner profile tree view: main_loop: 1095.4043 [2024-09-15 07:39:40,971][00574] Collected {0: 4005888}, FPS: 3657.0 [2024-09-15 07:39:40,998][00574] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-15 07:39:40,999][00574] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-15 07:39:41,001][00574] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-15 07:39:41,003][00574] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-15 07:39:41,005][00574] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-15 07:39:41,007][00574] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-15 07:39:41,008][00574] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-15 07:39:41,009][00574] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-15 07:39:41,010][00574] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-15 07:39:41,011][00574] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-15 07:39:41,012][00574] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-15 07:39:41,013][00574] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-15 07:39:41,014][00574] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-15 07:39:41,015][00574] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-15 07:39:41,016][00574] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-15 07:39:41,052][00574] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-15 07:39:41,055][00574] RunningMeanStd input shape: (3, 72, 128) [2024-09-15 07:39:41,057][00574] RunningMeanStd input shape: (1,) [2024-09-15 07:39:41,076][00574] ConvEncoder: input_channels=3 [2024-09-15 07:39:41,191][00574] Conv encoder output size: 512 [2024-09-15 07:39:41,193][00574] Policy head output size: 512 [2024-09-15 07:39:41,477][00574] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-15 07:39:42,346][00574] Num frames 100... [2024-09-15 07:39:42,476][00574] Num frames 200... [2024-09-15 07:39:42,612][00574] Num frames 300... [2024-09-15 07:39:42,738][00574] Num frames 400... [2024-09-15 07:39:42,867][00574] Num frames 500... [2024-09-15 07:39:42,993][00574] Num frames 600... [2024-09-15 07:39:43,135][00574] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720 [2024-09-15 07:39:43,137][00574] Avg episode reward: 11.720, avg true_objective: 6.720 [2024-09-15 07:39:43,176][00574] Num frames 700... [2024-09-15 07:39:43,300][00574] Num frames 800... [2024-09-15 07:39:43,425][00574] Num frames 900... [2024-09-15 07:39:43,556][00574] Num frames 1000... [2024-09-15 07:39:43,690][00574] Num frames 1100... [2024-09-15 07:39:43,826][00574] Num frames 1200... [2024-09-15 07:39:43,954][00574] Num frames 1300... [2024-09-15 07:39:44,082][00574] Num frames 1400... [2024-09-15 07:39:44,208][00574] Num frames 1500... [2024-09-15 07:39:44,337][00574] Num frames 1600... [2024-09-15 07:39:44,466][00574] Num frames 1700... [2024-09-15 07:39:44,604][00574] Num frames 1800... [2024-09-15 07:39:44,770][00574] Num frames 1900... [2024-09-15 07:39:44,904][00574] Num frames 2000... [2024-09-15 07:39:45,035][00574] Num frames 2100... [2024-09-15 07:39:45,163][00574] Num frames 2200... [2024-09-15 07:39:45,293][00574] Num frames 2300... [2024-09-15 07:39:45,420][00574] Num frames 2400... [2024-09-15 07:39:45,551][00574] Num frames 2500... [2024-09-15 07:39:45,689][00574] Num frames 2600... [2024-09-15 07:39:45,825][00574] Num frames 2700... [2024-09-15 07:39:45,969][00574] Avg episode rewards: #0: 35.360, true rewards: #0: 13.860 [2024-09-15 07:39:45,971][00574] Avg episode reward: 35.360, avg true_objective: 13.860 [2024-09-15 07:39:46,010][00574] Num frames 2800... [2024-09-15 07:39:46,135][00574] Num frames 2900... [2024-09-15 07:39:46,259][00574] Num frames 3000... [2024-09-15 07:39:46,385][00574] Num frames 3100... [2024-09-15 07:39:46,515][00574] Num frames 3200... [2024-09-15 07:39:46,649][00574] Num frames 3300... [2024-09-15 07:39:46,798][00574] Num frames 3400... [2024-09-15 07:39:46,929][00574] Num frames 3500... [2024-09-15 07:39:47,060][00574] Num frames 3600... [2024-09-15 07:39:47,195][00574] Num frames 3700... [2024-09-15 07:39:47,375][00574] Avg episode rewards: #0: 29.320, true rewards: #0: 12.653 [2024-09-15 07:39:47,376][00574] Avg episode reward: 29.320, avg true_objective: 12.653 [2024-09-15 07:39:47,387][00574] Num frames 3800... [2024-09-15 07:39:47,578][00574] Num frames 3900... [2024-09-15 07:39:47,776][00574] Num frames 4000... [2024-09-15 07:39:47,960][00574] Num frames 4100... [2024-09-15 07:39:48,144][00574] Num frames 4200... [2024-09-15 07:39:48,329][00574] Num frames 4300... [2024-09-15 07:39:48,509][00574] Num frames 4400... [2024-09-15 07:39:48,691][00574] Num frames 4500... [2024-09-15 07:39:48,905][00574] Num frames 4600... [2024-09-15 07:39:49,054][00574] Avg episode rewards: #0: 26.867, true rewards: #0: 11.618 [2024-09-15 07:39:49,057][00574] Avg episode reward: 26.867, avg true_objective: 11.618 [2024-09-15 07:39:49,167][00574] Num frames 4700... [2024-09-15 07:39:49,367][00574] Num frames 4800... [2024-09-15 07:39:49,561][00574] Num frames 4900... [2024-09-15 07:39:49,755][00574] Num frames 5000... [2024-09-15 07:39:49,929][00574] Avg episode rewards: #0: 23.326, true rewards: #0: 10.126 [2024-09-15 07:39:49,931][00574] Avg episode reward: 23.326, avg true_objective: 10.126 [2024-09-15 07:39:49,984][00574] Num frames 5100... [2024-09-15 07:39:50,120][00574] Num frames 5200... [2024-09-15 07:39:50,249][00574] Num frames 5300... [2024-09-15 07:39:50,388][00574] Num frames 5400... [2024-09-15 07:39:50,527][00574] Num frames 5500... [2024-09-15 07:39:50,665][00574] Num frames 5600... [2024-09-15 07:39:50,813][00574] Num frames 5700... [2024-09-15 07:39:50,958][00574] Num frames 5800... [2024-09-15 07:39:51,095][00574] Num frames 5900... [2024-09-15 07:39:51,231][00574] Num frames 6000... [2024-09-15 07:39:51,367][00574] Num frames 6100... [2024-09-15 07:39:51,504][00574] Num frames 6200... [2024-09-15 07:39:51,586][00574] Avg episode rewards: #0: 23.525, true rewards: #0: 10.358 [2024-09-15 07:39:51,588][00574] Avg episode reward: 23.525, avg true_objective: 10.358 [2024-09-15 07:39:51,706][00574] Num frames 6300... [2024-09-15 07:39:51,851][00574] Num frames 6400... [2024-09-15 07:39:51,995][00574] Num frames 6500... [2024-09-15 07:39:52,136][00574] Num frames 6600... [2024-09-15 07:39:52,278][00574] Avg episode rewards: #0: 21.519, true rewards: #0: 9.519 [2024-09-15 07:39:52,280][00574] Avg episode reward: 21.519, avg true_objective: 9.519 [2024-09-15 07:39:52,338][00574] Num frames 6700... [2024-09-15 07:39:52,477][00574] Num frames 6800... [2024-09-15 07:39:52,624][00574] Num frames 6900... [2024-09-15 07:39:52,777][00574] Num frames 7000... [2024-09-15 07:39:52,924][00574] Num frames 7100... [2024-09-15 07:39:53,061][00574] Num frames 7200... [2024-09-15 07:39:53,198][00574] Num frames 7300... [2024-09-15 07:39:53,332][00574] Num frames 7400... [2024-09-15 07:39:53,434][00574] Avg episode rewards: #0: 21.289, true rewards: #0: 9.289 [2024-09-15 07:39:53,436][00574] Avg episode reward: 21.289, avg true_objective: 9.289 [2024-09-15 07:39:53,532][00574] Num frames 7500... [2024-09-15 07:39:53,666][00574] Num frames 7600... [2024-09-15 07:39:53,804][00574] Num frames 7700... [2024-09-15 07:39:53,931][00574] Num frames 7800... [2024-09-15 07:39:54,074][00574] Num frames 7900... [2024-09-15 07:39:54,216][00574] Num frames 8000... [2024-09-15 07:39:54,352][00574] Num frames 8100... [2024-09-15 07:39:54,454][00574] Avg episode rewards: #0: 20.699, true rewards: #0: 9.032 [2024-09-15 07:39:54,457][00574] Avg episode reward: 20.699, avg true_objective: 9.032 [2024-09-15 07:39:54,554][00574] Num frames 8200... [2024-09-15 07:39:54,689][00574] Num frames 8300... [2024-09-15 07:39:54,832][00574] Num frames 8400... [2024-09-15 07:39:54,971][00574] Num frames 8500... [2024-09-15 07:39:55,116][00574] Num frames 8600... [2024-09-15 07:39:55,252][00574] Num frames 8700... [2024-09-15 07:39:55,360][00574] Avg episode rewards: #0: 19.537, true rewards: #0: 8.737 [2024-09-15 07:39:55,361][00574] Avg episode reward: 19.537, avg true_objective: 8.737 [2024-09-15 07:40:46,434][00574] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-15 07:40:46,461][00574] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-15 07:40:46,463][00574] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-15 07:40:46,464][00574] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-15 07:40:46,466][00574] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-15 07:40:46,467][00574] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-15 07:40:46,468][00574] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-15 07:40:46,473][00574] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-15 07:40:46,474][00574] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-15 07:40:46,475][00574] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-15 07:40:46,476][00574] Adding new argument 'hf_repository'='PierrotLalune/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-15 07:40:46,481][00574] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-15 07:40:46,482][00574] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-15 07:40:46,483][00574] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-15 07:40:46,484][00574] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-15 07:40:46,486][00574] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-15 07:40:46,513][00574] RunningMeanStd input shape: (3, 72, 128) [2024-09-15 07:40:46,514][00574] RunningMeanStd input shape: (1,) [2024-09-15 07:40:46,528][00574] ConvEncoder: input_channels=3 [2024-09-15 07:40:46,566][00574] Conv encoder output size: 512 [2024-09-15 07:40:46,567][00574] Policy head output size: 512 [2024-09-15 07:40:46,588][00574] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-15 07:40:47,045][00574] Num frames 100... [2024-09-15 07:40:47,175][00574] Num frames 200... [2024-09-15 07:40:47,308][00574] Num frames 300... [2024-09-15 07:40:47,437][00574] Num frames 400... [2024-09-15 07:40:47,563][00574] Num frames 500... [2024-09-15 07:40:47,695][00574] Num frames 600... [2024-09-15 07:40:47,834][00574] Num frames 700... [2024-09-15 07:40:47,975][00574] Num frames 800... [2024-09-15 07:40:48,115][00574] Avg episode rewards: #0: 15.640, true rewards: #0: 8.640 [2024-09-15 07:40:48,117][00574] Avg episode reward: 15.640, avg true_objective: 8.640 [2024-09-15 07:40:48,167][00574] Num frames 900... [2024-09-15 07:40:48,297][00574] Num frames 1000... [2024-09-15 07:40:48,426][00574] Num frames 1100... [2024-09-15 07:40:48,556][00574] Num frames 1200... [2024-09-15 07:40:48,734][00574] Avg episode rewards: #0: 10.980, true rewards: #0: 6.480 [2024-09-15 07:40:48,736][00574] Avg episode reward: 10.980, avg true_objective: 6.480 [2024-09-15 07:40:48,746][00574] Num frames 1300... [2024-09-15 07:40:48,889][00574] Num frames 1400... [2024-09-15 07:40:49,034][00574] Num frames 1500... [2024-09-15 07:40:49,164][00574] Num frames 1600... [2024-09-15 07:40:49,293][00574] Num frames 1700... [2024-09-15 07:40:49,429][00574] Num frames 1800... [2024-09-15 07:40:49,565][00574] Num frames 1900... [2024-09-15 07:40:49,712][00574] Avg episode rewards: #0: 11.227, true rewards: #0: 6.560 [2024-09-15 07:40:49,714][00574] Avg episode reward: 11.227, avg true_objective: 6.560 [2024-09-15 07:40:49,762][00574] Num frames 2000... [2024-09-15 07:40:49,905][00574] Num frames 2100... [2024-09-15 07:40:50,053][00574] Num frames 2200... [2024-09-15 07:40:50,187][00574] Num frames 2300... [2024-09-15 07:40:50,321][00574] Num frames 2400... [2024-09-15 07:40:50,454][00574] Num frames 2500... [2024-09-15 07:40:50,594][00574] Num frames 2600... [2024-09-15 07:40:50,727][00574] Num frames 2700... [2024-09-15 07:40:50,873][00574] Num frames 2800... [2024-09-15 07:40:51,015][00574] Num frames 2900... [2024-09-15 07:40:51,159][00574] Num frames 3000... [2024-09-15 07:40:51,301][00574] Num frames 3100... [2024-09-15 07:40:51,442][00574] Num frames 3200... [2024-09-15 07:40:51,618][00574] Avg episode rewards: #0: 15.168, true rewards: #0: 8.167 [2024-09-15 07:40:51,620][00574] Avg episode reward: 15.168, avg true_objective: 8.167 [2024-09-15 07:40:51,682][00574] Num frames 3300... [2024-09-15 07:40:51,881][00574] Num frames 3400... [2024-09-15 07:40:52,071][00574] Num frames 3500... [2024-09-15 07:40:52,258][00574] Num frames 3600... [2024-09-15 07:40:52,444][00574] Num frames 3700... [2024-09-15 07:40:52,620][00574] Num frames 3800... [2024-09-15 07:40:52,806][00574] Num frames 3900... [2024-09-15 07:40:52,996][00574] Num frames 4000... [2024-09-15 07:40:53,189][00574] Num frames 4100... [2024-09-15 07:40:53,389][00574] Num frames 4200... [2024-09-15 07:40:53,587][00574] Num frames 4300... [2024-09-15 07:40:53,689][00574] Avg episode rewards: #0: 16.246, true rewards: #0: 8.646 [2024-09-15 07:40:53,692][00574] Avg episode reward: 16.246, avg true_objective: 8.646 [2024-09-15 07:40:53,857][00574] Num frames 4400... [2024-09-15 07:40:53,993][00574] Num frames 4500... [2024-09-15 07:40:54,127][00574] Num frames 4600... [2024-09-15 07:40:54,270][00574] Num frames 4700... [2024-09-15 07:40:54,406][00574] Num frames 4800... [2024-09-15 07:40:54,535][00574] Num frames 4900... [2024-09-15 07:40:54,662][00574] Num frames 5000... [2024-09-15 07:40:54,797][00574] Num frames 5100... [2024-09-15 07:40:54,928][00574] Num frames 5200... [2024-09-15 07:40:55,042][00574] Avg episode rewards: #0: 16.907, true rewards: #0: 8.740 [2024-09-15 07:40:55,044][00574] Avg episode reward: 16.907, avg true_objective: 8.740 [2024-09-15 07:40:55,119][00574] Num frames 5300... [2024-09-15 07:40:55,266][00574] Num frames 5400... [2024-09-15 07:40:55,401][00574] Num frames 5500... [2024-09-15 07:40:55,528][00574] Num frames 5600... [2024-09-15 07:40:55,660][00574] Num frames 5700... [2024-09-15 07:40:55,800][00574] Num frames 5800... [2024-09-15 07:40:55,934][00574] Num frames 5900... [2024-09-15 07:40:56,067][00574] Num frames 6000... [2024-09-15 07:40:56,203][00574] Num frames 6100... [2024-09-15 07:40:56,314][00574] Avg episode rewards: #0: 16.772, true rewards: #0: 8.771 [2024-09-15 07:40:56,317][00574] Avg episode reward: 16.772, avg true_objective: 8.771 [2024-09-15 07:40:56,398][00574] Num frames 6200... [2024-09-15 07:40:56,530][00574] Num frames 6300... [2024-09-15 07:40:56,658][00574] Num frames 6400... [2024-09-15 07:40:56,794][00574] Num frames 6500... [2024-09-15 07:40:56,933][00574] Num frames 6600... [2024-09-15 07:40:57,065][00574] Num frames 6700... [2024-09-15 07:40:57,196][00574] Num frames 6800... [2024-09-15 07:40:57,340][00574] Num frames 6900... [2024-09-15 07:40:57,471][00574] Num frames 7000... [2024-09-15 07:40:57,534][00574] Avg episode rewards: #0: 17.130, true rewards: #0: 8.755 [2024-09-15 07:40:57,536][00574] Avg episode reward: 17.130, avg true_objective: 8.755 [2024-09-15 07:40:57,665][00574] Num frames 7100... [2024-09-15 07:40:57,811][00574] Num frames 7200... [2024-09-15 07:40:57,953][00574] Num frames 7300... [2024-09-15 07:40:58,087][00574] Num frames 7400... [2024-09-15 07:40:58,232][00574] Num frames 7500... [2024-09-15 07:40:58,373][00574] Avg episode rewards: #0: 16.276, true rewards: #0: 8.387 [2024-09-15 07:40:58,374][00574] Avg episode reward: 16.276, avg true_objective: 8.387 [2024-09-15 07:40:58,447][00574] Num frames 7600... [2024-09-15 07:40:58,586][00574] Num frames 7700... [2024-09-15 07:40:58,718][00574] Num frames 7800... [2024-09-15 07:40:58,866][00574] Num frames 7900... [2024-09-15 07:40:59,000][00574] Num frames 8000... [2024-09-15 07:40:59,135][00574] Num frames 8100... [2024-09-15 07:40:59,269][00574] Num frames 8200... [2024-09-15 07:40:59,408][00574] Num frames 8300... [2024-09-15 07:40:59,536][00574] Num frames 8400... [2024-09-15 07:40:59,671][00574] Num frames 8500... [2024-09-15 07:40:59,807][00574] Num frames 8600... [2024-09-15 07:40:59,945][00574] Num frames 8700... [2024-09-15 07:41:00,062][00574] Avg episode rewards: #0: 17.648, true rewards: #0: 8.748 [2024-09-15 07:41:00,063][00574] Avg episode reward: 17.648, avg true_objective: 8.748 [2024-09-15 07:41:50,671][00574] Replay video saved to /content/train_dir/default_experiment/replay.mp4!