gugaaa's picture
Upload folder using huggingface_hub
38e7103 verified
[2024-08-16 06:34:50,113][00769] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-08-16 06:34:50,115][00769] Rollout worker 0 uses device cpu
[2024-08-16 06:34:50,116][00769] Rollout worker 1 uses device cpu
[2024-08-16 06:34:50,117][00769] Rollout worker 2 uses device cpu
[2024-08-16 06:34:50,118][00769] Rollout worker 3 uses device cpu
[2024-08-16 06:34:50,120][00769] Rollout worker 4 uses device cpu
[2024-08-16 06:34:50,121][00769] Rollout worker 5 uses device cpu
[2024-08-16 06:34:50,122][00769] Rollout worker 6 uses device cpu
[2024-08-16 06:34:50,123][00769] Rollout worker 7 uses device cpu
[2024-08-16 06:34:50,269][00769] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-08-16 06:34:50,270][00769] InferenceWorker_p0-w0: min num requests: 2
[2024-08-16 06:34:50,305][00769] Starting all processes...
[2024-08-16 06:34:50,307][00769] Starting process learner_proc0
[2024-08-16 06:34:50,355][00769] Starting all processes...
[2024-08-16 06:34:50,364][00769] Starting process inference_proc0-0
[2024-08-16 06:34:50,364][00769] Starting process rollout_proc0
[2024-08-16 06:34:50,366][00769] Starting process rollout_proc1
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc2
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc3
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc4
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc5
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc6
[2024-08-16 06:34:50,367][00769] Starting process rollout_proc7
[2024-08-16 06:35:01,737][08179] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-08-16 06:35:01,738][08179] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-08-16 06:35:01,822][08179] Num visible devices: 1
[2024-08-16 06:35:01,851][08179] Starting seed is not provided
[2024-08-16 06:35:01,851][08179] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-08-16 06:35:01,852][08179] Initializing actor-critic model on device cuda:0
[2024-08-16 06:35:01,852][08179] RunningMeanStd input shape: (3, 72, 128)
[2024-08-16 06:35:01,854][08179] RunningMeanStd input shape: (1,)
[2024-08-16 06:35:01,947][08193] Worker 0 uses CPU cores [0]
[2024-08-16 06:35:01,955][08179] ConvEncoder: input_channels=3
[2024-08-16 06:35:02,124][08195] Worker 2 uses CPU cores [0]
[2024-08-16 06:35:02,375][08194] Worker 1 uses CPU cores [1]
[2024-08-16 06:35:02,398][08200] Worker 6 uses CPU cores [0]
[2024-08-16 06:35:02,421][08199] Worker 4 uses CPU cores [0]
[2024-08-16 06:35:02,546][08192] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-08-16 06:35:02,549][08192] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-08-16 06:35:02,573][08201] Worker 5 uses CPU cores [1]
[2024-08-16 06:35:02,584][08202] Worker 7 uses CPU cores [1]
[2024-08-16 06:35:02,601][08192] Num visible devices: 1
[2024-08-16 06:35:02,611][08196] Worker 3 uses CPU cores [1]
[2024-08-16 06:35:02,624][08179] Conv encoder output size: 512
[2024-08-16 06:35:02,624][08179] Policy head output size: 512
[2024-08-16 06:35:02,639][08179] Created Actor Critic model with architecture:
[2024-08-16 06:35:02,639][08179] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-08-16 06:35:06,369][08179] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-08-16 06:35:06,370][08179] No checkpoints found
[2024-08-16 06:35:06,371][08179] Did not load from checkpoint, starting from scratch!
[2024-08-16 06:35:06,371][08179] Initialized policy 0 weights for model version 0
[2024-08-16 06:35:06,383][08179] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-08-16 06:35:06,391][08179] LearnerWorker_p0 finished initialization!
[2024-08-16 06:35:06,592][08192] RunningMeanStd input shape: (3, 72, 128)
[2024-08-16 06:35:06,594][08192] RunningMeanStd input shape: (1,)
[2024-08-16 06:35:06,612][08192] ConvEncoder: input_channels=3
[2024-08-16 06:35:06,774][08192] Conv encoder output size: 512
[2024-08-16 06:35:06,776][08192] Policy head output size: 512
[2024-08-16 06:35:06,864][00769] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-08-16 06:35:08,945][00769] Inference worker 0-0 is ready!
[2024-08-16 06:35:08,948][00769] All inference workers are ready! Signal rollout workers to start!
[2024-08-16 06:35:09,062][08199] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,097][08200] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,098][08195] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,100][08193] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,189][08202] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,197][08196] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,203][08201] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:09,225][08194] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:35:10,261][00769] Heartbeat connected on Batcher_0
[2024-08-16 06:35:10,267][00769] Heartbeat connected on LearnerWorker_p0
[2024-08-16 06:35:10,302][00769] Heartbeat connected on InferenceWorker_p0-w0
[2024-08-16 06:35:10,595][08194] Decorrelating experience for 0 frames...
[2024-08-16 06:35:10,596][08201] Decorrelating experience for 0 frames...
[2024-08-16 06:35:10,808][08199] Decorrelating experience for 0 frames...
[2024-08-16 06:35:10,810][08193] Decorrelating experience for 0 frames...
[2024-08-16 06:35:10,812][08195] Decorrelating experience for 0 frames...
[2024-08-16 06:35:10,825][08200] Decorrelating experience for 0 frames...
[2024-08-16 06:35:11,864][00769] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-08-16 06:35:11,957][08199] Decorrelating experience for 32 frames...
[2024-08-16 06:35:11,964][08193] Decorrelating experience for 32 frames...
[2024-08-16 06:35:11,967][08200] Decorrelating experience for 32 frames...
[2024-08-16 06:35:12,093][08194] Decorrelating experience for 32 frames...
[2024-08-16 06:35:12,109][08201] Decorrelating experience for 32 frames...
[2024-08-16 06:35:12,190][08196] Decorrelating experience for 0 frames...
[2024-08-16 06:35:12,346][08202] Decorrelating experience for 0 frames...
[2024-08-16 06:35:13,144][08195] Decorrelating experience for 32 frames...
[2024-08-16 06:35:13,152][08193] Decorrelating experience for 64 frames...
[2024-08-16 06:35:13,155][08200] Decorrelating experience for 64 frames...
[2024-08-16 06:35:13,157][08199] Decorrelating experience for 64 frames...
[2024-08-16 06:35:13,732][08193] Decorrelating experience for 96 frames...
[2024-08-16 06:35:13,815][00769] Heartbeat connected on RolloutWorker_w0
[2024-08-16 06:35:14,456][08202] Decorrelating experience for 32 frames...
[2024-08-16 06:35:14,455][08201] Decorrelating experience for 64 frames...
[2024-08-16 06:35:14,459][08196] Decorrelating experience for 32 frames...
[2024-08-16 06:35:14,462][08194] Decorrelating experience for 64 frames...
[2024-08-16 06:35:15,276][08202] Decorrelating experience for 64 frames...
[2024-08-16 06:35:15,807][08200] Decorrelating experience for 96 frames...
[2024-08-16 06:35:15,810][08199] Decorrelating experience for 96 frames...
[2024-08-16 06:35:15,856][08195] Decorrelating experience for 64 frames...
[2024-08-16 06:35:16,185][00769] Heartbeat connected on RolloutWorker_w6
[2024-08-16 06:35:16,203][00769] Heartbeat connected on RolloutWorker_w4
[2024-08-16 06:35:16,255][08202] Decorrelating experience for 96 frames...
[2024-08-16 06:35:16,580][00769] Heartbeat connected on RolloutWorker_w7
[2024-08-16 06:35:16,864][00769] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-08-16 06:35:16,984][08195] Decorrelating experience for 96 frames...
[2024-08-16 06:35:17,757][00769] Heartbeat connected on RolloutWorker_w2
[2024-08-16 06:35:17,831][08196] Decorrelating experience for 64 frames...
[2024-08-16 06:35:17,945][08201] Decorrelating experience for 96 frames...
[2024-08-16 06:35:18,371][00769] Heartbeat connected on RolloutWorker_w5
[2024-08-16 06:35:21,866][00769] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 110.3. Samples: 1654. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-08-16 06:35:21,869][00769] Avg episode reward: [(0, '2.270')]
[2024-08-16 06:35:22,251][08194] Decorrelating experience for 96 frames...
[2024-08-16 06:35:22,550][08196] Decorrelating experience for 96 frames...
[2024-08-16 06:35:23,367][00769] Heartbeat connected on RolloutWorker_w1
[2024-08-16 06:35:23,805][00769] Heartbeat connected on RolloutWorker_w3
[2024-08-16 06:35:24,271][08179] Signal inference workers to stop experience collection...
[2024-08-16 06:35:24,295][08192] InferenceWorker_p0-w0: stopping experience collection
[2024-08-16 06:35:25,294][08179] Signal inference workers to resume experience collection...
[2024-08-16 06:35:25,295][08192] InferenceWorker_p0-w0: resuming experience collection
[2024-08-16 06:35:26,864][00769] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 123.3. Samples: 2466. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-08-16 06:35:26,870][00769] Avg episode reward: [(0, '3.014')]
[2024-08-16 06:35:31,864][00769] Fps is (10 sec: 2458.0, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 192.3. Samples: 4808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0)
[2024-08-16 06:35:31,870][00769] Avg episode reward: [(0, '3.771')]
[2024-08-16 06:35:35,470][08192] Updated weights for policy 0, policy_version 10 (0.0014)
[2024-08-16 06:35:36,864][00769] Fps is (10 sec: 4096.0, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 375.3. Samples: 11258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:35:36,867][00769] Avg episode reward: [(0, '4.151')]
[2024-08-16 06:35:41,864][00769] Fps is (10 sec: 4096.0, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 415.4. Samples: 14540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:35:41,869][00769] Avg episode reward: [(0, '4.368')]
[2024-08-16 06:35:46,865][00769] Fps is (10 sec: 3276.7, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 477.6. Samples: 19104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:35:46,872][00769] Avg episode reward: [(0, '4.541')]
[2024-08-16 06:35:47,822][08192] Updated weights for policy 0, policy_version 20 (0.0017)
[2024-08-16 06:35:51,864][00769] Fps is (10 sec: 3276.8, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 549.4. Samples: 24722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-08-16 06:35:51,872][00769] Avg episode reward: [(0, '4.699')]
[2024-08-16 06:35:56,864][00769] Fps is (10 sec: 4096.1, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 118784. Throughput: 0: 622.8. Samples: 28028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:35:56,866][00769] Avg episode reward: [(0, '4.492')]
[2024-08-16 06:35:56,876][08179] Saving new best policy, reward=4.492!
[2024-08-16 06:35:56,886][08192] Updated weights for policy 0, policy_version 30 (0.0025)
[2024-08-16 06:36:01,866][00769] Fps is (10 sec: 3685.8, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 135168. Throughput: 0: 744.5. Samples: 33524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-08-16 06:36:01,869][00769] Avg episode reward: [(0, '4.595')]
[2024-08-16 06:36:01,877][08179] Saving new best policy, reward=4.595!
[2024-08-16 06:36:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 2525.9, 300 sec: 2525.9). Total num frames: 151552. Throughput: 0: 804.6. Samples: 37858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-08-16 06:36:06,867][00769] Avg episode reward: [(0, '4.573')]
[2024-08-16 06:36:09,304][08192] Updated weights for policy 0, policy_version 40 (0.0044)
[2024-08-16 06:36:11,864][00769] Fps is (10 sec: 3687.0, 60 sec: 2867.2, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 861.4. Samples: 41230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:36:11,867][00769] Avg episode reward: [(0, '4.483')]
[2024-08-16 06:36:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 959.6. Samples: 47992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:36:16,869][00769] Avg episode reward: [(0, '4.476')]
[2024-08-16 06:36:20,826][08192] Updated weights for policy 0, policy_version 50 (0.0027)
[2024-08-16 06:36:21,867][00769] Fps is (10 sec: 3275.9, 60 sec: 3413.3, 300 sec: 2730.6). Total num frames: 204800. Throughput: 0: 907.5. Samples: 52100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:36:21,873][00769] Avg episode reward: [(0, '4.285')]
[2024-08-16 06:36:26,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 2816.0). Total num frames: 225280. Throughput: 0: 896.8. Samples: 54896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-08-16 06:36:26,866][00769] Avg episode reward: [(0, '4.326')]
[2024-08-16 06:36:30,693][08192] Updated weights for policy 0, policy_version 60 (0.0016)
[2024-08-16 06:36:31,864][00769] Fps is (10 sec: 4506.8, 60 sec: 3754.7, 300 sec: 2939.5). Total num frames: 249856. Throughput: 0: 937.3. Samples: 61284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:36:31,871][00769] Avg episode reward: [(0, '4.644')]
[2024-08-16 06:36:31,874][08179] Saving new best policy, reward=4.644!
[2024-08-16 06:36:36,865][00769] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 2912.7). Total num frames: 262144. Throughput: 0: 925.1. Samples: 66350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:36:36,868][00769] Avg episode reward: [(0, '4.710')]
[2024-08-16 06:36:36,875][08179] Saving new best policy, reward=4.710!
[2024-08-16 06:36:41,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 2975.0). Total num frames: 282624. Throughput: 0: 897.5. Samples: 68414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-08-16 06:36:41,867][00769] Avg episode reward: [(0, '4.872')]
[2024-08-16 06:36:41,870][08179] Saving new best policy, reward=4.872!
[2024-08-16 06:36:42,790][08192] Updated weights for policy 0, policy_version 70 (0.0015)
[2024-08-16 06:36:46,864][00769] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3031.0). Total num frames: 303104. Throughput: 0: 921.6. Samples: 74996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:36:46,867][00769] Avg episode reward: [(0, '4.589')]
[2024-08-16 06:36:46,873][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
[2024-08-16 06:36:51,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3081.8). Total num frames: 323584. Throughput: 0: 963.1. Samples: 81198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:36:51,867][00769] Avg episode reward: [(0, '4.624')]
[2024-08-16 06:36:53,016][08192] Updated weights for policy 0, policy_version 80 (0.0017)
[2024-08-16 06:36:56,874][00769] Fps is (10 sec: 3273.7, 60 sec: 3617.6, 300 sec: 3053.1). Total num frames: 335872. Throughput: 0: 935.6. Samples: 83342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:36:56,877][00769] Avg episode reward: [(0, '4.672')]
[2024-08-16 06:37:01,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3134.3). Total num frames: 360448. Throughput: 0: 906.8. Samples: 88800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:37:01,867][00769] Avg episode reward: [(0, '4.457')]
[2024-08-16 06:37:03,567][08192] Updated weights for policy 0, policy_version 90 (0.0019)
[2024-08-16 06:37:06,864][00769] Fps is (10 sec: 4509.9, 60 sec: 3822.9, 300 sec: 3174.4). Total num frames: 380928. Throughput: 0: 969.3. Samples: 95714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:37:06,869][00769] Avg episode reward: [(0, '4.510')]
[2024-08-16 06:37:11,865][00769] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 967.7. Samples: 98442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:37:11,868][00769] Avg episode reward: [(0, '4.744')]
[2024-08-16 06:37:15,217][08192] Updated weights for policy 0, policy_version 100 (0.0022)
[2024-08-16 06:37:16,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3182.3). Total num frames: 413696. Throughput: 0: 922.8. Samples: 102808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:37:16,871][00769] Avg episode reward: [(0, '4.718')]
[2024-08-16 06:37:21,864][00769] Fps is (10 sec: 4096.4, 60 sec: 3891.4, 300 sec: 3246.5). Total num frames: 438272. Throughput: 0: 963.1. Samples: 109690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:37:21,869][00769] Avg episode reward: [(0, '4.533')]
[2024-08-16 06:37:24,090][08192] Updated weights for policy 0, policy_version 110 (0.0012)
[2024-08-16 06:37:26,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 458752. Throughput: 0: 992.0. Samples: 113054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:37:26,870][00769] Avg episode reward: [(0, '4.551')]
[2024-08-16 06:37:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3248.6). Total num frames: 471040. Throughput: 0: 942.3. Samples: 117398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:37:31,868][00769] Avg episode reward: [(0, '4.698')]
[2024-08-16 06:37:35,912][08192] Updated weights for policy 0, policy_version 120 (0.0021)
[2024-08-16 06:37:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3304.1). Total num frames: 495616. Throughput: 0: 938.4. Samples: 123424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:37:36,873][00769] Avg episode reward: [(0, '4.800')]
[2024-08-16 06:37:41,864][00769] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3329.6). Total num frames: 516096. Throughput: 0: 967.6. Samples: 126876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:37:41,870][00769] Avg episode reward: [(0, '4.977')]
[2024-08-16 06:37:41,873][08179] Saving new best policy, reward=4.977!
[2024-08-16 06:37:46,663][08192] Updated weights for policy 0, policy_version 130 (0.0029)
[2024-08-16 06:37:46,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3328.0). Total num frames: 532480. Throughput: 0: 973.1. Samples: 132588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:37:46,870][00769] Avg episode reward: [(0, '5.031')]
[2024-08-16 06:37:46,880][08179] Saving new best policy, reward=5.031!
[2024-08-16 06:37:51,864][00769] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3326.4). Total num frames: 548864. Throughput: 0: 924.2. Samples: 137304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:37:51,871][00769] Avg episode reward: [(0, '4.685')]
[2024-08-16 06:37:56,646][08192] Updated weights for policy 0, policy_version 140 (0.0018)
[2024-08-16 06:37:56,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3960.1, 300 sec: 3373.2). Total num frames: 573440. Throughput: 0: 940.9. Samples: 140780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:37:56,866][00769] Avg episode reward: [(0, '4.875')]
[2024-08-16 06:38:01,869][00769] Fps is (10 sec: 4094.1, 60 sec: 3822.6, 300 sec: 3370.3). Total num frames: 589824. Throughput: 0: 988.9. Samples: 147314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:38:01,872][00769] Avg episode reward: [(0, '4.986')]
[2024-08-16 06:38:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3367.8). Total num frames: 606208. Throughput: 0: 931.0. Samples: 151584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:38:06,867][00769] Avg episode reward: [(0, '4.864')]
[2024-08-16 06:38:08,606][08192] Updated weights for policy 0, policy_version 150 (0.0018)
[2024-08-16 06:38:11,864][00769] Fps is (10 sec: 3688.1, 60 sec: 3823.0, 300 sec: 3387.5). Total num frames: 626688. Throughput: 0: 925.4. Samples: 154696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:38:11,872][00769] Avg episode reward: [(0, '4.908')]
[2024-08-16 06:38:16,868][00769] Fps is (10 sec: 4504.0, 60 sec: 3959.2, 300 sec: 3427.6). Total num frames: 651264. Throughput: 0: 978.9. Samples: 161454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:38:16,870][00769] Avg episode reward: [(0, '5.125')]
[2024-08-16 06:38:16,883][08179] Saving new best policy, reward=5.125!
[2024-08-16 06:38:17,992][08192] Updated weights for policy 0, policy_version 160 (0.0012)
[2024-08-16 06:38:21,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3402.8). Total num frames: 663552. Throughput: 0: 953.1. Samples: 166314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:38:21,870][00769] Avg episode reward: [(0, '5.033')]
[2024-08-16 06:38:26,864][00769] Fps is (10 sec: 2458.4, 60 sec: 3618.1, 300 sec: 3379.2). Total num frames: 675840. Throughput: 0: 912.8. Samples: 167950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:38:26,875][00769] Avg episode reward: [(0, '5.155')]
[2024-08-16 06:38:26,889][08179] Saving new best policy, reward=5.155!
[2024-08-16 06:38:31,865][00769] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3376.7). Total num frames: 692224. Throughput: 0: 875.0. Samples: 171962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:38:31,871][00769] Avg episode reward: [(0, '5.419')]
[2024-08-16 06:38:31,874][08179] Saving new best policy, reward=5.419!
[2024-08-16 06:38:32,282][08192] Updated weights for policy 0, policy_version 170 (0.0012)
[2024-08-16 06:38:36,865][00769] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3393.8). Total num frames: 712704. Throughput: 0: 922.3. Samples: 178806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:38:36,867][00769] Avg episode reward: [(0, '5.393')]
[2024-08-16 06:38:41,864][00769] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3391.1). Total num frames: 729088. Throughput: 0: 896.5. Samples: 181124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:38:41,867][00769] Avg episode reward: [(0, '5.275')]
[2024-08-16 06:38:44,172][08192] Updated weights for policy 0, policy_version 180 (0.0012)
[2024-08-16 06:38:46,864][00769] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3388.5). Total num frames: 745472. Throughput: 0: 856.1. Samples: 185834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:38:46,871][00769] Avg episode reward: [(0, '5.224')]
[2024-08-16 06:38:46,971][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth...
[2024-08-16 06:38:51,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3422.4). Total num frames: 770048. Throughput: 0: 909.7. Samples: 192522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:38:51,871][00769] Avg episode reward: [(0, '5.419')]
[2024-08-16 06:38:53,451][08192] Updated weights for policy 0, policy_version 190 (0.0022)
[2024-08-16 06:38:56,867][00769] Fps is (10 sec: 4095.1, 60 sec: 3549.7, 300 sec: 3419.2). Total num frames: 786432. Throughput: 0: 911.0. Samples: 195694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:38:56,869][00769] Avg episode reward: [(0, '5.343')]
[2024-08-16 06:39:01,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3416.2). Total num frames: 802816. Throughput: 0: 853.8. Samples: 199872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:39:01,872][00769] Avg episode reward: [(0, '5.386')]
[2024-08-16 06:39:05,284][08192] Updated weights for policy 0, policy_version 200 (0.0028)
[2024-08-16 06:39:06,864][00769] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3430.4). Total num frames: 823296. Throughput: 0: 881.8. Samples: 205994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:39:06,872][00769] Avg episode reward: [(0, '5.427')]
[2024-08-16 06:39:06,881][08179] Saving new best policy, reward=5.427!
[2024-08-16 06:39:11,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3444.0). Total num frames: 843776. Throughput: 0: 917.9. Samples: 209254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:39:11,869][00769] Avg episode reward: [(0, '5.139')]
[2024-08-16 06:39:16,638][08192] Updated weights for policy 0, policy_version 210 (0.0014)
[2024-08-16 06:39:16,868][00769] Fps is (10 sec: 3685.0, 60 sec: 3481.6, 300 sec: 3440.6). Total num frames: 860160. Throughput: 0: 941.4. Samples: 214326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:39:16,871][00769] Avg episode reward: [(0, '5.276')]
[2024-08-16 06:39:21,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3437.4). Total num frames: 876544. Throughput: 0: 905.2. Samples: 219542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:39:21,871][00769] Avg episode reward: [(0, '5.517')]
[2024-08-16 06:39:21,958][08179] Saving new best policy, reward=5.517!
[2024-08-16 06:39:26,637][08192] Updated weights for policy 0, policy_version 220 (0.0019)
[2024-08-16 06:39:26,864][00769] Fps is (10 sec: 4097.5, 60 sec: 3754.7, 300 sec: 3465.8). Total num frames: 901120. Throughput: 0: 925.4. Samples: 222768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:39:26,867][00769] Avg episode reward: [(0, '5.734')]
[2024-08-16 06:39:26,880][08179] Saving new best policy, reward=5.734!
[2024-08-16 06:39:31,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3462.3). Total num frames: 917504. Throughput: 0: 953.8. Samples: 228756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-08-16 06:39:31,867][00769] Avg episode reward: [(0, '5.486')]
[2024-08-16 06:39:36,865][00769] Fps is (10 sec: 2867.0, 60 sec: 3618.1, 300 sec: 3443.7). Total num frames: 929792. Throughput: 0: 897.0. Samples: 232888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:39:36,869][00769] Avg episode reward: [(0, '5.749')]
[2024-08-16 06:39:36,955][08179] Saving new best policy, reward=5.749!
[2024-08-16 06:39:38,921][08192] Updated weights for policy 0, policy_version 230 (0.0013)
[2024-08-16 06:39:41,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3470.4). Total num frames: 954368. Throughput: 0: 898.1. Samples: 236106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:39:41,867][00769] Avg episode reward: [(0, '5.901')]
[2024-08-16 06:39:41,872][08179] Saving new best policy, reward=5.901!
[2024-08-16 06:39:46,865][00769] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 974848. Throughput: 0: 951.5. Samples: 242690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:39:46,871][00769] Avg episode reward: [(0, '5.628')]
[2024-08-16 06:39:49,539][08192] Updated weights for policy 0, policy_version 240 (0.0014)
[2024-08-16 06:39:51,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3463.6). Total num frames: 987136. Throughput: 0: 919.4. Samples: 247366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:39:51,869][00769] Avg episode reward: [(0, '5.545')]
[2024-08-16 06:39:56,864][00769] Fps is (10 sec: 3277.0, 60 sec: 3686.5, 300 sec: 3474.5). Total num frames: 1007616. Throughput: 0: 898.8. Samples: 249702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:39:56,869][00769] Avg episode reward: [(0, '5.714')]
[2024-08-16 06:40:00,195][08192] Updated weights for policy 0, policy_version 250 (0.0015)
[2024-08-16 06:40:01,865][00769] Fps is (10 sec: 4095.7, 60 sec: 3754.6, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 933.9. Samples: 256348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:01,872][00769] Avg episode reward: [(0, '6.156')]
[2024-08-16 06:40:01,875][08179] Saving new best policy, reward=6.156!
[2024-08-16 06:40:06,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1044480. Throughput: 0: 937.3. Samples: 261722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:40:06,867][00769] Avg episode reward: [(0, '6.229')]
[2024-08-16 06:40:06,882][08179] Saving new best policy, reward=6.229!
[2024-08-16 06:40:11,864][00769] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 909.0. Samples: 263674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:40:11,871][00769] Avg episode reward: [(0, '5.986')]
[2024-08-16 06:40:12,352][08192] Updated weights for policy 0, policy_version 260 (0.0013)
[2024-08-16 06:40:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 910.8. Samples: 269744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:40:16,867][00769] Avg episode reward: [(0, '5.930')]
[2024-08-16 06:40:21,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1101824. Throughput: 0: 964.3. Samples: 276282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:21,870][00769] Avg episode reward: [(0, '6.224')]
[2024-08-16 06:40:22,051][08192] Updated weights for policy 0, policy_version 270 (0.0024)
[2024-08-16 06:40:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1118208. Throughput: 0: 937.0. Samples: 278270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:40:26,873][00769] Avg episode reward: [(0, '6.453')]
[2024-08-16 06:40:26,883][08179] Saving new best policy, reward=6.453!
[2024-08-16 06:40:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1134592. Throughput: 0: 898.1. Samples: 283104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:31,866][00769] Avg episode reward: [(0, '6.735')]
[2024-08-16 06:40:31,875][08179] Saving new best policy, reward=6.735!
[2024-08-16 06:40:33,938][08192] Updated weights for policy 0, policy_version 280 (0.0018)
[2024-08-16 06:40:36,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 1159168. Throughput: 0: 942.4. Samples: 289776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:36,871][00769] Avg episode reward: [(0, '6.854')]
[2024-08-16 06:40:36,880][08179] Saving new best policy, reward=6.854!
[2024-08-16 06:40:41,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1175552. Throughput: 0: 955.9. Samples: 292716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:41,871][00769] Avg episode reward: [(0, '7.629')]
[2024-08-16 06:40:41,872][08179] Saving new best policy, reward=7.629!
[2024-08-16 06:40:46,014][08192] Updated weights for policy 0, policy_version 290 (0.0019)
[2024-08-16 06:40:46,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 1187840. Throughput: 0: 897.4. Samples: 296730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:40:46,867][00769] Avg episode reward: [(0, '7.627')]
[2024-08-16 06:40:46,904][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth...
[2024-08-16 06:40:47,052][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
[2024-08-16 06:40:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1212416. Throughput: 0: 920.2. Samples: 303132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:40:51,868][00769] Avg episode reward: [(0, '7.321')]
[2024-08-16 06:40:55,228][08192] Updated weights for policy 0, policy_version 300 (0.0020)
[2024-08-16 06:40:56,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1232896. Throughput: 0: 951.1. Samples: 306474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:40:56,867][00769] Avg episode reward: [(0, '7.027')]
[2024-08-16 06:41:01,866][00769] Fps is (10 sec: 3276.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1245184. Throughput: 0: 925.5. Samples: 311394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:41:01,868][00769] Avg episode reward: [(0, '7.028')]
[2024-08-16 06:41:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1265664. Throughput: 0: 902.2. Samples: 316882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:06,871][00769] Avg episode reward: [(0, '7.111')]
[2024-08-16 06:41:07,036][08192] Updated weights for policy 0, policy_version 310 (0.0014)
[2024-08-16 06:41:11,864][00769] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1290240. Throughput: 0: 931.0. Samples: 320166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:11,872][00769] Avg episode reward: [(0, '7.786')]
[2024-08-16 06:41:11,875][08179] Saving new best policy, reward=7.786!
[2024-08-16 06:41:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1306624. Throughput: 0: 953.8. Samples: 326024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:16,873][00769] Avg episode reward: [(0, '8.331')]
[2024-08-16 06:41:16,889][08179] Saving new best policy, reward=8.331!
[2024-08-16 06:41:18,319][08192] Updated weights for policy 0, policy_version 320 (0.0020)
[2024-08-16 06:41:21,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1323008. Throughput: 0: 901.3. Samples: 330334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:21,867][00769] Avg episode reward: [(0, '8.633')]
[2024-08-16 06:41:21,875][08179] Saving new best policy, reward=8.633!
[2024-08-16 06:41:26,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1335296. Throughput: 0: 898.2. Samples: 333136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:41:26,872][00769] Avg episode reward: [(0, '8.487')]
[2024-08-16 06:41:31,573][08192] Updated weights for policy 0, policy_version 330 (0.0024)
[2024-08-16 06:41:31,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 1351680. Throughput: 0: 901.2. Samples: 337282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:31,871][00769] Avg episode reward: [(0, '8.415')]
[2024-08-16 06:41:36,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3665.6). Total num frames: 1363968. Throughput: 0: 849.2. Samples: 341344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:41:36,874][00769] Avg episode reward: [(0, '8.671')]
[2024-08-16 06:41:36,884][08179] Saving new best policy, reward=8.671!
[2024-08-16 06:41:41,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 1384448. Throughput: 0: 831.4. Samples: 343886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:41:41,867][00769] Avg episode reward: [(0, '9.037')]
[2024-08-16 06:41:41,869][08179] Saving new best policy, reward=9.037!
[2024-08-16 06:41:43,355][08192] Updated weights for policy 0, policy_version 340 (0.0017)
[2024-08-16 06:41:46,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1404928. Throughput: 0: 870.9. Samples: 350584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:41:46,872][00769] Avg episode reward: [(0, '9.480')]
[2024-08-16 06:41:46,881][08179] Saving new best policy, reward=9.480!
[2024-08-16 06:41:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3679.6). Total num frames: 1421312. Throughput: 0: 865.2. Samples: 355818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-08-16 06:41:51,867][00769] Avg episode reward: [(0, '9.836')]
[2024-08-16 06:41:51,869][08179] Saving new best policy, reward=9.836!
[2024-08-16 06:41:55,364][08192] Updated weights for policy 0, policy_version 350 (0.0054)
[2024-08-16 06:41:56,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3651.7). Total num frames: 1437696. Throughput: 0: 836.7. Samples: 357816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:41:56,871][00769] Avg episode reward: [(0, '10.612')]
[2024-08-16 06:41:56,885][08179] Saving new best policy, reward=10.612!
[2024-08-16 06:42:01,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3651.7). Total num frames: 1458176. Throughput: 0: 842.9. Samples: 363956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:42:01,871][00769] Avg episode reward: [(0, '10.846')]
[2024-08-16 06:42:01,927][08179] Saving new best policy, reward=10.846!
[2024-08-16 06:42:04,768][08192] Updated weights for policy 0, policy_version 360 (0.0017)
[2024-08-16 06:42:06,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1478656. Throughput: 0: 887.2. Samples: 370260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:42:06,868][00769] Avg episode reward: [(0, '11.691')]
[2024-08-16 06:42:06,876][08179] Saving new best policy, reward=11.691!
[2024-08-16 06:42:11,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3651.7). Total num frames: 1490944. Throughput: 0: 866.5. Samples: 372130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:11,872][00769] Avg episode reward: [(0, '11.694')]
[2024-08-16 06:42:11,878][08179] Saving new best policy, reward=11.694!
[2024-08-16 06:42:16,865][00769] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3637.8). Total num frames: 1511424. Throughput: 0: 889.7. Samples: 377318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:16,868][00769] Avg episode reward: [(0, '10.993')]
[2024-08-16 06:42:16,932][08192] Updated weights for policy 0, policy_version 370 (0.0025)
[2024-08-16 06:42:21,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1536000. Throughput: 0: 949.5. Samples: 384070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:21,867][00769] Avg episode reward: [(0, '10.671')]
[2024-08-16 06:42:26,864][00769] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1552384. Throughput: 0: 950.8. Samples: 386674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:42:26,869][00769] Avg episode reward: [(0, '11.007')]
[2024-08-16 06:42:28,359][08192] Updated weights for policy 0, policy_version 380 (0.0021)
[2024-08-16 06:42:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1568768. Throughput: 0: 896.1. Samples: 390908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:31,867][00769] Avg episode reward: [(0, '11.915')]
[2024-08-16 06:42:31,869][08179] Saving new best policy, reward=11.915!
[2024-08-16 06:42:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3637.8). Total num frames: 1589248. Throughput: 0: 920.1. Samples: 397222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:36,867][00769] Avg episode reward: [(0, '11.915')]
[2024-08-16 06:42:38,613][08192] Updated weights for policy 0, policy_version 390 (0.0018)
[2024-08-16 06:42:41,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1605632. Throughput: 0: 947.6. Samples: 400456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:42:41,871][00769] Avg episode reward: [(0, '12.157')]
[2024-08-16 06:42:41,875][08179] Saving new best policy, reward=12.157!
[2024-08-16 06:42:46,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1622016. Throughput: 0: 910.4. Samples: 404926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:42:46,868][00769] Avg episode reward: [(0, '12.249')]
[2024-08-16 06:42:46,878][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000396_1622016.pth...
[2024-08-16 06:42:47,050][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth
[2024-08-16 06:42:47,090][08179] Saving new best policy, reward=12.249!
[2024-08-16 06:42:50,626][08192] Updated weights for policy 0, policy_version 400 (0.0018)
[2024-08-16 06:42:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1642496. Throughput: 0: 891.6. Samples: 410382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:42:51,867][00769] Avg episode reward: [(0, '12.057')]
[2024-08-16 06:42:56,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3637.9). Total num frames: 1662976. Throughput: 0: 920.7. Samples: 413562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:42:56,869][00769] Avg episode reward: [(0, '13.909')]
[2024-08-16 06:42:56,877][08179] Saving new best policy, reward=13.909!
[2024-08-16 06:43:01,869][00769] Fps is (10 sec: 3275.2, 60 sec: 3617.8, 300 sec: 3623.9). Total num frames: 1675264. Throughput: 0: 922.7. Samples: 418846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:43:01,874][00769] Avg episode reward: [(0, '13.555')]
[2024-08-16 06:43:02,570][08192] Updated weights for policy 0, policy_version 410 (0.0021)
[2024-08-16 06:43:06,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1691648. Throughput: 0: 870.5. Samples: 423242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:43:06,867][00769] Avg episode reward: [(0, '13.674')]
[2024-08-16 06:43:11,864][00769] Fps is (10 sec: 4098.0, 60 sec: 3754.7, 300 sec: 3610.1). Total num frames: 1716224. Throughput: 0: 884.9. Samples: 426494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:43:11,870][00769] Avg episode reward: [(0, '13.450')]
[2024-08-16 06:43:12,719][08192] Updated weights for policy 0, policy_version 420 (0.0016)
[2024-08-16 06:43:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1732608. Throughput: 0: 937.7. Samples: 433106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:43:16,871][00769] Avg episode reward: [(0, '14.141')]
[2024-08-16 06:43:16,890][08179] Saving new best policy, reward=14.141!
[2024-08-16 06:43:21,865][00769] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 1744896. Throughput: 0: 883.9. Samples: 436998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:43:21,869][00769] Avg episode reward: [(0, '14.681')]
[2024-08-16 06:43:21,877][08179] Saving new best policy, reward=14.681!
[2024-08-16 06:43:25,015][08192] Updated weights for policy 0, policy_version 430 (0.0031)
[2024-08-16 06:43:26,865][00769] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3637.8). Total num frames: 1765376. Throughput: 0: 873.9. Samples: 439780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:43:26,867][00769] Avg episode reward: [(0, '14.367')]
[2024-08-16 06:43:31,864][00769] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1789952. Throughput: 0: 919.2. Samples: 446290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:43:31,867][00769] Avg episode reward: [(0, '15.686')]
[2024-08-16 06:43:31,871][08179] Saving new best policy, reward=15.686!
[2024-08-16 06:43:36,132][08192] Updated weights for policy 0, policy_version 440 (0.0026)
[2024-08-16 06:43:36,869][00769] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3637.7). Total num frames: 1802240. Throughput: 0: 905.7. Samples: 451144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:43:36,876][00769] Avg episode reward: [(0, '16.473')]
[2024-08-16 06:43:36,891][08179] Saving new best policy, reward=16.473!
[2024-08-16 06:43:41,867][00769] Fps is (10 sec: 2866.4, 60 sec: 3549.7, 300 sec: 3637.8). Total num frames: 1818624. Throughput: 0: 879.1. Samples: 453122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:43:41,869][00769] Avg episode reward: [(0, '16.843')]
[2024-08-16 06:43:41,879][08179] Saving new best policy, reward=16.843!
[2024-08-16 06:43:46,731][08192] Updated weights for policy 0, policy_version 450 (0.0030)
[2024-08-16 06:43:46,864][00769] Fps is (10 sec: 4097.9, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1843200. Throughput: 0: 901.6. Samples: 459414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:43:46,867][00769] Avg episode reward: [(0, '16.172')]
[2024-08-16 06:43:51,864][00769] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1859584. Throughput: 0: 940.4. Samples: 465562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:43:51,870][00769] Avg episode reward: [(0, '16.040')]
[2024-08-16 06:43:56,865][00769] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3637.8). Total num frames: 1875968. Throughput: 0: 911.9. Samples: 467530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:43:56,873][00769] Avg episode reward: [(0, '15.016')]
[2024-08-16 06:43:58,720][08192] Updated weights for policy 0, policy_version 460 (0.0024)
[2024-08-16 06:44:01,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3637.8). Total num frames: 1896448. Throughput: 0: 885.1. Samples: 472934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:44:01,867][00769] Avg episode reward: [(0, '14.427')]
[2024-08-16 06:44:06,864][00769] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3637.8). Total num frames: 1916928. Throughput: 0: 943.7. Samples: 479462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:44:06,869][00769] Avg episode reward: [(0, '13.944')]
[2024-08-16 06:44:08,949][08192] Updated weights for policy 0, policy_version 470 (0.0026)
[2024-08-16 06:44:11,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3624.0). Total num frames: 1929216. Throughput: 0: 935.0. Samples: 481854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:44:11,866][00769] Avg episode reward: [(0, '14.381')]
[2024-08-16 06:44:16,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1949696. Throughput: 0: 889.5. Samples: 486318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:44:16,867][00769] Avg episode reward: [(0, '14.355')]
[2024-08-16 06:44:20,166][08192] Updated weights for policy 0, policy_version 480 (0.0013)
[2024-08-16 06:44:21,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 1970176. Throughput: 0: 927.6. Samples: 492880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:44:21,867][00769] Avg episode reward: [(0, '15.242')]
[2024-08-16 06:44:26,867][00769] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3637.8). Total num frames: 1990656. Throughput: 0: 959.1. Samples: 496282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:44:26,869][00769] Avg episode reward: [(0, '15.488')]
[2024-08-16 06:44:31,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 1998848. Throughput: 0: 899.4. Samples: 499886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:44:31,870][00769] Avg episode reward: [(0, '15.635')]
[2024-08-16 06:44:34,456][08192] Updated weights for policy 0, policy_version 490 (0.0023)
[2024-08-16 06:44:36,864][00769] Fps is (10 sec: 2048.5, 60 sec: 3481.9, 300 sec: 3582.3). Total num frames: 2011136. Throughput: 0: 838.0. Samples: 503270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:44:36,867][00769] Avg episode reward: [(0, '15.929')]
[2024-08-16 06:44:41,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3596.2). Total num frames: 2035712. Throughput: 0: 867.7. Samples: 506576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:44:41,870][00769] Avg episode reward: [(0, '16.837')]
[2024-08-16 06:44:44,334][08192] Updated weights for policy 0, policy_version 500 (0.0024)
[2024-08-16 06:44:46,864][00769] Fps is (10 sec: 4505.5, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2056192. Throughput: 0: 897.6. Samples: 513326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:44:46,867][00769] Avg episode reward: [(0, '17.214')]
[2024-08-16 06:44:46,885][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000502_2056192.pth...
[2024-08-16 06:44:47,041][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth
[2024-08-16 06:44:47,053][08179] Saving new best policy, reward=17.214!
[2024-08-16 06:44:51,866][00769] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 3596.1). Total num frames: 2068480. Throughput: 0: 842.2. Samples: 517364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:44:51,869][00769] Avg episode reward: [(0, '17.443')]
[2024-08-16 06:44:51,873][08179] Saving new best policy, reward=17.443!
[2024-08-16 06:44:56,455][08192] Updated weights for policy 0, policy_version 510 (0.0037)
[2024-08-16 06:44:56,864][00769] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 2088960. Throughput: 0: 847.7. Samples: 520000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:44:56,868][00769] Avg episode reward: [(0, '18.521')]
[2024-08-16 06:44:56,880][08179] Saving new best policy, reward=18.521!
[2024-08-16 06:45:01,864][00769] Fps is (10 sec: 4096.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 2109440. Throughput: 0: 895.8. Samples: 526628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:45:01,867][00769] Avg episode reward: [(0, '19.058')]
[2024-08-16 06:45:01,871][08179] Saving new best policy, reward=19.058!
[2024-08-16 06:45:06,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 2125824. Throughput: 0: 861.5. Samples: 531648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:45:06,873][00769] Avg episode reward: [(0, '19.304')]
[2024-08-16 06:45:06,891][08179] Saving new best policy, reward=19.304!
[2024-08-16 06:45:08,047][08192] Updated weights for policy 0, policy_version 520 (0.0019)
[2024-08-16 06:45:11,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 2142208. Throughput: 0: 830.5. Samples: 533652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:45:11,866][00769] Avg episode reward: [(0, '19.639')]
[2024-08-16 06:45:11,869][08179] Saving new best policy, reward=19.639!
[2024-08-16 06:45:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 2162688. Throughput: 0: 888.1. Samples: 539850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:45:16,867][00769] Avg episode reward: [(0, '20.686')]
[2024-08-16 06:45:16,876][08179] Saving new best policy, reward=20.686!
[2024-08-16 06:45:18,029][08192] Updated weights for policy 0, policy_version 530 (0.0017)
[2024-08-16 06:45:21,866][00769] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 2183168. Throughput: 0: 954.4. Samples: 546218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:21,874][00769] Avg episode reward: [(0, '19.606')]
[2024-08-16 06:45:26,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3596.1). Total num frames: 2195456. Throughput: 0: 922.7. Samples: 548098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:45:26,869][00769] Avg episode reward: [(0, '19.629')]
[2024-08-16 06:45:30,011][08192] Updated weights for policy 0, policy_version 540 (0.0016)
[2024-08-16 06:45:31,864][00769] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2220032. Throughput: 0: 893.6. Samples: 553538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:31,867][00769] Avg episode reward: [(0, '18.987')]
[2024-08-16 06:45:36,865][00769] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3610.0). Total num frames: 2240512. Throughput: 0: 949.5. Samples: 560090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:36,867][00769] Avg episode reward: [(0, '18.802')]
[2024-08-16 06:45:40,613][08192] Updated weights for policy 0, policy_version 550 (0.0026)
[2024-08-16 06:45:41,868][00769] Fps is (10 sec: 3275.6, 60 sec: 3617.9, 300 sec: 3610.0). Total num frames: 2252800. Throughput: 0: 949.5. Samples: 562732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:41,870][00769] Avg episode reward: [(0, '18.865')]
[2024-08-16 06:45:46,864][00769] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 2273280. Throughput: 0: 898.3. Samples: 567052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:46,866][00769] Avg episode reward: [(0, '19.848')]
[2024-08-16 06:45:51,238][08192] Updated weights for policy 0, policy_version 560 (0.0013)
[2024-08-16 06:45:51,864][00769] Fps is (10 sec: 4097.5, 60 sec: 3754.8, 300 sec: 3596.1). Total num frames: 2293760. Throughput: 0: 937.2. Samples: 573820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-08-16 06:45:51,867][00769] Avg episode reward: [(0, '19.983')]
[2024-08-16 06:45:56,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2314240. Throughput: 0: 968.1. Samples: 577216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:45:56,871][00769] Avg episode reward: [(0, '19.657')]
[2024-08-16 06:46:01,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 2326528. Throughput: 0: 930.1. Samples: 581704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:46:01,866][00769] Avg episode reward: [(0, '19.546')]
[2024-08-16 06:46:03,084][08192] Updated weights for policy 0, policy_version 570 (0.0014)
[2024-08-16 06:46:06,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 2351104. Throughput: 0: 919.2. Samples: 587580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-08-16 06:46:06,871][00769] Avg episode reward: [(0, '18.089')]
[2024-08-16 06:46:11,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3610.0). Total num frames: 2371584. Throughput: 0: 955.1. Samples: 591076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:46:11,867][00769] Avg episode reward: [(0, '17.579')]
[2024-08-16 06:46:12,075][08192] Updated weights for policy 0, policy_version 580 (0.0018)
[2024-08-16 06:46:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 2387968. Throughput: 0: 954.0. Samples: 596466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:46:16,868][00769] Avg episode reward: [(0, '17.364')]
[2024-08-16 06:46:21,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 2404352. Throughput: 0: 917.6. Samples: 601382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:46:21,867][00769] Avg episode reward: [(0, '17.216')]
[2024-08-16 06:46:23,959][08192] Updated weights for policy 0, policy_version 590 (0.0018)
[2024-08-16 06:46:26,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3651.7). Total num frames: 2428928. Throughput: 0: 932.7. Samples: 604702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:46:26,875][00769] Avg episode reward: [(0, '17.808')]
[2024-08-16 06:46:31,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2445312. Throughput: 0: 981.2. Samples: 611208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:46:31,868][00769] Avg episode reward: [(0, '18.314')]
[2024-08-16 06:46:35,492][08192] Updated weights for policy 0, policy_version 600 (0.0021)
[2024-08-16 06:46:36,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2457600. Throughput: 0: 920.6. Samples: 615248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:46:36,867][00769] Avg episode reward: [(0, '17.836')]
[2024-08-16 06:46:41,866][00769] Fps is (10 sec: 3685.7, 60 sec: 3823.0, 300 sec: 3651.7). Total num frames: 2482176. Throughput: 0: 912.6. Samples: 618284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:46:41,869][00769] Avg episode reward: [(0, '18.430')]
[2024-08-16 06:46:45,212][08192] Updated weights for policy 0, policy_version 610 (0.0021)
[2024-08-16 06:46:46,865][00769] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2502656. Throughput: 0: 963.5. Samples: 625060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:46:46,868][00769] Avg episode reward: [(0, '18.708')]
[2024-08-16 06:46:46,952][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000612_2506752.pth...
[2024-08-16 06:46:47,158][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000396_1622016.pth
[2024-08-16 06:46:51,864][00769] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2519040. Throughput: 0: 942.1. Samples: 629974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:46:51,867][00769] Avg episode reward: [(0, '19.059')]
[2024-08-16 06:46:56,864][00769] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2535424. Throughput: 0: 912.0. Samples: 632116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:46:56,871][00769] Avg episode reward: [(0, '19.773')]
[2024-08-16 06:46:57,228][08192] Updated weights for policy 0, policy_version 620 (0.0035)
[2024-08-16 06:47:01,864][00769] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3665.6). Total num frames: 2560000. Throughput: 0: 937.5. Samples: 638652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:47:01,872][00769] Avg episode reward: [(0, '20.336')]
[2024-08-16 06:47:06,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2576384. Throughput: 0: 963.7. Samples: 644748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:47:06,871][00769] Avg episode reward: [(0, '21.887')]
[2024-08-16 06:47:06,885][08179] Saving new best policy, reward=21.887!
[2024-08-16 06:47:07,302][08192] Updated weights for policy 0, policy_version 630 (0.0021)
[2024-08-16 06:47:11,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2592768. Throughput: 0: 932.2. Samples: 646650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:47:11,871][00769] Avg episode reward: [(0, '22.773')]
[2024-08-16 06:47:11,876][08179] Saving new best policy, reward=22.773!
[2024-08-16 06:47:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2613248. Throughput: 0: 909.0. Samples: 652114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:47:16,867][00769] Avg episode reward: [(0, '22.928')]
[2024-08-16 06:47:16,877][08179] Saving new best policy, reward=22.928!
[2024-08-16 06:47:18,479][08192] Updated weights for policy 0, policy_version 640 (0.0016)
[2024-08-16 06:47:21,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2633728. Throughput: 0: 968.1. Samples: 658814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:47:21,867][00769] Avg episode reward: [(0, '21.974')]
[2024-08-16 06:47:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2650112. Throughput: 0: 954.9. Samples: 661252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:47:26,871][00769] Avg episode reward: [(0, '22.095')]
[2024-08-16 06:47:30,382][08192] Updated weights for policy 0, policy_version 650 (0.0037)
[2024-08-16 06:47:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2666496. Throughput: 0: 904.6. Samples: 665768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:47:31,871][00769] Avg episode reward: [(0, '21.720')]
[2024-08-16 06:47:36,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2678784. Throughput: 0: 894.1. Samples: 670208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:47:36,871][00769] Avg episode reward: [(0, '21.230')]
[2024-08-16 06:47:41,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3637.8). Total num frames: 2695168. Throughput: 0: 894.6. Samples: 672374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:47:41,867][00769] Avg episode reward: [(0, '22.070')]
[2024-08-16 06:47:44,858][08192] Updated weights for policy 0, policy_version 660 (0.0028)
[2024-08-16 06:47:46,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3610.0). Total num frames: 2707456. Throughput: 0: 841.0. Samples: 676496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:47:46,867][00769] Avg episode reward: [(0, '21.751')]
[2024-08-16 06:47:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2732032. Throughput: 0: 840.8. Samples: 682582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:47:51,872][00769] Avg episode reward: [(0, '21.859')]
[2024-08-16 06:47:54,573][08192] Updated weights for policy 0, policy_version 670 (0.0022)
[2024-08-16 06:47:56,864][00769] Fps is (10 sec: 4505.7, 60 sec: 3618.1, 300 sec: 3651.8). Total num frames: 2752512. Throughput: 0: 872.8. Samples: 685926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:47:56,867][00769] Avg episode reward: [(0, '19.736')]
[2024-08-16 06:48:01,866][00769] Fps is (10 sec: 3276.3, 60 sec: 3413.2, 300 sec: 3637.8). Total num frames: 2764800. Throughput: 0: 869.7. Samples: 691252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:48:01,869][00769] Avg episode reward: [(0, '18.751')]
[2024-08-16 06:48:06,560][08192] Updated weights for policy 0, policy_version 680 (0.0027)
[2024-08-16 06:48:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 2785280. Throughput: 0: 833.6. Samples: 696328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:48:06,867][00769] Avg episode reward: [(0, '16.752')]
[2024-08-16 06:48:11,864][00769] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 2805760. Throughput: 0: 852.7. Samples: 699624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:11,871][00769] Avg episode reward: [(0, '16.998')]
[2024-08-16 06:48:16,803][08192] Updated weights for policy 0, policy_version 690 (0.0019)
[2024-08-16 06:48:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 2826240. Throughput: 0: 893.8. Samples: 705990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:16,871][00769] Avg episode reward: [(0, '17.996')]
[2024-08-16 06:48:21,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3637.8). Total num frames: 2838528. Throughput: 0: 887.3. Samples: 710136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:21,875][00769] Avg episode reward: [(0, '19.029')]
[2024-08-16 06:48:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 2863104. Throughput: 0: 913.0. Samples: 713460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:26,867][00769] Avg episode reward: [(0, '20.136')]
[2024-08-16 06:48:27,402][08192] Updated weights for policy 0, policy_version 700 (0.0014)
[2024-08-16 06:48:31,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2883584. Throughput: 0: 971.3. Samples: 720206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:31,872][00769] Avg episode reward: [(0, '19.310')]
[2024-08-16 06:48:36,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2895872. Throughput: 0: 938.3. Samples: 724806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:48:36,871][00769] Avg episode reward: [(0, '19.277')]
[2024-08-16 06:48:39,458][08192] Updated weights for policy 0, policy_version 710 (0.0018)
[2024-08-16 06:48:41,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2916352. Throughput: 0: 913.7. Samples: 727042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:48:41,869][00769] Avg episode reward: [(0, '20.056')]
[2024-08-16 06:48:46,867][00769] Fps is (10 sec: 4504.4, 60 sec: 3891.0, 300 sec: 3665.5). Total num frames: 2940928. Throughput: 0: 945.0. Samples: 733776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:48:46,873][00769] Avg episode reward: [(0, '19.831')]
[2024-08-16 06:48:46,885][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000718_2940928.pth...
[2024-08-16 06:48:47,011][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000502_2056192.pth
[2024-08-16 06:48:48,512][08192] Updated weights for policy 0, policy_version 720 (0.0020)
[2024-08-16 06:48:51,866][00769] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 2957312. Throughput: 0: 959.7. Samples: 739516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:51,869][00769] Avg episode reward: [(0, '20.021')]
[2024-08-16 06:48:56,865][00769] Fps is (10 sec: 2867.9, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2969600. Throughput: 0: 932.4. Samples: 741582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:48:56,872][00769] Avg episode reward: [(0, '20.098')]
[2024-08-16 06:49:00,563][08192] Updated weights for policy 0, policy_version 730 (0.0015)
[2024-08-16 06:49:01,864][00769] Fps is (10 sec: 3687.1, 60 sec: 3823.0, 300 sec: 3651.7). Total num frames: 2994176. Throughput: 0: 921.6. Samples: 747462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:49:01,869][00769] Avg episode reward: [(0, '21.418')]
[2024-08-16 06:49:06,864][00769] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 3014656. Throughput: 0: 978.4. Samples: 754166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:49:06,868][00769] Avg episode reward: [(0, '21.318')]
[2024-08-16 06:49:11,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3026944. Throughput: 0: 949.6. Samples: 756190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:49:11,869][00769] Avg episode reward: [(0, '21.012')]
[2024-08-16 06:49:12,214][08192] Updated weights for policy 0, policy_version 740 (0.0018)
[2024-08-16 06:49:16,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3047424. Throughput: 0: 907.4. Samples: 761040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:49:16,872][00769] Avg episode reward: [(0, '21.268')]
[2024-08-16 06:49:21,579][08192] Updated weights for policy 0, policy_version 750 (0.0029)
[2024-08-16 06:49:21,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3665.6). Total num frames: 3072000. Throughput: 0: 956.3. Samples: 767838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:49:21,868][00769] Avg episode reward: [(0, '21.620')]
[2024-08-16 06:49:26,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3088384. Throughput: 0: 976.2. Samples: 770972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:49:26,868][00769] Avg episode reward: [(0, '20.921')]
[2024-08-16 06:49:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3104768. Throughput: 0: 919.4. Samples: 775146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:49:31,866][00769] Avg episode reward: [(0, '20.172')]
[2024-08-16 06:49:33,597][08192] Updated weights for policy 0, policy_version 760 (0.0024)
[2024-08-16 06:49:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 3125248. Throughput: 0: 935.7. Samples: 781622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:49:36,869][00769] Avg episode reward: [(0, '20.870')]
[2024-08-16 06:49:41,868][00769] Fps is (10 sec: 4094.5, 60 sec: 3822.7, 300 sec: 3693.3). Total num frames: 3145728. Throughput: 0: 963.0. Samples: 784920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:49:41,870][00769] Avg episode reward: [(0, '19.416')]
[2024-08-16 06:49:43,723][08192] Updated weights for policy 0, policy_version 770 (0.0022)
[2024-08-16 06:49:46,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3707.3). Total num frames: 3162112. Throughput: 0: 944.1. Samples: 789946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:49:46,873][00769] Avg episode reward: [(0, '21.516')]
[2024-08-16 06:49:51,864][00769] Fps is (10 sec: 3687.7, 60 sec: 3754.8, 300 sec: 3707.2). Total num frames: 3182592. Throughput: 0: 916.5. Samples: 795410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:49:51,867][00769] Avg episode reward: [(0, '22.658')]
[2024-08-16 06:49:54,390][08192] Updated weights for policy 0, policy_version 780 (0.0020)
[2024-08-16 06:49:56,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 3203072. Throughput: 0: 946.2. Samples: 798770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:49:56,867][00769] Avg episode reward: [(0, '22.022')]
[2024-08-16 06:50:01,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3219456. Throughput: 0: 969.1. Samples: 804648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:50:01,875][00769] Avg episode reward: [(0, '21.812')]
[2024-08-16 06:50:06,246][08192] Updated weights for policy 0, policy_version 790 (0.0034)
[2024-08-16 06:50:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3235840. Throughput: 0: 920.8. Samples: 809276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:50:06,872][00769] Avg episode reward: [(0, '24.386')]
[2024-08-16 06:50:06,886][08179] Saving new best policy, reward=24.386!
[2024-08-16 06:50:11,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3256320. Throughput: 0: 920.8. Samples: 812408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:50:11,871][00769] Avg episode reward: [(0, '21.743')]
[2024-08-16 06:50:15,689][08192] Updated weights for policy 0, policy_version 800 (0.0028)
[2024-08-16 06:50:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3276800. Throughput: 0: 976.9. Samples: 819106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:50:16,871][00769] Avg episode reward: [(0, '22.172')]
[2024-08-16 06:50:21,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3293184. Throughput: 0: 925.5. Samples: 823268. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:50:21,870][00769] Avg episode reward: [(0, '22.749')]
[2024-08-16 06:50:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3313664. Throughput: 0: 915.6. Samples: 826120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:50:26,867][00769] Avg episode reward: [(0, '22.788')]
[2024-08-16 06:50:27,282][08192] Updated weights for policy 0, policy_version 810 (0.0017)
[2024-08-16 06:50:31,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3338240. Throughput: 0: 954.4. Samples: 832894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:50:31,867][00769] Avg episode reward: [(0, '20.910')]
[2024-08-16 06:50:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 3350528. Throughput: 0: 945.9. Samples: 837974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-08-16 06:50:36,872][00769] Avg episode reward: [(0, '21.976')]
[2024-08-16 06:50:39,377][08192] Updated weights for policy 0, policy_version 820 (0.0022)
[2024-08-16 06:50:41,868][00769] Fps is (10 sec: 2456.7, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3362816. Throughput: 0: 913.5. Samples: 839882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-08-16 06:50:41,871][00769] Avg episode reward: [(0, '21.947')]
[2024-08-16 06:50:46,864][00769] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3375104. Throughput: 0: 870.2. Samples: 843808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:50:46,870][00769] Avg episode reward: [(0, '22.158')]
[2024-08-16 06:50:46,894][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000825_3379200.pth...
[2024-08-16 06:50:47,023][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000612_2506752.pth
[2024-08-16 06:50:51,864][00769] Fps is (10 sec: 3278.0, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3395584. Throughput: 0: 893.8. Samples: 849496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:50:51,871][00769] Avg episode reward: [(0, '22.277')]
[2024-08-16 06:50:52,653][08192] Updated weights for policy 0, policy_version 830 (0.0035)
[2024-08-16 06:50:56,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 3411968. Throughput: 0: 869.6. Samples: 851538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-08-16 06:50:56,868][00769] Avg episode reward: [(0, '22.689')]
[2024-08-16 06:51:01,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3432448. Throughput: 0: 848.5. Samples: 857290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:51:01,866][00769] Avg episode reward: [(0, '22.705')]
[2024-08-16 06:51:03,131][08192] Updated weights for policy 0, policy_version 840 (0.0020)
[2024-08-16 06:51:06,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3452928. Throughput: 0: 906.6. Samples: 864066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:51:06,870][00769] Avg episode reward: [(0, '23.289')]
[2024-08-16 06:51:11,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3469312. Throughput: 0: 890.6. Samples: 866198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:51:11,872][00769] Avg episode reward: [(0, '23.298')]
[2024-08-16 06:51:15,018][08192] Updated weights for policy 0, policy_version 850 (0.0023)
[2024-08-16 06:51:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 3489792. Throughput: 0: 847.7. Samples: 871042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-08-16 06:51:16,867][00769] Avg episode reward: [(0, '22.637')]
[2024-08-16 06:51:21,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3510272. Throughput: 0: 882.0. Samples: 877664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:51:21,867][00769] Avg episode reward: [(0, '22.823')]
[2024-08-16 06:51:24,545][08192] Updated weights for policy 0, policy_version 860 (0.0016)
[2024-08-16 06:51:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3526656. Throughput: 0: 911.7. Samples: 880906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:51:26,867][00769] Avg episode reward: [(0, '23.112')]
[2024-08-16 06:51:31,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3679.5). Total num frames: 3543040. Throughput: 0: 912.7. Samples: 884880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:51:31,869][00769] Avg episode reward: [(0, '22.187')]
[2024-08-16 06:51:36,120][08192] Updated weights for policy 0, policy_version 870 (0.0016)
[2024-08-16 06:51:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3563520. Throughput: 0: 928.8. Samples: 891290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:51:36,869][00769] Avg episode reward: [(0, '22.695')]
[2024-08-16 06:51:41,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 3584000. Throughput: 0: 955.0. Samples: 894512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:51:41,871][00769] Avg episode reward: [(0, '22.991')]
[2024-08-16 06:51:46,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3596288. Throughput: 0: 932.4. Samples: 899250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:51:46,866][00769] Avg episode reward: [(0, '23.134')]
[2024-08-16 06:51:48,154][08192] Updated weights for policy 0, policy_version 880 (0.0015)
[2024-08-16 06:51:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3620864. Throughput: 0: 907.0. Samples: 904882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-08-16 06:51:51,873][00769] Avg episode reward: [(0, '21.210')]
[2024-08-16 06:51:56,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 3641344. Throughput: 0: 934.4. Samples: 908244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:51:56,869][00769] Avg episode reward: [(0, '19.512')]
[2024-08-16 06:51:57,335][08192] Updated weights for policy 0, policy_version 890 (0.0025)
[2024-08-16 06:52:01,866][00769] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 3657728. Throughput: 0: 952.9. Samples: 913926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:52:01,871][00769] Avg episode reward: [(0, '20.098')]
[2024-08-16 06:52:06,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3674112. Throughput: 0: 914.2. Samples: 918804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:52:06,866][00769] Avg episode reward: [(0, '19.478')]
[2024-08-16 06:52:08,930][08192] Updated weights for policy 0, policy_version 900 (0.0012)
[2024-08-16 06:52:11,864][00769] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 3698688. Throughput: 0: 913.8. Samples: 922026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-08-16 06:52:11,866][00769] Avg episode reward: [(0, '19.453')]
[2024-08-16 06:52:16,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3715072. Throughput: 0: 969.8. Samples: 928520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:52:16,868][00769] Avg episode reward: [(0, '20.933')]
[2024-08-16 06:52:20,924][08192] Updated weights for policy 0, policy_version 910 (0.0019)
[2024-08-16 06:52:21,864][00769] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3727360. Throughput: 0: 914.3. Samples: 932432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:52:21,871][00769] Avg episode reward: [(0, '22.017')]
[2024-08-16 06:52:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3751936. Throughput: 0: 913.3. Samples: 935612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:52:26,871][00769] Avg episode reward: [(0, '23.565')]
[2024-08-16 06:52:29,972][08192] Updated weights for policy 0, policy_version 920 (0.0014)
[2024-08-16 06:52:31,865][00769] Fps is (10 sec: 4915.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3776512. Throughput: 0: 964.7. Samples: 942660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:52:31,867][00769] Avg episode reward: [(0, '24.848')]
[2024-08-16 06:52:31,869][08179] Saving new best policy, reward=24.848!
[2024-08-16 06:52:36,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3788800. Throughput: 0: 946.8. Samples: 947486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:52:36,868][00769] Avg episode reward: [(0, '25.332')]
[2024-08-16 06:52:36,883][08179] Saving new best policy, reward=25.332!
[2024-08-16 06:52:41,737][08192] Updated weights for policy 0, policy_version 930 (0.0031)
[2024-08-16 06:52:41,864][00769] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3809280. Throughput: 0: 923.5. Samples: 949800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:52:41,867][00769] Avg episode reward: [(0, '24.718')]
[2024-08-16 06:52:46,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3829760. Throughput: 0: 948.8. Samples: 956622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:52:46,867][00769] Avg episode reward: [(0, '23.177')]
[2024-08-16 06:52:46,879][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000935_3829760.pth...
[2024-08-16 06:52:47,010][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000718_2940928.pth
[2024-08-16 06:52:51,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3846144. Throughput: 0: 964.4. Samples: 962202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:52:51,870][00769] Avg episode reward: [(0, '22.012')]
[2024-08-16 06:52:52,272][08192] Updated weights for policy 0, policy_version 940 (0.0025)
[2024-08-16 06:52:56,864][00769] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3862528. Throughput: 0: 940.4. Samples: 964346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:52:56,866][00769] Avg episode reward: [(0, '22.723')]
[2024-08-16 06:53:01,864][00769] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 3887104. Throughput: 0: 934.1. Samples: 970554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:53:01,872][00769] Avg episode reward: [(0, '21.733')]
[2024-08-16 06:53:02,335][08192] Updated weights for policy 0, policy_version 950 (0.0013)
[2024-08-16 06:53:06,864][00769] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3907584. Throughput: 0: 996.2. Samples: 977260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:53:06,868][00769] Avg episode reward: [(0, '21.882')]
[2024-08-16 06:53:11,867][00769] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3707.2). Total num frames: 3919872. Throughput: 0: 971.6. Samples: 979338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:53:11,869][00769] Avg episode reward: [(0, '21.727')]
[2024-08-16 06:53:14,181][08192] Updated weights for policy 0, policy_version 960 (0.0018)
[2024-08-16 06:53:16,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3944448. Throughput: 0: 930.6. Samples: 984538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:53:16,867][00769] Avg episode reward: [(0, '22.060')]
[2024-08-16 06:53:21,864][00769] Fps is (10 sec: 4506.8, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 3964928. Throughput: 0: 980.3. Samples: 991600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-08-16 06:53:21,871][00769] Avg episode reward: [(0, '20.925')]
[2024-08-16 06:53:22,982][08192] Updated weights for policy 0, policy_version 970 (0.0014)
[2024-08-16 06:53:26,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3981312. Throughput: 0: 993.6. Samples: 994510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-08-16 06:53:26,867][00769] Avg episode reward: [(0, '21.165')]
[2024-08-16 06:53:31,864][00769] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 4001792. Throughput: 0: 942.3. Samples: 999026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-08-16 06:53:31,872][00769] Avg episode reward: [(0, '21.211')]
[2024-08-16 06:53:32,605][08179] Stopping Batcher_0...
[2024-08-16 06:53:32,606][08179] Loop batcher_evt_loop terminating...
[2024-08-16 06:53:32,607][00769] Component Batcher_0 stopped!
[2024-08-16 06:53:32,608][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-08-16 06:53:32,651][00769] Component RolloutWorker_w2 stopped!
[2024-08-16 06:53:32,657][08195] Stopping RolloutWorker_w2...
[2024-08-16 06:53:32,667][08195] Loop rollout_proc2_evt_loop terminating...
[2024-08-16 06:53:32,676][00769] Component RolloutWorker_w3 stopped!
[2024-08-16 06:53:32,676][08196] Stopping RolloutWorker_w3...
[2024-08-16 06:53:32,685][08196] Loop rollout_proc3_evt_loop terminating...
[2024-08-16 06:53:32,691][00769] Component RolloutWorker_w4 stopped!
[2024-08-16 06:53:32,696][08199] Stopping RolloutWorker_w4...
[2024-08-16 06:53:32,685][08192] Weights refcount: 2 0
[2024-08-16 06:53:32,698][08199] Loop rollout_proc4_evt_loop terminating...
[2024-08-16 06:53:32,705][08194] Stopping RolloutWorker_w1...
[2024-08-16 06:53:32,700][08202] Stopping RolloutWorker_w7...
[2024-08-16 06:53:32,700][00769] Component RolloutWorker_w7 stopped!
[2024-08-16 06:53:32,707][00769] Component RolloutWorker_w1 stopped!
[2024-08-16 06:53:32,706][08194] Loop rollout_proc1_evt_loop terminating...
[2024-08-16 06:53:32,714][08192] Stopping InferenceWorker_p0-w0...
[2024-08-16 06:53:32,715][08201] Stopping RolloutWorker_w5...
[2024-08-16 06:53:32,708][08202] Loop rollout_proc7_evt_loop terminating...
[2024-08-16 06:53:32,715][00769] Component InferenceWorker_p0-w0 stopped!
[2024-08-16 06:53:32,717][00769] Component RolloutWorker_w5 stopped!
[2024-08-16 06:53:32,714][08192] Loop inference_proc0-0_evt_loop terminating...
[2024-08-16 06:53:32,724][08201] Loop rollout_proc5_evt_loop terminating...
[2024-08-16 06:53:32,735][00769] Component RolloutWorker_w0 stopped!
[2024-08-16 06:53:32,741][00769] Component RolloutWorker_w6 stopped!
[2024-08-16 06:53:32,746][08200] Stopping RolloutWorker_w6...
[2024-08-16 06:53:32,738][08193] Stopping RolloutWorker_w0...
[2024-08-16 06:53:32,747][08200] Loop rollout_proc6_evt_loop terminating...
[2024-08-16 06:53:32,754][08193] Loop rollout_proc0_evt_loop terminating...
[2024-08-16 06:53:32,773][08179] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000825_3379200.pth
[2024-08-16 06:53:32,795][08179] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-08-16 06:53:33,002][00769] Component LearnerWorker_p0 stopped!
[2024-08-16 06:53:33,012][00769] Waiting for process learner_proc0 to stop...
[2024-08-16 06:53:33,002][08179] Stopping LearnerWorker_p0...
[2024-08-16 06:53:33,019][08179] Loop learner_proc0_evt_loop terminating...
[2024-08-16 06:53:34,508][00769] Waiting for process inference_proc0-0 to join...
[2024-08-16 06:53:34,681][00769] Waiting for process rollout_proc0 to join...
[2024-08-16 06:53:35,870][00769] Waiting for process rollout_proc1 to join...
[2024-08-16 06:53:35,875][00769] Waiting for process rollout_proc2 to join...
[2024-08-16 06:53:35,879][00769] Waiting for process rollout_proc3 to join...
[2024-08-16 06:53:35,884][00769] Waiting for process rollout_proc4 to join...
[2024-08-16 06:53:35,888][00769] Waiting for process rollout_proc5 to join...
[2024-08-16 06:53:35,892][00769] Waiting for process rollout_proc6 to join...
[2024-08-16 06:53:35,896][00769] Waiting for process rollout_proc7 to join...
[2024-08-16 06:53:35,900][00769] Batcher 0 profile tree view:
batching: 26.9923, releasing_batches: 0.0228
[2024-08-16 06:53:35,904][00769] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 478.4115
update_model: 7.7665
weight_update: 0.0013
one_step: 0.0052
handle_policy_step: 573.0195
deserialize: 14.9309, stack: 3.0899, obs_to_device_normalize: 117.8758, forward: 288.9587, send_messages: 28.9819
prepare_outputs: 89.8958
to_cpu: 56.2033
[2024-08-16 06:53:35,905][00769] Learner 0 profile tree view:
misc: 0.0053, prepare_batch: 18.2343
train: 76.4241
epoch_init: 0.0060, minibatch_init: 0.0062, losses_postprocess: 0.6963, kl_divergence: 0.6246, after_optimizer: 33.3536
calculate_losses: 26.6978
losses_init: 0.0100, forward_head: 1.8522, bptt_initial: 17.5608, tail: 1.1508, advantages_returns: 0.3639, losses: 3.0313
bptt: 2.4120
bptt_forward_core: 2.3307
update: 14.3898
clip: 1.4463
[2024-08-16 06:53:35,907][00769] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.4224, enqueue_policy_requests: 121.1035, env_step: 855.0159, overhead: 14.8277, complete_rollouts: 7.3332
save_policy_outputs: 26.7017
split_output_tensors: 9.3441
[2024-08-16 06:53:35,909][00769] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.4683, enqueue_policy_requests: 122.3764, env_step: 850.2403, overhead: 15.6263, complete_rollouts: 6.9698
save_policy_outputs: 26.1566
split_output_tensors: 8.8983
[2024-08-16 06:53:35,911][00769] Loop Runner_EvtLoop terminating...
[2024-08-16 06:53:35,913][00769] Runner profile tree view:
main_loop: 1125.6084
[2024-08-16 06:53:35,914][00769] Collected {0: 4005888}, FPS: 3558.9
[2024-08-16 06:53:47,979][00769] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-08-16 06:53:47,981][00769] Overriding arg 'num_workers' with value 1 passed from command line
[2024-08-16 06:53:47,983][00769] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-08-16 06:53:47,985][00769] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-08-16 06:53:47,987][00769] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-08-16 06:53:47,991][00769] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-08-16 06:53:47,993][00769] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-08-16 06:53:47,995][00769] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-08-16 06:53:47,997][00769] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-08-16 06:53:47,998][00769] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-08-16 06:53:48,000][00769] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-08-16 06:53:48,001][00769] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-08-16 06:53:48,002][00769] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-08-16 06:53:48,003][00769] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-08-16 06:53:48,005][00769] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-08-16 06:53:48,025][00769] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-08-16 06:53:48,028][00769] RunningMeanStd input shape: (3, 72, 128)
[2024-08-16 06:53:48,031][00769] RunningMeanStd input shape: (1,)
[2024-08-16 06:53:48,053][00769] ConvEncoder: input_channels=3
[2024-08-16 06:53:48,256][00769] Conv encoder output size: 512
[2024-08-16 06:53:48,260][00769] Policy head output size: 512
[2024-08-16 06:53:49,934][00769] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-08-16 06:53:50,776][00769] Num frames 100...
[2024-08-16 06:53:50,893][00769] Num frames 200...
[2024-08-16 06:53:51,052][00769] Num frames 300...
[2024-08-16 06:53:51,221][00769] Num frames 400...
[2024-08-16 06:53:51,380][00769] Num frames 500...
[2024-08-16 06:53:51,545][00769] Num frames 600...
[2024-08-16 06:53:51,717][00769] Num frames 700...
[2024-08-16 06:53:51,873][00769] Num frames 800...
[2024-08-16 06:53:52,056][00769] Num frames 900...
[2024-08-16 06:53:52,124][00769] Avg episode rewards: #0: 19.060, true rewards: #0: 9.060
[2024-08-16 06:53:52,127][00769] Avg episode reward: 19.060, avg true_objective: 9.060
[2024-08-16 06:53:52,288][00769] Num frames 1000...
[2024-08-16 06:53:52,455][00769] Num frames 1100...
[2024-08-16 06:53:52,632][00769] Num frames 1200...
[2024-08-16 06:53:52,800][00769] Num frames 1300...
[2024-08-16 06:53:52,964][00769] Num frames 1400...
[2024-08-16 06:53:53,132][00769] Num frames 1500...
[2024-08-16 06:53:53,277][00769] Num frames 1600...
[2024-08-16 06:53:53,395][00769] Num frames 1700...
[2024-08-16 06:53:53,492][00769] Avg episode rewards: #0: 20.165, true rewards: #0: 8.665
[2024-08-16 06:53:53,494][00769] Avg episode reward: 20.165, avg true_objective: 8.665
[2024-08-16 06:53:53,575][00769] Num frames 1800...
[2024-08-16 06:53:53,700][00769] Num frames 1900...
[2024-08-16 06:53:53,817][00769] Num frames 2000...
[2024-08-16 06:53:53,947][00769] Num frames 2100...
[2024-08-16 06:53:54,069][00769] Num frames 2200...
[2024-08-16 06:53:54,189][00769] Num frames 2300...
[2024-08-16 06:53:54,305][00769] Num frames 2400...
[2024-08-16 06:53:54,423][00769] Num frames 2500...
[2024-08-16 06:53:54,547][00769] Avg episode rewards: #0: 19.193, true rewards: #0: 8.527
[2024-08-16 06:53:54,549][00769] Avg episode reward: 19.193, avg true_objective: 8.527
[2024-08-16 06:53:54,603][00769] Num frames 2600...
[2024-08-16 06:53:54,725][00769] Num frames 2700...
[2024-08-16 06:53:54,838][00769] Num frames 2800...
[2024-08-16 06:53:54,959][00769] Num frames 2900...
[2024-08-16 06:53:55,077][00769] Num frames 3000...
[2024-08-16 06:53:55,196][00769] Num frames 3100...
[2024-08-16 06:53:55,315][00769] Num frames 3200...
[2024-08-16 06:53:55,436][00769] Num frames 3300...
[2024-08-16 06:53:55,555][00769] Num frames 3400...
[2024-08-16 06:53:55,681][00769] Num frames 3500...
[2024-08-16 06:53:55,802][00769] Num frames 3600...
[2024-08-16 06:53:55,924][00769] Num frames 3700...
[2024-08-16 06:53:56,042][00769] Num frames 3800...
[2024-08-16 06:53:56,160][00769] Num frames 3900...
[2024-08-16 06:53:56,280][00769] Num frames 4000...
[2024-08-16 06:53:56,401][00769] Num frames 4100...
[2024-08-16 06:53:56,524][00769] Num frames 4200...
[2024-08-16 06:53:56,646][00769] Num frames 4300...
[2024-08-16 06:53:56,816][00769] Avg episode rewards: #0: 27.475, true rewards: #0: 10.975
[2024-08-16 06:53:56,818][00769] Avg episode reward: 27.475, avg true_objective: 10.975
[2024-08-16 06:53:56,832][00769] Num frames 4400...
[2024-08-16 06:53:56,956][00769] Num frames 4500...
[2024-08-16 06:53:57,073][00769] Num frames 4600...
[2024-08-16 06:53:57,195][00769] Num frames 4700...
[2024-08-16 06:53:57,315][00769] Num frames 4800...
[2024-08-16 06:53:57,435][00769] Num frames 4900...
[2024-08-16 06:53:57,555][00769] Num frames 5000...
[2024-08-16 06:53:57,671][00769] Num frames 5100...
[2024-08-16 06:53:57,797][00769] Num frames 5200...
[2024-08-16 06:53:57,921][00769] Num frames 5300...
[2024-08-16 06:53:58,073][00769] Avg episode rewards: #0: 26.364, true rewards: #0: 10.764
[2024-08-16 06:53:58,074][00769] Avg episode reward: 26.364, avg true_objective: 10.764
[2024-08-16 06:53:58,098][00769] Num frames 5400...
[2024-08-16 06:53:58,214][00769] Num frames 5500...
[2024-08-16 06:53:58,333][00769] Num frames 5600...
[2024-08-16 06:53:58,451][00769] Num frames 5700...
[2024-08-16 06:53:58,570][00769] Num frames 5800...
[2024-08-16 06:53:58,689][00769] Num frames 5900...
[2024-08-16 06:53:58,813][00769] Num frames 6000...
[2024-08-16 06:53:58,942][00769] Num frames 6100...
[2024-08-16 06:53:59,060][00769] Num frames 6200...
[2024-08-16 06:53:59,173][00769] Avg episode rewards: #0: 25.077, true rewards: #0: 10.410
[2024-08-16 06:53:59,174][00769] Avg episode reward: 25.077, avg true_objective: 10.410
[2024-08-16 06:53:59,241][00769] Num frames 6300...
[2024-08-16 06:53:59,358][00769] Num frames 6400...
[2024-08-16 06:53:59,480][00769] Num frames 6500...
[2024-08-16 06:53:59,601][00769] Num frames 6600...
[2024-08-16 06:53:59,721][00769] Num frames 6700...
[2024-08-16 06:53:59,847][00769] Num frames 6800...
[2024-08-16 06:53:59,979][00769] Num frames 6900...
[2024-08-16 06:54:00,066][00769] Avg episode rewards: #0: 23.753, true rewards: #0: 9.896
[2024-08-16 06:54:00,068][00769] Avg episode reward: 23.753, avg true_objective: 9.896
[2024-08-16 06:54:00,157][00769] Num frames 7000...
[2024-08-16 06:54:00,274][00769] Num frames 7100...
[2024-08-16 06:54:00,405][00769] Num frames 7200...
[2024-08-16 06:54:00,523][00769] Num frames 7300...
[2024-08-16 06:54:00,641][00769] Num frames 7400...
[2024-08-16 06:54:00,797][00769] Avg episode rewards: #0: 21.859, true rewards: #0: 9.359
[2024-08-16 06:54:00,798][00769] Avg episode reward: 21.859, avg true_objective: 9.359
[2024-08-16 06:54:00,819][00769] Num frames 7500...
[2024-08-16 06:54:00,945][00769] Num frames 7600...
[2024-08-16 06:54:01,064][00769] Num frames 7700...
[2024-08-16 06:54:01,180][00769] Num frames 7800...
[2024-08-16 06:54:01,297][00769] Num frames 7900...
[2024-08-16 06:54:01,416][00769] Num frames 8000...
[2024-08-16 06:54:01,536][00769] Num frames 8100...
[2024-08-16 06:54:01,708][00769] Avg episode rewards: #0: 20.999, true rewards: #0: 9.110
[2024-08-16 06:54:01,710][00769] Avg episode reward: 20.999, avg true_objective: 9.110
[2024-08-16 06:54:01,715][00769] Num frames 8200...
[2024-08-16 06:54:01,841][00769] Num frames 8300...
[2024-08-16 06:54:01,965][00769] Num frames 8400...
[2024-08-16 06:54:02,081][00769] Num frames 8500...
[2024-08-16 06:54:02,201][00769] Num frames 8600...
[2024-08-16 06:54:02,319][00769] Num frames 8700...
[2024-08-16 06:54:02,437][00769] Num frames 8800...
[2024-08-16 06:54:02,556][00769] Num frames 8900...
[2024-08-16 06:54:02,678][00769] Num frames 9000...
[2024-08-16 06:54:02,794][00769] Num frames 9100...
[2024-08-16 06:54:02,927][00769] Num frames 9200...
[2024-08-16 06:54:03,051][00769] Num frames 9300...
[2024-08-16 06:54:03,171][00769] Num frames 9400...
[2024-08-16 06:54:03,324][00769] Num frames 9500...
[2024-08-16 06:54:03,499][00769] Num frames 9600...
[2024-08-16 06:54:03,663][00769] Num frames 9700...
[2024-08-16 06:54:03,832][00769] Num frames 9800...
[2024-08-16 06:54:03,997][00769] Num frames 9900...
[2024-08-16 06:54:04,164][00769] Num frames 10000...
[2024-08-16 06:54:04,320][00769] Num frames 10100...
[2024-08-16 06:54:04,459][00769] Avg episode rewards: #0: 23.751, true rewards: #0: 10.151
[2024-08-16 06:54:04,461][00769] Avg episode reward: 23.751, avg true_objective: 10.151
[2024-08-16 06:55:05,517][00769] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-08-16 06:57:21,162][00769] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-08-16 06:57:21,168][00769] Overriding arg 'num_workers' with value 1 passed from command line
[2024-08-16 06:57:21,169][00769] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-08-16 06:57:21,172][00769] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-08-16 06:57:21,173][00769] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-08-16 06:57:21,175][00769] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-08-16 06:57:21,177][00769] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-08-16 06:57:21,178][00769] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-08-16 06:57:21,179][00769] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-08-16 06:57:21,180][00769] Adding new argument 'hf_repository'='gugaaa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-08-16 06:57:21,181][00769] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-08-16 06:57:21,182][00769] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-08-16 06:57:21,183][00769] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-08-16 06:57:21,184][00769] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-08-16 06:57:21,185][00769] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-08-16 06:57:21,193][00769] RunningMeanStd input shape: (3, 72, 128)
[2024-08-16 06:57:21,202][00769] RunningMeanStd input shape: (1,)
[2024-08-16 06:57:21,215][00769] ConvEncoder: input_channels=3
[2024-08-16 06:57:21,252][00769] Conv encoder output size: 512
[2024-08-16 06:57:21,254][00769] Policy head output size: 512
[2024-08-16 06:57:21,272][00769] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-08-16 06:57:21,732][00769] Num frames 100...
[2024-08-16 06:57:21,847][00769] Num frames 200...
[2024-08-16 06:57:21,985][00769] Num frames 300...
[2024-08-16 06:57:22,114][00769] Num frames 400...
[2024-08-16 06:57:22,238][00769] Num frames 500...
[2024-08-16 06:57:22,353][00769] Num frames 600...
[2024-08-16 06:57:22,476][00769] Num frames 700...
[2024-08-16 06:57:22,594][00769] Num frames 800...
[2024-08-16 06:57:22,712][00769] Num frames 900...
[2024-08-16 06:57:22,840][00769] Num frames 1000...
[2024-08-16 06:57:23,007][00769] Num frames 1100...
[2024-08-16 06:57:23,178][00769] Num frames 1200...
[2024-08-16 06:57:23,348][00769] Num frames 1300...
[2024-08-16 06:57:23,510][00769] Num frames 1400...
[2024-08-16 06:57:23,670][00769] Num frames 1500...
[2024-08-16 06:57:23,829][00769] Num frames 1600...
[2024-08-16 06:57:23,988][00769] Num frames 1700...
[2024-08-16 06:57:24,154][00769] Num frames 1800...
[2024-08-16 06:57:24,335][00769] Num frames 1900...
[2024-08-16 06:57:24,505][00769] Num frames 2000...
[2024-08-16 06:57:24,679][00769] Num frames 2100...
[2024-08-16 06:57:24,735][00769] Avg episode rewards: #0: 57.999, true rewards: #0: 21.000
[2024-08-16 06:57:24,737][00769] Avg episode reward: 57.999, avg true_objective: 21.000
[2024-08-16 06:57:24,916][00769] Num frames 2200...
[2024-08-16 06:57:25,077][00769] Num frames 2300...
[2024-08-16 06:57:25,193][00769] Num frames 2400...
[2024-08-16 06:57:25,318][00769] Num frames 2500...
[2024-08-16 06:57:25,441][00769] Num frames 2600...
[2024-08-16 06:57:25,560][00769] Num frames 2700...
[2024-08-16 06:57:25,677][00769] Num frames 2800...
[2024-08-16 06:57:25,793][00769] Num frames 2900...
[2024-08-16 06:57:25,915][00769] Num frames 3000...
[2024-08-16 06:57:26,043][00769] Avg episode rewards: #0: 37.800, true rewards: #0: 15.300
[2024-08-16 06:57:26,046][00769] Avg episode reward: 37.800, avg true_objective: 15.300
[2024-08-16 06:57:26,093][00769] Num frames 3100...
[2024-08-16 06:57:26,206][00769] Num frames 3200...
[2024-08-16 06:57:26,340][00769] Num frames 3300...
[2024-08-16 06:57:26,456][00769] Num frames 3400...
[2024-08-16 06:57:26,573][00769] Num frames 3500...
[2024-08-16 06:57:26,692][00769] Num frames 3600...
[2024-08-16 06:57:26,811][00769] Num frames 3700...
[2024-08-16 06:57:26,934][00769] Num frames 3800...
[2024-08-16 06:57:27,053][00769] Num frames 3900...
[2024-08-16 06:57:27,137][00769] Avg episode rewards: #0: 31.746, true rewards: #0: 13.080
[2024-08-16 06:57:27,138][00769] Avg episode reward: 31.746, avg true_objective: 13.080
[2024-08-16 06:57:27,235][00769] Num frames 4000...
[2024-08-16 06:57:27,360][00769] Num frames 4100...
[2024-08-16 06:57:27,477][00769] Num frames 4200...
[2024-08-16 06:57:27,593][00769] Num frames 4300...
[2024-08-16 06:57:27,710][00769] Num frames 4400...
[2024-08-16 06:57:27,825][00769] Num frames 4500...
[2024-08-16 06:57:27,948][00769] Num frames 4600...
[2024-08-16 06:57:28,063][00769] Num frames 4700...
[2024-08-16 06:57:28,185][00769] Num frames 4800...
[2024-08-16 06:57:28,305][00769] Num frames 4900...
[2024-08-16 06:57:28,428][00769] Num frames 5000...
[2024-08-16 06:57:28,547][00769] Num frames 5100...
[2024-08-16 06:57:28,664][00769] Num frames 5200...
[2024-08-16 06:57:28,783][00769] Num frames 5300...
[2024-08-16 06:57:28,919][00769] Num frames 5400...
[2024-08-16 06:57:29,043][00769] Num frames 5500...
[2024-08-16 06:57:29,164][00769] Num frames 5600...
[2024-08-16 06:57:29,287][00769] Num frames 5700...
[2024-08-16 06:57:29,413][00769] Num frames 5800...
[2024-08-16 06:57:29,531][00769] Num frames 5900...
[2024-08-16 06:57:29,669][00769] Avg episode rewards: #0: 37.917, true rewards: #0: 14.918
[2024-08-16 06:57:29,670][00769] Avg episode reward: 37.917, avg true_objective: 14.918
[2024-08-16 06:57:29,711][00769] Num frames 6000...
[2024-08-16 06:57:29,828][00769] Num frames 6100...
[2024-08-16 06:57:29,950][00769] Num frames 6200...
[2024-08-16 06:57:30,069][00769] Num frames 6300...
[2024-08-16 06:57:30,186][00769] Num frames 6400...
[2024-08-16 06:57:30,306][00769] Num frames 6500...
[2024-08-16 06:57:30,430][00769] Num frames 6600...
[2024-08-16 06:57:30,553][00769] Num frames 6700...
[2024-08-16 06:57:30,669][00769] Num frames 6800...
[2024-08-16 06:57:30,784][00769] Num frames 6900...
[2024-08-16 06:57:30,900][00769] Num frames 7000...
[2024-08-16 06:57:31,020][00769] Num frames 7100...
[2024-08-16 06:57:31,138][00769] Avg episode rewards: #0: 35.502, true rewards: #0: 14.302
[2024-08-16 06:57:31,140][00769] Avg episode reward: 35.502, avg true_objective: 14.302
[2024-08-16 06:57:31,200][00769] Num frames 7200...
[2024-08-16 06:57:31,316][00769] Num frames 7300...
[2024-08-16 06:57:31,443][00769] Num frames 7400...
[2024-08-16 06:57:31,561][00769] Num frames 7500...
[2024-08-16 06:57:31,679][00769] Num frames 7600...
[2024-08-16 06:57:31,793][00769] Num frames 7700...
[2024-08-16 06:57:31,914][00769] Num frames 7800...
[2024-08-16 06:57:32,041][00769] Num frames 7900...
[2024-08-16 06:57:32,162][00769] Num frames 8000...
[2024-08-16 06:57:32,281][00769] Num frames 8100...
[2024-08-16 06:57:32,401][00769] Num frames 8200...
[2024-08-16 06:57:32,531][00769] Num frames 8300...
[2024-08-16 06:57:32,650][00769] Num frames 8400...
[2024-08-16 06:57:32,769][00769] Num frames 8500...
[2024-08-16 06:57:32,886][00769] Num frames 8600...
[2024-08-16 06:57:33,012][00769] Num frames 8700...
[2024-08-16 06:57:33,131][00769] Num frames 8800...
[2024-08-16 06:57:33,239][00769] Avg episode rewards: #0: 37.078, true rewards: #0: 14.745
[2024-08-16 06:57:33,242][00769] Avg episode reward: 37.078, avg true_objective: 14.745
[2024-08-16 06:57:33,305][00769] Num frames 8900...
[2024-08-16 06:57:33,422][00769] Num frames 9000...
[2024-08-16 06:57:33,550][00769] Num frames 9100...
[2024-08-16 06:57:33,663][00769] Num frames 9200...
[2024-08-16 06:57:33,779][00769] Num frames 9300...
[2024-08-16 06:57:33,896][00769] Num frames 9400...
[2024-08-16 06:57:34,016][00769] Num frames 9500...
[2024-08-16 06:57:34,133][00769] Num frames 9600...
[2024-08-16 06:57:34,251][00769] Num frames 9700...
[2024-08-16 06:57:34,366][00769] Num frames 9800...
[2024-08-16 06:57:34,483][00769] Avg episode rewards: #0: 35.077, true rewards: #0: 14.077
[2024-08-16 06:57:34,485][00769] Avg episode reward: 35.077, avg true_objective: 14.077
[2024-08-16 06:57:34,543][00769] Num frames 9900...
[2024-08-16 06:57:34,662][00769] Num frames 10000...
[2024-08-16 06:57:34,782][00769] Num frames 10100...
[2024-08-16 06:57:34,902][00769] Num frames 10200...
[2024-08-16 06:57:35,026][00769] Num frames 10300...
[2024-08-16 06:57:35,189][00769] Num frames 10400...
[2024-08-16 06:57:35,353][00769] Num frames 10500...
[2024-08-16 06:57:35,519][00769] Num frames 10600...
[2024-08-16 06:57:35,681][00769] Num frames 10700...
[2024-08-16 06:57:35,838][00769] Num frames 10800...
[2024-08-16 06:57:35,919][00769] Avg episode rewards: #0: 33.267, true rewards: #0: 13.517
[2024-08-16 06:57:35,921][00769] Avg episode reward: 33.267, avg true_objective: 13.517
[2024-08-16 06:57:36,059][00769] Num frames 10900...
[2024-08-16 06:57:36,220][00769] Num frames 11000...
[2024-08-16 06:57:36,388][00769] Num frames 11100...
[2024-08-16 06:57:36,552][00769] Num frames 11200...
[2024-08-16 06:57:36,719][00769] Num frames 11300...
[2024-08-16 06:57:36,883][00769] Num frames 11400...
[2024-08-16 06:57:37,050][00769] Num frames 11500...
[2024-08-16 06:57:37,224][00769] Num frames 11600...
[2024-08-16 06:57:37,375][00769] Num frames 11700...
[2024-08-16 06:57:37,495][00769] Num frames 11800...
[2024-08-16 06:57:37,623][00769] Num frames 11900...
[2024-08-16 06:57:37,742][00769] Num frames 12000...
[2024-08-16 06:57:37,858][00769] Num frames 12100...
[2024-08-16 06:57:37,980][00769] Num frames 12200...
[2024-08-16 06:57:38,104][00769] Num frames 12300...
[2024-08-16 06:57:38,222][00769] Num frames 12400...
[2024-08-16 06:57:38,340][00769] Num frames 12500...
[2024-08-16 06:57:38,452][00769] Avg episode rewards: #0: 34.720, true rewards: #0: 13.942
[2024-08-16 06:57:38,453][00769] Avg episode reward: 34.720, avg true_objective: 13.942
[2024-08-16 06:57:38,521][00769] Num frames 12600...
[2024-08-16 06:57:38,648][00769] Num frames 12700...
[2024-08-16 06:57:38,764][00769] Num frames 12800...
[2024-08-16 06:57:38,881][00769] Num frames 12900...
[2024-08-16 06:57:39,009][00769] Num frames 13000...
[2024-08-16 06:57:39,127][00769] Num frames 13100...
[2024-08-16 06:57:39,246][00769] Num frames 13200...
[2024-08-16 06:57:39,364][00769] Avg episode rewards: #0: 32.852, true rewards: #0: 13.252
[2024-08-16 06:57:39,366][00769] Avg episode reward: 32.852, avg true_objective: 13.252
[2024-08-16 06:58:56,488][00769] Replay video saved to /content/train_dir/default_experiment/replay.mp4!