output-7b-26k-lora / wandb /debug-internal.log
zzz99's picture
Training in progress, epoch 1
ca0cb2e verified
2024-02-08 18:53:11,296 INFO StreamThr :1516 [internal.py:wandb_internal():86] W&B internal server running at pid: 1516, started at: 2024-02-08 18:53:11.296140
2024-02-08 18:53:11,300 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: status
2024-02-08 18:53:11,301 INFO WriterThread:1516 [datastore.py:open_for_write():85] open: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/run-5uym8l7w.wandb
2024-02-08 18:53:11,302 DEBUG SenderThread:1516 [sender.py:send():382] send: header
2024-02-08 18:53:11,302 DEBUG SenderThread:1516 [sender.py:send():382] send: run
2024-02-08 18:53:11,515 INFO SenderThread:1516 [dir_watcher.py:__init__():211] watching files in: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files
2024-02-08 18:53:11,515 INFO SenderThread:1516 [sender.py:_start_run_threads():1136] run started: 5uym8l7w with start time 1707418391.295585
2024-02-08 18:53:11,519 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: check_version
2024-02-08 18:53:11,520 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: check_version
2024-02-08 18:53:11,603 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: run_start
2024-02-08 18:53:11,632 DEBUG HandlerThread:1516 [system_info.py:__init__():32] System info init
2024-02-08 18:53:11,633 DEBUG HandlerThread:1516 [system_info.py:__init__():47] System info init done
2024-02-08 18:53:11,633 INFO HandlerThread:1516 [system_monitor.py:start():194] Starting system monitor
2024-02-08 18:53:11,633 INFO SystemMonitor:1516 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-02-08 18:53:11,633 INFO HandlerThread:1516 [system_monitor.py:probe():214] Collecting system info
2024-02-08 18:53:11,634 INFO SystemMonitor:1516 [interfaces.py:start():190] Started cpu monitoring
2024-02-08 18:53:11,635 INFO SystemMonitor:1516 [interfaces.py:start():190] Started disk monitoring
2024-02-08 18:53:11,636 INFO SystemMonitor:1516 [interfaces.py:start():190] Started gpu monitoring
2024-02-08 18:53:11,637 INFO SystemMonitor:1516 [interfaces.py:start():190] Started memory monitoring
2024-02-08 18:53:11,637 INFO SystemMonitor:1516 [interfaces.py:start():190] Started network monitoring
2024-02-08 18:53:11,692 DEBUG HandlerThread:1516 [system_info.py:probe():196] Probing system
2024-02-08 18:53:11,694 DEBUG HandlerThread:1516 [gitlib.py:_init_repo():56] git repository is invalid
2024-02-08 18:53:11,694 DEBUG HandlerThread:1516 [system_info.py:probe():244] Probing system done
2024-02-08 18:53:11,694 DEBUG HandlerThread:1516 [system_monitor.py:probe():223] {'os': 'Linux-4.14.336-253.554.amzn2.x86_64-x86_64-with-glibc2.35', 'python': '3.10.13', 'heartbeatAt': '2024-02-08T18:53:11.692683', 'startedAt': '2024-02-08T18:53:11.292051', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/sagemaker-user/output-7b-26k-lora/../lora_finetuning_push_to_hub_save_local_latest.py', 'codePathLocal': None, 'host': 'default', 'username': 'sagemaker-user', 'executable': '/opt/conda/bin/python3', 'cpu_count': 96, 'cpu_count_logical': 192, 'cpu_freq': {'current': 3259.3440989583337, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 3299.457, 'min': 0.0, 'max': 0.0}, {'current': 3299.806, 'min': 0.0, 'max': 0.0}, {'current': 3205.787, 'min': 0.0, 'max': 0.0}, {'current': 3178.002, 'min': 0.0, 'max': 0.0}, {'current': 3200.706, 'min': 0.0, 'max': 0.0}, {'current': 3196.209, 'min': 0.0, 'max': 0.0}, {'current': 3188.711, 'min': 0.0, 'max': 0.0}, {'current': 3189.097, 'min': 0.0, 'max': 0.0}, {'current': 3206.992, 'min': 0.0, 'max': 0.0}, {'current': 3200.764, 'min': 0.0, 'max': 0.0}, {'current': 2810.939, 'min': 0.0, 'max': 0.0}, {'current': 2847.942, 'min': 0.0, 'max': 0.0}, {'current': 2923.114, 'min': 0.0, 'max': 0.0}, {'current': 3028.23, 'min': 0.0, 'max': 0.0}, {'current': 3012.472, 'min': 0.0, 'max': 0.0}, {'current': 3044.914, 'min': 0.0, 'max': 0.0}, {'current': 2953.37, 'min': 0.0, 'max': 0.0}, {'current': 2957.586, 'min': 0.0, 'max': 0.0}, {'current': 2984.294, 'min': 0.0, 'max': 0.0}, {'current': 2961.352, 'min': 0.0, 'max': 0.0}, {'current': 2901.559, 'min': 0.0, 'max': 0.0}, {'current': 2801.726, 'min': 0.0, 'max': 0.0}, {'current': 2985.17, 'min': 0.0, 'max': 0.0}, {'current': 2963.11, 'min': 0.0, 'max': 0.0}, {'current': 2912.001, 'min': 0.0, 'max': 0.0}, {'current': 2965.712, 'min': 0.0, 'max': 0.0}, {'current': 2966.821, 'min': 0.0, 'max': 0.0}, {'current': 2871.172, 'min': 0.0, 'max': 0.0}, {'current': 2974.758, 'min': 0.0, 'max': 0.0}, {'current': 2989.099, 'min': 0.0, 'max': 0.0}, {'current': 2948.999, 'min': 0.0, 'max': 0.0}, {'current': 2895.266, 'min': 0.0, 'max': 0.0}, {'current': 3299.988, 'min': 0.0, 'max': 0.0}, {'current': 2924.435, 'min': 0.0, 'max': 0.0}, {'current': 2919.839, 'min': 0.0, 'max': 0.0}, {'current': 2875.943, 'min': 0.0, 'max': 0.0}, {'current': 3300.697, 'min': 0.0, 'max': 0.0}, {'current': 2805.016, 'min': 0.0, 'max': 0.0}, {'current': 3298.583, 'min': 0.0, 'max': 0.0}, {'current': 3298.604, 'min': 0.0, 'max': 0.0}, {'current': 2673.256, 'min': 0.0, 'max': 0.0}, {'current': 3296.503, 'min': 0.0, 'max': 0.0}, {'current': 3139.11, 'min': 0.0, 'max': 0.0}, {'current': 3137.942, 'min': 0.0, 'max': 0.0}, {'current': 2833.969, 'min': 0.0, 'max': 0.0}, {'current': 3153.277, 'min': 0.0, 'max': 0.0}, {'current': 3178.769, 'min': 0.0, 'max': 0.0}, {'current': 3207.604, 'min': 0.0, 'max': 0.0}, {'current': 2892.532, 'min': 0.0, 'max': 0.0}, {'current': 3299.772, 'min': 0.0, 'max': 0.0}, {'current': 3299.641, 'min': 0.0, 'max': 0.0}, {'current': 3300.096, 'min': 0.0, 'max': 0.0}, {'current': 3298.515, 'min': 0.0, 'max': 0.0}, {'current': 3298.26, 'min': 0.0, 'max': 0.0}, {'current': 3299.084, 'min': 0.0, 'max': 0.0}, {'current': 3298.903, 'min': 0.0, 'max': 0.0}, {'current': 3298.866, 'min': 0.0, 'max': 0.0}, {'current': 3300.276, 'min': 0.0, 'max': 0.0}, {'current': 3298.704, 'min': 0.0, 'max': 0.0}, {'current': 3299.342, 'min': 0.0, 'max': 0.0}, {'current': 3297.795, 'min': 0.0, 'max': 0.0}, {'current': 3297.923, 'min': 0.0, 'max': 0.0}, {'current': 3298.013, 'min': 0.0, 'max': 0.0}, {'current': 3297.43, 'min': 0.0, 'max': 0.0}, {'current': 3299.564, 'min': 0.0, 'max': 0.0}, {'current': 3300.54, 'min': 0.0, 'max': 0.0}, {'current': 2747.263, 'min': 0.0, 'max': 0.0}, {'current': 3299.353, 'min': 0.0, 'max': 0.0}, {'current': 3297.896, 'min': 0.0, 'max': 0.0}, {'current': 2533.725, 'min': 0.0, 'max': 0.0}, {'current': 3299.656, 'min': 0.0, 'max': 0.0}, {'current': 3293.031, 'min': 0.0, 'max': 0.0}, {'current': 3027.834, 'min': 0.0, 'max': 0.0}, {'current': 3024.556, 'min': 0.0, 'max': 0.0}, {'current': 3067.379, 'min': 0.0, 'max': 0.0}, {'current': 3010.826, 'min': 0.0, 'max': 0.0}, {'current': 3101.81, 'min': 0.0, 'max': 0.0}, {'current': 2973.599, 'min': 0.0, 'max': 0.0}, {'current': 3061.27, 'min': 0.0, 'max': 0.0}, {'current': 3291.322, 'min': 0.0, 'max': 0.0}, {'current': 3017.723, 'min': 0.0, 'max': 0.0}, {'current': 2660.496, 'min': 0.0, 'max': 0.0}, {'current': 3004.775, 'min': 0.0, 'max': 0.0}, {'current': 3021.086, 'min': 0.0, 'max': 0.0}, {'current': 3027.592, 'min': 0.0, 'max': 0.0}, {'current': 3059.589, 'min': 0.0, 'max': 0.0}, {'current': 3019.568, 'min': 0.0, 'max': 0.0}, {'current': 3029.623, 'min': 0.0, 'max': 0.0}, {'current': 3080.312, 'min': 0.0, 'max': 0.0}, {'current': 3066.263, 'min': 0.0, 'max': 0.0}, {'current': 2998.37, 'min': 0.0, 'max': 0.0}, {'current': 2949.133, 'min': 0.0, 'max': 0.0}, {'current': 2964.0, 'min': 0.0, 'max': 0.0}, {'current': 3222.788, 'min': 0.0, 'max': 0.0}, {'current': 3299.63, 'min': 0.0, 'max': 0.0}, {'current': 2916.281, 'min': 0.0, 'max': 0.0}, {'current': 2825.282, 'min': 0.0, 'max': 0.0}, {'current': 3038.106, 'min': 0.0, 'max': 0.0}, {'current': 2895.235, 'min': 0.0, 'max': 0.0}, {'current': 3092.874, 'min': 0.0, 'max': 0.0}, {'current': 2924.994, 'min': 0.0, 'max': 0.0}, {'current': 2913.404, 'min': 0.0, 'max': 0.0}, {'current': 2935.638, 'min': 0.0, 'max': 0.0}, {'current': 2583.261, 'min': 0.0, 'max': 0.0}, {'current': 3101.162, 'min': 0.0, 'max': 0.0}, {'current': 3063.704, 'min': 0.0, 'max': 0.0}, {'current': 3093.23, 'min': 0.0, 'max': 0.0}, {'current': 3095.386, 'min': 0.0, 'max': 0.0}, {'current': 2925.773, 'min': 0.0, 'max': 0.0}, {'current': 2920.019, 'min': 0.0, 'max': 0.0}, {'current': 2916.15, 'min': 0.0, 'max': 0.0}, {'current': 2944.025, 'min': 0.0, 'max': 0.0}, {'current': 3259.667, 'min': 0.0, 'max': 0.0}, {'current': 3049.572, 'min': 0.0, 'max': 0.0}, {'current': 3263.675, 'min': 0.0, 'max': 0.0}, {'current': 3074.497, 'min': 0.0, 'max': 0.0}, {'current': 2923.985, 'min': 0.0, 'max': 0.0}, {'current': 2910.425, 'min': 0.0, 'max': 0.0}, {'current': 2812.861, 'min': 0.0, 'max': 0.0}, {'current': 2874.988, 'min': 0.0, 'max': 0.0}, {'current': 3120.953, 'min': 0.0, 'max': 0.0}, {'current': 3124.25, 'min': 0.0, 'max': 0.0}, {'current': 3113.753, 'min': 0.0, 'max': 0.0}, {'current': 3119.282, 'min': 0.0, 'max': 0.0}, {'current': 2982.281, 'min': 0.0, 'max': 0.0}, {'current': 3048.291, 'min': 0.0, 'max': 0.0}, {'current': 2987.986, 'min': 0.0, 'max': 0.0}, {'current': 2733.968, 'min': 0.0, 'max': 0.0}, {'current': 3274.202, 'min': 0.0, 'max': 0.0}, {'current': 3120.154, 'min': 0.0, 'max': 0.0}, {'current': 3122.388, 'min': 0.0, 'max': 0.0}, {'current': 2592.46, 'min': 0.0, 'max': 0.0}, {'current': 3121.448, 'min': 0.0, 'max': 0.0}, {'current': 3085.363, 'min': 0.0, 'max': 0.0}, {'current': 3176.23, 'min': 0.0, 'max': 0.0}, {'current': 3098.413, 'min': 0.0, 'max': 0.0}, {'current': 3131.838, 'min': 0.0, 'max': 0.0}, {'current': 3297.418, 'min': 0.0, 'max': 0.0}, {'current': 3144.573, 'min': 0.0, 'max': 0.0}, {'current': 3142.177, 'min': 0.0, 'max': 0.0}, {'current': 3135.089, 'min': 0.0, 'max': 0.0}, {'current': 3124.315, 'min': 0.0, 'max': 0.0}, {'current': 3206.745, 'min': 0.0, 'max': 0.0}, {'current': 3197.608, 'min': 0.0, 'max': 0.0}, {'current': 3271.659, 'min': 0.0, 'max': 0.0}, {'current': 3055.483, 'min': 0.0, 'max': 0.0}, {'current': 3299.813, 'min': 0.0, 'max': 0.0}, {'current': 3299.316, 'min': 0.0, 'max': 0.0}, {'current': 3298.471, 'min': 0.0, 'max': 0.0}, {'current': 3275.344, 'min': 0.0, 'max': 0.0}, {'current': 3298.318, 'min': 0.0, 'max': 0.0}, {'current': 3272.185, 'min': 0.0, 'max': 0.0}, {'current': 3299.032, 'min': 0.0, 'max': 0.0}, {'current': 3273.055, 'min': 0.0, 'max': 0.0}, {'current': 3277.573, 'min': 0.0, 'max': 0.0}, {'current': 3274.44, 'min': 0.0, 'max': 0.0}, {'current': 3275.925, 'min': 0.0, 'max': 0.0}, {'current': 3279.092, 'min': 0.0, 'max': 0.0}, {'current': 3275.089, 'min': 0.0, 'max': 0.0}, {'current': 3277.671, 'min': 0.0, 'max': 0.0}, {'current': 3299.135, 'min': 0.0, 'max': 0.0}, {'current': 3299.31, 'min': 0.0, 'max': 0.0}, {'current': 3298.038, 'min': 0.0, 'max': 0.0}, {'current': 3218.557, 'min': 0.0, 'max': 0.0}, {'current': 3298.859, 'min': 0.0, 'max': 0.0}, {'current': 3298.545, 'min': 0.0, 'max': 0.0}, {'current': 3027.843, 'min': 0.0, 'max': 0.0}, {'current': 3299.687, 'min': 0.0, 'max': 0.0}, {'current': 3053.229, 'min': 0.0, 'max': 0.0}, {'current': 3299.26, 'min': 0.0, 'max': 0.0}, {'current': 3059.862, 'min': 0.0, 'max': 0.0}, {'current': 3090.937, 'min': 0.0, 'max': 0.0}, {'current': 3094.897, 'min': 0.0, 'max': 0.0}, {'current': 3083.774, 'min': 0.0, 'max': 0.0}, {'current': 3027.722, 'min': 0.0, 'max': 0.0}, {'current': 3303.02, 'min': 0.0, 'max': 0.0}, {'current': 3069.951, 'min': 0.0, 'max': 0.0}, {'current': 3049.694, 'min': 0.0, 'max': 0.0}, {'current': 2814.624, 'min': 0.0, 'max': 0.0}, {'current': 3097.913, 'min': 0.0, 'max': 0.0}, {'current': 2788.423, 'min': 0.0, 'max': 0.0}, {'current': 3299.195, 'min': 0.0, 'max': 0.0}, {'current': 3069.533, 'min': 0.0, 'max': 0.0}, {'current': 3074.679, 'min': 0.0, 'max': 0.0}, {'current': 3066.308, 'min': 0.0, 'max': 0.0}, {'current': 2598.471, 'min': 0.0, 'max': 0.0}, {'current': 3299.109, 'min': 0.0, 'max': 0.0}, {'current': 3299.455, 'min': 0.0, 'max': 0.0}, {'current': 3298.945, 'min': 0.0, 'max': 0.0}, {'current': 3298.926, 'min': 0.0, 'max': 0.0}, {'current': 3299.003, 'min': 0.0, 'max': 0.0}, {'current': 3299.535, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 32.0, 'used': 0.01256561279296875}}, 'gpu': 'NVIDIA A10G', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}], 'memory': {'total': 747.9597625732422}}
2024-02-08 18:53:11,694 INFO HandlerThread:1516 [system_monitor.py:probe():224] Finished collecting system info
2024-02-08 18:53:11,694 INFO HandlerThread:1516 [system_monitor.py:probe():227] Publishing system info
2024-02-08 18:53:11,694 DEBUG HandlerThread:1516 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-02-08 18:53:11,695 DEBUG HandlerThread:1516 [system_info.py:_save_pip():68] Saving pip packages done
2024-02-08 18:53:11,695 DEBUG HandlerThread:1516 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-02-08 18:53:12,517 INFO Thread-12 :1516 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/conda-environment.yaml
2024-02-08 18:53:12,517 INFO Thread-12 :1516 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/requirements.txt
2024-02-08 18:53:25,997 DEBUG HandlerThread:1516 [system_info.py:_save_conda():87] Saving conda packages done
2024-02-08 18:53:25,999 INFO HandlerThread:1516 [system_monitor.py:probe():229] Finished publishing system info
2024-02-08 18:53:26,003 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:53:26,003 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:53:26,003 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:53:26,003 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:53:26,003 DEBUG SenderThread:1516 [sender.py:send():382] send: files
2024-02-08 18:53:26,004 INFO SenderThread:1516 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-02-08 18:53:26,008 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: stop_status
2024-02-08 18:53:26,008 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: stop_status
2024-02-08 18:53:26,010 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:53:26,169 DEBUG SenderThread:1516 [sender.py:send():382] send: telemetry
2024-02-08 18:53:26,169 DEBUG SenderThread:1516 [sender.py:send():382] send: config
2024-02-08 18:53:26,170 DEBUG SenderThread:1516 [sender.py:send():382] send: metric
2024-02-08 18:53:26,170 DEBUG SenderThread:1516 [sender.py:send():382] send: telemetry
2024-02-08 18:53:26,170 DEBUG SenderThread:1516 [sender.py:send():382] send: metric
2024-02-08 18:53:26,170 WARNING SenderThread:1516 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-02-08 18:53:26,382 INFO wandb-upload_0:1516 [upload_job.py:push():131] Uploaded file /tmp/tmpo2kahburwandb/uswugwyb-wandb-metadata.json
2024-02-08 18:53:26,518 INFO Thread-12 :1516 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/conda-environment.yaml
2024-02-08 18:53:26,518 INFO Thread-12 :1516 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/output.log
2024-02-08 18:53:26,518 INFO Thread-12 :1516 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/wandb-metadata.json
2024-02-08 18:53:26,821 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:53:28,518 INFO Thread-12 :1516 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/output.log
2024-02-08 18:53:28,894 DEBUG SenderThread:1516 [sender.py:send():382] send: exit
2024-02-08 18:53:28,894 INFO SenderThread:1516 [sender.py:send_exit():589] handling exit code: 1
2024-02-08 18:53:28,894 INFO SenderThread:1516 [sender.py:send_exit():591] handling runtime: 17
2024-02-08 18:53:28,894 INFO SenderThread:1516 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:53:28,894 INFO SenderThread:1516 [sender.py:send_exit():597] send defer
2024-02-08 18:53:28,894 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,895 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 0
2024-02-08 18:53:28,895 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,895 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 0
2024-02-08 18:53:28,895 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 1
2024-02-08 18:53:28,895 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,895 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 1
2024-02-08 18:53:28,895 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,895 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 1
2024-02-08 18:53:28,895 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 2
2024-02-08 18:53:28,895 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,895 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 2
2024-02-08 18:53:28,895 INFO HandlerThread:1516 [system_monitor.py:finish():203] Stopping system monitor
2024-02-08 18:53:28,897 DEBUG SystemMonitor:1516 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-02-08 18:53:28,897 DEBUG SystemMonitor:1516 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-02-08 18:53:28,897 DEBUG SystemMonitor:1516 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-02-08 18:53:28,899 INFO HandlerThread:1516 [interfaces.py:finish():202] Joined cpu monitor
2024-02-08 18:53:28,899 INFO HandlerThread:1516 [interfaces.py:finish():202] Joined disk monitor
2024-02-08 18:53:28,936 INFO HandlerThread:1516 [interfaces.py:finish():202] Joined gpu monitor
2024-02-08 18:53:28,936 INFO HandlerThread:1516 [interfaces.py:finish():202] Joined memory monitor
2024-02-08 18:53:28,936 INFO HandlerThread:1516 [interfaces.py:finish():202] Joined network monitor
2024-02-08 18:53:28,936 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,936 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 2
2024-02-08 18:53:28,936 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 3
2024-02-08 18:53:28,937 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,937 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 3
2024-02-08 18:53:28,937 DEBUG SenderThread:1516 [sender.py:send():382] send: stats
2024-02-08 18:53:28,938 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,938 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 3
2024-02-08 18:53:28,938 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 4
2024-02-08 18:53:28,938 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,938 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 4
2024-02-08 18:53:28,938 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,938 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 4
2024-02-08 18:53:28,938 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 5
2024-02-08 18:53:28,938 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,938 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 5
2024-02-08 18:53:28,938 DEBUG SenderThread:1516 [sender.py:send():382] send: summary
2024-02-08 18:53:28,939 INFO SenderThread:1516 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:53:28,940 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,940 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 5
2024-02-08 18:53:28,940 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 6
2024-02-08 18:53:28,940 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:28,940 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 6
2024-02-08 18:53:28,940 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:28,940 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 6
2024-02-08 18:53:28,944 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:53:29,088 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 7
2024-02-08 18:53:29,089 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:29,089 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 7
2024-02-08 18:53:29,089 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:29,089 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 7
2024-02-08 18:53:29,519 INFO Thread-12 :1516 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/config.yaml
2024-02-08 18:53:29,519 INFO Thread-12 :1516 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/wandb-summary.json
2024-02-08 18:53:29,894 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:53:30,180 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 8
2024-02-08 18:53:30,180 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:53:30,181 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:30,181 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 8
2024-02-08 18:53:30,181 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:30,181 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 8
2024-02-08 18:53:30,181 INFO SenderThread:1516 [job_builder.py:build():298] Attempting to build job artifact
2024-02-08 18:53:30,182 INFO SenderThread:1516 [job_builder.py:_get_source_type():439] no source found
2024-02-08 18:53:30,182 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 9
2024-02-08 18:53:30,182 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:30,182 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 9
2024-02-08 18:53:30,183 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:30,183 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 9
2024-02-08 18:53:30,183 INFO SenderThread:1516 [dir_watcher.py:finish():358] shutting down directory watcher
2024-02-08 18:53:30,519 INFO Thread-12 :1516 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/output.log
2024-02-08 18:53:30,519 INFO SenderThread:1516 [dir_watcher.py:finish():388] scan: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files
2024-02-08 18:53:30,520 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/config.yaml config.yaml
2024-02-08 18:53:30,520 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/requirements.txt requirements.txt
2024-02-08 18:53:30,520 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/conda-environment.yaml conda-environment.yaml
2024-02-08 18:53:30,520 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/wandb-metadata.json wandb-metadata.json
2024-02-08 18:53:30,520 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/output.log output.log
2024-02-08 18:53:30,522 INFO SenderThread:1516 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/wandb-summary.json wandb-summary.json
2024-02-08 18:53:30,524 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 10
2024-02-08 18:53:30,524 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:30,524 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 10
2024-02-08 18:53:30,525 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:30,525 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 10
2024-02-08 18:53:30,525 INFO SenderThread:1516 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:53:30,748 INFO wandb-upload_0:1516 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/config.yaml
2024-02-08 18:53:30,807 INFO wandb-upload_1:1516 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/requirements.txt
2024-02-08 18:53:30,838 INFO wandb-upload_4:1516 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/wandb-summary.json
2024-02-08 18:53:30,856 INFO wandb-upload_3:1516 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/output.log
2024-02-08 18:53:30,857 INFO wandb-upload_2:1516 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/files/conda-environment.yaml
2024-02-08 18:53:30,895 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:53:30,895 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:53:31,057 INFO Thread-11 (_thread_body):1516 [sender.py:transition_state():617] send defer: 11
2024-02-08 18:53:31,057 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:31,057 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 11
2024-02-08 18:53:31,058 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:31,058 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 11
2024-02-08 18:53:31,058 INFO SenderThread:1516 [file_pusher.py:join():181] waiting for file pusher
2024-02-08 18:53:31,058 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 12
2024-02-08 18:53:31,058 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:31,058 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 12
2024-02-08 18:53:31,058 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:31,058 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 12
2024-02-08 18:53:31,058 INFO SenderThread:1516 [file_stream.py:finish():595] file stream finish called
2024-02-08 18:53:31,123 INFO SenderThread:1516 [file_stream.py:finish():599] file stream finish is done
2024-02-08 18:53:31,123 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 13
2024-02-08 18:53:31,124 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:31,124 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 13
2024-02-08 18:53:31,124 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:31,124 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 13
2024-02-08 18:53:31,124 INFO SenderThread:1516 [sender.py:transition_state():617] send defer: 14
2024-02-08 18:53:31,124 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:53:31,124 INFO HandlerThread:1516 [handler.py:handle_request_defer():172] handle defer: 14
2024-02-08 18:53:31,124 DEBUG SenderThread:1516 [sender.py:send():382] send: final
2024-02-08 18:53:31,124 DEBUG SenderThread:1516 [sender.py:send():382] send: footer
2024-02-08 18:53:31,124 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: defer
2024-02-08 18:53:31,124 INFO SenderThread:1516 [sender.py:send_request_defer():613] handle sender defer: 14
2024-02-08 18:53:31,125 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:53:31,125 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:53:31,126 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:53:31,126 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: server_info
2024-02-08 18:53:31,126 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:53:31,126 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: server_info
2024-02-08 18:53:31,128 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: get_summary
2024-02-08 18:53:31,128 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: sampled_history
2024-02-08 18:53:31,129 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:53:31,129 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: job_info
2024-02-08 18:53:31,179 DEBUG SenderThread:1516 [sender.py:send_request():409] send_request: job_info
2024-02-08 18:53:31,179 INFO MainThread:1516 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-02-08 18:53:31,180 INFO MainThread:1516 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-02-08 18:53:31,180 INFO MainThread:1516 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-02-08 18:53:31,180 DEBUG HandlerThread:1516 [handler.py:handle_request():146] handle_request: shutdown
2024-02-08 18:53:31,180 INFO HandlerThread:1516 [handler.py:finish():866] shutting down handler
2024-02-08 18:53:32,129 INFO WriterThread:1516 [datastore.py:close():294] close: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_185311-5uym8l7w/run-5uym8l7w.wandb
2024-02-08 18:53:32,179 INFO SenderThread:1516 [sender.py:finish():1548] shutting down sender
2024-02-08 18:53:32,180 INFO SenderThread:1516 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:53:32,180 INFO SenderThread:1516 [file_pusher.py:join():181] waiting for file pusher