zzz99's picture
Training in progress, epoch 1
ca0cb2e verified
2024-02-08 18:00:20,766 INFO StreamThr :1998 [internal.py:wandb_internal():86] W&B internal server running at pid: 1998, started at: 2024-02-08 18:00:20.765271
2024-02-08 18:00:20,767 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: status
2024-02-08 18:00:20,770 INFO WriterThread:1998 [datastore.py:open_for_write():85] open: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/run-3wdew33h.wandb
2024-02-08 18:00:20,771 DEBUG SenderThread:1998 [sender.py:send():382] send: header
2024-02-08 18:00:20,771 DEBUG SenderThread:1998 [sender.py:send():382] send: run
2024-02-08 18:00:21,104 INFO SenderThread:1998 [dir_watcher.py:__init__():211] watching files in: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files
2024-02-08 18:00:21,104 INFO SenderThread:1998 [sender.py:_start_run_threads():1136] run started: 3wdew33h with start time 1707415220.764703
2024-02-08 18:00:21,108 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: check_version
2024-02-08 18:00:21,108 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: check_version
2024-02-08 18:00:21,151 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: run_start
2024-02-08 18:00:21,179 DEBUG HandlerThread:1998 [system_info.py:__init__():32] System info init
2024-02-08 18:00:21,179 DEBUG HandlerThread:1998 [system_info.py:__init__():47] System info init done
2024-02-08 18:00:21,179 INFO HandlerThread:1998 [system_monitor.py:start():194] Starting system monitor
2024-02-08 18:00:21,180 INFO SystemMonitor:1998 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-02-08 18:00:21,181 INFO HandlerThread:1998 [system_monitor.py:probe():214] Collecting system info
2024-02-08 18:00:21,181 INFO SystemMonitor:1998 [interfaces.py:start():190] Started cpu monitoring
2024-02-08 18:00:21,182 INFO SystemMonitor:1998 [interfaces.py:start():190] Started disk monitoring
2024-02-08 18:00:21,182 INFO SystemMonitor:1998 [interfaces.py:start():190] Started gpu monitoring
2024-02-08 18:00:21,184 INFO SystemMonitor:1998 [interfaces.py:start():190] Started memory monitoring
2024-02-08 18:00:21,184 INFO SystemMonitor:1998 [interfaces.py:start():190] Started network monitoring
2024-02-08 18:00:21,239 DEBUG HandlerThread:1998 [system_info.py:probe():196] Probing system
2024-02-08 18:00:21,241 DEBUG HandlerThread:1998 [gitlib.py:_init_repo():56] git repository is invalid
2024-02-08 18:00:21,241 DEBUG HandlerThread:1998 [system_info.py:probe():244] Probing system done
2024-02-08 18:00:21,241 DEBUG HandlerThread:1998 [system_monitor.py:probe():223] {'os': 'Linux-4.14.336-253.554.amzn2.x86_64-x86_64-with-glibc2.35', 'python': '3.10.13', 'heartbeatAt': '2024-02-08T18:00:21.239172', 'startedAt': '2024-02-08T18:00:20.760995', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/sagemaker-user/output-7b-26k-lora/../lora_finetuning_push_to_hub_save_local.py', 'codePathLocal': None, 'host': 'default', 'username': 'sagemaker-user', 'executable': '/opt/conda/bin/python3', 'cpu_count': 96, 'cpu_count_logical': 192, 'cpu_freq': {'current': 3011.898104166666, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 2708.532, 'min': 0.0, 'max': 0.0}, {'current': 2549.917, 'min': 0.0, 'max': 0.0}, {'current': 2649.342, 'min': 0.0, 'max': 0.0}, {'current': 2617.578, 'min': 0.0, 'max': 0.0}, {'current': 2514.64, 'min': 0.0, 'max': 0.0}, {'current': 3300.018, 'min': 0.0, 'max': 0.0}, {'current': 2514.814, 'min': 0.0, 'max': 0.0}, {'current': 2823.715, 'min': 0.0, 'max': 0.0}, {'current': 2444.923, 'min': 0.0, 'max': 0.0}, {'current': 2450.425, 'min': 0.0, 'max': 0.0}, {'current': 2405.045, 'min': 0.0, 'max': 0.0}, {'current': 3300.611, 'min': 0.0, 'max': 0.0}, {'current': 2521.7, 'min': 0.0, 'max': 0.0}, {'current': 2536.269, 'min': 0.0, 'max': 0.0}, {'current': 2454.76, 'min': 0.0, 'max': 0.0}, {'current': 2452.125, 'min': 0.0, 'max': 0.0}, {'current': 2578.243, 'min': 0.0, 'max': 0.0}, {'current': 2588.493, 'min': 0.0, 'max': 0.0}, {'current': 2597.93, 'min': 0.0, 'max': 0.0}, {'current': 2362.191, 'min': 0.0, 'max': 0.0}, {'current': 2603.048, 'min': 0.0, 'max': 0.0}, {'current': 1726.583, 'min': 0.0, 'max': 0.0}, {'current': 2010.457, 'min': 0.0, 'max': 0.0}, {'current': 1983.584, 'min': 0.0, 'max': 0.0}, {'current': 2585.543, 'min': 0.0, 'max': 0.0}, {'current': 2448.628, 'min': 0.0, 'max': 0.0}, {'current': 2473.293, 'min': 0.0, 'max': 0.0}, {'current': 2326.422, 'min': 0.0, 'max': 0.0}, {'current': 2469.471, 'min': 0.0, 'max': 0.0}, {'current': 1679.725, 'min': 0.0, 'max': 0.0}, {'current': 1707.891, 'min': 0.0, 'max': 0.0}, {'current': 2608.243, 'min': 0.0, 'max': 0.0}, {'current': 1825.706, 'min': 0.0, 'max': 0.0}, {'current': 1852.851, 'min': 0.0, 'max': 0.0}, {'current': 2741.597, 'min': 0.0, 'max': 0.0}, {'current': 2763.617, 'min': 0.0, 'max': 0.0}, {'current': 2668.973, 'min': 0.0, 'max': 0.0}, {'current': 1848.906, 'min': 0.0, 'max': 0.0}, {'current': 1859.662, 'min': 0.0, 'max': 0.0}, {'current': 1872.243, 'min': 0.0, 'max': 0.0}, {'current': 2750.258, 'min': 0.0, 'max': 0.0}, {'current': 2171.724, 'min': 0.0, 'max': 0.0}, {'current': 2673.366, 'min': 0.0, 'max': 0.0}, {'current': 2686.6, 'min': 0.0, 'max': 0.0}, {'current': 2254.628, 'min': 0.0, 'max': 0.0}, {'current': 1922.174, 'min': 0.0, 'max': 0.0}, {'current': 1919.649, 'min': 0.0, 'max': 0.0}, {'current': 1914.156, 'min': 0.0, 'max': 0.0}, {'current': 3299.749, 'min': 0.0, 'max': 0.0}, {'current': 3299.574, 'min': 0.0, 'max': 0.0}, {'current': 3299.179, 'min': 0.0, 'max': 0.0}, {'current': 3300.775, 'min': 0.0, 'max': 0.0}, {'current': 3299.087, 'min': 0.0, 'max': 0.0}, {'current': 3299.329, 'min': 0.0, 'max': 0.0}, {'current': 2200.612, 'min': 0.0, 'max': 0.0}, {'current': 2744.304, 'min': 0.0, 'max': 0.0}, {'current': 3300.722, 'min': 0.0, 'max': 0.0}, {'current': 3299.792, 'min': 0.0, 'max': 0.0}, {'current': 3299.956, 'min': 0.0, 'max': 0.0}, {'current': 3301.117, 'min': 0.0, 'max': 0.0}, {'current': 2226.489, 'min': 0.0, 'max': 0.0}, {'current': 3070.859, 'min': 0.0, 'max': 0.0}, {'current': 1992.465, 'min': 0.0, 'max': 0.0}, {'current': 3048.294, 'min': 0.0, 'max': 0.0}, {'current': 2694.481, 'min': 0.0, 'max': 0.0}, {'current': 3299.499, 'min': 0.0, 'max': 0.0}, {'current': 2794.422, 'min': 0.0, 'max': 0.0}, {'current': 3292.924, 'min': 0.0, 'max': 0.0}, {'current': 3299.232, 'min': 0.0, 'max': 0.0}, {'current': 3297.378, 'min': 0.0, 'max': 0.0}, {'current': 3300.519, 'min': 0.0, 'max': 0.0}, {'current': 3300.987, 'min': 0.0, 'max': 0.0}, {'current': 3298.537, 'min': 0.0, 'max': 0.0}, {'current': 2474.441, 'min': 0.0, 'max': 0.0}, {'current': 2795.47, 'min': 0.0, 'max': 0.0}, {'current': 2412.968, 'min': 0.0, 'max': 0.0}, {'current': 2550.34, 'min': 0.0, 'max': 0.0}, {'current': 3009.464, 'min': 0.0, 'max': 0.0}, {'current': 2578.135, 'min': 0.0, 'max': 0.0}, {'current': 2404.379, 'min': 0.0, 'max': 0.0}, {'current': 3020.541, 'min': 0.0, 'max': 0.0}, {'current': 3058.728, 'min': 0.0, 'max': 0.0}, {'current': 2989.512, 'min': 0.0, 'max': 0.0}, {'current': 3059.601, 'min': 0.0, 'max': 0.0}, {'current': 2714.737, 'min': 0.0, 'max': 0.0}, {'current': 2694.293, 'min': 0.0, 'max': 0.0}, {'current': 3261.941, 'min': 0.0, 'max': 0.0}, {'current': 2586.965, 'min': 0.0, 'max': 0.0}, {'current': 3294.605, 'min': 0.0, 'max': 0.0}, {'current': 3260.796, 'min': 0.0, 'max': 0.0}, {'current': 3295.156, 'min': 0.0, 'max': 0.0}, {'current': 3253.521, 'min': 0.0, 'max': 0.0}, {'current': 2832.733, 'min': 0.0, 'max': 0.0}, {'current': 3264.153, 'min': 0.0, 'max': 0.0}, {'current': 2767.538, 'min': 0.0, 'max': 0.0}, {'current': 3300.784, 'min': 0.0, 'max': 0.0}, {'current': 2342.793, 'min': 0.0, 'max': 0.0}, {'current': 2291.218, 'min': 0.0, 'max': 0.0}, {'current': 2343.318, 'min': 0.0, 'max': 0.0}, {'current': 2377.449, 'min': 0.0, 'max': 0.0}, {'current': 2182.822, 'min': 0.0, 'max': 0.0}, {'current': 3300.271, 'min': 0.0, 'max': 0.0}, {'current': 2180.405, 'min': 0.0, 'max': 0.0}, {'current': 2708.15, 'min': 0.0, 'max': 0.0}, {'current': 2217.841, 'min': 0.0, 'max': 0.0}, {'current': 2223.228, 'min': 0.0, 'max': 0.0}, {'current': 2332.486, 'min': 0.0, 'max': 0.0}, {'current': 2585.795, 'min': 0.0, 'max': 0.0}, {'current': 2332.572, 'min': 0.0, 'max': 0.0}, {'current': 2256.902, 'min': 0.0, 'max': 0.0}, {'current': 2334.544, 'min': 0.0, 'max': 0.0}, {'current': 2350.832, 'min': 0.0, 'max': 0.0}, {'current': 2424.323, 'min': 0.0, 'max': 0.0}, {'current': 2456.004, 'min': 0.0, 'max': 0.0}, {'current': 2457.45, 'min': 0.0, 'max': 0.0}, {'current': 2579.468, 'min': 0.0, 'max': 0.0}, {'current': 2458.733, 'min': 0.0, 'max': 0.0}, {'current': 1860.983, 'min': 0.0, 'max': 0.0}, {'current': 2198.204, 'min': 0.0, 'max': 0.0}, {'current': 2131.955, 'min': 0.0, 'max': 0.0}, {'current': 2416.616, 'min': 0.0, 'max': 0.0}, {'current': 2498.284, 'min': 0.0, 'max': 0.0}, {'current': 2409.271, 'min': 0.0, 'max': 0.0}, {'current': 2442.917, 'min': 0.0, 'max': 0.0}, {'current': 2387.494, 'min': 0.0, 'max': 0.0}, {'current': 1840.589, 'min': 0.0, 'max': 0.0}, {'current': 1851.316, 'min': 0.0, 'max': 0.0}, {'current': 2572.071, 'min': 0.0, 'max': 0.0}, {'current': 1846.514, 'min': 0.0, 'max': 0.0}, {'current': 1838.129, 'min': 0.0, 'max': 0.0}, {'current': 2458.548, 'min': 0.0, 'max': 0.0}, {'current': 2468.97, 'min': 0.0, 'max': 0.0}, {'current': 2573.521, 'min': 0.0, 'max': 0.0}, {'current': 1854.263, 'min': 0.0, 'max': 0.0}, {'current': 1852.806, 'min': 0.0, 'max': 0.0}, {'current': 1826.095, 'min': 0.0, 'max': 0.0}, {'current': 2445.646, 'min': 0.0, 'max': 0.0}, {'current': 2189.591, 'min': 0.0, 'max': 0.0}, {'current': 2264.349, 'min': 0.0, 'max': 0.0}, {'current': 2377.441, 'min': 0.0, 'max': 0.0}, {'current': 2012.723, 'min': 0.0, 'max': 0.0}, {'current': 1906.662, 'min': 0.0, 'max': 0.0}, {'current': 2124.071, 'min': 0.0, 'max': 0.0}, {'current': 1895.988, 'min': 0.0, 'max': 0.0}, {'current': 3300.029, 'min': 0.0, 'max': 0.0}, {'current': 2525.577, 'min': 0.0, 'max': 0.0}, {'current': 3299.997, 'min': 0.0, 'max': 0.0}, {'current': 2670.847, 'min': 0.0, 'max': 0.0}, {'current': 3299.699, 'min': 0.0, 'max': 0.0}, {'current': 3299.175, 'min': 0.0, 'max': 0.0}, {'current': 2197.86, 'min': 0.0, 'max': 0.0}, {'current': 2528.763, 'min': 0.0, 'max': 0.0}, {'current': 3197.634, 'min': 0.0, 'max': 0.0}, {'current': 3211.437, 'min': 0.0, 'max': 0.0}, {'current': 3251.423, 'min': 0.0, 'max': 0.0}, {'current': 2959.085, 'min': 0.0, 'max': 0.0}, {'current': 2065.145, 'min': 0.0, 'max': 0.0}, {'current': 2974.061, 'min': 0.0, 'max': 0.0}, {'current': 2068.626, 'min': 0.0, 'max': 0.0}, {'current': 2986.901, 'min': 0.0, 'max': 0.0}, {'current': 2340.617, 'min': 0.0, 'max': 0.0}, {'current': 3032.214, 'min': 0.0, 'max': 0.0}, {'current': 2431.142, 'min': 0.0, 'max': 0.0}, {'current': 3302.396, 'min': 0.0, 'max': 0.0}, {'current': 2944.793, 'min': 0.0, 'max': 0.0}, {'current': 3299.677, 'min': 0.0, 'max': 0.0}, {'current': 3298.69, 'min': 0.0, 'max': 0.0}, {'current': 2920.67, 'min': 0.0, 'max': 0.0}, {'current': 3090.867, 'min': 0.0, 'max': 0.0}, {'current': 2503.478, 'min': 0.0, 'max': 0.0}, {'current': 2473.681, 'min': 0.0, 'max': 0.0}, {'current': 2485.275, 'min': 0.0, 'max': 0.0}, {'current': 2488.179, 'min': 0.0, 'max': 0.0}, {'current': 2990.001, 'min': 0.0, 'max': 0.0}, {'current': 2452.206, 'min': 0.0, 'max': 0.0}, {'current': 2729.116, 'min': 0.0, 'max': 0.0}, {'current': 3001.225, 'min': 0.0, 'max': 0.0}, {'current': 3299.389, 'min': 0.0, 'max': 0.0}, {'current': 3299.765, 'min': 0.0, 'max': 0.0}, {'current': 3299.125, 'min': 0.0, 'max': 0.0}, {'current': 2002.363, 'min': 0.0, 'max': 0.0}, {'current': 2302.227, 'min': 0.0, 'max': 0.0}, {'current': 3132.468, 'min': 0.0, 'max': 0.0}, {'current': 2770.449, 'min': 0.0, 'max': 0.0}, {'current': 3288.126, 'min': 0.0, 'max': 0.0}, {'current': 3298.022, 'min': 0.0, 'max': 0.0}, {'current': 3298.691, 'min': 0.0, 'max': 0.0}, {'current': 3297.894, 'min': 0.0, 'max': 0.0}, {'current': 2828.863, 'min': 0.0, 'max': 0.0}, {'current': 3299.914, 'min': 0.0, 'max': 0.0}, {'current': 2721.595, 'min': 0.0, 'max': 0.0}, {'current': 3299.427, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 32.0, 'used': 0.01256561279296875}}, 'gpu': 'NVIDIA A10G', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}], 'memory': {'total': 747.9597625732422}}
2024-02-08 18:00:21,241 INFO HandlerThread:1998 [system_monitor.py:probe():224] Finished collecting system info
2024-02-08 18:00:21,241 INFO HandlerThread:1998 [system_monitor.py:probe():227] Publishing system info
2024-02-08 18:00:21,241 DEBUG HandlerThread:1998 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-02-08 18:00:21,242 DEBUG HandlerThread:1998 [system_info.py:_save_pip():68] Saving pip packages done
2024-02-08 18:00:21,242 DEBUG HandlerThread:1998 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-02-08 18:00:22,105 INFO Thread-12 :1998 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/requirements.txt
2024-02-08 18:00:22,105 INFO Thread-12 :1998 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/conda-environment.yaml
2024-02-08 18:00:35,540 DEBUG HandlerThread:1998 [system_info.py:_save_conda():87] Saving conda packages done
2024-02-08 18:00:35,542 INFO HandlerThread:1998 [system_monitor.py:probe():229] Finished publishing system info
2024-02-08 18:00:35,545 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:00:35,545 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:00:35,545 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:00:35,545 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 18:00:35,546 DEBUG SenderThread:1998 [sender.py:send():382] send: files
2024-02-08 18:00:35,546 INFO SenderThread:1998 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-02-08 18:00:35,550 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: stop_status
2024-02-08 18:00:35,550 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: stop_status
2024-02-08 18:00:35,557 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:00:35,673 DEBUG SenderThread:1998 [sender.py:send():382] send: telemetry
2024-02-08 18:00:35,673 DEBUG SenderThread:1998 [sender.py:send():382] send: config
2024-02-08 18:00:35,674 DEBUG SenderThread:1998 [sender.py:send():382] send: metric
2024-02-08 18:00:35,674 DEBUG SenderThread:1998 [sender.py:send():382] send: telemetry
2024-02-08 18:00:35,674 DEBUG SenderThread:1998 [sender.py:send():382] send: metric
2024-02-08 18:00:35,674 WARNING SenderThread:1998 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-02-08 18:00:35,893 INFO wandb-upload_0:1998 [upload_job.py:push():131] Uploaded file /tmp/tmp_9e35467wandb/s8xuyfes-wandb-metadata.json
2024-02-08 18:00:36,107 INFO Thread-12 :1998 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/conda-environment.yaml
2024-02-08 18:00:36,107 INFO Thread-12 :1998 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/output.log
2024-02-08 18:00:36,108 INFO Thread-12 :1998 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/wandb-metadata.json
2024-02-08 18:00:36,534 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:00:38,108 INFO Thread-12 :1998 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/output.log
2024-02-08 18:00:38,575 DEBUG SenderThread:1998 [sender.py:send():382] send: exit
2024-02-08 18:00:38,575 INFO SenderThread:1998 [sender.py:send_exit():589] handling exit code: 1
2024-02-08 18:00:38,575 INFO SenderThread:1998 [sender.py:send_exit():591] handling runtime: 17
2024-02-08 18:00:38,575 INFO SenderThread:1998 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:00:38,575 INFO SenderThread:1998 [sender.py:send_exit():597] send defer
2024-02-08 18:00:38,575 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,576 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 0
2024-02-08 18:00:38,576 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,576 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 0
2024-02-08 18:00:38,576 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 1
2024-02-08 18:00:38,576 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,576 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 1
2024-02-08 18:00:38,576 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,576 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 1
2024-02-08 18:00:38,576 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 2
2024-02-08 18:00:38,576 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,576 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 2
2024-02-08 18:00:38,576 INFO HandlerThread:1998 [system_monitor.py:finish():203] Stopping system monitor
2024-02-08 18:00:38,578 INFO HandlerThread:1998 [interfaces.py:finish():202] Joined cpu monitor
2024-02-08 18:00:38,578 INFO HandlerThread:1998 [interfaces.py:finish():202] Joined disk monitor
2024-02-08 18:00:38,578 DEBUG SystemMonitor:1998 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-02-08 18:00:38,578 DEBUG SystemMonitor:1998 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-02-08 18:00:38,578 DEBUG SystemMonitor:1998 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-02-08 18:00:38,618 INFO HandlerThread:1998 [interfaces.py:finish():202] Joined gpu monitor
2024-02-08 18:00:38,618 INFO HandlerThread:1998 [interfaces.py:finish():202] Joined memory monitor
2024-02-08 18:00:38,618 INFO HandlerThread:1998 [interfaces.py:finish():202] Joined network monitor
2024-02-08 18:00:38,618 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,619 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 2
2024-02-08 18:00:38,619 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 3
2024-02-08 18:00:38,619 DEBUG SenderThread:1998 [sender.py:send():382] send: stats
2024-02-08 18:00:38,620 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,620 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 3
2024-02-08 18:00:38,620 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,620 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 3
2024-02-08 18:00:38,620 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 4
2024-02-08 18:00:38,620 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,620 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 4
2024-02-08 18:00:38,620 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,620 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 4
2024-02-08 18:00:38,620 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 5
2024-02-08 18:00:38,620 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,621 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 5
2024-02-08 18:00:38,621 DEBUG SenderThread:1998 [sender.py:send():382] send: summary
2024-02-08 18:00:38,633 INFO SenderThread:1998 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 18:00:38,633 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,633 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 5
2024-02-08 18:00:38,633 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 6
2024-02-08 18:00:38,634 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,634 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 6
2024-02-08 18:00:38,634 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,634 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 6
2024-02-08 18:00:38,638 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 18:00:38,773 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 7
2024-02-08 18:00:38,773 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:38,773 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 7
2024-02-08 18:00:38,774 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:38,774 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 7
2024-02-08 18:00:39,108 INFO Thread-12 :1998 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/config.yaml
2024-02-08 18:00:39,108 INFO Thread-12 :1998 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/wandb-summary.json
2024-02-08 18:00:39,575 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:00:39,685 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 8
2024-02-08 18:00:39,685 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:00:39,686 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:39,686 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 8
2024-02-08 18:00:39,686 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:39,686 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 8
2024-02-08 18:00:39,686 INFO SenderThread:1998 [job_builder.py:build():298] Attempting to build job artifact
2024-02-08 18:00:39,688 INFO SenderThread:1998 [job_builder.py:_get_source_type():439] no source found
2024-02-08 18:00:39,688 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 9
2024-02-08 18:00:39,688 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:39,688 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 9
2024-02-08 18:00:39,688 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:39,688 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 9
2024-02-08 18:00:39,688 INFO SenderThread:1998 [dir_watcher.py:finish():358] shutting down directory watcher
2024-02-08 18:00:40,109 INFO Thread-12 :1998 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/output.log
2024-02-08 18:00:40,109 INFO SenderThread:1998 [dir_watcher.py:finish():388] scan: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files
2024-02-08 18:00:40,109 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/config.yaml config.yaml
2024-02-08 18:00:40,109 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/requirements.txt requirements.txt
2024-02-08 18:00:40,109 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/conda-environment.yaml conda-environment.yaml
2024-02-08 18:00:40,109 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/wandb-metadata.json wandb-metadata.json
2024-02-08 18:00:40,110 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/output.log output.log
2024-02-08 18:00:40,111 INFO SenderThread:1998 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/wandb-summary.json wandb-summary.json
2024-02-08 18:00:40,111 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 10
2024-02-08 18:00:40,112 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:40,112 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 10
2024-02-08 18:00:40,114 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:40,114 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 10
2024-02-08 18:00:40,114 INFO SenderThread:1998 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:00:40,385 INFO wandb-upload_0:1998 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/config.yaml
2024-02-08 18:00:40,437 INFO wandb-upload_3:1998 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/output.log
2024-02-08 18:00:40,470 INFO wandb-upload_2:1998 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/conda-environment.yaml
2024-02-08 18:00:40,560 INFO wandb-upload_1:1998 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/requirements.txt
2024-02-08 18:00:40,575 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:00:40,576 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:00:40,674 INFO wandb-upload_4:1998 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/files/wandb-summary.json
2024-02-08 18:00:40,875 INFO Thread-11 (_thread_body):1998 [sender.py:transition_state():617] send defer: 11
2024-02-08 18:00:40,875 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:40,875 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 11
2024-02-08 18:00:40,875 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:40,875 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 11
2024-02-08 18:00:40,875 INFO SenderThread:1998 [file_pusher.py:join():181] waiting for file pusher
2024-02-08 18:00:40,876 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 12
2024-02-08 18:00:40,876 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:40,876 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 12
2024-02-08 18:00:40,876 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:40,876 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 12
2024-02-08 18:00:40,876 INFO SenderThread:1998 [file_stream.py:finish():595] file stream finish called
2024-02-08 18:00:40,954 INFO SenderThread:1998 [file_stream.py:finish():599] file stream finish is done
2024-02-08 18:00:40,954 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 13
2024-02-08 18:00:40,954 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:40,954 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 13
2024-02-08 18:00:40,954 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:40,954 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 13
2024-02-08 18:00:40,954 INFO SenderThread:1998 [sender.py:transition_state():617] send defer: 14
2024-02-08 18:00:40,955 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: defer
2024-02-08 18:00:40,955 INFO HandlerThread:1998 [handler.py:handle_request_defer():172] handle defer: 14
2024-02-08 18:00:40,955 DEBUG SenderThread:1998 [sender.py:send():382] send: final
2024-02-08 18:00:40,955 DEBUG SenderThread:1998 [sender.py:send():382] send: footer
2024-02-08 18:00:40,955 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: defer
2024-02-08 18:00:40,955 INFO SenderThread:1998 [sender.py:send_request_defer():613] handle sender defer: 14
2024-02-08 18:00:40,955 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:00:40,956 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 18:00:40,956 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:00:40,956 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: server_info
2024-02-08 18:00:40,956 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 18:00:40,957 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: server_info
2024-02-08 18:00:40,959 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: get_summary
2024-02-08 18:00:40,959 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: sampled_history
2024-02-08 18:00:40,959 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 18:00:40,959 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: job_info
2024-02-08 18:00:41,013 DEBUG SenderThread:1998 [sender.py:send_request():409] send_request: job_info
2024-02-08 18:00:41,013 INFO MainThread:1998 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-02-08 18:00:41,013 INFO MainThread:1998 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-02-08 18:00:41,013 INFO MainThread:1998 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-02-08 18:00:41,014 DEBUG HandlerThread:1998 [handler.py:handle_request():146] handle_request: shutdown
2024-02-08 18:00:41,014 INFO HandlerThread:1998 [handler.py:finish():866] shutting down handler
2024-02-08 18:00:41,960 INFO WriterThread:1998 [datastore.py:close():294] close: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_180020-3wdew33h/run-3wdew33h.wandb
2024-02-08 18:00:42,013 INFO SenderThread:1998 [sender.py:finish():1548] shutting down sender
2024-02-08 18:00:42,013 INFO SenderThread:1998 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 18:00:42,013 INFO SenderThread:1998 [file_pusher.py:join():181] waiting for file pusher