File size: 57,216 Bytes
3cf731d 7e74649 3cf731d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 |
---
language:
- en
tags:
- open-assitant
- falcon
license: "unknown"
datasets:
- toanbku/oa-df
---
- Datasets: https://huggingface.co/datasets/toanbku/oa-df
- Training log: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6/overview
Command
```
export BS=8
deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py \
--config defaults oa-falcon-7b-top1 oasst_df \
--cache_dir /home/ubuntu/OA/model/model_training/.cache \
--per_device_eval_batch_size $BS --per_device_train_batch_size $BS \
--deepspeed
```
Config
```
oa-falcon-7b-top1:
dtype: bf16
log_dir: "falcon_log_7b"
learning_rate: 1e-5
model_name: "OpenAssistant/falcon-7b-sft-top1-696"
deepspeed_config: configs/zero_config.json
output_dir: falcon
weight_decay: 0.0
max_length: 2048
save_strategy: steps
eval_steps: 80
save_steps: 80
warmup_steps: 4
gradient_checkpointing: true
gradient_accumulation_steps: 2
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
num_train_epochs: 4
save_total_limit: 2
residual_dropout: 0.2
residual_dropout_lima: true
oasst_df:
save_strategy: epoch
datasets:
- oasst_export:
lang: "en"
hf_dataset_name: toanbku/oa-df
val_split: 0.05
```
### Demo
- **input_text:** <|prompter|>Provide information about Dwarves Foundation company<|endoftext|><|assistant|>
- **output:**
```
Dwarves Foundation is a blockchain-based, decentralized platform that aims to create a global community based on equality and inclusion. It was founded by Alex Mashinsky, an entrepreneur and investor. The platform uses blockchain technology to create a decentralized and transparent system for the creation and management of projects.
Dwarves Foundation allows users to create and participate in projects that promote equality and inclusion. Users can contribute time, money, or other resources to projects, and they are rewarded with DVA, the platform's native cryptocurrency. This cryptocurrency can be used to participate in future projects or to purchase goods and services within the Dwarves Foundation ecosystem.
The platform aims to create a community that is diverse and inclusive, and it encourages people from all walks of life to participate. It believes that collaboration and teamwork are key to solving the world's problems, and it aims to create a platform where people can work together to make the world a better place.
So far, Dwarves Foundation has been successful in creating a community and launching several projects. One of its most successful projects, the AION network, is a decentralized platform that allows users to create and participate in projects using the DVA cryptocurrency. The platform has already attracted thousands of users and has several projects in development.
Overall, Dwarves Foundation is a unique platform that aims to create a global community based on equality and inclusion. It uses blockchain technology to create a transparent and decentralized system, and it encourages people from all walks of life to participate and collaborate towards a better future.
```
- **log:**
```
python ./test.py
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[2023-07-17 11:32:36,837] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:11<00:00, 5.98s/it]
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
```
----
### Training log
```
(cuda118) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --deepspeed
[2023-07-17 16:21:13,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:16,536] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-07-17 16:21:16,536] [INFO] [runner.py:555:main] cmd = /home/ubuntu/mambaforge/envs/cuda118/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=61000 --enable_each_rank_log=None trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --deepspeed
[2023-07-17 16:21:17,929] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:20,292] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-17 16:21:20,292] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-17 16:21:20,292] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-17 16:21:20,292] [INFO] [launch.py:163:main] dist_world_size=8
[2023-07-17 16:21:20,292] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-07-17 16:21:24,714] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:24,805] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,000] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,151] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,228] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,251] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,299] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
trainig_conf = Namespace(rng_seed=2703368087, learning_rate='1e-5', gradient_checkpointing=True, gradient_accumulation_steps=2, per_device_train_batch_size=8, per_device_eval_batch_size=8, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon='1e-12', weight_decay=0.0, warmup_steps=4, eval_steps=80, save_strategy='epoch', save_steps=80, max_length=2048, val_max_length=None, num_train_epochs=4, logging_steps=10, max_grad_norm=2.0, save_total_limit=2, dtype='bf16', eval_accumulation_steps=None, freeze_layer=None, datasets=[{'oasst_export': {'lang': 'en', 'hf_dataset_name': 'toanbku/oa-df', 'val_split': 0.05}}], datasets_extra=[], cache_dir='/home/ubuntu/OA/model/model_training/.cache', loss_fn='CrossEntropyLoss', eval_size=None, log_dir='falcon_log_7b', quantization=False, seq2seqmodel=False, poly_eps=1.0, fuse_gelu=True, log_wandb=True, samples_mixing=False, verbose=False, output_dir='falcon', use_custom_sampler=False, random_offset_probability=0.8, label_masking=True, residual_dropout=0.2, use_flash_attention=False, sort_by_length=False, use_system_prefix=False, system_prefix='You are Joi, a large language model trained by Open-Assistant. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-12', use_system_tag=False, system_property_dropout=0.5, system_add_length=False, per_digit_tokens=False, is_reward_model=False, residual_dropout_lima=True, deepspeed_config='configs/zero_config.json', peft_model=False, peft_type='lora', model_name='OpenAssistant/falcon-7b-sft-top1-696', wandb_entity='toanbku', local_rank=0, deepspeed=True, resume_from_checkpoint=False, show_dataset_stats=False, world_size=8)
[2023-07-17 16:21:25,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:25,864] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:25,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-17 16:21:25,952] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:25,952] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,311] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,312] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,320] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,320] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,407] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,407] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,511] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,512] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,558] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,618] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,619] [INFO] [comm.py:594:init_distributed] cdb=None
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
Tokenizer sanity check:
Type: PreTrainedTokenizerFast
special_tokens_map: {'eos_token': '<|endoftext|>', 'sep_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|prompter|>', '>>SUFFIX<<', '<|prefix_begin|>', '>>INTRODUCTION<<', '>>QUESTION<<', '>>SUMMARY<<', '<|prefix_end|>', '>>DOMAIN<<', '<|assistant|>', '<|system|>', '>>TITLE<<', '>>COMMENT<<', '>>MIDDLE<<', '>>PREFIX<<', '>>ANSWER<<', '>>ABSTRACT<<']}
Using bos_token, but it is not set yet.
bos_token='None', bos_token_id=None
eos_token='<|endoftext|>', eos_token_id=11
prompter_token_id=65028, assistant_token_id=65025
encoding result: {'input_ids': [65028, 60, 28, 11, 65024, 13318, 37, 445, 193, 7055, 37, 204, 28, 193, 11723, 37, 20906, 193, 11, 65025, 44, 28, 11, 65028, 60, 29, 11, 65024, 7055, 37, 204, 28, 193, 13318, 37, 445, 193, 11723, 37, 20906, 193, 11, 65025, 44, 29, 11], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
0: 65028 -> "<|prompter|>"
1: 60 -> "Q"
2: 28 -> "1"
3: 11 -> "<|endoftext|>"
4: 65024 -> "<|system|>"
5: 13318 -> "lang"
6: 37 -> ":"
7: 445 -> " en"
8: 193 -> "
"
9: 7055 -> "length"
10: 37 -> ":"
11: 204 -> " "
12: 28 -> "1"
13: 193 -> "
"
14: 11723 -> "context"
15: 37 -> ":"
16: 20906 -> " ctx"
17: 193 -> "
"
18: 11 -> "<|endoftext|>"
19: 65025 -> "<|assistant|>"
20: 44 -> "A"
21: 28 -> "1"
22: 11 -> "<|endoftext|>"
23: 65028 -> "<|prompter|>"
24: 60 -> "Q"
25: 29 -> "2"
26: 11 -> "<|endoftext|>"
27: 65024 -> "<|system|>"
28: 7055 -> "length"
29: 37 -> ":"
30: 204 -> " "
31: 28 -> "1"
32: 193 -> "
"
33: 13318 -> "lang"
34: 37 -> ":"
35: 445 -> " en"
36: 193 -> "
"
37: 11723 -> "context"
38: 37 -> ":"
39: 20906 -> " ctx"
40: 193 -> "
"
41: 11 -> "<|endoftext|>"
42: 65025 -> "<|assistant|>"
43: 44 -> "A"
44: 29 -> "2"
45: 11 -> "<|endoftext|>"
message_indices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3]
Downloading and preparing dataset json/toanbku--oa-df to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50.4k/50.4k [00:00<00:00, 45.8MB/s]
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 11.5k/11.5k [00:00<00:00, 38.3MB/s]
Downloading data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 11.19it/s]
Extracting data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 1782.53it/s]
Dataset json downloaded and prepared to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 11.1MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.9MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 9.35MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.4MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 14.0MB/s]
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 9.62MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 0%| | 0.00/4.20k [00:00<?, ?B/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.7MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.6MB/s]
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. | 10.5M/1.92G [00:00<00:20, 91.9MB/s]
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. | 21.0M/1.92G [00:00<00:19, 96.0MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading (β¦)l-00001-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.92G/1.92G [00:33<00:00, 57.1MB/s]
Downloading (β¦)l-00002-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.99G/1.99G [00:36<00:00, 54.4MB/s]
Downloading (β¦)l-00003-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:37<00:00, 50.7MB/s]
Downloading (β¦)l-00004-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:35<00:00, 53.1MB/s]
Downloading (β¦)l-00005-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.99G/1.99G [00:36<00:00, 53.9MB/s]
Downloading (β¦)l-00006-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:35<00:00, 54.3MB/s]
Downloading (β¦)l-00007-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:37<00:00, 50.4MB/s]
Downloading (β¦)l-00008-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 921M/921M [00:18<00:00, 50.7MB/s]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:32<00:00, 34.10s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.14s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:32<00:00, 34.11s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.17s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.13s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.16s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.14s/it]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.20s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:15<00:00, 1.92s/it]
Downloading (β¦)neration_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 116/116 [00:00<00:00, 638kB/s]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:16<00:00, 2.03s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:15<00:00, 2.00s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 6/8 [00:16<00:05, 2.63s/it]Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:18<00:00, 2.29s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.42s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.47s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.48s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.47s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Resizing embeddings to 65040
Number of trainable parameters: 6921M
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230717_162731-w1l8j7n6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned
wandb: βοΈ View project at https://wandb.ai/toanbku/supervised-finetuning
wandb: π View run at https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
[1/3] /home/ubuntu/mambaforge/envs/cuda118/bin/nvcc -ccbin /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++17 -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
[2/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
[3/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/envs/cuda118/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 29.23199462890625 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.244072675704956 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.24746584892273 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.145082473754883 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.245840072631836 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.245012760162354 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.24890160560608 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.146907329559326 seconds
Rank: 6 partition count [8] and sizes[(865224176, False)]
Rank: 4 partition count [8] and sizes[(865224176, False)]
Rank: 3 partition count [8] and sizes[(865224176, False)]
Rank: 7 partition count [8] and sizes[(865224176, False)]
Rank: 5 partition count [8] and sizes[(865224176, False)]
Rank: 0 partition count [8] and sizes[(865224176, False)]
Rank: 2 partition count [8] and sizes[(865224176, False)]
Rank: 1 partition count [8] and sizes[(865224176, False)]
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
0%| | 0/4 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
25%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 1/4 [00:00<00:02, 1.05it/s/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
{'train_runtime': 603.8881, 'train_samples_per_second': 0.212, 'train_steps_per_second': 0.007, 'train_loss': 1.333984375, 'epoch': 4.0}
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [10:03<00:00, 150.97s/it]
[2023-07-17 16:38:44,427] [INFO] [launch.py:347:main] Process 46522 exits successfully.
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46521 exits successfully.
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46520 exits successfully.
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46518 exits successfully.
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46523 exits successfully.
[2023-07-17 16:38:45,431] [INFO] [launch.py:347:main] Process 46524 exits successfully.
[2023-07-17 16:38:45,432] [INFO] [launch.py:347:main] Process 46519 exits successfully.
wandb: Waiting for W&B process to finish... (success).
wandb:
wandb: Run history:
wandb: train/epoch β
wandb: train/global_step β
wandb: train/total_flos β
wandb: train/train_loss β
wandb: train/train_runtime β
wandb: train/train_samples_per_second β
wandb: train/train_steps_per_second β
wandb:
wandb: Run summary:
wandb: train/epoch 4.0
wandb: train/global_step 4
wandb: train/total_flos 1903271472005120.0
wandb: train/train_loss 1.33398
wandb: train/train_runtime 603.8881
wandb: train/train_samples_per_second 0.212
wandb: train/train_steps_per_second 0.007
wandb:
wandb: π View run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned at: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230717_162731-w1l8j7n6/logs
[2023-07-17 16:40:50,566] [INFO] [launch.py:347:main] Process 46517 exits successfully.
``` |