File size: 57,216 Bytes


---
language: 
  - en
tags:
- open-assitant
- falcon
license: "unknown"
datasets:
- toanbku/oa-df
---

- Datasets: https://huggingface.co/datasets/toanbku/oa-df  
- Training log: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6/overview

Command

```
export BS=8
deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py \
--config defaults oa-falcon-7b-top1 oasst_df \
--cache_dir /home/ubuntu/OA/model/model_training/.cache \
--per_device_eval_batch_size $BS --per_device_train_batch_size $BS \
--deepspeed
```

Config

```
oa-falcon-7b-top1:
  dtype: bf16
  log_dir: "falcon_log_7b"
  learning_rate: 1e-5
  model_name: "OpenAssistant/falcon-7b-sft-top1-696"
  deepspeed_config: configs/zero_config.json
  output_dir: falcon
  weight_decay: 0.0
  max_length: 2048
  save_strategy: steps
  eval_steps: 80
  save_steps: 80
  warmup_steps: 4
  gradient_checkpointing: true
  gradient_accumulation_steps: 2
  per_device_train_batch_size: 2
  per_device_eval_batch_size: 4
  num_train_epochs: 4
  save_total_limit: 2
  residual_dropout: 0.2
  residual_dropout_lima: true


oasst_df:
  save_strategy: epoch
  datasets:
    - oasst_export:
        lang: "en"
        hf_dataset_name: toanbku/oa-df
        val_split: 0.05
```

### Demo

- **input_text:** <|prompter|>Provide information about Dwarves Foundation company<|endoftext|><|assistant|>
- **output:** 

```
Dwarves Foundation is a blockchain-based, decentralized platform that aims to create a global community based on equality and inclusion. It was founded by Alex Mashinsky, an entrepreneur and investor. The platform uses blockchain technology to create a decentralized and transparent system for the creation and management of projects.
Dwarves Foundation allows users to create and participate in projects that promote equality and inclusion. Users can contribute time, money, or other resources to projects, and they are rewarded with DVA, the platform's native cryptocurrency. This cryptocurrency can be used to participate in future projects or to purchase goods and services within the Dwarves Foundation ecosystem.
The platform aims to create a community that is diverse and inclusive, and it encourages people from all walks of life to participate. It believes that collaboration and teamwork are key to solving the world's problems, and it aims to create a platform where people can work together to make the world a better place.
So far, Dwarves Foundation has been successful in creating a community and launching several projects. One of its most successful projects, the AION network, is a decentralized platform that allows users to create and participate in projects using the DVA cryptocurrency. The platform has already attracted thousands of users and has several projects in development.
Overall, Dwarves Foundation is a unique platform that aims to create a global community based on equality and inclusion. It uses blockchain technology to create a transparent and decentralized system, and it encourages people from all walks of life to participate and collaborate towards a better future.
```

- **log:**
```
python ./test.py 
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[2023-07-17 11:32:36,837] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.98s/it]
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
```



----

### Training log

```
(cuda118) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --deepspeed
[2023-07-17 16:21:13,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:16,536] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-07-17 16:21:16,536] [INFO] [runner.py:555:main] cmd = /home/ubuntu/mambaforge/envs/cuda118/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=61000 --enable_each_rank_log=None trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --deepspeed
[2023-07-17 16:21:17,929] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:20,292] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-07-17 16:21:20,292] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-07-17 16:21:20,292] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-07-17 16:21:20,292] [INFO] [launch.py:163:main] dist_world_size=8
[2023-07-17 16:21:20,292] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-07-17 16:21:24,714] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:24,805] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,000] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,151] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,228] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,251] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-17 16:21:25,299] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
trainig_conf = Namespace(rng_seed=2703368087, learning_rate='1e-5', gradient_checkpointing=True, gradient_accumulation_steps=2, per_device_train_batch_size=8, per_device_eval_batch_size=8, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon='1e-12', weight_decay=0.0, warmup_steps=4, eval_steps=80, save_strategy='epoch', save_steps=80, max_length=2048, val_max_length=None, num_train_epochs=4, logging_steps=10, max_grad_norm=2.0, save_total_limit=2, dtype='bf16', eval_accumulation_steps=None, freeze_layer=None, datasets=[{'oasst_export': {'lang': 'en', 'hf_dataset_name': 'toanbku/oa-df', 'val_split': 0.05}}], datasets_extra=[], cache_dir='/home/ubuntu/OA/model/model_training/.cache', loss_fn='CrossEntropyLoss', eval_size=None, log_dir='falcon_log_7b', quantization=False, seq2seqmodel=False, poly_eps=1.0, fuse_gelu=True, log_wandb=True, samples_mixing=False, verbose=False, output_dir='falcon', use_custom_sampler=False, random_offset_probability=0.8, label_masking=True, residual_dropout=0.2, use_flash_attention=False, sort_by_length=False, use_system_prefix=False, system_prefix='You are Joi, a large language model trained by Open-Assistant. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-12', use_system_tag=False, system_property_dropout=0.5, system_add_length=False, per_digit_tokens=False, is_reward_model=False, residual_dropout_lima=True, deepspeed_config='configs/zero_config.json', peft_model=False, peft_type='lora', model_name='OpenAssistant/falcon-7b-sft-top1-696', wandb_entity='toanbku', local_rank=0, deepspeed=True, resume_from_checkpoint=False, show_dataset_stats=False, world_size=8)
[2023-07-17 16:21:25,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:25,864] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:25,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-17 16:21:25,952] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:25,952] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,311] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,312] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,320] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,320] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,407] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,407] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,511] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,512] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,558] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-17 16:21:26,618] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-17 16:21:26,619] [INFO] [comm.py:594:init_distributed] cdb=None
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
Tokenizer sanity check:
Type: PreTrainedTokenizerFast
special_tokens_map: {'eos_token': '<|endoftext|>', 'sep_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|prompter|>', '>>SUFFIX<<', '<|prefix_begin|>', '>>INTRODUCTION<<', '>>QUESTION<<', '>>SUMMARY<<', '<|prefix_end|>', '>>DOMAIN<<', '<|assistant|>', '<|system|>', '>>TITLE<<', '>>COMMENT<<', '>>MIDDLE<<', '>>PREFIX<<', '>>ANSWER<<', '>>ABSTRACT<<']}
Using bos_token, but it is not set yet.
bos_token='None', bos_token_id=None
eos_token='<|endoftext|>', eos_token_id=11
prompter_token_id=65028, assistant_token_id=65025
encoding result: {'input_ids': [65028, 60, 28, 11, 65024, 13318, 37, 445, 193, 7055, 37, 204, 28, 193, 11723, 37, 20906, 193, 11, 65025, 44, 28, 11, 65028, 60, 29, 11, 65024, 7055, 37, 204, 28, 193, 13318, 37, 445, 193, 11723, 37, 20906, 193, 11, 65025, 44, 29, 11], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
0: 65028 -> "<|prompter|>"
1: 60 -> "Q"
2: 28 -> "1"
3: 11 -> "<|endoftext|>"
4: 65024 -> "<|system|>"
5: 13318 -> "lang"
6: 37 -> ":"
7: 445 -> " en"
8: 193 -> "
"
9: 7055 -> "length"
10: 37 -> ":"
11: 204 -> " "
12: 28 -> "1"
13: 193 -> "
"
14: 11723 -> "context"
15: 37 -> ":"
16: 20906 -> " ctx"
17: 193 -> "
"
18: 11 -> "<|endoftext|>"
19: 65025 -> "<|assistant|>"
20: 44 -> "A"
21: 28 -> "1"
22: 11 -> "<|endoftext|>"
23: 65028 -> "<|prompter|>"
24: 60 -> "Q"
25: 29 -> "2"
26: 11 -> "<|endoftext|>"
27: 65024 -> "<|system|>"
28: 7055 -> "length"
29: 37 -> ":"
30: 204 -> " "
31: 28 -> "1"
32: 193 -> "
"
33: 13318 -> "lang"
34: 37 -> ":"
35: 445 -> " en"
36: 193 -> "
"
37: 11723 -> "context"
38: 37 -> ":"
39: 20906 -> " ctx"
40: 193 -> "
"
41: 11 -> "<|endoftext|>"
42: 65025 -> "<|assistant|>"
43: 44 -> "A"
44: 29 -> "2"
45: 11 -> "<|endoftext|>"
message_indices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3]
Downloading and preparing dataset json/toanbku--oa-df to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50.4k/50.4k [00:00<00:00, 45.8MB/s]
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.5k/11.5k [00:00<00:00, 38.3MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.19it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1782.53it/s]
Dataset json downloaded and prepared to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 11.1MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.9MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 9.35MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.4MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 14.0MB/s]
Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 9.62MB/s]
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script:   0%|                                                                                                                                                                     | 0.00/4.20k [00:00<?, ?B/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.7MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.6MB/s]
Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
                                                                                                                                                                                                                                 Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.                             | 10.5M/1.92G [00:00<00:20, 91.9MB/s]
Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
                                                                                                                                                                                                                                 Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.                                     | 21.0M/1.92G [00:00<00:19, 96.0MB/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Downloading (…)l-00001-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.92G/1.92G [00:33<00:00, 57.1MB/s]
Downloading (…)l-00002-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.99G/1.99G [00:36<00:00, 54.4MB/s]
Downloading (…)l-00003-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:37<00:00, 50.7MB/s]
Downloading (…)l-00004-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:35<00:00, 53.1MB/s]
Downloading (…)l-00005-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.99G/1.99G [00:36<00:00, 53.9MB/s]
Downloading (…)l-00006-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:35<00:00, 54.3MB/s]
Downloading (…)l-00007-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:37<00:00, 50.4MB/s]
Downloading (…)l-00008-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 921M/921M [00:18<00:00, 50.7MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:32<00:00, 34.10s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.14s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:32<00:00, 34.11s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.17s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.13s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.16s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.14s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.20s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00,  1.92s/it]
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 638kB/s]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:16<00:00,  2.03s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00,  2.00s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards:  75%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                         | 6/8 [00:16<00:05,  2.63s/it]Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:18<00:00,  2.29s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.42s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.47s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.48s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.47s/it]
Resizing embeddings to 65040
Number of trainable parameters: 6921M
Resizing embeddings to 65040
Number of trainable parameters: 6921M
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230717_162731-w1l8j7n6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned
wandb: ⭐️ View project at https://wandb.ai/toanbku/supervised-finetuning
wandb: 🚀 View run at https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
[1/3] /home/ubuntu/mambaforge/envs/cuda118/bin/nvcc  -ccbin /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++17 -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
[2/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
[3/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/envs/cuda118/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 29.23199462890625 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.244072675704956 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.24746584892273 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.145082473754883 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.245840072631836 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.245012760162354 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.24890160560608 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 29.146907329559326 seconds
Rank: 6 partition count [8] and sizes[(865224176, False)] 
Rank: 4 partition count [8] and sizes[(865224176, False)] 
Rank: 3 partition count [8] and sizes[(865224176, False)] 
Rank: 7 partition count [8] and sizes[(865224176, False)] 
Rank: 5 partition count [8] and sizes[(865224176, False)] 
Rank: 0 partition count [8] and sizes[(865224176, False)] 
Rank: 2 partition count [8] and sizes[(865224176, False)] 
Rank: 1 partition count [8] and sizes[(865224176, False)] 
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  0%|                                                                                                                                                                                                       | 0/4 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
 25%|███████████████████████████████████████████████▊                                                                                                                                               | 1/4 [00:00<00:02,  1.05it/s/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
{'train_runtime': 603.8881, 'train_samples_per_second': 0.212, 'train_steps_per_second': 0.007, 'train_loss': 1.333984375, 'epoch': 4.0}                                                                                          
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [10:03<00:00, 150.97s/it]
[2023-07-17 16:38:44,427] [INFO] [launch.py:347:main] Process 46522 exits successfully.
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46521 exits successfully.
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46520 exits successfully.
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46518 exits successfully.
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46523 exits successfully.
[2023-07-17 16:38:45,431] [INFO] [launch.py:347:main] Process 46524 exits successfully.
[2023-07-17 16:38:45,432] [INFO] [launch.py:347:main] Process 46519 exits successfully.
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run history:
wandb:                    train/epoch ▁
wandb:              train/global_step ▁
wandb:               train/total_flos ▁
wandb:               train/train_loss ▁
wandb:            train/train_runtime ▁
wandb: train/train_samples_per_second ▁
wandb:   train/train_steps_per_second ▁
wandb: 
wandb: Run summary:
wandb:                    train/epoch 4.0
wandb:              train/global_step 4
wandb:               train/total_flos 1903271472005120.0
wandb:               train/train_loss 1.33398
wandb:            train/train_runtime 603.8881
wandb: train/train_samples_per_second 0.212
wandb:   train/train_steps_per_second 0.007
wandb: 
wandb: 🚀 View run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned at: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230717_162731-w1l8j7n6/logs
[2023-07-17 16:40:50,566] [INFO] [launch.py:347:main] Process 46517 exits successfully.
```