language: - en tags: - open-assitant - falcon license: "unknown" datasets: - toanbku/oa-df

Datasets: https://huggingface.co/datasets/toanbku/oa-df
Training log: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6/overview

Command
```
export BS=8
deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py \
  --config defaults oa-falcon-7b-top1 oasst_df \
  --cache_dir /home/ubuntu/OA/model/model_training/.cache \
  --per_device_eval_batch_size $BS --per_device_train_batch_size $BS \
  --deepspeed
```

Config
```
oa-falcon-7b-top1:
  dtype: bf16
  log_dir: "falcon_log_7b"
  learning_rate: 1e-5
  model_name: "OpenAssistant/falcon-7b-sft-top1-696"
  deepspeed_config: configs/zero_config.json
  output_dir: falcon
  weight_decay: 0.0
  max_length: 2048
  save_strategy: steps
  eval_steps: 80
  save_steps: 80
  warmup_steps: 4
  gradient_checkpointing: true
  gradient_accumulation_steps: 2
  per_device_train_batch_size: 2
  per_device_eval_batch_size: 4
  num_train_epochs: 4
  save_total_limit: 2
  residual_dropout: 0.2
  residual_dropout_lima: true

oasst_df:
  save_strategy: epoch
  datasets:
    - oasst_export:
        lang: "en"
        hf_dataset_name: toanbku/oa-df
        val_split: 0.05
```

### Demo
- **input_text:** <|prompter|>Provide information about Dwarves Foundation company<|endoftext|><|assistant|>
- **output:**
```
Dwarves Foundation is a blockchain-based, decentralized platform that aims to create a global community based on equality and inclusion. It was founded by Alex Mashinsky, an entrepreneur and investor. The platform uses blockchain technology to create a decentralized and transparent system for the creation and management of projects. Dwarves Foundation allows users to create and participate in projects that promote equality and inclusion. Users can contribute time, money, or other resources to projects, and they are rewarded with DVA, the platform's native cryptocurrency. This cryptocurrency can be used to participate in future projects or to purchase goods and services within the Dwarves Foundation ecosystem. The platform aims to create a community that is diverse and inclusive, and it encourages people from all walks of life to participate. It believes that collaboration and teamwork are key to solving the world's problems, and it aims to create a platform where people can work together to make the world a better place. So far, Dwarves Foundation has been successful in creating a community and launching several projects. One of its most successful projects, the AION network, is a decentralized platform that allows users to create and participate in projects using the DVA cryptocurrency. The platform has already attracted thousands of users and has several projects in development. Overall, Dwarves Foundation is a unique platform that aims to create a global community based on equality and inclusion. It uses blockchain technology to create a transparent and decentralized system, and it encourages people from all walks of life to participate and collaborate towards a better future. ``` - **log:** ``` python ./test.py Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation) warnings.warn( Setting `pad_token_id` to `eos_token_id`:11 for open-end generation. ``` ---- ### Training log ``` (cuda118) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --deepspeed [2023-07-17 16:21:13,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:16,536] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-07-17 16:21:16,536] [INFO] [runner.py:555:main] cmd = /home/ubuntu/mambaforge/envs/cuda118/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr= --master_port=61000 --enable_each_rank_log=None trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --deepspeed [2023-07-17 16:21:17,929] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:20,292] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]} [2023-07-17 16:21:20,292] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0 [2023-07-17 16:21:20,292] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}) [2023-07-17 16:21:20,292] [INFO] [launch.py:163:main] dist_world_size=8 [2023-07-17 16:21:20,292] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 [2023-07-17 16:21:24,714] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:24,805] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,000] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,151] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,228] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,251] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-07-17 16:21:25,299] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) trainig_conf = Namespace(rng_seed=2703368087, learning_rate='1e-5', gradient_checkpointing=True, gradient_accumulation_steps=2, per_device_train_batch_size=8, per_device_eval_batch_size=8, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon='1e-12', weight_decay=0.0, warmup_steps=4, eval_steps=80, save_strategy='epoch', save_steps=80, max_length=2048, val_max_length=None, num_train_epochs=4, logging_steps=10, max_grad_norm=2.0, save_total_limit=2, dtype='bf16', eval_accumulation_steps=None, freeze_layer=None, datasets=[{'oasst_export': {'lang': 'en', 'hf_dataset_name': 'toanbku/oa-df', 'val_split': 0.05}}], datasets_extra=[], cache_dir='/home/ubuntu/OA/model/model_training/.cache', loss_fn='CrossEntropyLoss', eval_size=None, log_dir='falcon_log_7b', quantization=False, seq2seqmodel=False, poly_eps=1.0, fuse_gelu=True, log_wandb=True, samples_mixing=False, verbose=False, output_dir='falcon', use_custom_sampler=False, random_offset_probability=0.8, label_masking=True, residual_dropout=0.2, use_flash_attention=False, sort_by_length=False, use_system_prefix=False, system_prefix='You are Joi, a large language model trained by Open-Assistant. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-12', use_system_tag=False, system_property_dropout=0.5, system_add_length=False, per_digit_tokens=False, is_reward_model=False, residual_dropout_lima=True, deepspeed_config='configs/zero_config.json', peft_model=False, peft_type='lora', model_name='OpenAssistant/falcon-7b-sft-top1-696', wandb_entity='toanbku', local_rank=0, deepspeed=True, resume_from_checkpoint=False, show_dataset_stats=False, world_size=8) [2023-07-17 16:21:25,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:25,864] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:25,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-17 16:21:25,952] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:25,952] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,311] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,312] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,320] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,320] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,407] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,407] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,511] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,512] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,558] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-17 16:21:26,618] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-17 16:21:26,619] [INFO] [comm.py:594:init_distributed] cdb=None RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 RNG seed: 2703368087 Tokenizer sanity check: Type: PreTrainedTokenizerFast special_tokens_map: {'eos_token': '<|endoftext|>', 'sep_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|prompter|>', '>>SUFFIX<<', '<|prefix_begin|>', '>>INTRODUCTION<<', '>>QUESTION<<', '>>SUMMARY<<', '<|prefix_end|>', '>>DOMAIN<<', '<|assistant|>', '<|system|>', '>>TITLE<<', '>>COMMENT<<', '>>MIDDLE<<', '>>PREFIX<<', '>>ANSWER<<', '>>ABSTRACT<<']} Using bos_token, but it is not set yet. bos_token='None', bos_token_id=None eos_token='<|endoftext|>', eos_token_id=11 prompter_token_id=65028, assistant_token_id=65025 encoding result: {'input_ids': [65028, 60, 28, 11, 65024, 13318, 37, 445, 193, 7055, 37, 204, 28, 193, 11723, 37, 20906, 193, 11, 65025, 44, 28, 11, 65028, 60, 29, 11, 65024, 7055, 37, 204, 28, 193, 13318, 37, 445, 193, 11723, 37, 20906, 193, 11, 65025, 44, 29, 11], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]} 0: 65028 -> "<|prompter|>" 1: 60 -> "Q" 2: 28 -> "1" 3: 11 -> "<|endoftext|>" 4: 65024 -> "<|system|>" 5: 13318 -> "lang" 6: 37 -> ":" 7: 445 -> " en" 8: 193 -> " " 9: 7055 -> "length" 10: 37 -> ":" 11: 204 -> " " 12: 28 -> "1" 13: 193 -> " " 14: 11723 -> "context" 15: 37 -> ":" 16: 20906 -> " ctx" 17: 193 -> " " 18: 11 -> "<|endoftext|>" 19: 65025 -> "<|assistant|>" 20: 44 -> "A" 21: 28 -> "1" 22: 11 -> "<|endoftext|>" 23: 65028 -> "<|prompter|>" 24: 60 -> "Q" 25: 29 -> "2" 26: 11 -> "<|endoftext|>" 27: 65024 -> "<|system|>" 28: 7055 -> "length" 29: 37 -> ":" 30: 204 -> " " 31: 28 -> "1" 32: 193 -> " " 33: 13318 -> "lang" 34: 37 -> ":" 35: 445 -> " en" 36: 193 -> " " 37: 11723 -> "context" 38: 37 -> ":" 39: 20906 -> " ctx" 40: 193 -> " " 41: 11 -> "<|endoftext|>" 42: 65025 -> "<|assistant|>" 43: 44 -> "A" 44: 29 -> "2" 45: 11 -> "<|endoftext|>" message_indices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3] Downloading and preparing dataset json/toanbku--oa-df to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96... 