Fine-Tuning

#7
by prvns - opened

I have been trying to fine-tune this model.
I am getting this error.
" total_len = int(target.ne(tokenizer.pad_token_id).sum())
TypeError: ne() received an invalid combination of arguments - got (NoneType), but expected one of:

  • (Tensor other)"

I searched online, Llama3 didn't have any pad token before. But in the latest version it has been added. https://github.com/meta-llama/llama3/issues/42
This is my training command.
Could you let me know where I am making mistake.
%%sh
deepspeed LLaVA/llava/train/train_mem.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed LLaVA/scripts/zero3.json
--model_name_or_path %%sh
deepspeed LLaVA/llava/train/train_mem.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed LLaVA/scripts/zero3.json
--model_name_or_path lmms-lab/llama3-llava-next-8b
--version llava_llama_3
--data_path train_val_new.json
--cache_dir ./
--image_folder ./data/
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir ./checkpoints/Boreholellm
--num_train_epochs 1
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--gradient_accumulation_steps 1
--evaluation_strategy "epoch"
--save_strategy "steps"
--save_steps 3000
--save_total_limit 10
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 8192
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb
--version llava_llama_3
--data_path train_val_new.json
--cache_dir ./
--image_folder ./data/
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir ./checkpoints/Boreholellm
--num_train_epochs 1
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--gradient_accumulation_steps 1
--evaluation_strategy "epoch"
--save_strategy "steps"
--save_steps 3000
--save_total_limit 10
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 8192
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb

Sign up or log in to comment