Error when finetuning on windows
Hi I am using this finetune with some modification to finetune startcoder
What I changed is:
Load_data- that loads locas parquet files saved on disk
also the model is saved on disk and I am loading it from there
I am using Windows 10 Enterprise with 2 NVIDIA Quuadro RTX 8000 gpus each 48gb
When I run Trainer.train() the process give cuda error so I started debugging this:
the training part and I see that the dataset and dataloaders work fine
get a cuda error on model inference part
Using this code:
training_args=TrainingArguments(
output_dir=args.output_dir,
dataloader_drop_last=True,
dataloader_num_workers =4,
evaluation_strategy="steps",
max_steps=args.max_steps,
eval_steps=args.eval_freq,
save_steps=args.save_freq,
logging_steps=args.log_freq,
per_device_train_batch_size=args.batch_size,
per_device_eval_batch_size=args.batch_size,
learning_rate=args.learning_rate,
lr_scheduler_type=args.lr_scheduler_type,
warmup_steps=args.num_warmup_steps,
gradient_accumulation_steps=args.gradient_accumulation_steps,
gradient_checkpointing=args.no_gradient_checkpointing,
fp16=not args.no_fp16,
bf16=args.bf16,
weight_decay=args.weight_decay,
report_to="tensorboard",
run_name=f"santacoder-{args.subset}",
)
# model.to("cpu")
trainer=Trainer(model=model,args=training_args,train_dataset=train_data,eval_dataset=val_data,callbacks=[SavePeftModelCallback, LoadBestPeftModelCallback])
print("Training...")
# some tests:
for batch in trainer.get_train_dataloader():
# batch["input_ids"]=batch["input_ids"].cpu()
# batch["labels"]=batch["labels"].cpu()
break
outputs=trainer.model(**batch)
And I get an error from bitsandbytes. I am using version 0.37.35
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
File "C:\Users\rachel_shalom.conda\envs\codegen\Lib\site-packages\bitsandbytes\functional.py", line 1634, in double_quant nnz = nnz_row_ptr[-1].item() ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\rachel_shalom.conda\envs\codegen\Lib\site-packages\bitsandbytes\autograd_functions.py", line 303, in forward CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold)
I tried testing this using only cpu by moving the model and inputs to cpu
But I get a very weird error in the forward pass saying that the model and inputs are not on the same device and when I checked the inputs on forward pass were on cuda- even though they were on cpu before it was fed to the model. The model was on cpu.
Any ideas on how to continue debugging this from from here?
it seems like bitsandbytes do not work well with windows even though I downloaded "bitsandbytes-windows" and according to this: https://huggingface.co/blog/hf-bitsandbytes-integration
"8-bit tensor cores are not supported on the CPU. bitsandbytes can be run on 8-bit tensor core-supported hardware, which are Turing and Ampere GPUs (RTX 20s, RTX 30s, A40-A100, T4+). For example, Google Colab GPUs are usually NVIDIA T4 GPUs, and their latest generation of GPUs does support 8-bit tensor cores." my gpus are rtx 8000 and it looks like they do not support 8-bit tensors, so I loaded the model with fp.16 and started training
Hi Rachel, did you find any workaround for the above issue?
I am finetuning with lora but without bitsandbytes so do not load your model in 8bit
I am trying to do that in my windows laptop with GPU RTX 4060, but when I am loading the model, my laptop hangs.
Below is the command:
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype='auto')
I have sufficient disk storage, 12GB GPU RAM, 32 GB CPU RAM, more than 600GB hard disk storage.