Issues with Fine-Tuning
I'm trying to fine-tune this model and the base one but both are not learning. They are generating non-sense text during and after the process. I was suspecting that it related to the chat template configured in the tokenizer, but even after fixing it, the result is exactly the same.
Has anyone else experienced similar problems with this model?
Can you provide more details on how you load the model & Lora settings? I am also trying to finetune but failed with tensor size issues mentioned in this link
Can you provide more details on how you load the model & Lora settings? I am also trying to finetune but failed with tensor size issues mentioned in this link
Solved by pip install git+https://github.com/huggingface/transformers.git Seems like some changes are only in dev version of transformer not the latest release.
I managed to start the fine-tuning after installing transformers from source. However, the model is not learning at all. It seems to be related to the tokenizer configuration, but despite trying various settings, the output model only generates nonsensical results. The issue persists regardless of the LoRA or tokenizer configuration used. Here are the current LoRA and Tokenizer parameters:
LoRA:
- bits: 4
- lora_r: 256
- lora_alpha: 128
- lora_dropout: 0.05
- bias: "none"
- target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
- task_type: "CAUSAL_LM"
Tokenizer:
- padding: True
- padding_side: 'right'
- add_bos_token: False
- add_eos_token: True
- trust_remote_code: True
- use_auth_token: True
- eos_token: < /s>
- pad_token: < /s>
why are you NOT trining the lmhead ?
why are you NOT trining the lmhead ?
Do you mean base_layer in LoRA?
yes bro ...
target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","lm-head"]
Here ,
with this you will be able to Train the model fully !
and it will make big changes !
It seems that this was the problem! I'll get back soon reporting.
with this you will be able to Train the model fully !
Except for the embedding layer.
yes bro ...
target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","lm-head"]
Here ,
with this you will be able to Train the model fully !
and it will make big changes !
Thanks for the tip!
Quick question... Unsloth colab examples also include "embed_tokens" for full training. Is that also important for nemo CPT or we should stick to the 8 modules you suggested only?
Source: https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing#scrollTo=6bZsfBuZDeCL