Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE
Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE? I have a fine-tuned MiniCPM-Llam3-v2-5 based model and i would like to merge this with one of our domain fine-tuned llama-3-1-8B text generation model and it throws error with mergekit.
This is becoming a show-stopper for us to move with using our vision fine-tuned model. Any guidance will help.
Alternatively, Pls confirm if we can use model merge_and_unload to merge the minicmp_v5 adapters with llama 3.1 based model.
I haven't tried it, and most of the errors in this strategy come from the fact that the model architectures of llama3.1 and llama3 are not the same, and based on my reading of the llama3.1 paper, it looks like they made very, very minor changes to his architecture, and this may be the reason
Thanks @Cuiunbo for your reply. Can you pls assist to give a try by merging MiniCPM-Llam3-v2-5 based model / adapter with a Llama-3-1 / 3.0 text generation model and guide me if possible ?. Thanks in advance for your assistance.
I had issues merging adapter of MiniCPM-Llam3-v2-5 model with LLama-3 based text model. Even if you can assist to make this work, this will help to remove a hurdle and help us to move forward.
Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.
@Cuiunbo
Additional Info. When i try to merge the MiniCPM-V2.5 adapter with a LLama 3 base model , I see the below error
"ValueError: Target modules llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj) not found in the base model. Please check the target modules and try again."
with the statement
model = PeftModel.from_pretrained(model, new_model_name)
Here model is a llama-3 base model and new_model_name points an adapter from MiniCPM-v2.5
Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.
@Cuiunbo i tried as you said to load the weights from our domain based text llama-3 model into the MiniCPM-v2.5-llama model folder and configured the model.safetensors.index.json file within the folder to point "llm.*" properties alone to point to our text model weights without touching other properties.
When i try to load the model for inference, I am getting the below error
ERROR Stack
File "C:\ProgramData\anaconda3\envs\llava\lib\site-packages\accelerate\utils\modeling.py", line 354, in set_module_tensor_to_device
raise ValueError(f"{tensor_name} is on the meta device, we need a value
to put in on {device}.")
ValueError: weight is on the meta device, we need a value
to put in on 0.
can u pls guide me
You may try first loading all model dict to GPU(your model and minicpmv25), then replace every LLM layer.