Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE

#68

by rameshch - opened Aug 16

Aug 16

•

Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE? I have a fine-tuned MiniCPM-Llam3-v2-5 based model and i would like to merge this with one of our domain fine-tuned llama-3-1-8B text generation model and it throws error with mergekit.

This is becoming a show-stopper for us to move with using our vision fine-tuned model. Any guidance will help.

Alternatively, Pls confirm if we can use model merge_and_unload to merge the minicmp_v5 adapters with llama 3.1 based model.

rameshch changed discussion status to closed Aug 16

rameshch changed discussion status to open Aug 16

Cuiunbo

OpenBMB org Aug 17

I haven't tried it, and most of the errors in this strategy come from the fact that the model architectures of llama3.1 and llama3 are not the same, and based on my reading of the llama3.1 paper, it looks like they made very, very minor changes to his architecture, and this may be the reason

rameshch

Aug 17

•

edited Aug 17

Thanks @Cuiunbo for your reply. Can you pls assist to give a try by merging MiniCPM-Llam3-v2-5 based model / adapter with a Llama-3-1 / 3.0 text generation model and guide me if possible ?. Thanks in advance for your assistance.

I had issues merging adapter of MiniCPM-Llam3-v2-5 model with LLama-3 based text model. Even if you can assist to make this work, this will help to remove a hurdle and help us to move forward.

Cuiunbo

OpenBMB org Aug 17

Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.

rameshch

Aug 17

Thanks @Cuiunbo . Can you pls guide me on how I could do the above steps ? I would be thankful for your updates here again on your assistance

rameshch

Aug 18

@Cuiunbo Additional Info. When i try to merge the MiniCPM-V2.5 adapter with a LLama 3 base model , I see the below error
"ValueError: Target modules llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj) not found in the base model. Please check the target modules and try again."

with the statement
model = PeftModel.from_pretrained(model, new_model_name)

Here model is a llama-3 base model and new_model_name points an adapter from MiniCPM-v2.5

rameshch

Aug 20

Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.

@Cuiunbo i tried as you said to load the weights from our domain based text llama-3 model into the MiniCPM-v2.5-llama model folder and configured the model.safetensors.index.json file within the folder to point "llm.*" properties alone to point to our text model weights without touching other properties.

When i try to load the model for inference, I am getting the below error
ERROR Stack

File "C:\ProgramData\anaconda3\envs\llava\lib\site-packages\accelerate\utils\modeling.py", line 354, in set_module_tensor_to_device
raise ValueError(f"{tensor_name} is on the meta device, we need a value to put in on {device}.")
ValueError: weight is on the meta device, we need a value to put in on 0.

can u pls guide me

rameshch

Aug 27

@Cuiunbo Pls respond

Cuiunbo

OpenBMB org Sep 1

You may try first loading all model dict to GPU(your model and minicpmv25), then replace every LLM layer.

SrikanthChellappa

Sep 2

This comment has been hidden

rameshch

Sep 2

Thanks @Cuiunbo Can you give details on how we can do this - first try loading all model dict to GPU(your model and minicpmv25), then replace every LLM layer) ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment