🍭 Fine-tuning support for Qwen2-VL-7B-Instruct
The open-source release of Qwen2-VL is truly exciting 😊. We have supported VQA, OCR, grounding fine-tuning, and video fine-tuning for qwen2-vl.
English fine-tuning document:
https://swift.readthedocs.io/en/latest/Multi-Modal/qwen2-vl-best-practice.html
Nice! Thank you!
Thank you,
@study-hjt
!
Is there a way that we can lock the encoder and fine-tuning the LM decoder only to accept typical multi-turn conversations with image/video as a part of the conversation?
Error occurred in V100:
RuntimeError: CUDA error: too many resources requested for launch CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
Error occurred in V100:
RuntimeError: CUDA error: too many resources requested for launch CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
Try to remove ' torch_dtype="auto" ' !
using the following codes works, not sure why torch_dtype="auto" failed.
model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen2-VL-7B-Instruct", torch_dtype=torch.float16, device_map="auto"
)