tvl-mini
Description
This is finetune of Qwen2-VL-2B on russian language.
tvl was trained in bf16
Data
Train dataset contains:
- GrandMaster-PRO-MAX dataset (60k samples)
- Translated, humanized and merged by image subset of GQA (TODO)
Bechmarks
TODO
Quickstart
Your can simply run this notebook or run code below.
First install qwen-vl-utils and dev version of transformers:
pip install qwen-vl-utils
pip install --no-cache-dir git+https://github.com/huggingface/transformers@19e6e80e10118f855137b90740936c0b11ac397f
And then run:
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
model = Qwen2VLForConditionalGeneration.from_pretrained(
"2Vasabi/tvl-mini-0.1", torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained("2Vasabi/tvl-mini-0.1")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://i.ibb.co/d0QL8s6/images.jpg",
},
{"type": "text", "text": "Кратко опиши что ты видишь на изображении"},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1000)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
- Downloads last month
- 79
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for 2Vasabi/tvl-mini-0.1
Base model
Qwen/Qwen2-VL-2B-Instruct