Edit model card

tvl-mini

Description

This is finetune of Qwen2-VL-2B on russian language.

tvl was trained in bf16

Data

Train dataset contains:

  • GrandMaster-PRO-MAX dataset (60k samples)
  • Translated, humanized and merged by image subset of GQA (TODO)

Bechmarks

TODO

Quickstart

Your can simply run this notebook or run code below.

First install qwen-vl-utils and dev version of transformers:

pip install qwen-vl-utils
pip install --no-cache-dir git+https://github.com/huggingface/transformers@19e6e80e10118f855137b90740936c0b11ac397f

And then run:

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "2Vasabi/tvl-mini-0.1", torch_dtype=torch.bfloat16, device_map="auto"
)


processor = AutoProcessor.from_pretrained("2Vasabi/tvl-mini-0.1")
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://i.ibb.co/d0QL8s6/images.jpg",
            },
            {"type": "text", "text": "Кратко опиши что ты видишь на изображении"},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)

inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=1000)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
Downloads last month
79
Safetensors
Model size
2.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for 2Vasabi/tvl-mini-0.1

Finetuned
(12)
this model

Dataset used to train 2Vasabi/tvl-mini-0.1