Vision-CAIR
/

LongVU_Qwen2_7B

Video-Text-to-Text

Model card Files Files and versions Community

Add pipeline tag, link to paper

#2

by nielsr HF staff - opened 12 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -4,6 +4,7 @@ datasets:
 - shenxq/VideoChat2
 base_model:
 - Vision-CAIR/LongVU_Qwen2_7B_img
 model-index:
 - name: llava-onevision-qwen-7b-ov
   results:
@@ -50,6 +51,8 @@ model-index:
 ---
 # LongVU
 Play with the model on the [HF demo](https://huggingface.co/spaces/Vision-CAIR/LongVU).
 <div align="left">

 - shenxq/VideoChat2
 base_model:
 - Vision-CAIR/LongVU_Qwen2_7B_img
+pipeline_tag: video-text-to-text
 model-index:
 - name: llava-onevision-qwen-7b-ov
   results:
 ---
 # LongVU
+This repository contains the model based on Qwen2-7B as presented in [LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding](https://huggingface.co/papers/2410.17434).
 Play with the model on the [HF demo](https://huggingface.co/spaces/Vision-CAIR/LongVU).
 <div align="left">