ermu2001
/

pllava-13b

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

ermu2001 commited on Apr 29

Commit

eb3aa65

•

1 Parent(s): dbdf958

Create README.md

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+---
+license: apache-2.0
+tags:
+- video LLM
+datasets:
+- OpenGVLab/VideoChat2-IT
+---
+# PLLaVA Model Card
+## Model details
+**Model type:**
+PLLaVA-13B is an open-source video-language chatbot trained by fine-tuning Image-LLM on video instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: llava-hf/llava-v1.6-vicuna-13b-hf
+**Model date:**
+PLLaVA-13B was trained in April 2024.
+**Paper or resources for more information:**
+- github repo: https://github.com/magic-research/PLLaVA
+- project page: https://pllava.github.io/
+- paper link: https://arxiv.org/abs/2404.16994
+## License
+llava-hf/llava-v1.6-vicuna-13b-hf license.
+**Where to send questions or comments about the model:**
+https://github.com/magic-research/PLLaVA/issues
+## Intended use
+**Primary intended uses:**
+The primary use of PLLaVA is research on large multimodal models and chatbots.
+**Primary intended users:**
+The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
+## Training dataset
+Video-Instruct-Tuning data of OpenGVLab/VideoChat2-IT
+## Evaluation dataset
+A collection of 6 benchmarks, including 5 VQA benchmarks and 1 recent benchmarks specifically proposed for Video-LMMs.