AuroraCap-7B-VID / README.md
Reself's picture
Update README.md
88ed080 verified
---
license: apache-2.0
datasets:
- Reself/AuroraCap-trainset
base_model:
- lmsys/vicuna-7b-v1.5-16k
tags:
- caption
model-index:
- name: AuroraCap-7B
results:
- task:
type: video detailed caption
dataset:
type: VDC
name: VDC
metrics:
- type: Acc
value: 38.21
name: VDCScore
- type: Acc
value: 48.33
name: VDD
- type: cider
value: 9.51
- type: bleu
value: 30.90
name: bleu@1
- type: bleu
value: 4.06
name: bleu@4
- type: meteor
value: 19.09
- type: rouge
value: 21.58
name: rouge-l
- task:
type: video caption
dataset:
type: MSR-VTT
name: NSR-VTT
metrics:
- type: cider
value: 33.1
- type: bleu
value: 58.6
name: bleu@1
- type: bleu
value: 21.0
name: bleu@4
- type: meteor
value: 23.9
- type: rouge
value: 49.5
name: rouge-l
- task:
type: video caption
dataset:
type: VATEX
name: VATEX
metrics:
- type: cider
value: 33.8
- type: bleu
value: 57.1
name: bleu@1
- type: bleu
value: 18.4
name: bleu@4
- type: meteor
value: 19.0
- type: rouge
value: 40.8
name: rouge-l
- task:
type: video question anwering
dataset:
type: ActivityNet
name: ActivityNet
metrics:
- type: Acc
value: 61.8
- task:
type: video question anwering
dataset:
type: MSVD
name: MSVD
metrics:
- type: Acc
value: 62.6
- task:
type: video question anwering
dataset:
type: MSR-VTT
name: MSR-VTT
metrics:
- type: Acc
value: 43.5
- task:
type: video question anwering
dataset:
type: iVQA
name: iVQA
metrics:
- type: Acc
value: 55.2
---
<img src="assets/teaser.png" align="center">
## Features
<img src="assets/vdc_baseline.png" align="center">
AuroraCap is a multimodal large language model for image and video captioning.
## Resources
- [Website](https://rese1f.github.io/aurora-web/)
- [arXiv: Paper]()
- [GitHub: Code](https://github.com/rese1f/aurora)
- [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013)
- [Huggingface: VDC Benchmark](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013)