zR
commited on
Commit
•
2e5cbb2
1
Parent(s):
4c3068a
readme
Browse files- README.md +25 -8
- README_zh.md +10 -13
README.md
CHANGED
@@ -1,12 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# CogVLM2-Llama3-Caption
|
2 |
|
3 |
<div align="center">
|
4 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
5 |
</div>
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
## 使用方式
|
10 |
```python
|
11 |
import io
|
12 |
import numpy as np
|
@@ -119,12 +135,14 @@ if __name__ == '__main__':
|
|
119 |
|
120 |
```
|
121 |
|
122 |
-
##
|
123 |
|
124 |
-
|
125 |
-
[
|
|
|
|
|
126 |
|
127 |
-
##
|
128 |
|
129 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
130 |
|
@@ -134,5 +152,4 @@ if __name__ == '__main__':
|
|
134 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
135 |
journal={arXiv preprint arXiv:2408.06072},
|
136 |
year={2024}
|
137 |
-
}
|
138 |
-
```
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- meta-llama/Meta-Llama-3.1-8B-Instruct
|
7 |
+
pipeline_tag: video-text-to-text
|
8 |
+
inference: false
|
9 |
+
---
|
10 |
+
|
11 |
+
[中文阅读](README_zh.md)
|
12 |
+
|
13 |
# CogVLM2-Llama3-Caption
|
14 |
|
15 |
<div align="center">
|
16 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
17 |
</div>
|
18 |
|
19 |
+
# Introduction
|
20 |
+
|
21 |
+
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
|
22 |
+
data into textual descriptions to provide the essential training data for text-to-video models.
|
23 |
+
|
24 |
+
## Usage
|
25 |
|
|
|
26 |
```python
|
27 |
import io
|
28 |
import numpy as np
|
|
|
135 |
|
136 |
```
|
137 |
|
138 |
+
## License
|
139 |
|
140 |
+
This model is released under the
|
141 |
+
CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0).
|
142 |
+
For models built with Meta Llama 3, please also adhere to
|
143 |
+
the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
|
144 |
|
145 |
+
## Citation
|
146 |
|
147 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
148 |
|
|
|
152 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
153 |
journal={arXiv preprint arXiv:2408.06072},
|
154 |
year={2024}
|
155 |
+
}
|
|
README_zh.md
CHANGED
@@ -1,16 +1,14 @@
|
|
|
|
|
|
1 |
# CogVLM2-Llama3-Caption
|
2 |
|
3 |
<div align="center">
|
4 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
5 |
</div>
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
|
10 |
-
data into textual descriptions to provide the essential training data for text-to-video models.
|
11 |
-
|
12 |
-
## Usage
|
13 |
|
|
|
14 |
```python
|
15 |
import io
|
16 |
import numpy as np
|
@@ -123,14 +121,12 @@ if __name__ == '__main__':
|
|
123 |
|
124 |
```
|
125 |
|
126 |
-
##
|
127 |
|
128 |
-
|
129 |
-
|
130 |
-
For models built with Meta Llama 3, please also adhere to
|
131 |
-
the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
|
132 |
|
133 |
-
##
|
134 |
|
135 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
136 |
|
@@ -140,4 +136,5 @@ the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-b
|
|
140 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
141 |
journal={arXiv preprint arXiv:2408.06072},
|
142 |
year={2024}
|
143 |
-
}
|
|
|
|
1 |
+
[Read This in English](README_en.md)
|
2 |
+
|
3 |
# CogVLM2-Llama3-Caption
|
4 |
|
5 |
<div align="center">
|
6 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
7 |
</div>
|
8 |
|
9 |
+
通常情况下,大部分视频数据并没有附带相应的描述性文本,因此有必要将视频数据转换成文本描述,以提供文本到视频模型所需的必要训练数据。
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
+
## 使用方式
|
12 |
```python
|
13 |
import io
|
14 |
import numpy as np
|
|
|
121 |
|
122 |
```
|
123 |
|
124 |
+
## 模型协议
|
125 |
|
126 |
+
此模型根据 CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0) 发布。对于使用 Meta Llama 3 构建的模型,还请遵守
|
127 |
+
[LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0)。
|
|
|
|
|
128 |
|
129 |
+
## 引用
|
130 |
|
131 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
132 |
|
|
|
136 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
137 |
journal={arXiv preprint arXiv:2408.06072},
|
138 |
year={2024}
|
139 |
+
}
|
140 |
+
```
|