zR commited on
Commit
2e5cbb2
1 Parent(s): 4c3068a
Files changed (2) hide show
  1. README.md +25 -8
  2. README_zh.md +10 -13
README.md CHANGED
@@ -1,12 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
  # CogVLM2-Llama3-Caption
2
 
3
  <div align="center">
4
  <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
5
  </div>
6
 
7
- 通常情况下,大部分视频数据并没有附带相应的描述性文本,因此有必要将视频数据转换成文本描述,以提供文本到视频模型所需的必要训练数据。
 
 
 
 
 
8
 
9
- ## 使用方式
10
  ```python
11
  import io
12
  import numpy as np
@@ -119,12 +135,14 @@ if __name__ == '__main__':
119
 
120
  ```
121
 
122
- ## 模型协议
123
 
124
- 此模型根据 CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0) 发布。对于使用 Meta Llama 3 构建的模型,还请遵守
125
- [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0)
 
 
126
 
127
- ## 引用
128
 
129
  🌟 If you find our work helpful, please leave us a star and cite our paper.
130
 
@@ -134,5 +152,4 @@ if __name__ == '__main__':
134
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
135
  journal={arXiv preprint arXiv:2408.06072},
136
  year={2024}
137
- }
138
- ```
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Meta-Llama-3.1-8B-Instruct
7
+ pipeline_tag: video-text-to-text
8
+ inference: false
9
+ ---
10
+
11
+ [中文阅读](README_zh.md)
12
+
13
  # CogVLM2-Llama3-Caption
14
 
15
  <div align="center">
16
  <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
17
  </div>
18
 
19
+ # Introduction
20
+
21
+ Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
22
+ data into textual descriptions to provide the essential training data for text-to-video models.
23
+
24
+ ## Usage
25
 
 
26
  ```python
27
  import io
28
  import numpy as np
 
135
 
136
  ```
137
 
138
+ ## License
139
 
140
+ This model is released under the
141
+ CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0).
142
+ For models built with Meta Llama 3, please also adhere to
143
+ the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
144
 
145
+ ## Citation
146
 
147
  🌟 If you find our work helpful, please leave us a star and cite our paper.
148
 
 
152
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
153
  journal={arXiv preprint arXiv:2408.06072},
154
  year={2024}
155
+ }
 
README_zh.md CHANGED
@@ -1,16 +1,14 @@
 
 
1
  # CogVLM2-Llama3-Caption
2
 
3
  <div align="center">
4
  <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
5
  </div>
6
 
7
- # Introduction
8
-
9
- Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
10
- data into textual descriptions to provide the essential training data for text-to-video models.
11
-
12
- ## Usage
13
 
 
14
  ```python
15
  import io
16
  import numpy as np
@@ -123,14 +121,12 @@ if __name__ == '__main__':
123
 
124
  ```
125
 
126
- ## License
127
 
128
- This model is released under the
129
- CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0).
130
- For models built with Meta Llama 3, please also adhere to
131
- the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
132
 
133
- ## Citation
134
 
135
  🌟 If you find our work helpful, please leave us a star and cite our paper.
136
 
@@ -140,4 +136,5 @@ the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-b
140
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
141
  journal={arXiv preprint arXiv:2408.06072},
142
  year={2024}
143
- }
 
 
1
+ [Read This in English](README_en.md)
2
+
3
  # CogVLM2-Llama3-Caption
4
 
5
  <div align="center">
6
  <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
7
  </div>
8
 
9
+ 通常情况下,大部分视频数据并没有附带相应的描述性文本,因此有必要将视频数据转换成文本描述,以提供文本到视频模型所需的必要训练数据。
 
 
 
 
 
10
 
11
+ ## 使用方式
12
  ```python
13
  import io
14
  import numpy as np
 
121
 
122
  ```
123
 
124
+ ## 模型协议
125
 
126
+ 此模型根据 CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0) 发布。对于使用 Meta Llama 3 构建的模型,还请遵守
127
+ [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0)
 
 
128
 
129
+ ## 引用
130
 
131
  🌟 If you find our work helpful, please leave us a star and cite our paper.
132
 
 
136
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
137
  journal={arXiv preprint arXiv:2408.06072},
138
  year={2024}
139
+ }
140
+ ```