ziqingyang
commited on
Commit
•
648e171
1
Parent(s):
15edae3
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,13 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
---
|
6 |
+
|
7 |
+
**VLE** (**V**isual-**L**anguage **E**ncoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders.
|
8 |
+
It can be used for multimodal discriminative tasks such as visual question answering and image-text retrieval.
|
9 |
+
Especially on the visual commonsense reasoning (VCR) task, which requires high-level language understanding and reasoning skills, VLE achieves significant improvements.
|
10 |
+
|
11 |
+
For more details see [https://github.com/iflytek/VLE](https://github.com/iflytek/VLE).
|
12 |
+
|
13 |
+
Online VLE demo on Visual Question Answering: [https://huggingface.co/spaces/hfl/VQA_VLE_LLM](https://huggingface.co/spaces/hfl/VQA_VLE_LLM)
|