hfl
/

vle-base

Inference Endpoints

Model card Files Files and versions Community

vle-base / README.md

ziqingyang's picture

Update README.md

41e7a27 over 1 year ago

|

history blame contribute delete

700 Bytes

	---
	license: apache-2.0
	language:
	- en
	---

	VLE (Visual-Language Encoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders.
	It can be used for multimodal discriminative tasks such as visual question answering and image-text retrieval.
	Especially on the visual commonsense reasoning (VCR) task, which requires high-level language understanding and reasoning skills, VLE achieves significant improvements.

	For more details see [https://github.com/iflytek/VLE](https://github.com/iflytek/VLE).

	Online VLE demo on Visual Question Answering: [https://huggingface.co/spaces/hfl/VQA_VLE_LLM](https://huggingface.co/spaces/hfl/VQA_VLE_LLM)