MMInstruction
/

Silkie

Text Generation

Model card Files Files and versions Community

Silkie / README.md

Zhihui's picture

Create README.md

45d883d 10 months ago

|

No virus

2.42 kB

	---
	datasets:
	- MMInstruction/VLFeedback
	---
	# Model Card for Silkie

	<!-- Provide a quick summary of what the model is/does. -->

	Silkie is a visual language model trained using preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) and was trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Silkie is a visual language model trained by preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) that is trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Compared with the original model, Silkile achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Besides, Silkie sets a new state-of-the-art score of 3.02 on MMHal-Bench regarding hallucination evaluation. Please refer to our [project page](https://vlf-silkie.github.io/) for more details.

	## Model Sources

	<!-- Provide the basic links for the model. -->

	- Project page: https://vlf-silkie.github.io/
	- Dataset: https://huggingface.co/datasets/MMInstruction/VLFeedback
	- Paper: Coming soon.
	- Repository: Coming soon.

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	Silkie is intended for research purposes, particularly for alignment research in multimodal models.

	## How to Get Started

	Below is a simple Python code snippet to get started with the model.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(
	"MMInstruction/Silkie", trust_remote_code=True
	)
	model = AutoModelForCausalLM.from_pretrained(
	"MMInstruction/Silkie", device_map="cuda", trust_remote_code=True
	).eval()
	query = tokenizer.from_list_format(
	[
	{"image": "https://farm8.staticflickr.com/137/383965780_db4815011c_o.jpg"},
	{"text": "Which wooden stool has a vase with red flower on it?"},
	]
	)
	response, history = model.chat(tokenizer, query=query, history=None)
	```

	## Citation

	```
	Coming soon.
	```