Zhihui commited on
Commit
45d883d
1 Parent(s): 1a6ae74

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - MMInstruction/VLFeedback
4
+ ---
5
+ # Model Card for Silkie
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ Silkie is a visual language model trained using preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) and was trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Silkie is a visual language model trained by preference distillation on GPT-4V annotated AI feedback. It is a fine-tuned version of [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) that is trained on our [MMInstruction/VLFeedback](https://huggingface.co/datasets/MMInstruction/VLFeedback) dataset with direct preference optimization (DPO). Compared with the original model, Silkile achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Besides, Silkie sets a new state-of-the-art score of 3.02 on MMHal-Bench regarding hallucination evaluation. Please refer to our [project page](https://vlf-silkie.github.io/) for more details.
10
+
11
+ ## Model Sources
12
+
13
+ <!-- Provide the basic links for the model. -->
14
+
15
+ - **Project page:** https://vlf-silkie.github.io/
16
+ - **Dataset:** https://huggingface.co/datasets/MMInstruction/VLFeedback
17
+ - **Paper:** Coming soon.
18
+ - **Repository:** Coming soon.
19
+
20
+ ## Uses
21
+
22
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
23
+
24
+ Silkie is intended for research purposes, particularly for alignment research in multimodal models.
25
+
26
+ ## How to Get Started
27
+
28
+ Below is a simple Python code snippet to get started with the model.
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained(
34
+ "MMInstruction/Silkie", trust_remote_code=True
35
+ )
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ "MMInstruction/Silkie", device_map="cuda", trust_remote_code=True
38
+ ).eval()
39
+ query = tokenizer.from_list_format(
40
+ [
41
+ {"image": "https://farm8.staticflickr.com/137/383965780_db4815011c_o.jpg"},
42
+ {"text": "Which wooden stool has a vase with red flower on it?"},
43
+ ]
44
+ )
45
+ response, history = model.chat(tokenizer, query=query, history=None)
46
+ ```
47
+
48
+ ## Citation
49
+
50
+ ```
51
+ Coming soon.
52
+ ```