Update README.md
Browse files
README.md
CHANGED
@@ -10,22 +10,30 @@ Multimodal Large Language Models (MM-LLMs) have seen significant advancements in
|
|
10 |
Authors: Jainaveen Sundaram, Ravishankar Iyer
|
11 |
|
12 |
|
13 |
-
### Training
|
14 |
Two step training pipeline outlined in the LLaVa1.5 paper, consisting of two phases: (1) A Pre-training phase for feature alignment followed by an (2) End-to-end instruction fine-tuning
|
15 |
The pre-training phase involves 1 epoch on a filtered subset of 595K Conceptual Captions [2], with only the projection layer weights updated. For instruction fine-tuning, we use 1 epoch of the LLaVa-Instruct-150K dataset, with both projection layer and LLM weights updated.
|
|
|
16 |
|
17 |
-
###
|
18 |
-
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
TODO - Add clear instructions on how to use the model
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
-
## Evaluation Results
|
25 |
-
TODO - Add results data
|
26 |
|
27 |
## Model Sources
|
28 |
-
|
29 |
|
30 |
## Ethical Considerations
|
31 |
|
@@ -33,15 +41,15 @@ Intel is committed to respecting human rights and avoiding causing or contributi
|
|
33 |
|
34 |
| Ethical Considerations | Description |
|
35 |
| ----------- | ----------- |
|
36 |
-
| Data |
|
37 |
-
| Human life |
|
38 |
-
| Mitigations |
|
39 |
-
| Risks and harms |
|
40 |
| Use cases | - |
|
41 |
|
42 |
## Citation
|
43 |
|
44 |
-
|
45 |
|
46 |
## License
|
47 |
|
|
|
10 |
Authors: Jainaveen Sundaram, Ravishankar Iyer
|
11 |
|
12 |
|
13 |
+
### Training details and Evaluation
|
14 |
Two step training pipeline outlined in the LLaVa1.5 paper, consisting of two phases: (1) A Pre-training phase for feature alignment followed by an (2) End-to-end instruction fine-tuning
|
15 |
The pre-training phase involves 1 epoch on a filtered subset of 595K Conceptual Captions [2], with only the projection layer weights updated. For instruction fine-tuning, we use 1 epoch of the LLaVa-Instruct-150K dataset, with both projection layer and LLM weights updated.
|
16 |
+
For model evaluation, please refer to the linked technical report (coming soon!).
|
17 |
|
18 |
+
### How to use
|
19 |
+
Start off by cloning the repository:
|
20 |
+
git clone https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B
|
21 |
+
cd LlavaOLMoBitnet1B
|
22 |
|
23 |
+
Install all the requirements by following instructions on requirements.txt
|
|
|
24 |
|
25 |
+
You are all set! Run inference by calling:
|
26 |
+
python llava_olmo.py
|
27 |
+
|
28 |
+
To pass in your own query, modify the following lines within the file:
|
29 |
+
|
30 |
+
#Define Image and Text inputs..
|
31 |
+
text = "Be concise. What are the four major tournaments of the sport shown in the image?"
|
32 |
+
url = "https://farm3.staticflickr.com/2157/2439959136_d932f4e816_z.jpg"
|
33 |
|
|
|
|
|
34 |
|
35 |
## Model Sources
|
36 |
+
Arxiv link for technical report coming soon!
|
37 |
|
38 |
## Ethical Considerations
|
39 |
|
|
|
41 |
|
42 |
| Ethical Considerations | Description |
|
43 |
| ----------- | ----------- |
|
44 |
+
| Data | The model was trained using the LLaVA-v1.5 data mixture as described above.|
|
45 |
+
| Human life | The model is not intended to inform decisions central to human life or flourishing. |
|
46 |
+
| Mitigations | No additional risk mitigation strategies were considered during model development. |
|
47 |
+
| Risks and harms | This model has not been assessed for harm or biases, and should not be used for sensitive applications where it may cause harm. |
|
48 |
| Use cases | - |
|
49 |
|
50 |
## Citation
|
51 |
|
52 |
+
Coming soon
|
53 |
|
54 |
## License
|
55 |
|