naveensp commited on
Commit
5a23ab0
1 Parent(s): 44add6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -10,22 +10,30 @@ Multimodal Large Language Models (MM-LLMs) have seen significant advancements in
10
  Authors: Jainaveen Sundaram, Ravishankar Iyer
11
 
12
 
13
- ### Training Data
14
  Two step training pipeline outlined in the LLaVa1.5 paper, consisting of two phases: (1) A Pre-training phase for feature alignment followed by an (2) End-to-end instruction fine-tuning
15
  The pre-training phase involves 1 epoch on a filtered subset of 595K Conceptual Captions [2], with only the projection layer weights updated. For instruction fine-tuning, we use 1 epoch of the LLaVa-Instruct-150K dataset, with both projection layer and LLM weights updated.
 
16
 
17
- ### Evaluation Data
18
- TODO - Add info on Eval Data (if applicable)
 
 
19
 
20
- ## How to use
21
- TODO - Add clear instructions on how to use the model
22
 
 
 
 
 
 
 
 
 
23
 
24
- ## Evaluation Results
25
- TODO - Add results data
26
 
27
  ## Model Sources
28
- TODO - Add links to github or arxiv
29
 
30
  ## Ethical Considerations
31
 
@@ -33,15 +41,15 @@ Intel is committed to respecting human rights and avoiding causing or contributi
33
 
34
  | Ethical Considerations | Description |
35
  | ----------- | ----------- |
36
- | Data | **TODO** |
37
- | Human life | **TODO** |
38
- | Mitigations | **TODO** |
39
- | Risks and harms | **TODO** |
40
  | Use cases | - |
41
 
42
  ## Citation
43
 
44
- **TODO** Add Citation information (if applicable )
45
 
46
  ## License
47
 
 
10
  Authors: Jainaveen Sundaram, Ravishankar Iyer
11
 
12
 
13
+ ### Training details and Evaluation
14
  Two step training pipeline outlined in the LLaVa1.5 paper, consisting of two phases: (1) A Pre-training phase for feature alignment followed by an (2) End-to-end instruction fine-tuning
15
  The pre-training phase involves 1 epoch on a filtered subset of 595K Conceptual Captions [2], with only the projection layer weights updated. For instruction fine-tuning, we use 1 epoch of the LLaVa-Instruct-150K dataset, with both projection layer and LLM weights updated.
16
+ For model evaluation, please refer to the linked technical report (coming soon!).
17
 
18
+ ### How to use
19
+ Start off by cloning the repository:
20
+ git clone https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B
21
+ cd LlavaOLMoBitnet1B
22
 
23
+ Install all the requirements by following instructions on requirements.txt
 
24
 
25
+ You are all set! Run inference by calling:
26
+ python llava_olmo.py
27
+
28
+ To pass in your own query, modify the following lines within the file:
29
+
30
+ #Define Image and Text inputs..
31
+ text = "Be concise. What are the four major tournaments of the sport shown in the image?"
32
+ url = "https://farm3.staticflickr.com/2157/2439959136_d932f4e816_z.jpg"
33
 
 
 
34
 
35
  ## Model Sources
36
+ Arxiv link for technical report coming soon!
37
 
38
  ## Ethical Considerations
39
 
 
41
 
42
  | Ethical Considerations | Description |
43
  | ----------- | ----------- |
44
+ | Data | The model was trained using the LLaVA-v1.5 data mixture as described above.|
45
+ | Human life | The model is not intended to inform decisions central to human life or flourishing. |
46
+ | Mitigations | No additional risk mitigation strategies were considered during model development. |
47
+ | Risks and harms | This model has not been assessed for harm or biases, and should not be used for sensitive applications where it may cause harm. |
48
  | Use cases | - |
49
 
50
  ## Citation
51
 
52
+ Coming soon
53
 
54
  ## License
55