tehilap commited on
Commit
0bff6ed
1 Parent(s): 079da9e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +179 -193
README.md CHANGED
@@ -1,202 +1,188 @@
1
  ---
2
  base_model: mistralai/Mistral-7B-Instruct-v0.1
3
  library_name: peft
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
  ### Framework versions
201
 
202
  - PEFT 0.12.0
 
1
  ---
2
  base_model: mistralai/Mistral-7B-Instruct-v0.1
3
  library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - finetuned
7
+ - multimodal
8
+ dataset: /workspace/multi_token/new/multi_token/data/sentence-voxel-pretrain
9
+ inference: false
10
  ---
11
 
12
+ These are weights for a version of `mistralai/Mistral-7B-Instruct-v0.1` finetuned for multimodal applications.
13
+
14
+ ### Modalities
15
+
16
+ * VoxelModality (use `<voxel>` in text and provide `voxel_data`, encoded as 2 tokens)
17
+
18
+ ### Usage
19
+
20
+ GitHub: https://github.com/sshh12/multi_token (includes training scripts and basic inference server)
21
+
22
+ ### Dataset
23
+
24
+ /workspace/multi_token/new/multi_token/data/sentence-voxel-pretrain (202 examples)
25
+
26
+ ```
27
+ {'voxel_data': [-1.2669486999511719, -4.342422008514404, 0.08342710882425308, -1.1121463775634766, -1.7241164445877075, 0.8711026906967163, 1.6187070608139038, 2.1467154026031494, 1.55600106716156, 2.7908051013946533, 2.6149775981903076, 0.48798438906669617, -1.8658868074417114, -0.9153737425804138, 1.0539007186889648, 2.9938547611236572, -1.4584662914276123, 0.06789205223321915, 0.7774376273155212, 0.21760278940200806, -1.8041378259658813, 2.964979648590088, -1.1315451860427856, 0.17553456127643585, -0.30490806698799133, -0.2574838697910309, 0.46714287996292114, -1.0232142210006714, -0.8084980845451355, -1.2524477243423462, -3.438807487487793, 1.2044878005981445, -1.3203097581863403, -1.5149697065353394, 1.3110711574554443, -0.6502295136451721, 0.2924231290817261, -1.8042508363723755, 1.156070351600647, 3.68827748298645, -1.2678762674331665, -0.48739099502563477, -1.9123613834381104, -0.5652288794517517, 0.30757156014442444, -2.6405975818634033, -0.5657948851585388, 0.1962834596633911, 0.4952268898487091, -1.7487742900848389, 1.7829053401947021, -1.7034624814987183, -0.5107262134552002, -0.3320123553276062, -0.06942156702280045, 0.4950488209724426, 2.344041109085083, -1.5664364099502563, 0.19259212911128998, -3.1398189067840576, 0.04002213105559349, -1.2993210554122925, -1.6680536270141602, -1.251158595085144, 1.8072421550750732, -1.0329501628875732, 0.9539159536361694, 1.3106855154037476, -2.569223165512085, -1.2958600521087646, 0.126902237534523, 0.5233652591705322, 0.5843154788017273, -0.5259942412376404, -0.6380230784416199, -0.6816728115081787, -1.121833324432373, 0.3703728914260864, 1.237956166267395, 0.5594802498817444, -0.5233862996101379, -0.13332879543304443, 0.675186276435852, -1.2282785177230835, -3.3140101432800293, 0.7235065698623657, -0.35910749435424805, -2.077662467956543, 0.25364214181900024, -0.04129992425441742, -1.2904301881790161, -1.616705060005188, -1.6876271963119507, -0.7963595390319824, 0.030134305357933044, 1.8337446451187134, -0.7175531983375549, -1.975988745689392, 2.4509336948394775, 0.7048704028129578, 1.4666917324066162, 1.7357171773910522, -2.5205185413360596, 0.3177747130393982, 3.1697638034820557, -0.9803237915039062, 0.2490101158618927, 0.685883104801178, -0.5148935317993164, -0.6637391448020935, 1.1980229616165161, -2.6742348670959473, -0.3336712718009949, 0.7613745927810669, 0.4145558178424835, -0.39548221230506897, -0.8612095713615417, 0.47160154581069946, 1.5164895057678223, -0.7074841260910034, -1.4712883234024048, 0.9962572455406189, -1.2678629159927368, -0.37773820757865906, -1.8931519985198975, -0.05409574508666992, 2.9137215614318848, -0.8817853331565857, 0.6903612613677979, 0.4531203806400299, -1.6106483936309814, 0.23891609907150269, -0.7575222253799438, -0.8597385883331299, -0.4505012631416321, -1.0164486169815063, -2.209623336791992, -0.4585776627063751, -0.8505887389183044, 2.003972291946411, -1.3250545263290405, 3.2319674491882324, 2.2695298194885254, -0.8775315880775452, -0.628717303276062, -0.43926355242729187, 1.9588313102722168, -0.93973308801651, 0.12314625084400177, -0.33370646834373474, 0.07034939527511597, -2.8057355880737305, 1.337593674659729, -0.555436372756958, -2.6099681854248047, -0.712677538394928, 1.286773920059204, 0.38860979676246643, 0.8785397410392761, -1.712486743927002, -0.24093347787857056, 0.1924627721309662, -0.0006318278610706329, -1.6611075401306152, 0.2844694256782532, -1.7149747610092163, -0.5365468859672546, 0.13996855914592743, -0.056381598114967346, 1.8396815061569214, 0.8105614185333252, -1.2487802505493164, 0.4743833541870117, 0.1982801854610443, -0.15110887587070465, 1.4873329401016235, 0.5023205280303955, 0.1126936599612236, 1.627712607383728, -1.4724937677383423, 1.760959267616272, 0.17591479420661926, -1.152338981628418, -0.9325122833251953, 1.3554235696792603, 0.8807990550994873, 0.19217203557491302, -0.3776297867298126, 0.6159052848815918, -0.8186436891555786, 0.2990851104259491, 0.09922473132610321, 0.2839311957359314, 0.3771292567253113, -0.12268450111150742, -1.2299126386642456, 0.5846585631370544, -0.3947390019893646, 1.7231228351593018, 0.33239540457725525, -1.3260372877120972, 0.4368828535079956, 0.2650435268878937, 0.5281450152397156, -1.058358073234558, 0.6126224994659424, -0.688051700592041, 0.8823887705802917, -0.9234603047370911, -0.18388473987579346, -1.1497560739517212, -0.10189923644065857, -1.4299086332321167, 0.4046390950679779, -0.3188319206237793, 1.111311912536621, -1.0168960094451904], 'messages': [{'content': 'What scene might have been seen to cause these voxel activations? <voxel> ', 'role': 'user'}, {'content': 'Plate of spaghetti with basil, peppers, tomatoes, and bananas background.', 'role': 'assistant'}]}
28
+ ```
29
+
30
+ ### Training Device(s)
31
+
32
+ ```
33
+ name, pci.bus_id, vbios_version
34
+ NVIDIA H100 80GB HBM3, 00000000:66:00.0, 96.00.99.00.01
35
+ ```
36
+
37
+
38
+ ### Model
39
+
40
+ ```
41
+ MistralLMMForCausalLM.model =
42
+
43
+ PeftModelForCausalLM(
44
+ (base_model): LoraModel(
45
+ (model): MistralLMMForCausalLM(
46
+ (model): MistralLMMModel(
47
+ (embed_tokens): Embedding(32000, 4096)
48
+ (layers): ModuleList(
49
+ (0-31): 32 x MistralDecoderLayer(
50
+ (self_attn): MistralSdpaAttention(
51
+ (q_proj): lora.Linear(
52
+ (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
53
+ (lora_dropout): ModuleDict(
54
+ (default): Dropout(p=0.05, inplace=False)
55
+ )
56
+ (lora_A): ModuleDict(
57
+ (default): Linear(in_features=4096, out_features=64, bias=False)
58
+ )
59
+ (lora_B): ModuleDict(
60
+ (default): Linear(in_features=64, out_features=4096, bias=False)
61
+ )
62
+ (lora_embedding_A): ParameterDict()
63
+ (lora_embedding_B): ParameterDict()
64
+ (lora_magnitude_vector): ModuleDict()
65
+ )
66
+ (k_proj): lora.Linear(
67
+ (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
68
+ (lora_dropout): ModuleDict(
69
+ (default): Dropout(p=0.05, inplace=False)
70
+ )
71
+ (lora_A): ModuleDict(
72
+ (default): Linear(in_features=4096, out_features=64, bias=False)
73
+ )
74
+ (lora_B): ModuleDict(
75
+ (default): Linear(in_features=64, out_features=1024, bias=False)
76
+ )
77
+ (lora_embedding_A): ParameterDict()
78
+ (lora_embedding_B): ParameterDict()
79
+ (lora_magnitude_vector): ModuleDict()
80
+ )
81
+ (v_proj): lora.Linear(
82
+ (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
83
+ (lora_dropout): ModuleDict(
84
+ (default): Dropout(p=0.05, inplace=False)
85
+ )
86
+ (lora_A): ModuleDict(
87
+ (default): Linear(in_features=4096, out_features=64, bias=False)
88
+ )
89
+ (lora_B): ModuleDict(
90
+ (default): Linear(in_features=64, out_features=1024, bias=False)
91
+ )
92
+ (lora_embedding_A): ParameterDict()
93
+ (lora_embedding_B): ParameterDict()
94
+ (lora_magnitude_vector): ModuleDict()
95
+ )
96
+ (o_proj): lora.Linear(
97
+ (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
98
+ (lora_dropout): ModuleDict(
99
+ (default): Dropout(p=0.05, inplace=False)
100
+ )
101
+ (lora_A): ModuleDict(
102
+ (default): Linear(in_features=4096, out_features=64, bias=False)
103
+ )
104
+ (lora_B): ModuleDict(
105
+ (default): Linear(in_features=64, out_features=4096, bias=False)
106
+ )
107
+ (lora_embedding_A): ParameterDict()
108
+ (lora_embedding_B): ParameterDict()
109
+ (lora_magnitude_vector): ModuleDict()
110
+ )
111
+ (rotary_emb): MistralRotaryEmbedding()
112
+ )
113
+ (mlp): MistralMLP(
114
+ (gate_proj): lora.Linear(
115
+ (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
116
+ (lora_dropout): ModuleDict(
117
+ (default): Dropout(p=0.05, inplace=False)
118
+ )
119
+ (lora_A): ModuleDict(
120
+ (default): Linear(in_features=4096, out_features=64, bias=False)
121
+ )
122
+ (lora_B): ModuleDict(
123
+ (default): Linear(in_features=64, out_features=14336, bias=False)
124
+ )
125
+ (lora_embedding_A): ParameterDict()
126
+ (lora_embedding_B): ParameterDict()
127
+ (lora_magnitude_vector): ModuleDict()
128
+ )
129
+ (up_proj): lora.Linear(
130
+ (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
131
+ (lora_dropout): ModuleDict(
132
+ (default): Dropout(p=0.05, inplace=False)
133
+ )
134
+ (lora_A): ModuleDict(
135
+ (default): Linear(in_features=4096, out_features=64, bias=False)
136
+ )
137
+ (lora_B): ModuleDict(
138
+ (default): Linear(in_features=64, out_features=14336, bias=False)
139
+ )
140
+ (lora_embedding_A): ParameterDict()
141
+ (lora_embedding_B): ParameterDict()
142
+ (lora_magnitude_vector): ModuleDict()
143
+ )
144
+ (down_proj): lora.Linear(
145
+ (base_layer): Linear(in_features=14336, out_features=4096, bias=False)
146
+ (lora_dropout): ModuleDict(
147
+ (default): Dropout(p=0.05, inplace=False)
148
+ )
149
+ (lora_A): ModuleDict(
150
+ (default): Linear(in_features=14336, out_features=64, bias=False)
151
+ )
152
+ (lora_B): ModuleDict(
153
+ (default): Linear(in_features=64, out_features=4096, bias=False)
154
+ )
155
+ (lora_embedding_A): ParameterDict()
156
+ (lora_embedding_B): ParameterDict()
157
+ (lora_magnitude_vector): ModuleDict()
158
+ )
159
+ (act_fn): SiLU()
160
+ )
161
+ (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
162
+ (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
163
+ )
164
+ )
165
+ (norm): MistralRMSNorm((4096,), eps=1e-05)
166
+ (voxel_lmm_projector): _MLPVectorProjector(
167
+ (mlps): ModuleList(
168
+ (0-1): 2 x Sequential(
169
+ (0): Linear(in_features=217, out_features=4096, bias=True)
170
+ (1): GELU(approximate='none')
171
+ (2): Linear(in_features=4096, out_features=4096, bias=True)
172
+ (3): GELU(approximate='none')
173
+ (4): Linear(in_features=4096, out_features=4096, bias=True)
174
+ (5): GELU(approximate='none')
175
+ (6): Linear(in_features=4096, out_features=4096, bias=True)
176
+ )
177
+ )
178
+ )
179
+ )
180
+ (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
181
+ )
182
+ )
183
+ )
184
+ ```
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ### Framework versions
187
 
188
  - PEFT 0.12.0