Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -1,202 +1,188 @@
|
|
1 |
---
|
2 |
base_model: mistralai/Mistral-7B-Instruct-v0.1
|
3 |
library_name: peft
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
## Model Details
|
13 |
-
|
14 |
-
### Model Description
|
15 |
-
|
16 |
-
<!-- Provide a longer summary of what this model is. -->
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
- **Developed by:** [More Information Needed]
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
-
|
28 |
-
### Model Sources [optional]
|
29 |
-
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
-
## Uses
|
37 |
-
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
-
|
52 |
-
### Out-of-Scope Use
|
53 |
-
|
54 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
-
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
-
## How to Get Started with the Model
|
71 |
-
|
72 |
-
Use the code below to get started with the model.
|
73 |
-
|
74 |
-
[More Information Needed]
|
75 |
-
|
76 |
-
## Training Details
|
77 |
-
|
78 |
-
### Training Data
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
-
|
97 |
-
#### Speeds, Sizes, Times [optional]
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
-
|
101 |
-
[More Information Needed]
|
102 |
-
|
103 |
-
## Evaluation
|
104 |
-
|
105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
200 |
### Framework versions
|
201 |
|
202 |
- PEFT 0.12.0
|
|
|
1 |
---
|
2 |
base_model: mistralai/Mistral-7B-Instruct-v0.1
|
3 |
library_name: peft
|
4 |
+
license: apache-2.0
|
5 |
+
tags:
|
6 |
+
- finetuned
|
7 |
+
- multimodal
|
8 |
+
dataset: /workspace/multi_token/new/multi_token/data/sentence-voxel-pretrain
|
9 |
+
inference: false
|
10 |
---
|
11 |
|
12 |
+
These are weights for a version of `mistralai/Mistral-7B-Instruct-v0.1` finetuned for multimodal applications.
|
13 |
+
|
14 |
+
### Modalities
|
15 |
+
|
16 |
+
* VoxelModality (use `<voxel>` in text and provide `voxel_data`, encoded as 2 tokens)
|
17 |
+
|
18 |
+
### Usage
|
19 |
+
|
20 |
+
GitHub: https://github.com/sshh12/multi_token (includes training scripts and basic inference server)
|
21 |
+
|
22 |
+
### Dataset
|
23 |
+
|
24 |
+
/workspace/multi_token/new/multi_token/data/sentence-voxel-pretrain (202 examples)
|
25 |
+
|
26 |
+
```
|
27 |
+
{'voxel_data': [-1.2669486999511719, -4.342422008514404, 0.08342710882425308, -1.1121463775634766, -1.7241164445877075, 0.8711026906967163, 1.6187070608139038, 2.1467154026031494, 1.55600106716156, 2.7908051013946533, 2.6149775981903076, 0.48798438906669617, -1.8658868074417114, -0.9153737425804138, 1.0539007186889648, 2.9938547611236572, -1.4584662914276123, 0.06789205223321915, 0.7774376273155212, 0.21760278940200806, -1.8041378259658813, 2.964979648590088, -1.1315451860427856, 0.17553456127643585, -0.30490806698799133, -0.2574838697910309, 0.46714287996292114, -1.0232142210006714, -0.8084980845451355, -1.2524477243423462, -3.438807487487793, 1.2044878005981445, -1.3203097581863403, -1.5149697065353394, 1.3110711574554443, -0.6502295136451721, 0.2924231290817261, -1.8042508363723755, 1.156070351600647, 3.68827748298645, -1.2678762674331665, -0.48739099502563477, -1.9123613834381104, -0.5652288794517517, 0.30757156014442444, -2.6405975818634033, -0.5657948851585388, 0.1962834596633911, 0.4952268898487091, -1.7487742900848389, 1.7829053401947021, -1.7034624814987183, -0.5107262134552002, -0.3320123553276062, -0.06942156702280045, 0.4950488209724426, 2.344041109085083, -1.5664364099502563, 0.19259212911128998, -3.1398189067840576, 0.04002213105559349, -1.2993210554122925, -1.6680536270141602, -1.251158595085144, 1.8072421550750732, -1.0329501628875732, 0.9539159536361694, 1.3106855154037476, -2.569223165512085, -1.2958600521087646, 0.126902237534523, 0.5233652591705322, 0.5843154788017273, -0.5259942412376404, -0.6380230784416199, -0.6816728115081787, -1.121833324432373, 0.3703728914260864, 1.237956166267395, 0.5594802498817444, -0.5233862996101379, -0.13332879543304443, 0.675186276435852, -1.2282785177230835, -3.3140101432800293, 0.7235065698623657, -0.35910749435424805, -2.077662467956543, 0.25364214181900024, -0.04129992425441742, -1.2904301881790161, -1.616705060005188, -1.6876271963119507, -0.7963595390319824, 0.030134305357933044, 1.8337446451187134, -0.7175531983375549, -1.975988745689392, 2.4509336948394775, 0.7048704028129578, 1.4666917324066162, 1.7357171773910522, -2.5205185413360596, 0.3177747130393982, 3.1697638034820557, -0.9803237915039062, 0.2490101158618927, 0.685883104801178, -0.5148935317993164, -0.6637391448020935, 1.1980229616165161, -2.6742348670959473, -0.3336712718009949, 0.7613745927810669, 0.4145558178424835, -0.39548221230506897, -0.8612095713615417, 0.47160154581069946, 1.5164895057678223, -0.7074841260910034, -1.4712883234024048, 0.9962572455406189, -1.2678629159927368, -0.37773820757865906, -1.8931519985198975, -0.05409574508666992, 2.9137215614318848, -0.8817853331565857, 0.6903612613677979, 0.4531203806400299, -1.6106483936309814, 0.23891609907150269, -0.7575222253799438, -0.8597385883331299, -0.4505012631416321, -1.0164486169815063, -2.209623336791992, -0.4585776627063751, -0.8505887389183044, 2.003972291946411, -1.3250545263290405, 3.2319674491882324, 2.2695298194885254, -0.8775315880775452, -0.628717303276062, -0.43926355242729187, 1.9588313102722168, -0.93973308801651, 0.12314625084400177, -0.33370646834373474, 0.07034939527511597, -2.8057355880737305, 1.337593674659729, -0.555436372756958, -2.6099681854248047, -0.712677538394928, 1.286773920059204, 0.38860979676246643, 0.8785397410392761, -1.712486743927002, -0.24093347787857056, 0.1924627721309662, -0.0006318278610706329, -1.6611075401306152, 0.2844694256782532, -1.7149747610092163, -0.5365468859672546, 0.13996855914592743, -0.056381598114967346, 1.8396815061569214, 0.8105614185333252, -1.2487802505493164, 0.4743833541870117, 0.1982801854610443, -0.15110887587070465, 1.4873329401016235, 0.5023205280303955, 0.1126936599612236, 1.627712607383728, -1.4724937677383423, 1.760959267616272, 0.17591479420661926, -1.152338981628418, -0.9325122833251953, 1.3554235696792603, 0.8807990550994873, 0.19217203557491302, -0.3776297867298126, 0.6159052848815918, -0.8186436891555786, 0.2990851104259491, 0.09922473132610321, 0.2839311957359314, 0.3771292567253113, -0.12268450111150742, -1.2299126386642456, 0.5846585631370544, -0.3947390019893646, 1.7231228351593018, 0.33239540457725525, -1.3260372877120972, 0.4368828535079956, 0.2650435268878937, 0.5281450152397156, -1.058358073234558, 0.6126224994659424, -0.688051700592041, 0.8823887705802917, -0.9234603047370911, -0.18388473987579346, -1.1497560739517212, -0.10189923644065857, -1.4299086332321167, 0.4046390950679779, -0.3188319206237793, 1.111311912536621, -1.0168960094451904], 'messages': [{'content': 'What scene might have been seen to cause these voxel activations? <voxel> ', 'role': 'user'}, {'content': 'Plate of spaghetti with basil, peppers, tomatoes, and bananas background.', 'role': 'assistant'}]}
|
28 |
+
```
|
29 |
+
|
30 |
+
### Training Device(s)
|
31 |
+
|
32 |
+
```
|
33 |
+
name, pci.bus_id, vbios_version
|
34 |
+
NVIDIA H100 80GB HBM3, 00000000:66:00.0, 96.00.99.00.01
|
35 |
+
```
|
36 |
+
|
37 |
+
|
38 |
+
### Model
|
39 |
+
|
40 |
+
```
|
41 |
+
MistralLMMForCausalLM.model =
|
42 |
+
|
43 |
+
PeftModelForCausalLM(
|
44 |
+
(base_model): LoraModel(
|
45 |
+
(model): MistralLMMForCausalLM(
|
46 |
+
(model): MistralLMMModel(
|
47 |
+
(embed_tokens): Embedding(32000, 4096)
|
48 |
+
(layers): ModuleList(
|
49 |
+
(0-31): 32 x MistralDecoderLayer(
|
50 |
+
(self_attn): MistralSdpaAttention(
|
51 |
+
(q_proj): lora.Linear(
|
52 |
+
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
|
53 |
+
(lora_dropout): ModuleDict(
|
54 |
+
(default): Dropout(p=0.05, inplace=False)
|
55 |
+
)
|
56 |
+
(lora_A): ModuleDict(
|
57 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
58 |
+
)
|
59 |
+
(lora_B): ModuleDict(
|
60 |
+
(default): Linear(in_features=64, out_features=4096, bias=False)
|
61 |
+
)
|
62 |
+
(lora_embedding_A): ParameterDict()
|
63 |
+
(lora_embedding_B): ParameterDict()
|
64 |
+
(lora_magnitude_vector): ModuleDict()
|
65 |
+
)
|
66 |
+
(k_proj): lora.Linear(
|
67 |
+
(base_layer): Linear(in_features=4096, out_features=1024, bias=False)
|
68 |
+
(lora_dropout): ModuleDict(
|
69 |
+
(default): Dropout(p=0.05, inplace=False)
|
70 |
+
)
|
71 |
+
(lora_A): ModuleDict(
|
72 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
73 |
+
)
|
74 |
+
(lora_B): ModuleDict(
|
75 |
+
(default): Linear(in_features=64, out_features=1024, bias=False)
|
76 |
+
)
|
77 |
+
(lora_embedding_A): ParameterDict()
|
78 |
+
(lora_embedding_B): ParameterDict()
|
79 |
+
(lora_magnitude_vector): ModuleDict()
|
80 |
+
)
|
81 |
+
(v_proj): lora.Linear(
|
82 |
+
(base_layer): Linear(in_features=4096, out_features=1024, bias=False)
|
83 |
+
(lora_dropout): ModuleDict(
|
84 |
+
(default): Dropout(p=0.05, inplace=False)
|
85 |
+
)
|
86 |
+
(lora_A): ModuleDict(
|
87 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
88 |
+
)
|
89 |
+
(lora_B): ModuleDict(
|
90 |
+
(default): Linear(in_features=64, out_features=1024, bias=False)
|
91 |
+
)
|
92 |
+
(lora_embedding_A): ParameterDict()
|
93 |
+
(lora_embedding_B): ParameterDict()
|
94 |
+
(lora_magnitude_vector): ModuleDict()
|
95 |
+
)
|
96 |
+
(o_proj): lora.Linear(
|
97 |
+
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
|
98 |
+
(lora_dropout): ModuleDict(
|
99 |
+
(default): Dropout(p=0.05, inplace=False)
|
100 |
+
)
|
101 |
+
(lora_A): ModuleDict(
|
102 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
103 |
+
)
|
104 |
+
(lora_B): ModuleDict(
|
105 |
+
(default): Linear(in_features=64, out_features=4096, bias=False)
|
106 |
+
)
|
107 |
+
(lora_embedding_A): ParameterDict()
|
108 |
+
(lora_embedding_B): ParameterDict()
|
109 |
+
(lora_magnitude_vector): ModuleDict()
|
110 |
+
)
|
111 |
+
(rotary_emb): MistralRotaryEmbedding()
|
112 |
+
)
|
113 |
+
(mlp): MistralMLP(
|
114 |
+
(gate_proj): lora.Linear(
|
115 |
+
(base_layer): Linear(in_features=4096, out_features=14336, bias=False)
|
116 |
+
(lora_dropout): ModuleDict(
|
117 |
+
(default): Dropout(p=0.05, inplace=False)
|
118 |
+
)
|
119 |
+
(lora_A): ModuleDict(
|
120 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
121 |
+
)
|
122 |
+
(lora_B): ModuleDict(
|
123 |
+
(default): Linear(in_features=64, out_features=14336, bias=False)
|
124 |
+
)
|
125 |
+
(lora_embedding_A): ParameterDict()
|
126 |
+
(lora_embedding_B): ParameterDict()
|
127 |
+
(lora_magnitude_vector): ModuleDict()
|
128 |
+
)
|
129 |
+
(up_proj): lora.Linear(
|
130 |
+
(base_layer): Linear(in_features=4096, out_features=14336, bias=False)
|
131 |
+
(lora_dropout): ModuleDict(
|
132 |
+
(default): Dropout(p=0.05, inplace=False)
|
133 |
+
)
|
134 |
+
(lora_A): ModuleDict(
|
135 |
+
(default): Linear(in_features=4096, out_features=64, bias=False)
|
136 |
+
)
|
137 |
+
(lora_B): ModuleDict(
|
138 |
+
(default): Linear(in_features=64, out_features=14336, bias=False)
|
139 |
+
)
|
140 |
+
(lora_embedding_A): ParameterDict()
|
141 |
+
(lora_embedding_B): ParameterDict()
|
142 |
+
(lora_magnitude_vector): ModuleDict()
|
143 |
+
)
|
144 |
+
(down_proj): lora.Linear(
|
145 |
+
(base_layer): Linear(in_features=14336, out_features=4096, bias=False)
|
146 |
+
(lora_dropout): ModuleDict(
|
147 |
+
(default): Dropout(p=0.05, inplace=False)
|
148 |
+
)
|
149 |
+
(lora_A): ModuleDict(
|
150 |
+
(default): Linear(in_features=14336, out_features=64, bias=False)
|
151 |
+
)
|
152 |
+
(lora_B): ModuleDict(
|
153 |
+
(default): Linear(in_features=64, out_features=4096, bias=False)
|
154 |
+
)
|
155 |
+
(lora_embedding_A): ParameterDict()
|
156 |
+
(lora_embedding_B): ParameterDict()
|
157 |
+
(lora_magnitude_vector): ModuleDict()
|
158 |
+
)
|
159 |
+
(act_fn): SiLU()
|
160 |
+
)
|
161 |
+
(input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
|
162 |
+
(post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
|
163 |
+
)
|
164 |
+
)
|
165 |
+
(norm): MistralRMSNorm((4096,), eps=1e-05)
|
166 |
+
(voxel_lmm_projector): _MLPVectorProjector(
|
167 |
+
(mlps): ModuleList(
|
168 |
+
(0-1): 2 x Sequential(
|
169 |
+
(0): Linear(in_features=217, out_features=4096, bias=True)
|
170 |
+
(1): GELU(approximate='none')
|
171 |
+
(2): Linear(in_features=4096, out_features=4096, bias=True)
|
172 |
+
(3): GELU(approximate='none')
|
173 |
+
(4): Linear(in_features=4096, out_features=4096, bias=True)
|
174 |
+
(5): GELU(approximate='none')
|
175 |
+
(6): Linear(in_features=4096, out_features=4096, bias=True)
|
176 |
+
)
|
177 |
+
)
|
178 |
+
)
|
179 |
+
)
|
180 |
+
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
|
181 |
+
)
|
182 |
+
)
|
183 |
+
)
|
184 |
+
```
|
185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
### Framework versions
|
187 |
|
188 |
- PEFT 0.12.0
|