juanquivilla commited on
Commit
26ec789
1 Parent(s): 5055f5c

End of training

Browse files
Files changed (5) hide show
  1. README.md +255 -196
  2. config.json +28 -0
  3. model.safetensors +3 -0
  4. tokenizer_config.json +1 -0
  5. training_args.bin +3 -0
README.md CHANGED
@@ -1,199 +1,258 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - generated_from_trainer
4
+ model-index:
5
+ - name: deberta-base-en-wiki
6
+ results: []
7
  ---
8
 
9
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
+ should probably proofread and complete it, then remove this comment. -->
11
+
12
+ # deberta-base-en-wiki
13
+
14
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
+ It achieves the following results on the evaluation set:
16
+ - Loss: 1.1310
17
+
18
+ ## Model description
19
+
20
+ More information needed
21
+
22
+ ## Intended uses & limitations
23
+
24
+ More information needed
25
+
26
+ ## Training and evaluation data
27
+
28
+ More information needed
29
+
30
+ ## Training procedure
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - learning_rate: 0.0002
36
+ - train_batch_size: 16
37
+ - eval_batch_size: 8
38
+ - seed: 42
39
+ - distributed_type: multi-GPU
40
+ - num_devices: 2
41
+ - gradient_accumulation_steps: 8
42
+ - total_train_batch_size: 256
43
+ - total_eval_batch_size: 16
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: linear
46
+ - lr_scheduler_warmup_steps: 1250.0
47
+ - num_epochs: 10
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:------:|:------:|:---------------:|
53
+ | 6.4739 | 0.0504 | 1250 | 6.4351 |
54
+ | 3.4566 | 0.1009 | 2500 | 3.2703 |
55
+ | 2.7823 | 0.1513 | 3750 | 2.6563 |
56
+ | 2.5242 | 0.2018 | 5000 | 2.4202 |
57
+ | 2.3816 | 0.2522 | 6250 | 2.2763 |
58
+ | 2.2723 | 0.3027 | 7500 | 2.1715 |
59
+ | 2.182 | 0.3531 | 8750 | 2.0933 |
60
+ | 2.1325 | 0.4035 | 10000 | 2.0317 |
61
+ | 2.0584 | 0.4540 | 11250 | 1.9770 |
62
+ | 2.0333 | 0.5044 | 12500 | 1.9288 |
63
+ | 1.9898 | 0.5549 | 13750 | 1.8972 |
64
+ | 1.9574 | 0.6053 | 15000 | 1.8557 |
65
+ | 1.9184 | 0.6558 | 16250 | 1.8324 |
66
+ | 1.8899 | 0.7062 | 17500 | 1.8049 |
67
+ | 1.8909 | 0.7567 | 18750 | 1.7788 |
68
+ | 1.8371 | 0.8071 | 20000 | 1.7558 |
69
+ | 1.8343 | 0.8575 | 21250 | 1.7374 |
70
+ | 1.8341 | 0.9080 | 22500 | 1.7256 |
71
+ | 1.7976 | 0.9584 | 23750 | 1.7011 |
72
+ | 1.7777 | 1.0089 | 25000 | 1.6865 |
73
+ | 1.7523 | 1.0593 | 26250 | 1.6715 |
74
+ | 1.7476 | 1.1098 | 27500 | 1.6581 |
75
+ | 1.7291 | 1.1602 | 28750 | 1.6432 |
76
+ | 1.7108 | 1.2106 | 30000 | 1.6333 |
77
+ | 1.7195 | 1.2611 | 31250 | 1.6198 |
78
+ | 1.6969 | 1.3115 | 32500 | 1.6109 |
79
+ | 1.6927 | 1.3620 | 33750 | 1.5965 |
80
+ | 1.6818 | 1.4124 | 35000 | 1.5917 |
81
+ | 1.6647 | 1.4629 | 36250 | 1.5827 |
82
+ | 1.6635 | 1.5133 | 37500 | 1.5704 |
83
+ | 1.6561 | 1.5637 | 38750 | 1.5593 |
84
+ | 1.6404 | 1.6142 | 40000 | 1.5527 |
85
+ | 1.627 | 1.6646 | 41250 | 1.5470 |
86
+ | 1.6292 | 1.7151 | 42500 | 1.5391 |
87
+ | 1.6111 | 1.7655 | 43750 | 1.5288 |
88
+ | 1.6154 | 1.8160 | 45000 | 1.5217 |
89
+ | 1.5993 | 1.8664 | 46250 | 1.5191 |
90
+ | 1.6028 | 1.9168 | 47500 | 1.5077 |
91
+ | 1.5861 | 1.9673 | 48750 | 1.5019 |
92
+ | 1.5793 | 2.0177 | 50000 | 1.4954 |
93
+ | 1.5664 | 2.0682 | 51250 | 1.4887 |
94
+ | 1.5723 | 2.1186 | 52500 | 1.4839 |
95
+ | 1.5715 | 2.1691 | 53750 | 1.4786 |
96
+ | 1.5612 | 2.2195 | 55000 | 1.4757 |
97
+ | 1.5499 | 2.2700 | 56250 | 1.4648 |
98
+ | 1.5542 | 2.3204 | 57500 | 1.4632 |
99
+ | 1.5531 | 2.3708 | 58750 | 1.4558 |
100
+ | 1.5329 | 2.4213 | 60000 | 1.4507 |
101
+ | 1.5481 | 2.4717 | 61250 | 1.4472 |
102
+ | 1.5336 | 2.5222 | 62500 | 1.4431 |
103
+ | 1.526 | 2.5726 | 63750 | 1.4405 |
104
+ | 1.518 | 2.6231 | 65000 | 1.4345 |
105
+ | 1.5135 | 2.6735 | 66250 | 1.4264 |
106
+ | 1.4987 | 2.7239 | 67500 | 1.4226 |
107
+ | 1.5007 | 2.7744 | 68750 | 1.4176 |
108
+ | 1.4921 | 2.8248 | 70000 | 1.4179 |
109
+ | 1.5031 | 2.8753 | 71250 | 1.4146 |
110
+ | 1.4848 | 2.9257 | 72500 | 1.4098 |
111
+ | 1.4702 | 2.9762 | 73750 | 1.4023 |
112
+ | 1.4861 | 3.0266 | 75000 | 1.4010 |
113
+ | 1.487 | 3.0770 | 76250 | 1.3963 |
114
+ | 1.4736 | 3.1275 | 77500 | 1.3923 |
115
+ | 1.4751 | 3.1779 | 78750 | 1.3879 |
116
+ | 1.4783 | 3.2284 | 80000 | 1.3858 |
117
+ | 1.4843 | 3.2788 | 81250 | 1.3795 |
118
+ | 1.4722 | 3.3293 | 82500 | 1.3771 |
119
+ | 1.4551 | 3.3797 | 83750 | 1.3754 |
120
+ | 1.4539 | 3.4302 | 85000 | 1.3729 |
121
+ | 1.4723 | 3.4806 | 86250 | 1.3646 |
122
+ | 1.4493 | 3.5310 | 87500 | 1.3658 |
123
+ | 1.4455 | 3.5815 | 88750 | 1.3610 |
124
+ | 1.4442 | 3.6319 | 90000 | 1.3573 |
125
+ | 1.4457 | 3.6824 | 91250 | 1.3540 |
126
+ | 1.4259 | 3.7328 | 92500 | 1.3534 |
127
+ | 1.4355 | 3.7833 | 93750 | 1.3470 |
128
+ | 1.4184 | 3.8337 | 95000 | 1.3435 |
129
+ | 1.4437 | 3.8841 | 96250 | 1.3416 |
130
+ | 1.4255 | 3.9346 | 97500 | 1.3377 |
131
+ | 1.4115 | 3.9850 | 98750 | 1.3358 |
132
+ | 1.4196 | 4.0355 | 100000 | 1.3351 |
133
+ | 1.4159 | 4.0859 | 101250 | 1.3292 |
134
+ | 1.4227 | 4.1364 | 102500 | 1.3302 |
135
+ | 1.4122 | 4.1868 | 103750 | 1.3270 |
136
+ | 1.3996 | 4.2372 | 105000 | 1.3207 |
137
+ | 1.4041 | 4.2877 | 106250 | 1.3210 |
138
+ | 1.3956 | 4.3381 | 107500 | 1.3187 |
139
+ | 1.392 | 4.3886 | 108750 | 1.3170 |
140
+ | 1.3943 | 4.4390 | 110000 | 1.3125 |
141
+ | 1.4143 | 4.4895 | 111250 | 1.3095 |
142
+ | 1.3939 | 4.5399 | 112500 | 1.3063 |
143
+ | 1.3802 | 4.5903 | 113750 | 1.3067 |
144
+ | 1.3908 | 4.6408 | 115000 | 1.3020 |
145
+ | 1.3841 | 4.6912 | 116250 | 1.3025 |
146
+ | 1.3821 | 4.7417 | 117500 | 1.3007 |
147
+ | 1.3774 | 4.7921 | 118750 | 1.2989 |
148
+ | 1.3807 | 4.8426 | 120000 | 1.2907 |
149
+ | 1.3643 | 4.8930 | 121250 | 1.2946 |
150
+ | 1.3704 | 4.9435 | 122500 | 1.2920 |
151
+ | 1.3685 | 4.9939 | 123750 | 1.2868 |
152
+ | 1.3794 | 5.0443 | 125000 | 1.2812 |
153
+ | 1.3646 | 5.0948 | 126250 | 1.2809 |
154
+ | 1.356 | 5.1452 | 127500 | 1.2803 |
155
+ | 1.3696 | 5.1957 | 128750 | 1.2784 |
156
+ | 1.3544 | 5.2461 | 130000 | 1.2741 |
157
+ | 1.3618 | 5.2966 | 131250 | 1.2736 |
158
+ | 1.3471 | 5.3470 | 132500 | 1.2695 |
159
+ | 1.3444 | 5.3974 | 133750 | 1.2648 |
160
+ | 1.3524 | 5.4479 | 135000 | 1.2658 |
161
+ | 1.354 | 5.4983 | 136250 | 1.2643 |
162
+ | 1.3438 | 5.5488 | 137500 | 1.2639 |
163
+ | 1.357 | 5.5992 | 138750 | 1.2599 |
164
+ | 1.3473 | 5.6497 | 140000 | 1.2617 |
165
+ | 1.3309 | 5.7001 | 141250 | 1.2568 |
166
+ | 1.3328 | 5.7505 | 142500 | 1.2511 |
167
+ | 1.3236 | 5.8010 | 143750 | 1.2511 |
168
+ | 1.3276 | 5.8514 | 145000 | 1.2507 |
169
+ | 1.3288 | 5.9019 | 146250 | 1.2466 |
170
+ | 1.3238 | 5.9523 | 147500 | 1.2456 |
171
+ | 1.3327 | 6.0028 | 148750 | 1.2484 |
172
+ | 1.3329 | 6.0532 | 150000 | 1.2424 |
173
+ | 1.3328 | 6.1037 | 151250 | 1.2361 |
174
+ | 1.307 | 6.1541 | 152500 | 1.2407 |
175
+ | 1.3285 | 6.2045 | 153750 | 1.2374 |
176
+ | 1.3097 | 6.2550 | 155000 | 1.2339 |
177
+ | 1.3115 | 6.3054 | 156250 | 1.2354 |
178
+ | 1.304 | 6.3559 | 157500 | 1.2294 |
179
+ | 1.3132 | 6.4063 | 158750 | 1.2290 |
180
+ | 1.303 | 6.4568 | 160000 | 1.2276 |
181
+ | 1.3029 | 6.5072 | 161250 | 1.2270 |
182
+ | 1.3048 | 6.5576 | 162500 | 1.2229 |
183
+ | 1.3085 | 6.6081 | 163750 | 1.2226 |
184
+ | 1.2887 | 6.6585 | 165000 | 1.2209 |
185
+ | 1.3055 | 6.7090 | 166250 | 1.2206 |
186
+ | 1.2902 | 6.7594 | 167500 | 1.2178 |
187
+ | 1.2892 | 6.8099 | 168750 | 1.2149 |
188
+ | 1.3049 | 6.8603 | 170000 | 1.2125 |
189
+ | 1.2935 | 6.9107 | 171250 | 1.2115 |
190
+ | 1.2888 | 6.9612 | 172500 | 1.2091 |
191
+ | 1.2856 | 7.0116 | 173750 | 1.2082 |
192
+ | 1.2762 | 7.0621 | 175000 | 1.2085 |
193
+ | 1.2883 | 7.1125 | 176250 | 1.2055 |
194
+ | 1.2906 | 7.1630 | 177500 | 1.2019 |
195
+ | 1.2831 | 7.2134 | 178750 | 1.2047 |
196
+ | 1.2654 | 7.2638 | 180000 | 1.1995 |
197
+ | 1.2759 | 7.3143 | 181250 | 1.1994 |
198
+ | 1.276 | 7.3647 | 182500 | 1.1992 |
199
+ | 1.2692 | 7.4152 | 183750 | 1.1974 |
200
+ | 1.2791 | 7.4656 | 185000 | 1.1940 |
201
+ | 1.2697 | 7.5161 | 186250 | 1.1930 |
202
+ | 1.2635 | 7.5665 | 187500 | 1.1889 |
203
+ | 1.2656 | 7.6170 | 188750 | 1.1926 |
204
+ | 1.2615 | 7.6675 | 190000 | 1.1828 |
205
+ | 1.2641 | 7.7179 | 191250 | 1.1852 |
206
+ | 1.2578 | 7.7684 | 192500 | 1.1791 |
207
+ | 1.2647 | 7.8188 | 193750 | 1.1782 |
208
+ | 1.2644 | 7.8692 | 195000 | 1.1777 |
209
+ | 1.2638 | 7.9197 | 196250 | 1.1752 |
210
+ | 1.2528 | 7.9701 | 197500 | 1.1748 |
211
+ | 1.2554 | 8.0206 | 198750 | 1.1746 |
212
+ | 1.2548 | 8.0710 | 200000 | 1.1726 |
213
+ | 1.2546 | 8.1215 | 201250 | 1.1698 |
214
+ | 1.247 | 8.1719 | 202500 | 1.1689 |
215
+ | 1.2478 | 8.2223 | 203750 | 1.1698 |
216
+ | 1.2578 | 8.2728 | 205000 | 1.1650 |
217
+ | 1.2527 | 8.3232 | 206250 | 1.1650 |
218
+ | 1.2612 | 8.3737 | 207500 | 1.1639 |
219
+ | 1.2339 | 8.4241 | 208750 | 1.1635 |
220
+ | 1.2422 | 8.4746 | 210000 | 1.1633 |
221
+ | 1.2311 | 8.5250 | 211250 | 1.1617 |
222
+ | 1.2552 | 8.5754 | 212500 | 1.1585 |
223
+ | 1.2383 | 8.6259 | 213750 | 1.1561 |
224
+ | 1.2406 | 8.6763 | 215000 | 1.1555 |
225
+ | 1.2329 | 8.7268 | 216250 | 1.1551 |
226
+ | 1.2392 | 8.7772 | 217500 | 1.1552 |
227
+ | 1.2301 | 8.8277 | 218750 | 1.1536 |
228
+ | 1.2262 | 8.8781 | 220000 | 1.1483 |
229
+ | 1.2284 | 8.9286 | 221250 | 1.1509 |
230
+ | 1.2259 | 8.9790 | 222500 | 1.1529 |
231
+ | 1.2204 | 9.0294 | 223750 | 1.1474 |
232
+ | 1.237 | 9.0799 | 225000 | 1.1471 |
233
+ | 1.2432 | 9.1303 | 226250 | 1.1439 |
234
+ | 1.2145 | 9.1808 | 227500 | 1.1473 |
235
+ | 1.2132 | 9.2312 | 228750 | 1.1428 |
236
+ | 1.2178 | 9.2817 | 230000 | 1.1426 |
237
+ | 1.2138 | 9.3321 | 231250 | 1.1416 |
238
+ | 1.2204 | 9.3825 | 232500 | 1.1422 |
239
+ | 1.2233 | 9.4330 | 233750 | 1.1402 |
240
+ | 1.2048 | 9.4834 | 235000 | 1.1370 |
241
+ | 1.2203 | 9.5339 | 236250 | 1.1389 |
242
+ | 1.2156 | 9.5843 | 237500 | 1.1375 |
243
+ | 1.2131 | 9.6348 | 238750 | 1.1367 |
244
+ | 1.2215 | 9.6852 | 240000 | 1.1387 |
245
+ | 1.2152 | 9.7356 | 241250 | 1.1347 |
246
+ | 1.2179 | 9.7861 | 242500 | 1.1321 |
247
+ | 1.2166 | 9.8365 | 243750 | 1.1359 |
248
+ | 1.2171 | 9.8870 | 245000 | 1.1343 |
249
+ | 1.208 | 9.9374 | 246250 | 1.1321 |
250
+ | 1.2105 | 9.9879 | 247500 | 1.1332 |
251
+
252
+
253
+ ### Framework versions
254
+
255
+ - Transformers 4.41.2
256
+ - Pytorch 2.3.1+cu121
257
+ - Datasets 2.20.0
258
+ - Tokenizers 0.19.1
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DebertaForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "hidden_act": "gelu",
7
+ "hidden_dropout_prob": 0.1,
8
+ "hidden_size": 768,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 3072,
11
+ "layer_norm_eps": 1e-07,
12
+ "max_position_embeddings": 512,
13
+ "max_relative_positions": -1,
14
+ "model_type": "deberta",
15
+ "num_attention_heads": 12,
16
+ "num_hidden_layers": 12,
17
+ "pad_token_id": 0,
18
+ "pooler_dropout": 0,
19
+ "pooler_hidden_act": "gelu",
20
+ "pooler_hidden_size": 768,
21
+ "pos_att_type": null,
22
+ "position_biased_input": true,
23
+ "relative_attention": false,
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.41.2",
26
+ "type_vocab_size": 0,
27
+ "vocab_size": 50265
28
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3517d0dc452a1cac0943738643dfaca60190a70d2fb4c18568bc91e10eec64a
3
+ size 498763380
tokenizer_config.json CHANGED
@@ -50,6 +50,7 @@
50
  "eos_token": "[SEP]",
51
  "errors": "replace",
52
  "mask_token": "[MASK]",
 
53
  "model_max_length": 1000000000000000019884624838656,
54
  "pad_token": "[PAD]",
55
  "sep_token": "[SEP]",
 
50
  "eos_token": "[SEP]",
51
  "errors": "replace",
52
  "mask_token": "[MASK]",
53
+ "max_len": 512,
54
  "model_max_length": 1000000000000000019884624838656,
55
  "pad_token": "[PAD]",
56
  "sep_token": "[SEP]",
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54c89b225753bbb5ee5bd45008c30a99f50fc99170186b5d871e8dcc7b8f597c
3
+ size 5176