HiTZ
/

Text Generation
Transformers
PyTorch
Basque
English
llama
text-generation-inference
Inference Endpoints
OSainz commited on
Commit
dc57277
1 Parent(s): 1a38944

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +404 -135
README.md CHANGED
@@ -9,199 +9,468 @@ metrics:
9
  - accuracy
10
  - f1
11
  - perplexity
 
12
  ---
13
 
14
- # Model Card for Model ID
15
 
16
- <!-- Provide a quick summary of what the model is/does. -->
17
 
18
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
19
 
20
- ## Model Details
21
 
22
- ### Model Description
23
 
24
- <!-- Provide a longer summary of what this model is. -->
25
 
 
26
 
 
27
 
28
- - **Developed by:** [More Information Needed]
29
- - **Funded by [optional]:** [More Information Needed]
30
- - **Shared by [optional]:** [More Information Needed]
31
- - **Model type:** [More Information Needed]
32
- - **Language(s) (NLP):** [More Information Needed]
33
- - **License:** [More Information Needed]
34
- - **Finetuned from model [optional]:** [More Information Needed]
35
 
36
- ### Model Sources [optional]
37
 
38
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
39
 
40
- - **Repository:** [More Information Needed]
41
- - **Paper [optional]:** [More Information Needed]
42
- - **Demo [optional]:** [More Information Needed]
43
 
44
- ## Uses
45
-
46
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
47
-
48
- ### Direct Use
49
-
50
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
51
-
52
- [More Information Needed]
53
-
54
- ### Downstream Use [optional]
55
-
56
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
57
-
58
- [More Information Needed]
59
-
60
- ### Out-of-Scope Use
61
-
62
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
63
-
64
- [More Information Needed]
65
-
66
- ## Bias, Risks, and Limitations
67
-
68
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
69
-
70
- [More Information Needed]
71
-
72
- ### Recommendations
73
-
74
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
75
-
76
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
77
-
78
- ## How to Get Started with the Model
79
 
80
  Use the code below to get started with the model.
81
 
82
- [More Information Needed]
83
-
84
- ## Training Details
85
-
86
- ### Training Data
87
-
88
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
89
-
90
- [More Information Needed]
91
-
92
- ### Training Procedure
93
-
94
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
95
-
96
- #### Preprocessing [optional]
97
-
98
- [More Information Needed]
99
-
100
-
101
- #### Training Hyperparameters
102
-
103
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
104
-
105
- #### Speeds, Sizes, Times [optional]
106
-
107
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
108
-
109
- [More Information Needed]
110
-
111
- ## Evaluation
112
-
113
- <!-- This section describes the evaluation protocols and provides the results. -->
114
-
115
- ### Testing Data, Factors & Metrics
116
-
117
- #### Testing Data
118
 
119
- <!-- This should link to a Dataset Card if possible. -->
120
 
121
- [More Information Needed]
122
 
123
- #### Factors
124
 
125
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
126
 
127
- [More Information Needed]
 
 
 
 
 
128
 
129
- #### Metrics
130
 
131
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
132
 
133
- [More Information Needed]
134
 
135
- ### Results
136
 
137
- [More Information Needed]
138
 
139
- #### Summary
140
 
 
141
 
142
 
143
- ## Model Examination [optional]
144
 
145
- <!-- Relevant interpretability work for the model goes here -->
146
 
147
- [More Information Needed]
148
 
149
- ## Environmental Impact
150
 
151
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
152
 
153
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
154
 
155
- - **Hardware Type:** [More Information Needed]
156
- - **Hours used:** [More Information Needed]
157
- - **Cloud Provider:** [More Information Needed]
158
- - **Compute Region:** [More Information Needed]
159
- - **Carbon Emitted:** [More Information Needed]
160
 
161
- ## Technical Specifications [optional]
162
 
163
- ### Model Architecture and Objective
164
 
165
- [More Information Needed]
166
 
167
- ### Compute Infrastructure
168
 
169
- [More Information Needed]
170
 
171
- #### Hardware
172
 
173
- [More Information Needed]
174
 
175
- #### Software
176
 
177
- [More Information Needed]
178
 
179
- ## Citation [optional]
180
 
181
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
- **BibTeX:**
 
 
184
 
185
- [More Information Needed]
 
 
186
 
187
- **APA:**
 
 
188
 
189
- [More Information Needed]
 
 
190
 
191
- ## Glossary [optional]
 
 
 
 
 
 
192
 
193
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
194
 
195
- [More Information Needed]
 
 
196
 
197
- ## More Information [optional]
 
 
198
 
199
- [More Information Needed]
 
 
200
 
201
- ## Model Card Authors [optional]
 
 
 
 
 
 
202
 
203
- [More Information Needed]
 
 
204
 
205
- ## Model Card Contact
 
 
206
 
207
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - accuracy
10
  - f1
11
  - perplexity
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ # **Model Card for Basque Llama 13b**
16
 
17
+ Basque LLaMA is a collection of foundation models specifically tuned for Basque. Based on Meta’s LLaMA 2 model family, these models were further trained with Euscrawl, a highly curated Basque corpora ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)). Ranging from 7 billion to 70 billion parameters, these models are currently the biggest and best-performing LLMs built for Basque. This is the 13b repository, links to other models can be found in the index at the bottom.
18
 
 
19
 
20
+ # **Model Details**
21
 
 
22
 
23
+ ## **Model Description**
24
 
25
+ Basque LLaMA is a family of Large Language Models (LLM) based on Meta’s [LLaMA models](https://huggingface.co/meta-llama). Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations widen the gap between high- and low-resource languages when it comes to digital development. We present Basque LLaMA to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Basque LLaMA models follow the same architecture as their original counterparts and were further trained in Euscrawl v1 ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)), a high-quality Basque corpora.
26
 
27
+ The models are released in three sizes: 7B, 13B and 70B.
28
 
 
 
 
 
 
 
 
29
 
 
30
 
31
+ * **Developed by:** HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
32
+ * **Model type:** Language model
33
+ * **Language(s) (NLP):** en, eu
34
+ * **License:** llama2
35
+ * **Parent Model:** meta-llama/Llama-2-13b
36
+ * **Contact:** [email protected]
37
 
 
 
 
38
 
39
+ ## **Getting started**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  Use the code below to get started with the model.
42
 
43
+ ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+ from transformers import pipeline
46
 
47
+ pipe = pipeline("text-generation", model=”HiTZ/basque-llama-2-7b-v1”)
48
 
49
+ text = "Euskara adimen artifizialera iritsi da!"
50
 
51
+ pipe(text, max_new_tokens=50, num_beams=5)
52
 
53
+ >> [
54
+ {
55
+ 'generated_text': 'Euskara adimen artifizialera iritsi da!\nEuskararen eta adimen artifizialaren arteko harremana aspaldikoa da,'
56
+ ' baina azken urteotan aurrerapauso handiak eman dira arlo horretan'
57
+ }
58
+ ]
59
 
60
+ ```
61
 
 
62
 
63
+ # **Uses**
64
 
65
+ Basque LLaMA models are intended to be used with Basque data; for any other language the performance is not guaranteed. Same as the original, Basque LLaMA inherits the [LLaMA-2 License](https://ai.meta.com/llama/license/) which allows for commercial and research use.
66
 
 
67
 
68
+ ## **Direct Use**
69
 
70
+ Basque LLaMA family models are pre-trained LLMs without any task-specific or instruction fine-tuning. That is, the model can either be prompted to perform a specific task or further fine-tuned for specific use cases.
71
 
72
 
73
+ ## **Out-of-Scope Use**
74
 
75
+ The model was not fine-tuned to follow instructions or to work as a chat assistant, therefore, this kind of usage is not tested nor recommended.
76
 
 
77
 
78
+ # **Bias, Risks, and Limitations**
79
 
80
+ In an effort to alleviate the potentially disturbing or harmful content, Basque LLaMA has been trained on carefully selected and processed data which comes mainly from local media, national/regional newspapers, encyclopedias and blogs (see Euscrawl below). Still, the model is based on LLaMA models and can potentially carry the same bias, risk and limitations.
81
 
82
+ Please see the LLaMA’s _Ethical Considerations and Limitations _for further information.
83
 
 
 
 
 
 
84
 
85
+ # **Training Details**
86
 
 
87
 
88
+ ## **Training Data**
89
 
90
+ The models were trained on EusCrawl v1, a high-quality corpus for Basque comprising 1.72M documents, 288M words, totalling 2.1GiB of uncompressed text. EusCrawl was built using ad-hoc scrapers to extract text from 33 Basque websites with high-quality content, resulting in cleaner text compared to general-purpose approaches.
91
 
92
+ See more details in the [EusCrawl](https://huggingface.co/datasets/HiTZ/euscrawl) dataset card.
93
 
94
+ Additionally, 100K documents of English data randomly selected from the [Pile](https://huggingface.co/datasets/EleutherAI/pile) dataset were also included to avoid catastrophic forgetting.
95
 
 
96
 
97
+ ## **Training Procedure**
98
 
99
+ The models were trained using the GPT-Neox library on the HPC CINECA computing cluster. All the models were approximately trained with an effective batch size of 2M tokens for 1000 to 2000 steps.
100
 
 
101
 
102
+ <table>
103
+ <tr>
104
+ <td>Model
105
+ </td>
106
+ <td>Steps
107
+ </td>
108
+ <td>Sequence length
109
+ </td>
110
+ <td>Effective Batch size
111
+ </td>
112
+ <td>Total tokens
113
+ </td>
114
+ <td>GPU hours
115
+ </td>
116
+ </tr>
117
+ <tr>
118
+ <td>Basque LLaMA 7B
119
+ </td>
120
+ <td><p style="text-align: right">
121
+ 2000</p>
122
 
123
+ </td>
124
+ <td><p style="text-align: right">
125
+ 4096</p>
126
 
127
+ </td>
128
+ <td><p style="text-align: right">
129
+ 2M tokens/step</p>
130
 
131
+ </td>
132
+ <td><p style="text-align: right">
133
+ 4B</p>
134
 
135
+ </td>
136
+ <td><p style="text-align: right">
137
+ 359.2h</p>
138
 
139
+ </td>
140
+ </tr>
141
+ <tr>
142
+ <td>Basque LLaMA 13B
143
+ </td>
144
+ <td><p style="text-align: right">
145
+ 1000</p>
146
 
147
+ </td>
148
+ <td><p style="text-align: right">
149
+ 4096</p>
150
 
151
+ </td>
152
+ <td><p style="text-align: right">
153
+ 2M tokens/step</p>
154
 
155
+ </td>
156
+ <td><p style="text-align: right">
157
+ 2B</p>
158
 
159
+ </td>
160
+ <td><p style="text-align: right">
161
+ 468.8h</p>
162
 
163
+ </td>
164
+ </tr>
165
+ <tr>
166
+ <td>Basque LLaMA 70B
167
+ </td>
168
+ <td><p style="text-align: right">
169
+ 1680</p>
170
 
171
+ </td>
172
+ <td><p style="text-align: right">
173
+ 4096</p>
174
 
175
+ </td>
176
+ <td><p style="text-align: right">
177
+ 2M tokens/step</p>
178
 
179
+ </td>
180
+ <td><p style="text-align: right">
181
+ 3.4B</p>
182
+
183
+ </td>
184
+ <td><p style="text-align: right">
185
+ *6475.52h</p>
186
+
187
+ </td>
188
+ </tr>
189
+ </table>
190
+
191
+
192
+ * indicates the time for the entire training process (2000 steps), however the weights of the step 1680 are shared as it is the best checkpoint according to validation loss.
193
+
194
+
195
+ # **Evaluation**
196
+
197
+ We evaluated the models on zero-shot and few-shot settings on generative, multiple-choice and classification tasks. We used the basque partitions of each dataset.
198
+
199
+
200
+ ## **Testing Data, Factors & Metrics**
201
+
202
+
203
+ ### **Testing Data**
204
+
205
+
206
+
207
+ * **Belebele** ([Bandarkar et al.](https://arxiv.org/abs/2308.16884)): Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. We evaluated the model in a 5-shot fashion.
208
+ * Data card: [https://huggingface.co/datasets/facebook/belebele](https://huggingface.co/datasets/facebook/belebele)
209
+ * **X-StoryCloze**: XStoryCloze consists of the professionally translated version of the English StoryCloze dataset to 10 non-English languages. Story Cloze is a new commonsense reasoning dataset which consists of choosing the correct ending to a four-sentence story. We evaluated the model in a 0-shot fashion.
210
+ * Data card: [https://huggingface.co/datasets/juletxara/xstory_cloze](https://huggingface.co/datasets/juletxara/xstory_cloze)
211
+ * **BasqueGLUE** ([Urbizu et al.](https://aclanthology.org/2022.lrec-1.172.pdf)): BasqueGLUE is a NLU benchmark for Basque. We evaluated the model in a 5-shot fashion on the following tasks:
212
+ * Data card:[ https://huggingface.co/datasets/orai-nlp/basqueGLUE](https://huggingface.co/datasets/orai-nlp/basqueGLUE).
213
+ * Tasks:
214
+ * **BEC2016eu**: Sentiment analysis on tweets about the 2016 Basque elections campaign.
215
+ * **VaxxStance**: Stance detection on tweets around the anti-vaccine movement.
216
+ * **BTHCv2**: Topic classification of news extracts with 12 categories.
217
+ * **EpecKorrefBin**: Correference detection task similar to WSC.
218
+ * **QNLIeu**: Q&A NLI built from the Basque Wikipedia.
219
+ * **WiCeu**: Basque Word-in-Context task.
220
+
221
+
222
+ ### **Metrics**
223
+
224
+
225
+
226
+ * **Accuracy**: Belebele, X-StoryCloze, EpecKorrefBin, QNLI-eu, and, WiC-eu
227
+ * **Micro F1**: BEC2016-eu and BHTCv2
228
+ * **Macro F1**: VaxxStance (favor & against)
229
+
230
+
231
+ ## **Results**
232
+
233
+ The model was evaluated using the LM Evaluation harness library from Eleuther AI. In order to reproduce our results please refer to our [fork](https://github.com/naiarapm/lm-evaluation-harness/tree/basqueglue) that includes the implementation for the mentioned datasets.
234
+
235
+
236
+ <table>
237
+ <tr>
238
+ <td><strong>Model</strong>
239
+ </td>
240
+ <td><strong>Belebele</strong>
241
+ </td>
242
+ <td><strong>X-StoryCloze</strong>
243
+ </td>
244
+ <td><strong>BEC</strong>
245
+ </td>
246
+ <td><strong>Vaxx</strong>
247
+ </td>
248
+ <td><strong>BHTC</strong>
249
+ </td>
250
+ <td><strong>coref</strong>
251
+ </td>
252
+ <td><strong>QNLI</strong>
253
+ </td>
254
+ <td><strong>WiC</strong>
255
+ </td>
256
+ <td><strong>Average</strong>
257
+ </td>
258
+ </tr>
259
+ <tr>
260
+ <td>Random
261
+ </td>
262
+ <td>25.00
263
+ </td>
264
+ <td>50.00
265
+ </td>
266
+ <td>33.33
267
+ </td>
268
+ <td>33.33
269
+ </td>
270
+ <td>8.33
271
+ </td>
272
+ <td>50.00
273
+ </td>
274
+ <td>50.00
275
+ </td>
276
+ <td>50.00
277
+ </td>
278
+ <td>37.50
279
+ </td>
280
+ </tr>
281
+ <tr>
282
+ <td>LLaMA 2 7B
283
+ </td>
284
+ <td>26.22
285
+ </td>
286
+ <td>50.43
287
+ </td>
288
+ <td>41.63
289
+ </td>
290
+ <td>18.60
291
+ </td>
292
+ <td>20.06
293
+ </td>
294
+ <td>50.94
295
+ </td>
296
+ <td>48.32
297
+ </td>
298
+ <td>49.64
299
+ </td>
300
+ <td>38.23
301
+ </td>
302
+ </tr>
303
+ <tr>
304
+ <td>LLaMA 2 13B
305
+ </td>
306
+ <td>32.00
307
+ </td>
308
+ <td>50.63
309
+ </td>
310
+ <td>41.09
311
+ </td>
312
+ <td>18.25
313
+ </td>
314
+ <td>27.35
315
+ </td>
316
+ <td>49.23
317
+ </td>
318
+ <td>48.74
319
+ </td>
320
+ <td>49.21
321
+ </td>
322
+ <td>39.56
323
+ </td>
324
+ </tr>
325
+ <tr>
326
+ <td>LLaMA 2 70B
327
+ </td>
328
+ <td>33.56
329
+ </td>
330
+ <td>51.62
331
+ </td>
332
+ <td>47.47
333
+ </td>
334
+ <td>21.01
335
+ </td>
336
+ <td>31.01
337
+ </td>
338
+ <td>52.98
339
+ </td>
340
+ <td>51.26
341
+ </td>
342
+ <td>51.57
343
+ </td>
344
+ <td>42.56
345
+ </td>
346
+ </tr>
347
+ <tr>
348
+ <td>BLOOM 7B
349
+ </td>
350
+ <td>27.00
351
+ </td>
352
+ <td>57.18
353
+ </td>
354
+ <td>37.94
355
+ </td>
356
+ <td>20.72
357
+ </td>
358
+ <td>39.10
359
+ </td>
360
+ <td>48.21
361
+ </td>
362
+ <td>47.48
363
+ </td>
364
+ <td>47.57
365
+ </td>
366
+ <td>40.65
367
+ </td>
368
+ </tr>
369
+ <tr>
370
+ <td>XGLM 7B
371
+ </td>
372
+ <td>23.88
373
+ </td>
374
+ <td>57.71
375
+ </td>
376
+ <td>39.94
377
+ </td>
378
+ <td>21.58
379
+ </td>
380
+ <td>36.73
381
+ </td>
382
+ <td>50.94
383
+ </td>
384
+ <td>50.42
385
+ </td>
386
+ <td>49.21
387
+ </td>
388
+ <td>41.30
389
+ </td>
390
+ </tr>
391
+ <tr>
392
+ <td><strong>Basque LLaMA 7B</strong>
393
+ </td>
394
+ <td>35.67
395
+ </td>
396
+ <td>63.13
397
+ </td>
398
+ <td>55.61
399
+ </td>
400
+ <td>45.93
401
+ </td>
402
+ <td>44.44
403
+ </td>
404
+ <td>50.43
405
+ </td>
406
+ <td>55.04
407
+ </td>
408
+ <td>50.14
409
+ </td>
410
+ <td>50.05
411
+ </td>
412
+ </tr>
413
+ <tr>
414
+ <td><strong>Basque LLaMA 13B</strong>
415
+ </td>
416
+ <td>53.56
417
+ </td>
418
+ <td>65.85
419
+ </td>
420
+ <td>53.23
421
+ </td>
422
+ <td>48.66
423
+ </td>
424
+ <td><strong>53.61</strong>
425
+ </td>
426
+ <td>62.52
427
+ </td>
428
+ <td>57.14
429
+ </td>
430
+ <td>54.21
431
+ </td>
432
+ <td>56.10
433
+ </td>
434
+ </tr>
435
+ <tr>
436
+ <td><strong>Basque LLaMA 70B</strong>
437
+ </td>
438
+ <td><strong>71.78</strong>
439
+ </td>
440
+ <td><strong>67.57</strong>
441
+ </td>
442
+ <td><strong>63.52</strong>
443
+ </td>
444
+ <td><strong>48.95</strong>
445
+ </td>
446
+ <td>49.51
447
+ </td>
448
+ <td><strong>79.90</strong>
449
+ </td>
450
+ <td><strong>58.82</strong>
451
+ </td>
452
+ <td><strong>55.50</strong>
453
+ </td>
454
+ <td><strong>61.94</strong>
455
+ </td>
456
+ </tr>
457
+ </table>
458
+
459
+
460
+
461
+ # **Environmental Impact**
462
+
463
+ Carbon emissions are estimated using the[ Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in[ Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
464
+
465
+
466
+
467
+ * **Hardware Type:** HPC Cluster, 4x A100 64Gb nodes
468
+ * **Hours used:** 359.2h + 468.8h + 6475.52h = 7303.52h
469
+ * **Compute cluster:** CINECA HPC
470
+ * **Compute Region:** Italy
471
+ * **Carbon Emitted:** 673.75kg CO<sub>2</sub> eq
472
+
473
+
474
+ # **Acknowledgements**
475
+
476
+ This work has been partially supported by the Basque Government (IKER-GAITU project). The models were trained on the Leonardo supercomputer at CINECA under the EuroHPC Joint Undertaking, project EHPC-EXT-2023E01-013.