File size: 1,264 Bytes
b630ed4
3c55574
b630ed4
3c55574
 
b630ed4
3c55574
 
7f497c2
a87367e
1548903
e2babb0
e3d90df
1548903
40724f2
1548903
40724f2
1548903
 
7b3f7c7
 
1548903
 
 
 
519a6c1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
language: sk
license: mit
datasets:
- oscar
---

# SlovakT5-small
This model was trained on slightly adapted code from [run_t5_mlm_flax.py](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling). 
If you want to know about training details or evaluation results, see [SlovakT5_report.pdf](https://huggingface.co/ApoTro/slovak-t5-small/resolve/main/SlovakT5_report.pdf). For evaluation, you can also run [SlovakT5_eval.ipynb](https://colab.research.google.com/github/richardcepka/notebooks/blob/main/SlovakT5_eval.ipynb).

### How to use
SlovakT5-small can be fine-tuned for a lot of different downstream tasks. For example, NER: 
```python
from transformers import AutoTokenizer, T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("ApoTro/slovak-t5-small")
model = T5ForConditionalGeneration.from_pretrained("ApoTro/slovak-t5-small")

input_ids = tokenizer("ner veta: Do druhého kola postúpili Robert Fico a Andrej Kiska s rozdielom 4,0%.", return_tensors="pt").input_ids
labels = tokenizer("per: Robert Fico | per: Andrej Kiska", return_tensors="pt").input_ids

# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels).loss
loss.item()
```