|
--- |
|
license: mit |
|
base_model: FacebookAI/xlm-roberta-large |
|
model-index: |
|
- name: xlm-roberta-large-finetuned-wikiner-fr |
|
results: [] |
|
datasets: |
|
- Alizee/wikiner_fr_mixed_caps |
|
pipeline_tag: token-classification |
|
language: |
|
- fr |
|
library_name: transformers |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# xlm-roberta-large-finetuned-wikiner-fr |
|
|
|
This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) on the [Alizee/wikiner_fr_mixed_caps](https://huggingface.co/datasets/Alizee/wikiner_fr_mixed_caps). |
|
|
|
|
|
## Why this model? |
|
|
|
Credits to [Jean-Baptiste](https://huggingface.co/Jean-Baptiste) for building the current "best" model for French NER "[camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner)" based on wikiNER ([Jean-Baptiste/wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr)). |
|
|
|
xlm-roberta-large models fine-tuned on conll03 [English](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english) and especially [German](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german) were outperforming the Camembert-NER model in my own tasks. This inspired me to build a French version of the xlm-roberta-large models based on the wikiNER dataset, with the hope to create a slightly improved standard for French 4-entity NER. |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
4-entity NER for French, with the following tags: |
|
|
|
Abbreviation|Description |
|
-|- |
|
O |Outside of a named entity |
|
MISC |Miscellaneous entity |
|
PER |Person’s name |
|
ORG |Organization |
|
LOC |Location |
|
|
|
## Performance |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0518 |
|
- Precision: 0.8881 |
|
- Recall: 0.9014 |
|
- F1: 0.8947 |
|
- Accuracy: 0.9855 |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1.5e-05 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 32 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.02 |
|
- num_epochs: 3 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |
|
|:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|:------:|:--------:| |
|
| 0.1032 | 0.1 | 374 | 0.0853 | 0.7645 | 0.8170 | 0.7899 | 0.9742 | |
|
| 0.0767 | 0.2 | 748 | 0.0721 | 0.8111 | 0.8423 | 0.8264 | 0.9785 | |
|
| 0.074 | 0.3 | 1122 | 0.0655 | 0.8252 | 0.8502 | 0.8375 | 0.9797 | |
|
| 0.0634 | 0.4 | 1496 | 0.0629 | 0.8423 | 0.8694 | 0.8556 | 0.9809 | |
|
| 0.0605 | 0.5 | 1870 | 0.0610 | 0.8515 | 0.8711 | 0.8612 | 0.9808 | |
|
| 0.0578 | 0.6 | 2244 | 0.0594 | 0.8633 | 0.8744 | 0.8688 | 0.9822 | |
|
| 0.0592 | 0.7 | 2618 | 0.0555 | 0.8624 | 0.8833 | 0.8727 | 0.9825 | |
|
| 0.0567 | 0.8 | 2992 | 0.0534 | 0.8626 | 0.8838 | 0.8731 | 0.9830 | |
|
| 0.0522 | 0.9 | 3366 | 0.0563 | 0.8560 | 0.8771 | 0.8664 | 0.9818 | |
|
| 0.0516 | 1.0 | 3739 | 0.0556 | 0.8702 | 0.8869 | 0.8785 | 0.9831 | |
|
| 0.0438 | 1.0 | 3740 | 0.0558 | 0.8712 | 0.8873 | 0.8792 | 0.9831 | |
|
| 0.0395 | 1.1 | 4114 | 0.0565 | 0.8696 | 0.8856 | 0.8775 | 0.9830 | |
|
| 0.0371 | 1.2 | 4488 | 0.0536 | 0.8762 | 0.8910 | 0.8835 | 0.9838 | |
|
| 0.0403 | 1.3 | 4862 | 0.0531 | 0.8709 | 0.8887 | 0.8797 | 0.9835 | |
|
| 0.0366 | 1.4 | 5236 | 0.0517 | 0.8791 | 0.8912 | 0.8851 | 0.9843 | |
|
| 0.037 | 1.5 | 5610 | 0.0510 | 0.8830 | 0.8936 | 0.8883 | 0.9847 | |
|
| 0.0368 | 1.6 | 5984 | 0.0492 | 0.8795 | 0.8940 | 0.8867 | 0.9845 | |
|
| 0.0359 | 1.7 | 6358 | 0.0501 | 0.8833 | 0.8986 | 0.8909 | 0.9850 | |
|
| 0.034 | 1.8 | 6732 | 0.0496 | 0.8852 | 0.8986 | 0.8918 | 0.9852 | |
|
| 0.0327 | 1.9 | 7106 | 0.0512 | 0.8762 | 0.8948 | 0.8854 | 0.9843 | |
|
| 0.0325 | 2.0 | 7478 | 0.0512 | 0.8829 | 0.8945 | 0.8887 | 0.9844 | |
|
| 0.01 | 2.0 | 7480 | 0.0512 | 0.8836 | 0.8945 | 0.8890 | 0.9843 | |
|
| 0.0232 | 2.1 | 7854 | 0.0526 | 0.8870 | 0.9002 | 0.8936 | 0.9852 | |
|
| 0.0235 | 2.2 | 8228 | 0.0530 | 0.8841 | 0.8983 | 0.8911 | 0.9848 | |
|
| 0.0211 | 2.3 | 8602 | 0.0542 | 0.8875 | 0.9008 | 0.8941 | 0.9852 | |
|
| 0.0235 | 2.4 | 8976 | 0.0525 | 0.8883 | 0.9008 | 0.8945 | 0.9855 | |
|
| 0.0232 | 2.5 | 9350 | 0.0525 | 0.8874 | 0.9013 | 0.8943 | 0.9855 | |
|
| 0.0238 | 2.6 | 9724 | 0.0517 | 0.8861 | 0.9011 | 0.8935 | 0.9854 | |
|
| 0.0223 | 2.7 | 10098 | 0.0513 | 0.8893 | 0.9016 | 0.8954 | 0.9856 | |
|
| 0.0226 | 2.8 | 10472 | 0.0517 | 0.8892 | 0.9017 | 0.8954 | 0.9856 | |
|
| 0.0228 | 2.9 | 10846 | 0.0517 | 0.8879 | 0.9013 | 0.8945 | 0.9855 | |
|
| 0.0235 | 3.0 | 11217 | 0.0518 | 0.8881 | 0.9014 | 0.8947 | 0.9855 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.2 |
|
- Pytorch 2.0.1 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |