metadata
language: es
license: gpl-3.0
tags:
- PyTorch
- Transformers
- Token Classification
- xlm-roberta
- xlm-roberta-large
widget:
- text: Fue antes de llegar a Sigüeiro, en el Camino de Santiago.
- text: Si te metes en el Franco desde la Alameda, vas hacia la Catedral.
- text: Y allí precisamente es Santiago el patrón del pueblo.
model-index:
- name: es_trf_ner_cds_xlm-large
results: []
Introduction
This model is a fine-tuned version of xlm-roberta-large for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).
Usage
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_xlm-large")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_xlm-large")
example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
for ent in ner_pipe(example):
print(ent)
Dataset
ToDo
Model performance
entity | precision | recall | f1 |
---|---|---|---|
LOC | 0.973 | 0.983 | 0.978 |
MISC | 0.760 | 0.788 | 0.773 |
ORG | 0.885 | 0.701 | 0.783 |
PER | 0.937 | 0.878 | 0.906 |
micro avg | 0.953 | 0.958 | 0.955 |
macro avg | 0.889 | 0.838 | 0.860 |
weighted avg | 0.953 | 0.958 | 0.955 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Framework versions
- Transformers 4.28.1
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3