sdocio's picture
Initial commit
480bb57
|
raw
history blame
2.04 kB
metadata
language: es
license: gpl-3.0
tags:
  - PyTorch
  - Transformers
  - Token Classification
  - xlm-roberta
  - xlm-roberta-large
widget:
  - text: Fue antes de llegar a Sigüeiro, en el Camino de Santiago.
  - text: Si te metes en el Franco desde la Alameda, vas hacia la Catedral.
  - text: Y allí precisamente es Santiago el patrón del pueblo.
model-index:
  - name: es_trf_ner_cds_xlm-large
    results: []

Introduction

This model is a fine-tuned version of xlm-roberta-large for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).

Usage

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("es_trf_ner_cds_xlm-large")
model = AutoModelForTokenClassification.from_pretrained("es_trf_ner_cds_xlm-large")

example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. Si te metes en el Franco desde la Alameda, vas hacia la Catedral. Y allí precisamente es Santiago el patrón del pueblo."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

for ent in ner_pipe(example):
    print(ent)

Dataset

ToDo

Model performance

entity precision recall f1
LOC 0.973 0.983 0.978
MISC 0.760 0.788 0.773
ORG 0.885 0.701 0.783
PER 0.937 0.878 0.906
micro avg 0.953 0.958 0.955
macro avg 0.889 0.838 0.860
weighted avg 0.953 0.958 0.955

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.28.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3