|
--- |
|
license: apache-2.0 |
|
library_name: span-marker |
|
tags: |
|
- span-marker |
|
- token-classification |
|
- ner |
|
- named-entity-recognition |
|
pipeline_tag: token-classification |
|
widget: |
|
- text: >- |
|
Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic |
|
to Paris. |
|
example_title: Amelia Earhart |
|
- text: >- |
|
Leonardo di ser Piero da Vinci painted the Mona Lisa based on Italian noblewoman |
|
Lisa del Giocondo. |
|
example_title: Leonardo da Vinci |
|
- text: >- |
|
On June 13th, 2014, at 4:44 pm during the 2014 World Cup held in Salvador, Brazil, |
|
the legendary soccer player, Robin van Persie, representing the Dutch national team, |
|
scored a remarkable goal in the 44th minute. |
|
example_title: Robin van Persie |
|
model-index: |
|
- name: >- |
|
SpanMarker w. roberta-large on OntoNotes v5.0 by Tom Aarsen |
|
results: |
|
- task: |
|
type: token-classification |
|
name: Named Entity Recognition |
|
dataset: |
|
type: tner/ontonotes5 |
|
name: OntoNotes v5.0 |
|
split: test |
|
revision: cf9ef57ad260810be1298ba795d83c09a915e959 |
|
metrics: |
|
- type: f1 |
|
value: 0.9153 |
|
name: F1 |
|
- type: precision |
|
value: 0.9116 |
|
name: Precision |
|
- type: recall |
|
value: 0.9191 |
|
name: Recall |
|
datasets: |
|
- tner/ontonotes5 |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- recall |
|
- precision |
|
--- |
|
|
|
# SpanMarker for Named Entity Recognition |
|
|
|
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. In particular, this SpanMarker model uses [roberta-large](https://huggingface.co/roberta-large) as the underlying encoder. See [train.py](train.py) for the training script. |
|
|
|
## Usage |
|
|
|
To use this model for inference, first install the `span_marker` library: |
|
|
|
```bash |
|
pip install span_marker |
|
``` |
|
|
|
You can then run inference with this model like so: |
|
|
|
```python |
|
from span_marker import SpanMarkerModel |
|
|
|
# Download from the π€ Hub |
|
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-ontonotes5") |
|
# Run inference |
|
entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.") |
|
``` |
|
|
|
### Limitations |
|
|
|
**Warning**: This model works best when punctuation is separated from the prior words, so |
|
```python |
|
# β
|
|
model.predict("He plays J. Robert Oppenheimer , an American theoretical physicist .") |
|
# β |
|
model.predict("He plays J. Robert Oppenheimer, an American theoretical physicist.") |
|
|
|
# You can also supply a list of words directly: β
|
|
model.predict(["He", "plays", "J.", "Robert", "Oppenheimer", ",", "an", "American", "theoretical", "physicist", "."]) |
|
``` |
|
The same may be beneficial for some languages, such as splitting `"l'ocean Atlantique"` into `"l' ocean Atlantique"`. |
|
|
|
See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library. |