tomaarsen
/

span-marker-xlm-roberta-large-conllpp-doc-context

Token Classification

named-entity-recognition

Model card Files Files and versions Community

tomaarsen HF staff commited on Aug 7, 2023

Commit

c4cd982

•

1 Parent(s): 85dde0f

Add limitation due to RoBERTa

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ pipeline_tag: token-classification
 widget:
   - text: >-
       Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
-      to Paris.
     example_title: Amelia Earhart
 model-index:
   - name: >-
@@ -71,4 +71,18 @@ model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-xlm-roberta-large
 entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")
 ```
 See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library.

 widget:
   - text: >-
       Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
+      to Paris .
     example_title: Amelia Earhart
 model-index:
   - name: >-
 entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")
 ```
+### Limitations
+**Warning**: This model works best when punctuation is separated from the prior words, so
+```python
+# ✅
+model.predict("He plays J. Robert Oppenheimer , an American theoretical physicist .")
+# ❌
+model.predict("He plays J. Robert Oppenheimer, an American theoretical physicist.")
+# You can also supply a list of words directly: ✅
+model.predict(["He", "plays", "J.", "Robert", "Oppenheimer", ",", "an", "American", "theoretical", "physicist", "."])
+```
+The same may be beneficial for some languages, such as splitting `"l'ocean Atlantique"` into `"l' ocean Atlantique"`.
 See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library.