Model and entities
roberta_classics_ner
is a domain-specific RoBERTa-based model for named entity recognition in Classical Studies. It recognises bibliographical entities, such as:
id | label | desciption | Example |
---|---|---|---|
0 | 'O' | Out of entity | |
1 | 'B-AAUTHOR' | Ancient authors | Herodotus |
2 | 'I-AAUTHOR' | ||
3 | 'B-AWORK' | The title of an ancient work | Symposium, Aeneid |
4 | 'I-AWORK' | ||
5 | 'B-REFAUWORK' | A structured reference to an ancient work | Homer, Il. |
6 | 'I-REFAUWORK' | ||
7 | 'B-REFSCOPE' | The scope of a reference | II.1.993a30–b11 |
8 | 'I-REFSCOPE' | ||
9 | 'B-FRAGREF' | A reference to fragmentary texts or scholia | Frag. 19. West |
10 | 'I-FRAGREF' |
Example
B-AAUTHOR B-AWORK B-REFSCOPE
Homer 's Iliad opens with an invocation to the muse ( 1. 1).
Dataset
roberta_classics_ner
was fine-tuned and evaluated on EpiBau
, a dataset which has not been released publicly yet. It is composed of four volumes of Structures of Epic Poetry, a compendium on the narrative patterns and structural elements in ancient epic.
Entity counts of the Epibau
dataset are the following:
train-set | dev-set | test-set | |
---|---|---|---|
word count | 712462 | 125729 | 122324 |
AAUTHOR | 4436 | 1368 | 1511 |
AWORK | 3145 | 780 | 670 |
REFAUWORK | 5102 | 988 | 1209 |
REFSCOPE | 14768 | 3193 | 2847 |
FRAGREF | 266 | 29 | 33 |
total entities | 13822 | 1415 | 2419 |
Results
The model was developed in the context of experiments reported here.Trained and tested on EpiBau
with a 85-15 split, the model yields a general F1 score of .82 (micro-averages). Detailed scores are displayed below. Evaluation was performed with the CLEF-HIPE-scorer, in strict mode)
metric | AAUTHOR | AWORK | REFSCOPE | REFAUWORK |
---|---|---|---|---|
F1 | .819 | .796 | .863 | .756 |
Precision | .842 | .818 | .860 | .755 |
Recall | .797 | .766 | .756 | .866 |
Questions, remarks, help or contribution ? Get in touch here, we'll be happy to chat !
- Downloads last month
- 8