mariasierro/flair-ner-echr-fr-rev

This is a flair sequence tagger trained with a corpus of 32 case reports from the European Court of Human Rights (ECHR) in French (using pre-trained embeddings from the flair/ner-french model).

This corpus was built and annotated for anonymization as part of the work presented in the Master's thesis "Anonymization of case reports from the ECHR in Spanish and French: exploration of two alternative annotation approaches".

The annotation was carried out by projecting the annotations of the parallel texts of the English corpus built by Pilán et al. (2022), followed by a review of the projected annotations performed by human reviewers.

It predicts 8 tags: DATETIME, CODE, PER, DEM, MISC, ORG, LOC, QUANTITY.

The corpus and the code used for training this sequence tagger are available on GitHub: https://github.com/mariasierro/automatic-anonymization-ECHR-French-Spanish.

References

Pilán, I., Lison, P., Ovrelid, L., Papadopoulou, A., Sánchez, D. & Batet, M. (2022). The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization. In Computational Linguistics, 48(4), pp. 1053–1101. Cambridge, MA: MIT Press. doi: 10.1162/coli_a_00458.