update README: fix model ref

1949435 verified 10 months ago

5.46 kB

	---
	license: mit
	base_model: FacebookAI/xlm-roberta-large
	model-index:
	- name: xlm-roberta-large-finetuned-wikiner-fr
	results: []
	datasets:
	- Alizee/wikiner_fr_mixed_caps
	pipeline_tag: token-classification
	language:
	- fr
	library_name: transformers
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# xlm-roberta-large-finetuned-wikiner-fr

	This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) on the [Alizee/wikiner_fr_mixed_caps](https://huggingface.co/datasets/Alizee/wikiner_fr_mixed_caps).


	## Why this model?

	Credits to [Jean-Baptiste](https://huggingface.co/Jean-Baptiste) for building the current "best" model for French NER "[camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner)" based on wikiNER ([Jean-Baptiste/wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr)).

	xlm-roberta-large models fine-tuned on conll03 [English](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english) and especially [German](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german) were outperforming the Camembert-NER model in my own tasks. This inspired me to build a French version of the xlm-roberta-large models based on the wikiNER dataset, with the hope to create a slightly improved standard for French 4-entity NER.


	## Intended uses & limitations

	4-entity NER for French, with the following tags:

	Abbreviation\|Description
	-\|-
	O \|Outside of a named entity
	MISC \|Miscellaneous entity
	PER \|Person’s name
	ORG \|Organization
	LOC \|Location

	## Performance

	It achieves the following results on the evaluation set:
	- Loss: 0.0518
	- Precision: 0.8881
	- Recall: 0.9014
	- F1: 0.8947
	- Accuracy: 0.9855


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1.5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Precision \| Recall \| F1 \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:---------:\|:------:\|:------:\|:--------:\|
	\| 0.1032 \| 0.1 \| 374 \| 0.0853 \| 0.7645 \| 0.8170 \| 0.7899 \| 0.9742 \|
	\| 0.0767 \| 0.2 \| 748 \| 0.0721 \| 0.8111 \| 0.8423 \| 0.8264 \| 0.9785 \|
	\| 0.074 \| 0.3 \| 1122 \| 0.0655 \| 0.8252 \| 0.8502 \| 0.8375 \| 0.9797 \|
	\| 0.0634 \| 0.4 \| 1496 \| 0.0629 \| 0.8423 \| 0.8694 \| 0.8556 \| 0.9809 \|
	\| 0.0605 \| 0.5 \| 1870 \| 0.0610 \| 0.8515 \| 0.8711 \| 0.8612 \| 0.9808 \|
	\| 0.0578 \| 0.6 \| 2244 \| 0.0594 \| 0.8633 \| 0.8744 \| 0.8688 \| 0.9822 \|
	\| 0.0592 \| 0.7 \| 2618 \| 0.0555 \| 0.8624 \| 0.8833 \| 0.8727 \| 0.9825 \|
	\| 0.0567 \| 0.8 \| 2992 \| 0.0534 \| 0.8626 \| 0.8838 \| 0.8731 \| 0.9830 \|
	\| 0.0522 \| 0.9 \| 3366 \| 0.0563 \| 0.8560 \| 0.8771 \| 0.8664 \| 0.9818 \|
	\| 0.0516 \| 1.0 \| 3739 \| 0.0556 \| 0.8702 \| 0.8869 \| 0.8785 \| 0.9831 \|
	\| 0.0438 \| 1.0 \| 3740 \| 0.0558 \| 0.8712 \| 0.8873 \| 0.8792 \| 0.9831 \|
	\| 0.0395 \| 1.1 \| 4114 \| 0.0565 \| 0.8696 \| 0.8856 \| 0.8775 \| 0.9830 \|
	\| 0.0371 \| 1.2 \| 4488 \| 0.0536 \| 0.8762 \| 0.8910 \| 0.8835 \| 0.9838 \|
	\| 0.0403 \| 1.3 \| 4862 \| 0.0531 \| 0.8709 \| 0.8887 \| 0.8797 \| 0.9835 \|
	\| 0.0366 \| 1.4 \| 5236 \| 0.0517 \| 0.8791 \| 0.8912 \| 0.8851 \| 0.9843 \|
	\| 0.037 \| 1.5 \| 5610 \| 0.0510 \| 0.8830 \| 0.8936 \| 0.8883 \| 0.9847 \|
	\| 0.0368 \| 1.6 \| 5984 \| 0.0492 \| 0.8795 \| 0.8940 \| 0.8867 \| 0.9845 \|
	\| 0.0359 \| 1.7 \| 6358 \| 0.0501 \| 0.8833 \| 0.8986 \| 0.8909 \| 0.9850 \|
	\| 0.034 \| 1.8 \| 6732 \| 0.0496 \| 0.8852 \| 0.8986 \| 0.8918 \| 0.9852 \|
	\| 0.0327 \| 1.9 \| 7106 \| 0.0512 \| 0.8762 \| 0.8948 \| 0.8854 \| 0.9843 \|
	\| 0.0325 \| 2.0 \| 7478 \| 0.0512 \| 0.8829 \| 0.8945 \| 0.8887 \| 0.9844 \|
	\| 0.01 \| 2.0 \| 7480 \| 0.0512 \| 0.8836 \| 0.8945 \| 0.8890 \| 0.9843 \|
	\| 0.0232 \| 2.1 \| 7854 \| 0.0526 \| 0.8870 \| 0.9002 \| 0.8936 \| 0.9852 \|
	\| 0.0235 \| 2.2 \| 8228 \| 0.0530 \| 0.8841 \| 0.8983 \| 0.8911 \| 0.9848 \|
	\| 0.0211 \| 2.3 \| 8602 \| 0.0542 \| 0.8875 \| 0.9008 \| 0.8941 \| 0.9852 \|
	\| 0.0235 \| 2.4 \| 8976 \| 0.0525 \| 0.8883 \| 0.9008 \| 0.8945 \| 0.9855 \|
	\| 0.0232 \| 2.5 \| 9350 \| 0.0525 \| 0.8874 \| 0.9013 \| 0.8943 \| 0.9855 \|
	\| 0.0238 \| 2.6 \| 9724 \| 0.0517 \| 0.8861 \| 0.9011 \| 0.8935 \| 0.9854 \|
	\| 0.0223 \| 2.7 \| 10098 \| 0.0513 \| 0.8893 \| 0.9016 \| 0.8954 \| 0.9856 \|
	\| 0.0226 \| 2.8 \| 10472 \| 0.0517 \| 0.8892 \| 0.9017 \| 0.8954 \| 0.9856 \|
	\| 0.0228 \| 2.9 \| 10846 \| 0.0517 \| 0.8879 \| 0.9013 \| 0.8945 \| 0.9855 \|
	\| 0.0235 \| 3.0 \| 11217 \| 0.0518 \| 0.8881 \| 0.9014 \| 0.8947 \| 0.9855 \|


	### Framework versions

	- Transformers 4.36.2
	- Pytorch 2.0.1
	- Datasets 2.16.1
	- Tokenizers 0.15.0