Spaces:

Ikala-allen
/

relation_extraction

Runtime error

App Files Files Community

relation_extraction / README.md

Ikala-allen

Update README.md

ff3eea6 about 1 year ago

preview code

raw

history blame contribute delete

8.92 kB

	---
	title: relation_extraction
	datasets:
	- none
	tags:
	- evaluate
	- metric
	description: >-
	This metric is used for evaluating the F1 accuracy of input references and
	predictions.
	sdk: gradio
	sdk_version: 3.19.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# Metric Card for relation_extraction evaluation
	This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.


	## Metric Description
	This metric computes and returns various scoring metrics for the prediction model based on the mode specified, including Precision, Recall, F1-Score and others. It evaluates the model's predictions against the provided reference data.

	## How to Use
	```python
	import evaluate
	metric = evaluate.load("Ikala-allen/relation_extraction")
	references = [
	[
	{"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	]
	]
	predictions = [
	[
	{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	]
	]
	scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
	```

	### Inputs
	- predictions (`list` of `list` of `dictionary`): A list of list of dictionary predicted relations from the model.
	- references (`list` of `list` of `dictionary`): A list of list of dictionary ground-truth or reference relations to compare the predictions against.
	- mode (`str`, Optional): Evaluation mode - `strict` or `boundaries`. Default `strict`. `strict` mode takes into account both entities type and their relationships, while `boundaries` mode only considers the entity spans of the relationships.
	- detailed_scores (`bool`, Optional): Default `False`. If `True` it returns scores for each relation type specifically, if `False` it returns the overall scores.
	- relation_types (`list`, Optional): Default `[]`. A list of relation types to consider while evaluating. If not provided, relation types will be constructed from the ground truth or reference data.

	### Output Values

	output (`dictionary` of `dictionaries`) A dictionary mapping each entity type to its respective scoring metrics such as Precision, Recall, F1 score.
	- ALL (`dictionary`): score of total relation type
	- tp : true positive count
	- fp : false positive count
	- fn : false negative count
	- p : precision
	- r : recall
	- f1 : micro f1 score
	- Macro_f1 : macro f1 score
	- Macro_p : macro precision
	- Macro_r : macro recall
	- {selected relation type} (`dictionary`): score of selected relation type
	- tp : true positive count
	- fp : false positive count
	- fn : false negative count
	- p : precision
	- r : recall
	- f1 : micro f1 score

	Output Example:
	```python
	{'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
	```

	Note : `Macro_f1`, `Macro_p`, `Macro_r`, `p`, `r`, `f1` are always numbers between 0 and 1. The values of `tp`, `fp`, `fn` depend on the number of data inputs.

	### Examples
	Example1 : Only one prediction and reference.
	```python
	metric = evaluate.load("Ikala-allen/relation_extraction")
	references = [
	[
	{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
	]
	]
	predictions = [
	[
	{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	]
	]
	scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
	print(scores)
	>>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
	```

	Example 2 : Two or more prediction and reference. Output all score of relation type.
	```python
	metric = evaluate.load("Ikala-allen/relation_extraction")
	references = [
	[
	{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	],
	[
	{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
	{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
	]
	]
	predictions = [
	[
	{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	],
	[
	{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
	{'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
	]
	]
	scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=[])
	print(scores)
	>>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
	```

	Example 3 : Two or more prediction and reference. Output all score of relation type. Consider only the score of type "belongs_to".
	```python
	metric = evaluate.load("Ikala-allen/relation_extraction")
	references = [
	[
	{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	],
	[
	{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
	{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
	]
	]
	predictions = [
	[
	{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
	],
	[
	{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
	{'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
	]
	]
	scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=["belongs_to"])
	print(scores)
	>>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
	```

	## Limitations and Bias
	There are two mode in this metric : `strict` and `boundaries`. It offers multiple `relation_types` to choose from. Ensure you choose appropriate evaluation parameters, as they can significantly impact the F1 score.
	The entity(`head`,`tail`,`head_type`,`tail_type`) in both the prediction and reference should match exactly, disregarding case and spaces. If the prediction doesn't match the reference exactly, it will be counted as either a false positive (`fp`) or a false negative (`fn`).

	## Citation
	```bibtex
	@Paper{
	author = {Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari},
	title = {Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!},
	year = {2020},
	link = https://arxiv.org/abs/2009.10684
	}
	```
	## Further References
	This evaluation metric revised from
	https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py