Spaces:
Runtime error
Runtime error
title: relation_extraction | |
datasets: | |
- none | |
tags: | |
- evaluate | |
- metric | |
description: >- | |
This metric is used for evaluating the F1 accuracy of input references and | |
predictions. | |
sdk: gradio | |
sdk_version: 3.19.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Metric Card for relation_extraction evalutation | |
This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality. | |
## Metric Description | |
This metric can be used in relation extraction evaluation. | |
## How to Use | |
This metric takes 2 inputs, prediction and references(ground truth). Both of them are a list of list of dictionary of entity's name and entity's type: | |
```python | |
>>> import evaluate | |
>>> metric_path = "Ikala-allen/relation_extraction" | |
>>> module = evaluate.load(metric_path) | |
>>> references = [ | |
... [ | |
... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ] | |
... ] | |
>>> predictions = [ | |
... [ | |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ] | |
... ] | |
>>> evaluation_scores = module.compute(predictions=predictions, references=references) | |
>>> print(evaluation_scores) | |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}} | |
``` | |
### Inputs | |
- **predictions** (`list` of `list`s of `dictionary`s): relation and its type of prediction | |
- **references** (`list` of `list`s of `dictionary`s): references for each relation and its type. | |
- | |
### Output Values | |
**output** (`dictionary` of `dictionary`s) with multiple key-value pairs | |
- **sell** (`dictionary`): score of type sell | |
- **tp** : true positive count | |
- **fp** : false positive count | |
- **fn** : false negative count | |
- **p** : precision | |
- **r** : recall | |
- **f1** : micro f1 score | |
- **ALL** (`dictionary`): score of all of the type (sell and belongs to) | |
- **tp** : true positive count | |
- **fp** : false positive count | |
- **fn** : false negative count | |
- **p** : precision | |
- **r** : recall | |
- **f1** : micro f1 score | |
- **Macro_f1** : macro f1 score | |
- **Macro_p** : macro precision | |
- **Macro_r** : macro recall | |
- | |
Output Example: | |
```python | |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}} | |
``` | |
Remind : Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs. | |
### Examples | |
Example of only one prediction and reference: | |
```python | |
>>> metric_path = "Ikala-allen/relation_extraction" | |
>>> module = evaluate.load(metric_path) | |
>>> references = [ | |
... [ | |
... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ] | |
... ] | |
>>> predictions = [ | |
... [ | |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ] | |
... ] | |
>>> evaluation_scores = module.compute(predictions=predictions, references=references) | |
>>> print(evaluation_scores) | |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}} | |
``` | |
Example with two or more prediction and reference: | |
```python | |
>>> metric_path = "Ikala-allen/relation_extraction" | |
>>> module = evaluate.load(metric_path) | |
>>> references = [ | |
... [ | |
... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ],[ | |
... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'} | |
... ] | |
... ] | |
>>> predictions = [ | |
... [ | |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ],[ | |
... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}, | |
... {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'} | |
... ] | |
... ] | |
>>> evaluation_scores = module.compute(predictions=predictions, references=references) | |
>>> print(evaluation_scores) | |
{'sell': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146}, 'ALL': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146, 'Macro_f1': 57.142857142857146, 'Macro_p': 50.0, 'Macro_r': 66.66666666666667}} | |
``` | |
## Limitations and Bias | |
This metric has strict filter mechanism, if any of the prediction's entity names, such as head, head_type, type, tail, or tail_type, is not exactly the same as the reference one. It will count as fp or fn. | |
## Citation | |
```bibtex | |
@Paper{ | |
author = {Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari}, | |
title = {Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!}, | |
year = {2020}, | |
} | |
*https://arxiv.org/abs/2009.10684* | |
``` | |
## Further References | |
This evaluation metric implementation uses | |
*https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py* |