Spaces:

Ikala-allen
/

relation_extraction

Runtime error

App Files Files Community

Ikala-allen commited on Oct 11, 2023

Commit

5306dec

•

1 Parent(s): 0aa8c44

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -51

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ pinned: false
 license: apache-2.0
 ---
-# Metric Card for relation_extraction evalutation
 This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
@@ -23,11 +23,9 @@ This metric is used for evaluating the quality of relation extraction output. By
 This metric can be used in relation extraction evaluation.
 ## How to Use
-This metric takes 3 inputs, prediction, references(ground truth) and mode. Predictions and references are a list of list of dictionary of entity's name and entity's type. And mode define the evaluation type:
 ```python
 import evaluate
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
 references = [
   [
     {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -40,19 +38,19 @@ predictions = [
     {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
   ]
 ]
-evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict")
 ```
 ### Inputs
 - **predictions** (`list` of `list` of `dictionary`): A list of list of dictionary predicted relations from the model.
 - **references** (`list` of `list` of `dictionary`): A list of list of dictionary ground-truth or reference relations to compare the predictions against.
-- **mode** (`str`): Evaluation mode - 'strict' or 'boundaries'. 'strict' mode takes into account both entities type and their relationships, while 'boundaries' mode only considers the entity spans of the relationships.
-- **detailed_scores** (`bool`): If True it returns scores for each relation type specifically, if False it returns the overall scores.
-- **relation_types** (`list`): A list of relation types to consider while evaluating. If not provided, relation types will be constructed from the ground truth or reference data.
 ### Output Values
-**output** (`dictionary` of `dictionary`s) A dictionary mapping each entity type to its respective scoring metrics such as Precision, Recall, F1 Score.
 - **ALL** (`dictionary`): score of total relation type
   - **tp** : true positive count
   - **fp** : false positive count
@@ -76,35 +74,12 @@ Output Example:
 {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
 ```
-Remind : Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs.
 ### Examples
-Example1 : only one prediction and reference, mode=strict, detailed_scores=False, only output total relation score
-```python
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
-references = [
-  [
-    {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-    {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-    {'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
-  ]
-]
-predictions = [
-  [
-    {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-    {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-  ]
-]
-evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
-print(evaluation_scores)
->>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
-```
-Example2 : only one prediction and reference, mode=boundaries, detailed_scores=False, only output total relation score
-```python
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -118,15 +93,14 @@ predictions = [
     {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
   ]
 ]
-evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
-print(evaluation_scores)
 >>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
 ```
-Example3 : two or more prediction and reference, mode=boundaries, detailed_scores=True, output all relation type score
 ```python
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -147,15 +121,14 @@ predictions = [
     {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
   ]
 ]
-evaluation_scores = module.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=[])
-print(evaluation_scores)
 >>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
 ```
-Example 4 : two or more prediction and reference, mode=boundaries, detailed_scores=True, output all relation type score, relation_types  = ["belongs_to"], only consider belongs_to type score
 ```python
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -176,14 +149,14 @@ predictions = [
     {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
   ]
 ]
-evaluation_scores = module.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=["belongs_to"])
-print(evaluation_scores)
 >>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
 ```
 ## Limitations and Bias
-This metric has two mode : strict and boundaries. It also has multiple relation_types to choose from. Make sure to select suitable evaluation parameters. F1 score may be totally different.
-Prediction and reference entity_name should be exactly the same regardless of case and spaces. If prediction is not exactly the same as the reference one. It will count as fp or fn.
 ## Citation
 ```bibtex
@@ -195,5 +168,5 @@ Prediction and reference entity_name should be exactly the same regardless of ca
 }
 ```
 ## Further References
-This evaluation metric implementation uses
 *https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*

 license: apache-2.0
 ---
+# Metric Card for relation_extraction evaluation
 This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
 This metric can be used in relation extraction evaluation.
 ## How to Use
 ```python
 import evaluate
+metric = evaluate.load("Ikala-allen/relation_extraction")
 references = [
   [
     {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
     {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
   ]
 ]
+scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
 ```
 ### Inputs
 - **predictions** (`list` of `list` of `dictionary`): A list of list of dictionary predicted relations from the model.
 - **references** (`list` of `list` of `dictionary`): A list of list of dictionary ground-truth or reference relations to compare the predictions against.
+- **mode** (`str`): Evaluation mode - 'strict' or 'boundaries'. 'strict' mode takes into account both entities type and their relationships, while 'boundaries' mode only considers the entity spans of the relationships. Default `strict`.
+- **detailed_scores** (`bool`): If True it returns scores for each relation type specifically, if False it returns the overall scores. Default `False`.
+- **relation_types** (`list`): A list of relation types to consider while evaluating. If not provided, relation types will be constructed from the ground truth or reference data. Default `[]`.
 ### Output Values
+**output** (`dictionary` of `dictionaries`) A dictionary mapping each entity type to its respective scoring metrics such as Precision, Recall, F1 score.
 - **ALL** (`dictionary`): score of total relation type
   - **tp** : true positive count
   - **fp** : false positive count
 {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
 ```
+Note : `Macro_f1`, `Macro_p`, `Macro_r`, `p`, `r`, `f1` are always a number between 0 and 1. And `tp`, `fp`, `fn` depend on how many data inputs.
 ### Examples
+Example1 : Only one prediction and reference.
+```python
+metric = evaluate.load("Ikala-allen/relation_extraction")
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
     {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
   ]
 ]
+scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
+print(scores)
 >>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
 ```
+Example 2 : Two or more prediction and reference, Output all relation type score
 ```python
+metric = evaluate.load("Ikala-allen/relation_extraction")
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
     {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
   ]
 ]
+scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=[])
+print(scores)
 >>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
 ```
+Example 3 : Two or more prediction and reference. Output all relation type score. Only consider score of type "belongs_to".
 ```python
+metric = evaluate.load("Ikala-allen/relation_extraction")
 references = [
   [
     {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
     {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
   ]
 ]
+scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=["belongs_to"])
+print(scores)
 >>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
 ```
 ## Limitations and Bias
+There are two mode in this metric : `strict` and `boundaries`. It offers multiple `relation_types` to choose from. Ensure you choose appropriate evaluation parameters, as they can significantly impact the F1 score.
+The `entity_name` in both the prediction and reference should match exactly, disregarding case and spaces. If the prediction doesn't match the reference exactly, it will be counted as either a false positive (fp) or a false negative (fn).
 ## Citation
 ```bibtex
 }
 ```
 ## Further References
+This evaluation metric revised from
 *https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*