Spaces:

Ikala-allen
/

relation_extraction

Runtime error

App Files Files Community

Ikala-allen commited on Oct 4, 2023

Commit

f6db68b

•

1 Parent(s): d772cf1

Update README.md

Browse files

Files changed (1) hide show

README.md +113 -35

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 tags:
 - evaluate
 - metric
-description: "TODO: add a description here"
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
@@ -13,7 +13,7 @@ pinned: false
 ---
 # Metric Card for relation_extraction evalutation
-This metric is for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
 ## Metric Description
@@ -24,31 +24,31 @@ This metric takes 2 inputs, prediction and references(ground truth). Both of the
 ```
 import evaluate
-# load metric
-metric_path = "Ikala-allen/relation_extraction"
-module = evaluate.load(metric_path)
-# Define your predictions and references
-references = [
-    [
-        {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-    ]
-]
-# Example references (ground truth)
-predictions = [
-    [
-        {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
-    ]
-]
-# Calculate evaluation scores using the loaded metric
-evaluation_scores = module.compute(predictions=predictions, references=references)
-print(evaluation_scores)
->>> {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
 ```
@@ -58,18 +58,96 @@ print(evaluation_scores)
 -
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
-#### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 ### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
 *Cite the source where this metric was introduced.*

 tags:
 - evaluate
 - metric
+description: "This metric is used for evaluating the F1 accuracy of input references and predictions."
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
 ---
 # Metric Card for relation_extraction evalutation
+This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
 ## Metric Description
 ```
 import evaluate
+#### load metric
+>>> metric_path = "Ikala-allen/relation_extraction"
+>>> module = evaluate.load(metric_path)
+#### Define your predictions and references
+#### Example references (ground truth)
+>>> references = [
+...    [
+...       {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...    ]
+... ]
+#### Example predictions
+>>> predictions = [
+...    [
+...        {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...    ]
+... ]
+#### Calculate evaluation scores using the loaded metric
+>>> evaluation_scores = module.compute(predictions=predictions, references=references)
+>>> print(evaluation_scores)
+{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
 ```
 -
 ### Output Values
+**output** (`dictionary` of `dictionary`s) with multiple key-value pairs
+- **sell** (`dictionary`): score of type sell
+  - **tp** : true positive count
+  - **fp** : false positive count
+  - **fn** : false negative count
+  - **p** : precision
+  - **r** : recall
+  - **f1** : micro f1 score
+- **ALL** (`dictionary`): score of all of the type (sell and belongs to)
+  - **tp** : true positive count
+  - **fp** : false positive count
+  - **fn** : false negative count
+  - **p** : precision
+  - **r** : recall
+  - **f1** : micro f1 score
+  - **Macro_f1** : macro f1 score
+  - **Macro_p** : macro precision
+  - **Macro_r** : macro recall
+-
+Output Example:
+```python
+{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
+```
+Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs.
 ### Examples
+Example of only one prediction and reference:
+```python
+>>> metric_path = "Ikala-allen/relation_extraction"
+>>> module = evaluate.load(metric_path)
+#### Define your predictions and references
+#### Example references (ground truth)
+>>> references = [
+...    [
+...       {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...    ]
+... ]
+#### Example predictions
+>>> predictions = [
+...    [
+...        {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...    ]
+... ]
+#### Calculate evaluation scores using the loaded metric
+>>> evaluation_scores = module.compute(predictions=predictions, references=references)
+>>> print(evaluation_scores)
+{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
+```
+Example with two or more prediction and reference:
+```python
+>>> metric_path = "Ikala-allen/relation_extraction"
+>>> module = evaluate.load(metric_path)
+#### Define your predictions and references
+#### Example references (ground truth)
+>>> references = [
+...     [
+...         {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...         {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...     ],[
+...           {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
+...           {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
+...     ]
+...   ]
+#### Example predictions
+>>> predictions = [
+...    [
+...        {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...        {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
+...    ],[
+...          {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
+...          {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
+...    ]
+...  ]
+#### Calculate evaluation scores using the loaded metric
+>>> evaluation_scores = module.compute(predictions=predictions, references=references)
+>>> print(evaluation_scores)
+{'sell': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146}, 'ALL': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146, 'Macro_f1': 57.142857142857146, 'Macro_p': 50.0, 'Macro_r': 66.66666666666667}}
+```
 ## Limitations and Bias
+This metric has multiple known limitations:
 ## Citation
 *Cite the source where this metric was introduced.*