Ikala-allen commited on
Commit
5306dec
1 Parent(s): 0aa8c44

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -51
README.md CHANGED
@@ -15,7 +15,7 @@ pinned: false
15
  license: apache-2.0
16
  ---
17
 
18
- # Metric Card for relation_extraction evalutation
19
  This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
20
 
21
 
@@ -23,11 +23,9 @@ This metric is used for evaluating the quality of relation extraction output. By
23
  This metric can be used in relation extraction evaluation.
24
 
25
  ## How to Use
26
- This metric takes 3 inputs, prediction, references(ground truth) and mode. Predictions and references are a list of list of dictionary of entity's name and entity's type. And mode define the evaluation type:
27
  ```python
28
  import evaluate
29
- metric_path = "Ikala-allen/relation_extraction"
30
- module = evaluate.load(metric_path)
31
  references = [
32
  [
33
  {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -40,19 +38,19 @@ predictions = [
40
  {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
41
  ]
42
  ]
43
- evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict")
44
  ```
45
 
46
  ### Inputs
47
  - **predictions** (`list` of `list` of `dictionary`): A list of list of dictionary predicted relations from the model.
48
  - **references** (`list` of `list` of `dictionary`): A list of list of dictionary ground-truth or reference relations to compare the predictions against.
49
- - **mode** (`str`): Evaluation mode - 'strict' or 'boundaries'. 'strict' mode takes into account both entities type and their relationships, while 'boundaries' mode only considers the entity spans of the relationships.
50
- - **detailed_scores** (`bool`): If True it returns scores for each relation type specifically, if False it returns the overall scores.
51
- - **relation_types** (`list`): A list of relation types to consider while evaluating. If not provided, relation types will be constructed from the ground truth or reference data.
52
 
53
  ### Output Values
54
 
55
- **output** (`dictionary` of `dictionary`s) A dictionary mapping each entity type to its respective scoring metrics such as Precision, Recall, F1 Score.
56
  - **ALL** (`dictionary`): score of total relation type
57
  - **tp** : true positive count
58
  - **fp** : false positive count
@@ -76,35 +74,12 @@ Output Example:
76
  {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
77
  ```
78
 
79
- Remind : Macro_f1Macro_pMacro_rprf1 are always a number between 0 and 1. And tpfpfn depend on how many data inputs.
80
 
81
  ### Examples
82
- Example1 : only one prediction and reference, mode=strict, detailed_scores=False, only output total relation score
83
- ```python
84
- metric_path = "Ikala-allen/relation_extraction"
85
- module = evaluate.load(metric_path)
86
- references = [
87
- [
88
- {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
89
- {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
90
- {'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
91
- ]
92
- ]
93
- predictions = [
94
- [
95
- {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
96
- {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
97
- ]
98
- ]
99
- evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
100
- print(evaluation_scores)
101
- >>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
102
- ```
103
-
104
- Example2 : only one prediction and reference, mode=boundaries, detailed_scores=False, only output total relation score
105
- ```python
106
- metric_path = "Ikala-allen/relation_extraction"
107
- module = evaluate.load(metric_path)
108
  references = [
109
  [
110
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -118,15 +93,14 @@ predictions = [
118
  {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
119
  ]
120
  ]
121
- evaluation_scores = module.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
122
- print(evaluation_scores)
123
  >>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
124
  ```
125
 
126
- Example3 : two or more prediction and reference, mode=boundaries, detailed_scores=True, output all relation type score
127
  ```python
128
- metric_path = "Ikala-allen/relation_extraction"
129
- module = evaluate.load(metric_path)
130
  references = [
131
  [
132
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -147,15 +121,14 @@ predictions = [
147
  {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
148
  ]
149
  ]
150
- evaluation_scores = module.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=[])
151
- print(evaluation_scores)
152
  >>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
153
  ```
154
 
155
- Example 4 : two or more prediction and reference, mode=boundaries, detailed_scores=True, output all relation type score, relation_types = ["belongs_to"], only consider belongs_to type score
156
  ```python
157
- metric_path = "Ikala-allen/relation_extraction"
158
- module = evaluate.load(metric_path)
159
  references = [
160
  [
161
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
@@ -176,14 +149,14 @@ predictions = [
176
  {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
177
  ]
178
  ]
179
- evaluation_scores = module.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=["belongs_to"])
180
- print(evaluation_scores)
181
  >>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
182
  ```
183
 
184
  ## Limitations and Bias
185
- This metric has two mode : strict and boundaries. It also has multiple relation_types to choose from. Make sure to select suitable evaluation parameters. F1 score may be totally different.
186
- Prediction and reference entity_name should be exactly the same regardless of case and spaces. If prediction is not exactly the same as the reference one. It will count as fp or fn.
187
 
188
  ## Citation
189
  ```bibtex
@@ -195,5 +168,5 @@ Prediction and reference entity_name should be exactly the same regardless of ca
195
  }
196
  ```
197
  ## Further References
198
- This evaluation metric implementation uses
199
  *https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*
 
15
  license: apache-2.0
16
  ---
17
 
18
+ # Metric Card for relation_extraction evaluation
19
  This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
20
 
21
 
 
23
  This metric can be used in relation extraction evaluation.
24
 
25
  ## How to Use
 
26
  ```python
27
  import evaluate
28
+ metric = evaluate.load("Ikala-allen/relation_extraction")
 
29
  references = [
30
  [
31
  {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
 
38
  {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
39
  ]
40
  ]
41
+ scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
42
  ```
43
 
44
  ### Inputs
45
  - **predictions** (`list` of `list` of `dictionary`): A list of list of dictionary predicted relations from the model.
46
  - **references** (`list` of `list` of `dictionary`): A list of list of dictionary ground-truth or reference relations to compare the predictions against.
47
+ - **mode** (`str`): Evaluation mode - 'strict' or 'boundaries'. 'strict' mode takes into account both entities type and their relationships, while 'boundaries' mode only considers the entity spans of the relationships. Default `strict`.
48
+ - **detailed_scores** (`bool`): If True it returns scores for each relation type specifically, if False it returns the overall scores. Default `False`.
49
+ - **relation_types** (`list`): A list of relation types to consider while evaluating. If not provided, relation types will be constructed from the ground truth or reference data. Default `[]`.
50
 
51
  ### Output Values
52
 
53
+ **output** (`dictionary` of `dictionaries`) A dictionary mapping each entity type to its respective scoring metrics such as Precision, Recall, F1 score.
54
  - **ALL** (`dictionary`): score of total relation type
55
  - **tp** : true positive count
56
  - **fp** : false positive count
 
74
  {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
75
  ```
76
 
77
+ Note : `Macro_f1`, `Macro_p`, `Macro_r`, `p`, `r`, `f1` are always a number between 0 and 1. And `tp`, `fp`, `fn` depend on how many data inputs.
78
 
79
  ### Examples
80
+ Example1 : Only one prediction and reference.
81
+ ```python
82
+ metric = evaluate.load("Ikala-allen/relation_extraction")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  references = [
84
  [
85
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
 
93
  {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
94
  ]
95
  ]
96
+ scores = metric.compute(predictions=predictions, references=references, mode="strict", detailed_scores=False, relation_types=[])
97
+ print(scores)
98
  >>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0}
99
  ```
100
 
101
+ Example 2 : Two or more prediction and reference, Output all relation type score
102
  ```python
103
+ metric = evaluate.load("Ikala-allen/relation_extraction")
 
104
  references = [
105
  [
106
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
 
121
  {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
122
  ]
123
  ]
124
+ scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=[])
125
+ print(scores)
126
  >>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
127
  ```
128
 
129
+ Example 3 : Two or more prediction and reference. Output all relation type score. Only consider score of type "belongs_to".
130
  ```python
131
+ metric = evaluate.load("Ikala-allen/relation_extraction")
 
132
  references = [
133
  [
134
  {"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
 
149
  {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
150
  ]
151
  ]
152
+ scores = metric.compute(predictions=predictions, references=references, mode="boundaries", detailed_scores=True, relation_types=["belongs_to"])
153
+ print(scores)
154
  >>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
155
  ```
156
 
157
  ## Limitations and Bias
158
+ There are two mode in this metric : `strict` and `boundaries`. It offers multiple `relation_types` to choose from. Ensure you choose appropriate evaluation parameters, as they can significantly impact the F1 score.
159
+ The `entity_name` in both the prediction and reference should match exactly, disregarding case and spaces. If the prediction doesn't match the reference exactly, it will be counted as either a false positive (fp) or a false negative (fn).
160
 
161
  ## Citation
162
  ```bibtex
 
168
  }
169
  ```
170
  ## Further References
171
+ This evaluation metric revised from
172
  *https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*