Ikala-allen commited on
Commit
f6db68b
1 Parent(s): d772cf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -35
README.md CHANGED
@@ -5,7 +5,7 @@ datasets:
5
  tags:
6
  - evaluate
7
  - metric
8
- description: "TODO: add a description here"
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
@@ -13,7 +13,7 @@ pinned: false
13
  ---
14
 
15
  # Metric Card for relation_extraction evalutation
16
- This metric is for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
17
 
18
 
19
  ## Metric Description
@@ -24,31 +24,31 @@ This metric takes 2 inputs, prediction and references(ground truth). Both of the
24
  ```
25
  import evaluate
26
 
27
- # load metric
28
- metric_path = "Ikala-allen/relation_extraction"
29
- module = evaluate.load(metric_path)
30
-
31
- # Define your predictions and references
32
- references = [
33
- [
34
- {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
35
- {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
36
- ]
37
- ]
38
-
39
- # Example references (ground truth)
40
- predictions = [
41
- [
42
- {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
43
- {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
44
- ]
45
- ]
46
-
47
- # Calculate evaluation scores using the loaded metric
48
- evaluation_scores = module.compute(predictions=predictions, references=references)
49
-
50
- print(evaluation_scores)
51
- >>> {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
52
  ```
53
 
54
 
@@ -58,18 +58,96 @@ print(evaluation_scores)
58
  -
59
  ### Output Values
60
 
61
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
62
-
63
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
64
-
65
- #### Values from Popular Papers
66
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ### Examples
69
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ## Limitations and Bias
72
- *Note any known limitations or biases that the metric has, with links and references if possible.*
73
 
74
  ## Citation
75
  *Cite the source where this metric was introduced.*
 
5
  tags:
6
  - evaluate
7
  - metric
8
+ description: "This metric is used for evaluating the F1 accuracy of input references and predictions."
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
 
13
  ---
14
 
15
  # Metric Card for relation_extraction evalutation
16
+ This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
17
 
18
 
19
  ## Metric Description
 
24
  ```
25
  import evaluate
26
 
27
+ #### load metric
28
+ >>> metric_path = "Ikala-allen/relation_extraction"
29
+ >>> module = evaluate.load(metric_path)
30
+
31
+ #### Define your predictions and references
32
+ #### Example references (ground truth)
33
+ >>> references = [
34
+ ... [
35
+ ... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
36
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
37
+ ... ]
38
+ ... ]
39
+
40
+ #### Example predictions
41
+ >>> predictions = [
42
+ ... [
43
+ ... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
44
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
45
+ ... ]
46
+ ... ]
47
+
48
+ #### Calculate evaluation scores using the loaded metric
49
+ >>> evaluation_scores = module.compute(predictions=predictions, references=references)
50
+ >>> print(evaluation_scores)
51
+ {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
52
  ```
53
 
54
 
 
58
  -
59
  ### Output Values
60
 
61
+ **output** (`dictionary` of `dictionary`s) with multiple key-value pairs
62
+ - **sell** (`dictionary`): score of type sell
63
+ - **tp** : true positive count
64
+ - **fp** : false positive count
65
+ - **fn** : false negative count
66
+ - **p** : precision
67
+ - **r** : recall
68
+ - **f1** : micro f1 score
69
+ - **ALL** (`dictionary`): score of all of the type (sell and belongs to)
70
+ - **tp** : true positive count
71
+ - **fp** : false positive count
72
+ - **fn** : false negative count
73
+ - **p** : precision
74
+ - **r** : recall
75
+ - **f1** : micro f1 score
76
+ - **Macro_f1** : macro f1 score
77
+ - **Macro_p** : macro precision
78
+ - **Macro_r** : macro recall
79
+ -
80
+ Output Example:
81
+ ```python
82
+ {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
83
+ ```
84
+ Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs.
85
 
86
  ### Examples
87
+ Example of only one prediction and reference:
88
+ ```python
89
+ >>> metric_path = "Ikala-allen/relation_extraction"
90
+ >>> module = evaluate.load(metric_path)
91
+
92
+ #### Define your predictions and references
93
+ #### Example references (ground truth)
94
+ >>> references = [
95
+ ... [
96
+ ... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
97
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
98
+ ... ]
99
+ ... ]
100
+
101
+ #### Example predictions
102
+ >>> predictions = [
103
+ ... [
104
+ ... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
105
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
106
+ ... ]
107
+ ... ]
108
+
109
+ #### Calculate evaluation scores using the loaded metric
110
+ >>> evaluation_scores = module.compute(predictions=predictions, references=references)
111
+ >>> print(evaluation_scores)
112
+ {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
113
+ ```
114
+
115
+ Example with two or more prediction and reference:
116
+ ```python
117
+ >>> metric_path = "Ikala-allen/relation_extraction"
118
+ >>> module = evaluate.load(metric_path)
119
+
120
+ #### Define your predictions and references
121
+ #### Example references (ground truth)
122
+ >>> references = [
123
+ ... [
124
+ ... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
125
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
126
+ ... ],[
127
+ ... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
128
+ ... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
129
+ ... ]
130
+ ... ]
131
+
132
+ #### Example predictions
133
+ >>> predictions = [
134
+ ... [
135
+ ... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
136
+ ... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
137
+ ... ],[
138
+ ... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
139
+ ... {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
140
+ ... ]
141
+ ... ]
142
+
143
+ #### Calculate evaluation scores using the loaded metric
144
+ >>> evaluation_scores = module.compute(predictions=predictions, references=references)
145
+ >>> print(evaluation_scores)
146
+ {'sell': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146}, 'ALL': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146, 'Macro_f1': 57.142857142857146, 'Macro_p': 50.0, 'Macro_r': 66.66666666666667}}
147
+ ```
148
 
149
  ## Limitations and Bias
150
+ This metric has multiple known limitations:
151
 
152
  ## Citation
153
  *Cite the source where this metric was introduced.*