A newer version of the Gradio SDK is available:
5.5.0
title: action_generation
datasets:
- none
tags:
- evaluate
- metric
description: 'TODO: add a description here'
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
Metric Card for action_generation
Metric Description
Evaluate the result of action generation task.
Consider the output format /class/phrase
. Compute the scores for both /class
and phrase
separately, and then perform a weighted sum of these scores.
How to Use
import evaluate
valid_labels = [
"/開箱",
"/教學",
"/表達",
"/分享/外部資訊",
"/分享/個人資訊",
"/推薦/產品",
"/推薦/服務",
"/推薦/其他",
""
]
predictions = [
["/開箱/xxx", "/教學/yyy", "/表達/zzz"],
["/分享/外部資訊/aaa", "/教學/yyy", "/表達/zzz", "/分享/個人資訊/bbb"]
]
references = [
["/開箱/xxx", "/教學/yyy", "/表達/zzz"],
["/推薦/產品/bbb", "/教學/yyy", "/表達/zzz"]
]
metric = evaluate.load("DarrenChensformer/action_generation")
result = metric.compute(predictions=predictions, references=references, valid_labels=valid_labels, detailed_scores=True)
print(result)
{'class': {'precision': 0.7143, 'recall': 0.8333, 'f1': 0.7692}, 'phrase': {'precision': 0.8571, 'recall': 1.0, 'f1': 0.9231}, 'weighted_sum': {'precision': 0.7429, 'recall': 0.8666, 'f1': 0.8}}
Inputs
List all input arguments in the format below
- input_field (type): Definition of input, with explanation if necessary. State any default value(s).
Output Values
Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}
State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."
Examples
Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.
Limitations and Bias
Note any known limitations or biases that the metric has, with links and references if possible.
Citation
Cite the source where this metric was introduced.
Further References
Add any useful further references.