Zongxia commited on
Commit
ef40418
•
1 Parent(s): 720b955

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md CHANGED
@@ -1,3 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
 
1
+ # QA-Evaluation-Metrics
2
+
3
+ [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
4
+
5
+
6
+ QA-Evaluation-Metrics is a fast and lightweight Python package for evaluating question-answering models. It provides various basic metrics to assess the performance of QA models. Check out our **CFMatcher**, a matching method going beyond token-level matching and is more efficient than LLM matchings but still retains competitive evaluation performance of transformer LLM models.
7
+
8
+ ## Installation
9
+
10
+ To install the package, run the following command:
11
+
12
+ ```bash
13
+ pip install qa-metrics
14
+ ```
15
+
16
+ ## Usage
17
+
18
+ The python package currently provides four QA evaluation metrics.
19
+
20
+ #### Exact Match
21
+ ```python
22
+ from qa_metrics.em import em_match
23
+
24
+ reference_answer = ["Charles , Prince of Wales"]
25
+ candidate_answer = "Prince Charles"
26
+ match_result = em_match(reference_answer, candidate_answer)
27
+ print("Exact Match: ", match_result)
28
+ ```
29
+
30
+ #### F1 Score
31
+ ```python
32
+ from qa_metrics.f1 import f1_match,f1_score_with_precision_recall
33
+
34
+ f1_stats = f1_score_with_precision_recall(reference_answer[0], candidate_answer)
35
+ print("F1 stats: ", f1_stats)
36
+
37
+ match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
38
+ print("F1 Match: ", match_result)
39
+ ```
40
+
41
+ #### CFMatch
42
+ ```python
43
+ from qa_metrics.cfm import CFMatcher
44
+
45
+ question = "who will take the throne after the queen dies"
46
+ cfm = CFMatcher()
47
+ scores = cfm.get_scores(reference_answer, candidate_answer, question)
48
+ match_result = cfm.cf_match(reference_answer, candidate_answer, question)
49
+ print("Score: %s; CF Match: %s" % (scores, match_result))
50
+ ```
51
+
52
+ #### Transformer Match
53
+ Our fine-tuned BERT model is on 🤗 [Huggingface](https://huggingface.co/Zongxia/answer_equivalence_bert?text=The+goal+of+life+is+%5BMASK%5D.). Our Package also supports downloading and matching directly. More Matching transformer models will be available 🔥🔥🔥
54
+
55
+ ```python
56
+ from qa_metrics.transformerMatcher import TransformerMatcher
57
+
58
+ question = "who will take the throne after the queen dies"
59
+ tm = TransformerMatcher("bert")
60
+ scores = tm.get_scores(reference_answer, candidate_answer, question)
61
+ match_result = tm.transformer_match(reference_answer, candidate_answer, question)
62
+ print("Score: %s; CF Match: %s" % (scores, match_result))
63
+ ```
64
+
65
+ ## Datasets
66
+ Our Training Dataset is adapted and augmented from [Bulian et al](https://github.com/google-research-datasets/answer-equivalence-dataset). Our [dataset repo](https://github.com/zli12321/Answer_Equivalence_Dataset.git) includes the augmented training set and QA evaluation testing sets discussed in our paper.
67
+
68
+ ## License
69
+
70
+ This project is licensed under the [MIT License](LICENSE.md) - see the LICENSE file for details.
71
+
72
+ ## Contact
73
+
74
+ For any additional questions or comments, please contact [[email protected]].
75
+
76
+
77
  ---
78
  license: mit
79
  ---