oliverguhr commited on
Commit
16fd186
1 Parent(s): 8eeec9d

updated readme

Browse files
Files changed (1) hide show
  1. README.md +51 -22
README.md CHANGED
@@ -8,40 +8,27 @@ datasets: sonar
8
  license: mit
9
  widget:
10
  - text: "hervatting van de zitting ik verklaar de zitting van het europees parlement die op vrijdag 17 december werd onderbroken te zijn hervat"
11
- example_title: "EuroParl Sample"
12
  metrics:
13
  - f1
14
  ---
15
 
16
- ## Model
17
 
18
- Trained on Sonar corpus
19
 
20
- ## Performance
 
 
21
 
22
- Evaluated on dutch SoNaR data set
23
- ```
24
- precision recall f1-score support
25
-
26
- , 0.754384 0.687349 0.719308 3127454
27
- - 0.848480 0.628337 0.722000 331849
28
- . 0.856989 0.851786 0.854380 4941897
29
- 0 0.982454 0.989201 0.985816 73926815
30
- : 0.738974 0.657906 0.696088 590946
31
- ? 0.730301 0.643325 0.684060 410416
32
 
33
- accuracy 0.964233 83329377
34
- macro avg 0.818597 0.742984 0.776942 83329377
35
- weighted avg 0.962951 0.964233 0.963427 83329377
36
-
37
- ```
38
-
39
- Usage:
40
 
41
  ```bash
42
  pip install deepmultilingualpunctuation
43
  ```
44
-
45
  ```python
46
  from deepmultilingualpunctuation import PunctuationModel
47
 
@@ -50,3 +37,45 @@ text = "hervatting van de zitting ik verklaar de zitting van het europees parlem
50
  result = model.restore_punctuation(text)
51
  print(result)
52
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  license: mit
9
  widget:
10
  - text: "hervatting van de zitting ik verklaar de zitting van het europees parlement die op vrijdag 17 december werd onderbroken te zijn hervat"
11
+ example_title: "Dutch Sample"
12
  metrics:
13
  - f1
14
  ---
15
 
16
+ This model predicts the punctuation of Dutch texts. We developed it to restore the punctuation of transcribed spoken language.
17
 
18
+ This multilanguage model was trained on the [SoNaR Dataset](http://hdl.handle.net/10032/tm-a2-h5).
19
 
20
+ The model restores the following punctuation markers: **"." "," "?" "-" ":"**
21
+ ## Sample Code
22
+ We provide a simple python package that allows you to process text of any length.
23
 
24
+ ## Install
 
 
 
 
 
 
 
 
 
25
 
26
+ To get started install the package from [pypi](https://pypi.org/project/deepmultilingualpunctuation/):
 
 
 
 
 
 
27
 
28
  ```bash
29
  pip install deepmultilingualpunctuation
30
  ```
31
+ ### Restore Punctuation
32
  ```python
33
  from deepmultilingualpunctuation import PunctuationModel
34
 
 
37
  result = model.restore_punctuation(text)
38
  print(result)
39
  ```
40
+
41
+ **output**
42
+ > hervatting van de zitting. ik verklaar de zitting van het europees parlement, die op vrijdag 17 december werd onderbroken, te zijn hervat.
43
+
44
+
45
+ ### Predict Labels
46
+ ```python
47
+ from deepmultilingualpunctuation import PunctuationModel
48
+
49
+ model = PunctuationModel()
50
+ text = "hervatting van de zitting ik verklaar de zitting van het europees parlement die op vrijdag 17 december werd onderbroken te zijn hervat"
51
+ clean_text = model.preprocess(text)
52
+ labled_words = model.predict(clean_text)
53
+ print(labled_words)
54
+ ```
55
+
56
+ **output**
57
+
58
+ > [['hervatting', '0', 0.99998724], ['van', '0', 0.9999784], ['de', '0', 0.99991274], ['zitting', '.', 0.6771242], ['ik', '0', 0.9999466], ['verklaar', '0', 0.9998566], ['de', '0', 0.9999783], ['zitting', '0', 0.9999809], ['van', '0', 0.99996245], ['het', '0', 0.99997795], ['europees', '0', 0.9999783], ['parlement', ',', 0.9908242], ['die', '0', 0.999985], ['op', '0', 0.99998224], ['vrijdag', '0', 0.9999831], ['17', '0', 0.99997985], ['december', '0', 0.9999827], ['werd', '0', 0.999982], ['onderbroken', ',', 0.9951485], ['te', '0', 0.9999677], ['zijn', '0', 0.99997723], ['hervat', '.', 0.9957053]]
59
+
60
+
61
+
62
+
63
+ ## Results
64
+
65
+ The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores:
66
+
67
+ | Label | F1 Score |
68
+ | ------------- | -------- |
69
+ | 0 | 0.985816 |
70
+ | . | 0.854380 |
71
+ | ? | 0.684060 |
72
+ | , | 0.719308 |
73
+ | : | 0.696088 |
74
+ | - | 0.722000 |
75
+ | macro average | 0.776942 |
76
+ | micro average | 0.963427 |
77
+
78
+ ## References
79
+
80
+ TBD
81
+