1-800-BAD-CODE
commited on
Commit
•
d44eb1a
1
Parent(s):
f204db5
Update README.md
Browse files
README.md
CHANGED
@@ -71,6 +71,12 @@ $ pip install punctuators
|
|
71 |
|
72 |
Though this is just an ONNX and SentencePiece model, so you may run it as you wish.
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
<details open>
|
75 |
|
76 |
<summary>Example Usage</summary>
|
@@ -85,35 +91,15 @@ m: PunctCapSegModelONNX = PunctCapSegModelONNX.from_pretrained(
|
|
85 |
"1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase"
|
86 |
)
|
87 |
|
88 |
-
input_texts: List[str] = [
|
89 |
-
# "hello world how's it going did you see the game last night my favorite team was playing and i got to go to "
|
90 |
-
# "the game it went into overtime and i got home late i like most sports but some are kind of boring especially "
|
91 |
-
# "baseball most of the time they aren't really playing they're just standing around waiting for something to "
|
92 |
-
# "happen i wish it were more exiting like football or hockey in those sports you have practically non stop play "
|
93 |
-
# "and everyone is involved in the game at all times unlike in baseball where it's only one person at a time",
|
94 |
-
# "hola mundo cómo estás estamos bajo el sol y hace mucho calor santa coloma abre los huertos urbanos a las escuelas "
|
95 |
-
# "de la ciudad",
|
96 |
-
"hello friend how's it going it's snowing outside right now in connecticut a large storm is moving in",
|
97 |
-
# "未來疫苗將有望覆蓋3歲以上全年齡段美國與北約軍隊已全部撤離還有鐵路公路在內的各項基建的來源都將枯竭",
|
98 |
-
# "በባለፈው ሳምንት ኢትዮጵያ ከሶማሊያ 3 ሺህ ወታደሮቿንም እንዳስወጣች የሶማሊያው ዳልሳን ሬድዮ ዘግቦ ነበር ጸጥታ ሃይሉና ህዝቡ ተቀናጅቶ "
|
99 |
-
# "በመስራቱ በመዲናዋ ላይ የታቀደው የጥፋት ሴራ ከሽፏል",
|
100 |
-
# "all human beings are born free and equal in dignity and rights they are endowed with reason and conscience and "
|
101 |
-
# "should act towards one another in a spirit of brotherhood",
|
102 |
-
# "सभी मनुष्य जन्म से मर्यादा और अधिकारों में स्वतंत्र और समान होते हैं वे तर्क और विवेक से संपन्न हैं तथा उन्हें भ्रातृत्व की भावना से परस्पर के प्रति कार्य करना चाहिए",
|
103 |
-
# "wszyscy ludzie rodzą się wolni i równi pod względem swej godności i swych praw są oni obdarzeni rozumem i "
|
104 |
-
# "sumieniem i powinni postępować wobec innych w duchu braterstwa",
|
105 |
-
# "tous les êtres humains naissent libres et égaux en dignité et en droits ils sont doués de raison et de conscience "
|
106 |
-
# "et doivent agir les uns envers les autres dans un esprit de fraternité",
|
107 |
-
]
|
108 |
input_texts: List[str] = [
|
109 |
"hola mundo cómo estás estamos bajo el sol y hace mucho calor santa coloma abre los huertos urbanos a las escuelas de la ciudad",
|
110 |
"hello friend how's it going it's snowing outside right now in connecticut a large storm is moving in",
|
111 |
"未來疫苗將有望覆蓋3歲以上全年齡段美國與北約軍隊已全部撤離還有鐵路公路在內的各項基建的來源都將枯竭",
|
112 |
"በባለፈው ሳምንት ኢትዮጵያ ከሶማሊያ 3 ሺህ ወታደሮቿንም እንዳስወጣች የሶማሊያው ዳልሳን ሬድዮ ዘግቦ ነበር ጸጥታ ሃይሉና ህዝቡ ተቀናጅቶ በመስራቱ በመዲናዋ ላይ የታቀደው የጥፋት ሴራ ከሽፏል",
|
113 |
-
"
|
114 |
-
"
|
115 |
-
"
|
116 |
-
"
|
117 |
]
|
118 |
|
119 |
results: List[List[str]] = m.infer(
|
@@ -136,6 +122,51 @@ for input_text, output_texts in zip(input_texts, results):
|
|
136 |
<summary>Expected output</summary>
|
137 |
|
138 |
```text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
|
140 |
```
|
141 |
|
|
|
71 |
|
72 |
Though this is just an ONNX and SentencePiece model, so you may run it as you wish.
|
73 |
|
74 |
+
The input to the `punctuators` API is a list (batch) of strings.
|
75 |
+
Each string will be punctuated, true-cased, and segmented on predicted full stops.
|
76 |
+
The output will therefore be a list of list of strings: one list of segmented sentences per input text.
|
77 |
+
To disable full stops, use `m.infer(texts, apply_sbd=False)`.
|
78 |
+
The output will then be a list of strings: one punctuated, true-cased string per input text.
|
79 |
+
|
80 |
<details open>
|
81 |
|
82 |
<summary>Example Usage</summary>
|
|
|
91 |
"1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase"
|
92 |
)
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
input_texts: List[str] = [
|
95 |
"hola mundo cómo estás estamos bajo el sol y hace mucho calor santa coloma abre los huertos urbanos a las escuelas de la ciudad",
|
96 |
"hello friend how's it going it's snowing outside right now in connecticut a large storm is moving in",
|
97 |
"未來疫苗將有望覆蓋3歲以上全年齡段美國與北約軍隊已全部撤離還有鐵路公路在內的各項基建的來源都將枯竭",
|
98 |
"በባለፈው ሳምንት ኢትዮጵያ ከሶማሊያ 3 ሺህ ወታደሮቿንም እንዳስወጣች የሶማሊያው ዳልሳን ሬድዮ ዘግቦ ነበር ጸጥታ ሃይሉና ህዝቡ ተቀናጅቶ በመስራቱ በመዲናዋ ላይ የታቀደው የጥፋት ሴራ ከሽፏል",
|
99 |
+
"こんにちは友人" "調子はどう" "今日は雨の日でしたね" "乾いた状態を保つために一日中室内で過ごしました",
|
100 |
+
"hallo freund wie geht's es war heute ein regnerischer tag nicht wahr ich verbrachte den tag drinnen um trocken zu bleiben",
|
101 |
+
"हैलो दोस्त ये कैसा चल रहा है आज बारिश का दिन था न मैंने सूखा रहने के लिए दिन घर के अंदर बिताया",
|
102 |
+
"كيف تجري الامور كان يومًا ممطرًا اليوم أليس كذلك قضيت اليوم في الداخل لأظل جافًا",
|
103 |
]
|
104 |
|
105 |
results: List[List[str]] = m.infer(
|
|
|
122 |
<summary>Expected output</summary>
|
123 |
|
124 |
```text
|
125 |
+
Input: hola mundo cómo estás estamos bajo el sol y hace mucho calor santa coloma abre los huertos urbanos a las escuelas de la ciudad
|
126 |
+
Outputs:
|
127 |
+
Hola mundo, ¿cómo estás?
|
128 |
+
Estamos bajo el sol y hace mucho calor.
|
129 |
+
Santa Coloma abre los huertos urbanos a las escuelas de la ciudad.
|
130 |
+
|
131 |
+
Input: hello friend how's it going it's snowing outside right now in connecticut a large storm is moving in
|
132 |
+
Outputs:
|
133 |
+
Hello friend, how's it going?
|
134 |
+
It's snowing outside right now.
|
135 |
+
In Connecticut, a large storm is moving in.
|
136 |
+
|
137 |
+
Input: 未來疫苗將有望覆蓋3歲以上全年齡段美國與北約軍隊已全部撤離還有鐵路公路在內的各項基建的來源都將枯竭
|
138 |
+
Outputs:
|
139 |
+
未來,疫苗將有望覆蓋3歲以上全年齡段。
|
140 |
+
美國與北約軍隊已全部撤離。
|
141 |
+
還有,鐵路,公路在內的各項基建的來源都將枯竭。
|
142 |
+
|
143 |
+
Input: በባለፈው ሳምንት ኢትዮጵያ ከሶማሊያ 3 ሺህ ወታደሮቿንም እንዳስወጣች የሶማሊያው ዳልሳን ሬድዮ ዘግቦ ነበር ጸጥታ ሃይሉና ህዝቡ ተቀናጅቶ በመስራቱ በመዲናዋ ላይ የታቀደው የጥፋት ሴራ ከሽፏል
|
144 |
+
Outputs:
|
145 |
+
በባለፈው ሳምንት ኢትዮጵያ ከሶማሊያ 3 ሺህ ወታደሮቿንም እንዳስወጣች የሶማሊያው ዳልሳን ሬድዮ ዘግቦ ነበር።
|
146 |
+
ጸጥታ ሃይሉና ህዝቡ ተቀናጅቶ በመስራቱ በመዲናዋ ላይ የታቀደው የጥፋት ሴራ ከሽፏል።
|
147 |
+
|
148 |
+
Input: こんにちは友人調子はどう今日は雨の日でしたね乾いた状態を保つために一日中室内で過ごしました
|
149 |
+
Outputs:
|
150 |
+
こんにちは、友人、調子はどう?
|
151 |
+
今日は雨の日でしたね。
|
152 |
+
乾いた状態を保つために、一日中、室内で過ごしました。
|
153 |
+
|
154 |
+
Input: hallo freund wie geht's es war heute ein regnerischer tag nicht wahr ich verbrachte den tag drinnen um trocken zu bleiben
|
155 |
+
Outputs:
|
156 |
+
Hallo Freund, wie geht's?
|
157 |
+
Es war heute ein regnerischer Tag, nicht wahr?
|
158 |
+
Ich verbrachte den Tag drinnen, um trocken zu bleiben.
|
159 |
+
|
160 |
+
Input: हैलो दोस्त ये कैसा चल रहा है आज बारिश का दिन था न मैंने सूखा रहने के लिए दिन घर के अंदर बिताया
|
161 |
+
Outputs:
|
162 |
+
हैलो दोस्त, ये कैसा चल रहा है?
|
163 |
+
आज बारिश का दिन था न, मैंने सूखा रहने के लिए दिन घर के अंदर बिताया।
|
164 |
+
|
165 |
+
Input: كيف تجري الامور كان يومًا ممطرًا اليوم أليس كذلك قضيت اليوم في الداخل لأظل جافًا
|
166 |
+
Outputs:
|
167 |
+
كيف تجري الامور؟
|
168 |
+
كان يومًا ممطرًا اليوم، أليس كذلك؟
|
169 |
+
قضيت اليوم في الداخل لأظل جافًا.
|
170 |
|
171 |
```
|
172 |
|