1-800-BAD-CODE
commited on
Commit
•
5548a75
1
Parent(s):
cad4273
Update README.md
Browse files
README.md
CHANGED
@@ -178,6 +178,10 @@ This model was trained on news data, and may not perform well on conversational
|
|
178 |
Further, this model is unlikely to be of production quality.
|
179 |
It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
|
180 |
|
|
|
|
|
|
|
|
|
181 |
|
182 |
|
183 |
# Evaluation
|
|
|
178 |
Further, this model is unlikely to be of production quality.
|
179 |
It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
|
180 |
|
181 |
+
This model over-predicts the inverted Spanish question mark, `¿`. Since `¿` is a rare token, especially in the
|
182 |
+
context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from
|
183 |
+
additional training data that was not used. However, this seems to have "over-corrected" the problem and a lot
|
184 |
+
of Spanish question marks are predicted.
|
185 |
|
186 |
|
187 |
# Evaluation
|