1-800-BAD-CODE
/

xlm-roberta_punctuation_fullstop_truecase

Text2Text Generation

sentence-boundary-detection

Model card Files Files and versions Community

1-800-BAD-CODE commited on Jun 2, 2023

Commit

8d75e5b

•

1 Parent(s): 932cc97

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -285,11 +285,8 @@ This model predicts the following set of punctuation tokens before each subword:
 # Training Details
-This model was trained in the NeMo framework.
-This model was trained on an A100 for approximately 9 hours.
-## Training Data
 This model was trained with News Crawl data from WMT.
 1M lines of text for each language was used, except for a few low-resource languages which may have used less.
 Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author.
@@ -308,6 +305,8 @@ by selecting more of these sentences from additional training data that was not
 The model may also over-predict commas.
 # Evaluation
 In these metrics, keep in mind that

 # Training Details
+This model was trained in the NeMo framework on an A100 for approximately 9 hours.
 This model was trained with News Crawl data from WMT.
 1M lines of text for each language was used, except for a few low-resource languages which may have used less.
 Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author.
 The model may also over-predict commas.
+If you find any general limitations not mentioned here, let me know so all limitations can be addressed in the
+next fine-tuning.
 # Evaluation
 In these metrics, keep in mind that