peterhung
/

vietnamese-accent-marker-xlm-roberta

Token Classification

accents inserter

Inference Endpoints

Model card Files Files and versions Community

peterhung commited on May 16

Commit

cfe60a2

•

1 Parent(s): e711ddf

Update README.md

Files changed (1) hide show

README.md +31 -3

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
----
-license: afl-3.0
----

+---
+license: afl-3.0
+language:
+- vi
+pipeline_tag: token-classification
+tags:
+- vietnamese
+- accents inserter
+---
+# A Transformer model for inserting Vietnamese accent marks
+This model is finetuned from the XLM-Roberta Large.
+Example input: Toi di hoc.
+Target output: Tôi đi học.
+## Model training
+This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it
+to the accented token.
+For more details on the training process, please refer to this [blog post](https://peterhung.org/tech/insert-vietnamese-accent-transformer-model/).
+## How to use this model
+There are 2 main steps:
+- Load the model as a token classification model (*AutoModelForTokenClassification*).
+- Run the input through the model to obtain the tag index for each input token.
+- Use the tags' index to retreive the actual tags in the file *selected_tags_names.txt*.
+- Apply the transformation to each token to obtain accented tokens.