Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
-
---
|
2 |
-
license: afl-3.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: afl-3.0
|
3 |
+
language:
|
4 |
+
- vi
|
5 |
+
pipeline_tag: token-classification
|
6 |
+
tags:
|
7 |
+
- vietnamese
|
8 |
+
- accents inserter
|
9 |
+
---
|
10 |
+
|
11 |
+
# A Transformer model for inserting Vietnamese accent marks
|
12 |
+
|
13 |
+
This model is finetuned from the XLM-Roberta Large.
|
14 |
+
|
15 |
+
Example input: Toi di hoc.
|
16 |
+
Target output: Tôi đi học.
|
17 |
+
|
18 |
+
## Model training
|
19 |
+
This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it
|
20 |
+
to the accented token.
|
21 |
+
For more details on the training process, please refer to this [blog post](https://peterhung.org/tech/insert-vietnamese-accent-transformer-model/).
|
22 |
+
|
23 |
+
## How to use this model
|
24 |
+
There are 2 main steps:
|
25 |
+
- Load the model as a token classification model (*AutoModelForTokenClassification*).
|
26 |
+
- Run the input through the model to obtain the tag index for each input token.
|
27 |
+
- Use the tags' index to retreive the actual tags in the file *selected_tags_names.txt*.
|
28 |
+
- Apply the transformation to each token to obtain accented tokens.
|
29 |
+
|
30 |
+
|
31 |
+
|