|
--- |
|
license: afl-3.0 |
|
language: |
|
- vi |
|
pipeline_tag: token-classification |
|
tags: |
|
- vietnamese |
|
- accents inserter |
|
--- |
|
|
|
# A Transformer model for inserting Vietnamese accent marks |
|
|
|
This model is finetuned from the XLM-Roberta Large. |
|
|
|
Example input: Toi di hoc. |
|
Target output: Tôi đi học. |
|
|
|
## Model training |
|
This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it |
|
to the accented token. |
|
For more details on the training process, please refer to this [blog post](https://peterhung.org/tech/insert-vietnamese-accent-transformer-model/). |
|
|
|
## How to use this model |
|
There are 2 main steps: |
|
- Load the model as a token classification model (*AutoModelForTokenClassification*). |
|
- Run the input through the model to obtain the tag index for each input token. |
|
- Use the tags' index to retreive the actual tags in the file *selected_tags_names.txt*. |
|
- Apply the transformation to each token to obtain accented tokens. |
|
|
|
|
|
|
|
|