File size: 997 Bytes
cfe60a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
license: afl-3.0
language:
- vi
pipeline_tag: token-classification
tags:
- vietnamese
- accents inserter
---

# A Transformer model for inserting Vietnamese accent marks

This model is finetuned from the XLM-Roberta Large.

Example input: Toi di hoc.  
Target output: Tôi đi học. 

## Model training
This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it
to the accented token.  
For more details on the training process, please refer to this [blog post](https://peterhung.org/tech/insert-vietnamese-accent-transformer-model/). 

## How to use this model
There are 2 main steps: 
- Load the model as a token classification model (*AutoModelForTokenClassification*).
- Run the input through the model to obtain the tag index for each input token.
- Use the tags' index to retreive the actual tags in the file *selected_tags_names.txt*.
- Apply the transformation to each token to obtain accented tokens.