metadata

license: afl-3.0
language:
  - vi
pipeline_tag: token-classification
tags:
  - vietnamese
  - accents inserter

A Transformer model for inserting Vietnamese accent marks

This model is finetuned from the XLM-Roberta Large.

Example input: Toi di hoc.
Target output: Tôi đi học.

Model training

This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it to the accented token.
For more details on the training process, please refer to this blog post.

How to use this model

There are 2 main steps:

Load the model as a token classification model (AutoModelForTokenClassification).
Run the input through the model to obtain the tag index for each input token.
Use the tags' index to retreive the actual tags in the file selected_tags_names.txt.
Apply the transformation to each token to obtain accented tokens.