peterhung's picture
Update README.md
cfe60a2 verified
|
raw
history blame
997 Bytes
metadata
license: afl-3.0
language:
  - vi
pipeline_tag: token-classification
tags:
  - vietnamese
  - accents inserter

A Transformer model for inserting Vietnamese accent marks

This model is finetuned from the XLM-Roberta Large.

Example input: Toi di hoc.
Target output: Tôi đi học.

Model training

This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it to the accented token.
For more details on the training process, please refer to this blog post.

How to use this model

There are 2 main steps:

  • Load the model as a token classification model (AutoModelForTokenClassification).
  • Run the input through the model to obtain the tag index for each input token.
  • Use the tags' index to retreive the actual tags in the file selected_tags_names.txt.
  • Apply the transformation to each token to obtain accented tokens.