peterhung
/

vietnamese-accent-marker-xlm-roberta

Token Classification

accents inserter

Inference Endpoints

Model card Files Files and versions Community

vietnamese-accent-marker-xlm-roberta / README.md

peterhung's picture

Update README.md

cfe60a2 verified 6 months ago

|

997 Bytes

	---
	license: afl-3.0
	language:
	- vi
	pipeline_tag: token-classification
	tags:
	- vietnamese
	- accents inserter
	---

	# A Transformer model for inserting Vietnamese accent marks

	This model is finetuned from the XLM-Roberta Large.

	Example input: Toi di hoc.
	Target output: Tôi đi học.

	## Model training
	This problem was modelled as a token classification problem. For each input token, the goal is to asssign a "tag" that will transform it
	to the accented token.
	For more details on the training process, please refer to this [blog post](https://peterhung.org/tech/insert-vietnamese-accent-transformer-model/).

	## How to use this model
	There are 2 main steps:
	- Load the model as a token classification model (AutoModelForTokenClassification).
	- Run the input through the model to obtain the tag index for each input token.
	- Use the tags' index to retreive the actual tags in the file selected_tags_names.txt.
	- Apply the transformation to each token to obtain accented tokens.