--- base_model: - Kansallisarkisto/multicentury-htr-model-onnx pipeline_tag: image-to-text license: mit --- ## Handwritten text recognition for table cell images The model performs handwritten text recognition from text line images. It was trained by fine-tuning the National Archives' Multicentury HTR model Microsoft's TrOCR model using text line images taken from Finnish death record and census record tables from the 1930s. ## Intended uses & limitations The model has been trained to recognize handwritten text from a specific type of table cell data, and may generalize poorly to other datasets. The model takes as input text line images, and the use of other types of inputs are not recommended. ## How to use The model can be used for predicting the text content of images following the code below. It is recommended to use GPU for inference if available. ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import torch # Use GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Model location in Huggingface Hub model_checkpoint = "Kansallisarkisto/tablecell-htr" # Path to textline image line_image_path = "/path/to/textline_image.jpg" # Initialize processor and model processor = TrOCRProcessor.from_pretrained(model_checkpoint) model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device) # Open image file and extract pixel values image = Image.open(line_image_path).convert("RGB") pixel_values = processor(image, return_tensors="pt").pixel_values # Use the model to generate predictions generated_ids = model.generate(pixel_values.to(device)) # Use the processor to decode ids to text generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_text) ``` The model that is downloaded from the HuggingFace Hub is saved locally to `~/.cache/huggingface/hub/`. ## Training data Model was trained using 6704 text line images, while the validation dataset contained 836 text line images. ## Training procedure This model was trained using a NVIDIA RTX A6000 GPU with the following hyperparameters: - train batch size: 16 - epochs: 15 - optimizer: AdamW - maximum length of text sequence: 64 For other parameters, the default values were used (find more information [here](https://huggingface.co/docs/transformers/model_doc/trocr)). The training code is available in the `train_trocr.py` code file. ## Evaluation results Evaluation results using the validation dataset are listed below: | Validation loss | Validation CER | Validation WER | | :-------------- | :------------- | :------------- | | 0.903 | 0.107 | 0.237 | The metrics were calculated using the [Evaluate](https://huggingface.co/docs/evaluate/index) library. More information on the CER metric can be found [here](https://huggingface.co/spaces/evaluate-metric/cer). More information on the WER metric can be found [here](https://huggingface.co/spaces/evaluate-metric/wer).