rhovhannisyan commited on
Commit
74fde2a
1 Parent(s): c5f2d70

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -1,3 +1,45 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-sa-4.0
3
+ tags:
4
+ - donut
5
+ - image-to-text
6
+ - vision
7
+ - invoices
8
  ---
9
+
10
+ # Donut finetuned on invoices
11
+
12
+ Based on Donut base model (introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).
13
+
14
+ The model was trained with a few thousand of annotated invoices and non-invoices (for those the doctype will be 'Other'). They span across different countries and languages. They are always one page only. The dataset is proprietary unfortunately. Model is set to input resolution of 1280x1920 pixels. So any sample you want to try with higher dpi than 150 has no added value.
15
+ It was trained for about 4 hours on a NVIDIA RTX A4000 for 20k steps with a val_metric of 0.03413819904382196 at the end.
16
+ The following indexes were included in the train set:
17
+
18
+ DocType
19
+ Currency
20
+ DocumentDate
21
+ GrossAmount
22
+ InvoiceNumber
23
+ NetAmount
24
+ TaxAmount
25
+ OrderNumber
26
+ CreditorCountry
27
+
28
+ [Demo space can be found here](https://huggingface.co/spaces/to-be/invoice_document_headers_extraction_with_donut)
29
+
30
+ ## Model description
31
+
32
+ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
33
+
34
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg)
35
+
36
+ ## Intended uses & limitations
37
+
38
+ This model is meant as a research in how well it fares with multilanguage invoices.
39
+ See my observations in the [demo space](https://huggingface.co/spaces/to-be/invoice_document_headers_extraction_with_donut).
40
+
41
+ ### How to use
42
+
43
+ Look at the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/donut) which includes code examples.
44
+
45
+