Amir Tahmasbi
commited on
Commit
•
13ae63a
1
Parent(s):
68adae7
First version of tf layoutlm large
Browse files- README.md +30 -0
- config.json +23 -0
- tf_model.h5 +3 -0
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LayoutLM
|
2 |
+
|
3 |
+
## Model description
|
4 |
+
|
5 |
+
LayoutLM is a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding. LayoutLM archives the SOTA results on multiple datasets. For more details, please refer to our paper:
|
6 |
+
|
7 |
+
[LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318)
|
8 |
+
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, [KDD 2020](https://www.kdd.org/kdd2020/accepted-papers)
|
9 |
+
|
10 |
+
## Training data
|
11 |
+
|
12 |
+
We pre-train LayoutLM on IIT-CDIP Test Collection 1.0\* dataset with two settings.
|
13 |
+
|
14 |
+
* LayoutLM-Base, Uncased (11M documents, 2 epochs): 12-layer, 768-hidden, 12-heads, 113M parameters
|
15 |
+
* LayoutLM-Large, Uncased (11M documents, 2 epochs): 24-layer, 1024-hidden, 16-heads, 343M parameters **(This Model)**
|
16 |
+
|
17 |
+
## Citation
|
18 |
+
|
19 |
+
If you find LayoutLM useful in your research, please cite the following paper:
|
20 |
+
|
21 |
+
``` latex
|
22 |
+
@misc{xu2019layoutlm,
|
23 |
+
title={LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
|
24 |
+
author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
|
25 |
+
year={2019},
|
26 |
+
eprint={1912.13318},
|
27 |
+
archivePrefix={arXiv},
|
28 |
+
primaryClass={cs.CL}
|
29 |
+
}
|
30 |
+
```
|
config.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/layoutlm-large-uncased",
|
3 |
+
"attention_probs_dropout_prob": 0.1,
|
4 |
+
"gradient_checkpointing": false,
|
5 |
+
"hidden_act": "gelu",
|
6 |
+
"hidden_dropout_prob": 0.1,
|
7 |
+
"hidden_size": 1024,
|
8 |
+
"initializer_range": 0.02,
|
9 |
+
"intermediate_size": 4096,
|
10 |
+
"layer_norm_eps": 1e-12,
|
11 |
+
"max_2d_position_embeddings": 1024,
|
12 |
+
"max_position_embeddings": 512,
|
13 |
+
"model_type": "layoutlm",
|
14 |
+
"num_attention_heads": 16,
|
15 |
+
"num_hidden_layers": 24,
|
16 |
+
"output_past": true,
|
17 |
+
"pad_token_id": 0,
|
18 |
+
"position_embedding_type": "absolute",
|
19 |
+
"transformers_version": "4.4.0.dev0",
|
20 |
+
"type_vocab_size": 2,
|
21 |
+
"use_cache": true,
|
22 |
+
"vocab_size": 30522
|
23 |
+
}
|
tf_model.h5
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8c9960d6fe76b4296918713e0e9c2b4b9c815176dce9b137d59b33378504620d
|
3 |
+
size 1357858920
|