Anja Reusch
commited on
Commit
•
31dd346
1
Parent(s):
f11f5d3
added model and tokenizer files
Browse files- README.md +41 -0
- Screenshot 2022-09-02 at 18.06.04.png +0 -0
- added_tokens.json +1 -0
- config.json +26 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- mathematics
|
6 |
+
- math-aware
|
7 |
+
datasets:
|
8 |
+
- MathematicalStackExchange
|
9 |
+
---
|
10 |
+
|
11 |
+
# Math-aware BERT
|
12 |
+
|
13 |
+
This repository contains our pre-trained BERT-based model. It was initialised from BERT-base-cased and further pre-trained on Math StackExchange in three different stages. We also added more LaTeX tokens to the tokenizer to enable a better tokenization of mathematical formulas. This model is not yet fine-tuned on a specific task.
|
14 |
+
|
15 |
+
# Training Details
|
16 |
+
|
17 |
+
The model was instantiated from BERT-base-cased weights and further pre-trained in three stages using different data for the sentence order prediction. During all three stages, the mask language modelling task was trained simultaneously. In addition, we added around 500 LaTeX tokens to the tokenizer to better cope with mathematical formulas.
|
18 |
+
|
19 |
+
The image illustrates the three pre-training stages: First, we train on mathematical formulas only. The SOP classifier predicts which segment contains the left hand side of the formula and which one contains the right hand side. This way we model inter-formula-coherence. The second stages models formula-sentence-coherence, i.e., whether the formula comes first in the original document or whether the natural language part comes first. Finally, we add the inter-sentence-coherence stage that is default for ALBERT. In this stage, sentences were split by a sentence separator.
|
20 |
+
|
21 |
+
![Image](https://huggingface.co/AnReu/math_albert/resolve/main/Screenshot%202022-09-02%20at%2018.06.04.png)
|
22 |
+
|
23 |
+
It is trained in exactly the same way as our ALBERT model which was our best-performing model in ARQMath 3 (2022). Details about our ALBERT Model can be found [here](https://huggingface.co/AnReu/math-albert).
|
24 |
+
|
25 |
+
|
26 |
+
# Usage
|
27 |
+
|
28 |
+
You can use this model to further fine-tune it on any math-aware task you have in mind, e.g., classification, question-answering, etc. . Please note, that the model in this repository is only pre-trained and not fine-tuned.
|
29 |
+
|
30 |
+
|
31 |
+
# Citation
|
32 |
+
|
33 |
+
If you find this model useful, consider citing our paper for the way the pre-training was performed:
|
34 |
+
```
|
35 |
+
@article{reusch2022transformer,
|
36 |
+
title={Transformer-Encoder and Decoder Models for Questions on Math},
|
37 |
+
author={Reusch, Anja and Thiele, Maik and Lehner, Wolfgang},
|
38 |
+
year={2022},
|
39 |
+
organization={CLEF}
|
40 |
+
}
|
41 |
+
```
|
Screenshot 2022-09-02 at 18.06.04.png
ADDED
added_tokens.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"\\nmid": 29290, "\\Longleftrightarrow": 29017, "\\mathbb": 29256, "\\int": 29198, "\\dotso": 29135, "\\prec": 29326, "\\varphi": 29456, "\\sqsupset": 29386, "\\lceil": 29209, "\\sqcup": 29382, "\\over": 29312, "\\leftarrow": 29215, "\\parallel": 29318, "\\leadsto": 29213, "\\tan": 29410, "\\nRightarrow": 29270, "\\longrightarrow": 29247, "\\bigcup": 29072, "\\varPsi": 29445, "\\textstyle": 29415, "\\ominus": 29308, "\\infty": 29196, "\\nearrow": 29274, "\\oplus": 29309, "\\leftleftarrows": 29219, "\\hom": 29182, "\\ngtr": 29282, "\\vdots": 29470, "\\owns": 29317, "\\nleftrightarrow": 29285, "\\Cup": 29002, "\\buildrel": 29088, "\\prod": 29335, "\\rightharpoonup": 29354, "\\nLeftarrow": 29268, "\\pmod": 29324, "\\gtrless": 29179, "\\Nu": 29021, "\\ddots": 29120, "\\triangle": 29423, "\\mp": 29265, "\\ge": 29162, "\\tau": 29412, "\\dim": 29125, "\\sqsubset": 29384, "\\Kappa": 29011, "\\emptyset": 29144, "\\end": 29145, "\\varlimsup": 29454, "\\biguplus": 29079, "\\frown": 29159, "\\bigoplus": 29074, "\\mod": 29264, "\\supseteq": 29405, "\\curlyvee": 29111, "\\langle": 29207, "\\eqcirc": 29148, "\\injlim": 29197, "\\csc": 29107, "\\gamma": 29160, "\\intop": 29199, "\\varinjlim": 29451, "\\VarOmega": 29042, "\\right": 29350, "\\not": 29291, "\\quad": 29340, "\\bigsqcup": 29076, "\\empty": 29143, "\\trianglelefteq": 29426, "\\nleq": 29286, "\\precapprox": 29327, "\\pm": 29323, "pmatrix": 29493, "\\circ": 29095, "\\curvearrowleft": 29113, "\\dbinom": 29116, "\\bot": 29084, "\\rightsquigarrow": 29358, "\\atop": 29060, "\\iiiint": 29188, "\\frac": 29158, "\\Delta": 29003, "\\eqsim": 29149, "\\nexists": 29278, "\\lVert": 29204, "\\overrightarrow": 29315, "\\Chi": 29001, "\\nsubseteq": 29297, "\\succ": 29394, "\\Psi": 29027, "\\leftharpoonup": 29218, "\\succcurlyeq": 29396, "\\min": 29263, "\\varGamma": 29442, "\\bigwedge": 29081, "\\eta": 29153, "\\varsigma": 29461, "\\varXi": 29449, "\\trianglerighteq": 29429, "\\smallsmile": 29379, "\\downharpoonleft": 29140, "\\preceq": 29329, "\\implies": 29193, "\\colon": 29099, "\\subseteq": 29390, "\\Lambda": 29012, "\\subset": 29389, "\\smallint": 29377, "\\rangle": 29343, "\\Pr": 29026, "\\thickapprox": 29418, "\\mid": 29262, "\\stackrel": 29388, "\\neg": 29275, "\\div": 29127, "\\dddot": 29118, "\\varpi": 29457, "\\qvar": 29341, "\\rfloor": 29347, "\\iff": 29187, "\\curlyeqsucc": 29110, "\\updownarrow": 29435, "\\dots": 29130, "\\root": 29360, "\\longleftarrow": 29244, "\\nsucc": 29298, "\\Theta": 29037, "\\nrightarrow": 29295, "\\wedge": 29475, "\\limsup": 29235, "\\circeq": 29096, "multline": 29492, "\\ln": 29238, "\\det": 29123, "\\dotplus": 29129, "\\theta": 29417, "\\ddot": 29119, "\\sphericalangle": 29380, "\\multimap": 29267, "\\lessapprox": 29227, "\\Longleftarrow": 29016, "\\gvertneqq": 29181, "\\ni": 29283, "\\Omicron": 29023, "\\ast": 29058, "\\bigtriangledown": 29077, "\\thicksim": 29419, "\\scriptscriptstyle": 29363, "\\coth": 29105, "\\above": 29046, "\\iota": 29200, "\\smallfrown": 29376, "\\tfrac": 29416, "\\iddots": 29185, "\\bowtie": 29085, "\\gtrsim": 29180, "\\bigodot": 29073, "\\sgn": 29368, "\\Sigma": 29033, "\\gg": 29167, "\\ne": 29273, "\\Gamma": 29008, "\\upuparrows": 29440, "\\supset": 29404, "\\curlyeqprec": 29109, "\\backsim": 29062, "\\propto": 29337, "\\binom": 29082, "\\supsetneqq": 29408, "\\enspace": 29146, "\\Rho": 29029, "\\asymp": 29059, "\\gneq": 29171, "\\dfrac": 29124, "\\Cap": 29000, "\\vartriangleleft": 29468, "\\bigcap": 29070, "\\cdots": 29091, "\\liminf": 29234, "\\looparrowright": 29249, "\\lim": 29233, "\\arg": 29056, "\\rightleftharpoons": 29356, "\\cr": 29106, "\\nprec": 29293, "\\eqslantgtr": 29150, "\\leftrightarrows": 29221, "\\subsetneqq": 29393, "\\Epsilon": 29006, "\\lgroup": 29232, "\\upsilon": 29439, "\\bigcirc": 29071, "\\succsim": 29401, "bmatrix": 29490, "\\iiint": 29189, "\\deg": 29121, "\\displaystyle": 29126, "\\curvearrowright": 29114, "\\Leftrightarrow": 29014, "\\dashv": 29115, "\\precneqq": 29331, "\\bracevert": 29086, "\\beth": 29067, "\\in": 29194, "\\ltimes": 29252, "\\leftharpoondown": 29217, ":=": 29484, "\\supseteqq": 29406, "\\exp": 29156, "\\cap": 29089, "\\oint": 29305, "\\Rightarrow": 29030, "\\underline": 29430, "\\sin": 29374, "\\subseteqq": 29391, "\\Alpha": 28996, "\\triangleleft": 29425, "\\upharpoonright": 29437, "\\triangleq": 29427, "\\between": 29068, "\\leftarrowtail": 29216, "\\triangledown": 29424, "\\overline": 29314, "\\barwedge": 29065, "\\amalg": 29049, "\\otimes": 29311, "\\Omega": 29022, "\\geqslant": 29165, "\\vert": 29474, "\\operatorname": 29310, "\\Rsh": 29032, "\\precnsim": 29332, "\\rho": 29349, "\\sqsupseteq": 29387, "\\rightrightarrows": 29357, "\\ncong": 29272, "\\npreceq": 29294, "\\partial": 29319, "\\sup": 29403, "\\log": 29243, "\\pi": 29322, "\\lessdot": 29228, "\\text": 29414, "\\angle": 29050, "\\downdownarrows": 29139, "\\eth": 29154, "\\Subset": 29034, "\\Arrowvert": 28998, "\\precsim": 29333, "\\rtimes": 29361, "\\sigma": 29369, "\\eqslantless": 29151, "\\Upsilon": 29040, "\\doteq": 29136, "\\gtrapprox": 29175, "\\And": 28997, "\\mapsto": 29255, "\\choose": 29094, "\\xi": 29477, "\\backsimeq": 29063, "\\varliminf": 29453, "\\overset": 29316, "\\nsupseteq": 29300, "align": 29487, "\\nabla": 29271, "\\succeq": 29397, "\\Rrightarrow": 29031, "\\varTheta": 29447, "\\neq": 29276, "\\divideontimes": 29128, "\\varUpsilon": 29448, "\\rVert": 29342, "\\top": 29422, "\\lambda": 29205, "\\brack": 29087, "\\rceil": 29346, "\\varSigma": 29446, "\\uplus": 29438, "Vmatrix": 29486, "\\sign": 29370, "\\supsetneq": 29407, "\\smallsetminus": 29378, "\\Eta": 29007, "\\setminus": 29367, "\\varkappa": 29452, "\\delta": 29122, "\\Xi": 29044, "\\nless": 29289, "\\coprod": 29101, "\\lnapprox": 29239, "\\leftrightsquigarrow": 29223, "\\ker": 29203, "\\leftrightharpoons": 29222, "\\varDelta": 29441, "\\land": 29206, "smallmatrix": 29494, "\\approxeq": 29052, "\\precnapprox": 29330, "\\Updownarrow": 29039, "\\sum": 29402, "\\unlhd": 29432, "\\mathcal": 29257, "\\leftrightarrow": 29220, "\\tbinom": 29413, "\\veebar": 29473, "\\vartriangleright": 29469, "alignat": 29488, "\\prime": 29334, "\\succnsim": 29400, "\\subsetneq": 29392, "\\scriptstyle": 29364, "\\gnsim": 29173, "\\unrhd": 29433, "\\rightharpoondown": 29353, "\\sqrt": 29383, "\\imath": 29191, "\\varsupsetneq": 29464, "\\gneqq": 29172, "\\longleftrightarrow": 29245, "\\lneq": 29240, "\\aleph": 29047, "\\left": 29214, "\\succapprox": 29395, "\\lt": 29251, "\\hookrightarrow": 29184, "\\downharpoonright": 29141, "\\Doteq": 29004, "\\qquad": 29339, "\\searrow": 29365, "\\xleftarrow": 29478, "\\begin": 29069, "\\\\": 29483, "\\beta": 29066, "\\bigotimes": 29075, "\\cos": 29102, "\\forall": 29157, "\\bigvee": 29080, "\\idotsint": 29186, "\\cong": 29100, "\\varPi": 29444, "\\varnothing": 29455, "\\succneqq": 29399, "\\nwarrow": 29302, "\\cfrac": 29092, "\\omega": 29306, "\\overleftarrow": 29313, "\\rvert": 29362, "\\dotsb": 29131, "\\max": 29259, "\\gt": 29174, "\\varsubsetneqq": 29463, "\\sec": 29366, "\\sinh": 29375, "\\wr": 29476, "\\underset": 29431, "\\longmapsto": 29246, "\\simeq": 29373, "\\gtrdot": 29176, "\\varrho": 29460, "\\cup": 29108, "\\arctan": 29055, "\\Downarrow": 29005, "\\ngeq": 29279, "\\odot": 29303, "\\vee": 29472, "\\dot=": 29137, "\\lfloor": 29230, "\\lor": 29250, "\\times": 29420, "\\backslash": 29064, "\\geqq": 29164, "\\lg": 29231, "\\projlim": 29336, "subarray": 29495, "\\Vert": 29043, "\\rbrace": 29344, "\\looparrowleft": 29248, "\\Re": 29028, "\\Lleftarrow": 29015, "\\nleqslant": 29288, "\\perp": 29320, "\\nsucceq": 29299, "\\ell": 29142, "\\mho": 29261, "\\ldots": 29211, "\\cosh": 29103, "\\rgroup": 29348, "\\leq": 29224, "\\equiv": 29152, "\\%": 29481, "\\Mu": 29020, "\\Supset": 29035, "\\circlearrowright": 29098, "\\phi": 29321, "\\swarrow": 29409, "\\chi": 29093, "\\Beta": 28999, "\\rightarrowtail": 29352, "\\preccurlyeq": 29328, "\\psi": 29338, "\\VarLambda": 29041, "\\Longrightarrow": 29018, "\\gcd": 29161, "Bmatrix": 29485, "alignedat": 29489, "\\nparallel": 29292, "\\lesssim": 29229, "\\vec": 29471, "\\zeta": 29480, "\\leqslant": 29226, "\\varpropto": 29459, "\\rmoustache": 29359, "\\lmoustache": 29237, "\\triangleright": 29428, "\\Leftarrow": 29013, "\\gggtr": 29169, "\\rightarrow": 29351, "\\mathrm": 29258, "\\lbrace": 29208, "\\to": 29421, "\\le": 29212, "\\nLeftrightarrow": 29269, "\\dotsi": 29133, "\\nu": 29301, "\\lvert": 29253, "\\Tau": 29036, "\\inf": 29195, "\\jmath": 29201, "vmatrix": 29496, "\\sqcap": 29381, "\\ll": 29236, "\\leqq": 29225, "\\measuredangle": 29260, "\\Join": 29010, "\\bmod": 29083, "\\epsilon": 29147, "\\arccos": 29053, "\\^": 29482, "\\mu": 29266, "eqnarray": 29491, "\\geq": 29163, "\\backepsilon": 29061, "\\impliedby": 29192, "\\Pi": 29025, "\\uparrow": 29434, "\\Iota": 29009, "\\exists": 29155, "\\varsupsetneqq": 29465, "\\arrowvert": 29057, "\\lnot": 29242, "\\alpha": 29048, "\\varepsilon": 29450, "\\approx": 29051, "\\gtreqless": 29177, "\\sqsubseteq": 29385, "\\gtreqqless": 29178, "\\ngeqq": 29280, "\\varPhi": 29443, "\\omicron": 29307, "\\Zeta": 29045, "\\ggg": 29168, "\\ngeqslant": 29281, "\\circlearrowleft": 29097, "\\nsim": 29296, "\\vartriangle": 29467, "\\rightleftarrows": 29355, "\\kappa": 29202, "\\varprojlim": 29458, "\\dotsc": 29132, "\\upharpoonleft": 29436, "\\lneqq": 29241, "\\of": 29304, "\\dotsm": 29134, "\\rbrack": 29345, "\\lbrack": 29210, "\\succnapprox": 29398, "\\cot": 29104, "\\pod": 29325, "\\gets": 29166, "\\iint": 29190, "\\Phi": 29024, "\\bigtriangleup": 29078, "\\Lsh": 29019, "\\newline": 29277, "\\arcsin": 29054, "\\tanh": 29411, "\\nleftarrow": 29284, "\\lvertneqq": 29254, "\\signum": 29371, "\\sim": 29372, "\\ddddot": 29117, "\\Uparrow": 29038, "\\downarrow": 29138, "\\varsubsetneq": 29462, "\\hookleftarrow": 29183, "\\cdot": 29090, "\\curlywedge": 29112, "\\xrightarrow": 29479, "\\vartheta": 29466, "\\gnapprox": 29170, "\\nleqq": 29287}
|
config.json
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/scratch/ws/1/s8252120-polbert/Slurm-for-ALBERT_Math/ALBERT-for-Math-AR/untrained_models/model_bert-base-cased_with_latex",
|
3 |
+
"architectures": [
|
4 |
+
"BertForPreTraining"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 3072,
|
14 |
+
"layer_norm_eps": 1e-12,
|
15 |
+
"max_position_embeddings": 512,
|
16 |
+
"model_type": "bert",
|
17 |
+
"num_attention_heads": 12,
|
18 |
+
"num_hidden_layers": 12,
|
19 |
+
"pad_token_id": 0,
|
20 |
+
"position_embedding_type": "absolute",
|
21 |
+
"torch_dtype": "float32",
|
22 |
+
"transformers_version": "4.9.2",
|
23 |
+
"type_vocab_size": 2,
|
24 |
+
"use_cache": true,
|
25 |
+
"vocab_size": 29497
|
26 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7c65144bc6cdec8283a468f3683846f83297d3046654598fbd5c7e54accce06f
|
3 |
+
size 437359011
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-cased", "tokenizer_class": "BertTokenizer"}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|