Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/m3hrdadfi/albert-fa-base-v2/README.md
README.md
ADDED
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: fa
|
3 |
+
tags:
|
4 |
+
- albert-persian
|
5 |
+
- persian-lm
|
6 |
+
license: apache-2.0
|
7 |
+
datasets:
|
8 |
+
- Persian Wikidumps
|
9 |
+
- MirasText
|
10 |
+
- BigBang Page
|
11 |
+
- Chetor
|
12 |
+
- Eligasht
|
13 |
+
- DigiMag
|
14 |
+
- Ted Talks
|
15 |
+
- Books (Novels, ...)
|
16 |
+
---
|
17 |
+
|
18 |
+
# ALBERT-Persian
|
19 |
+
|
20 |
+
## ALBERT-Persian: A Lite BERT for Self-supervised Learning of Language Representations for the Persian Language
|
21 |
+
|
22 |
+
## Introduction
|
23 |
+
|
24 |
+
ALBERT-Persian trained on a massive amount of public corpora ([Persian Wikidumps](https://dumps.wikimedia.org/fawiki/), [MirasText](https://github.com/miras-tech/MirasText)) and six other manually crawled text data from a various type of websites ([BigBang Page](https://bigbangpage.com/) `scientific`, [Chetor](https://www.chetor.com/) `lifestyle`, [Eligasht](https://www.eligasht.com/Blog/) `itinerary`, [Digikala](https://www.digikala.com/mag/) `digital magazine`, [Ted Talks](https://www.ted.com/talks) `general conversational`, Books `novels, storybooks, short stories from old to the contemporary era`).
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
## Intended uses & limitations
|
29 |
+
|
30 |
+
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
|
31 |
+
be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?search=albert-fa) to look for
|
32 |
+
fine-tuned versions on a task that interests you.
|
33 |
+
|
34 |
+
|
35 |
+
### How to use
|
36 |
+
|
37 |
+
#### TensorFlow 2.0
|
38 |
+
|
39 |
+
```python
|
40 |
+
from transformers import AutoConfig, AutoTokenizer, TFAutoModel
|
41 |
+
|
42 |
+
config = AutoConfig.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
43 |
+
tokenizer = AutoTokenizer.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
44 |
+
model = TFAutoModel.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
45 |
+
|
46 |
+
text = "ما در هوشواره معتقدیم با انتقال صحیح دانش و آگاهی، همه افراد میتوانند از ابزارهای هوشمند استفاده کنند. شعار ما هوش مصنوعی برای همه است."
|
47 |
+
tokenizer.tokenize(text)
|
48 |
+
|
49 |
+
>>> ['▁ما', '▁در', '▁هوش', 'واره', '▁معتقد', 'یم', '▁با', '▁انتقال', '▁صحیح', '▁دانش', '▁و', '▁اگاه', 'ی', '،', '▁همه', '▁افراد', '▁می', '▁توانند', '▁از', '▁ابزارهای', '▁هوشمند', '▁استفاده', '▁کنند', '.', '▁شعار', '▁ما', '▁هوش', '▁مصنوعی', '▁برای', '▁همه', '▁است', '.']
|
50 |
+
|
51 |
+
```
|
52 |
+
|
53 |
+
#### Pytorch
|
54 |
+
|
55 |
+
```python
|
56 |
+
from transformers import AutoConfig, AutoTokenizer, AutoModel
|
57 |
+
|
58 |
+
config = AutoConfig.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
59 |
+
tokenizer = AutoTokenizer.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
60 |
+
model = AutoModel.from_pretrained("m3hrdadfi/albert-fa-base-v2")
|
61 |
+
```
|
62 |
+
|
63 |
+
## Training
|
64 |
+
|
65 |
+
ALBERT-Persian is the first attempt on ALBERT for the Persian Language. The model was trained based on Google's ALBERT BASE Version 2.0 over various writing styles from numerous subjects (e.g., scientific, novels, news) with more than `3.9M` documents, `73M` sentences, and `1.3B` words, like the way we did for [ParsBERT](https://github.com/hooshvare/parsbert).
|
66 |
+
|
67 |
+
## Goals
|
68 |
+
Objective goals during training are as below (after 140K steps).
|
69 |
+
|
70 |
+
``` bash
|
71 |
+
***** Eval results *****
|
72 |
+
global_step = 140000
|
73 |
+
loss = 2.0080082
|
74 |
+
masked_lm_accuracy = 0.6141017
|
75 |
+
masked_lm_loss = 1.9963315
|
76 |
+
sentence_order_accuracy = 0.985
|
77 |
+
sentence_order_loss = 0.06908702
|
78 |
+
```
|
79 |
+
|
80 |
+
|
81 |
+
## Derivative models
|
82 |
+
|
83 |
+
### Base Config
|
84 |
+
|
85 |
+
#### Albert Model
|
86 |
+
- [m3hrdadfi/albert-face-base-v2](https://huggingface.co/m3hrdadfi/albert-fa-base-v2)
|
87 |
+
|
88 |
+
#### Albert Sentiment Analysis
|
89 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-digikala](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-digikala)
|
90 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-snappfood](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-snappfood)
|
91 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-deepsentipers-binary](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-deepsentipers-binary)
|
92 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-deepsentipers-multi](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-deepsentipers-multi)
|
93 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-binary](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-binary)
|
94 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-multi](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-multi)
|
95 |
+
- [m3hrdadfi/albert-fa-base-v2-sentiment-multi](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-sentiment-multi)
|
96 |
+
|
97 |
+
#### Albert Text Classification
|
98 |
+
- [m3hrdadfi/albert-fa-base-v2-clf-digimag](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-clf-digimag)
|
99 |
+
- [m3hrdadfi/albert-fa-base-v2-clf-persiannews](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-clf-persiannews)
|
100 |
+
|
101 |
+
#### Albert NER
|
102 |
+
- [m3hrdadfi/albert-fa-base-v2-ner](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-ner)
|
103 |
+
- [m3hrdadfi/albert-fa-base-v2-ner-arman](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-ner-arman)
|
104 |
+
- [m3hrdadfi/albert-fa-base-v2-ner-arman](https://huggingface.co/m3hrdadfi/albert-fa-base-v2-ner-arman)
|
105 |
+
|
106 |
+
## Eval results
|
107 |
+
|
108 |
+
The following tables summarize the F1 scores obtained by ALBERT-Persian as compared to other models and architectures.
|
109 |
+
|
110 |
+
|
111 |
+
### Sentiment Analysis (SA) Task
|
112 |
+
|
113 |
+
| Dataset | ALBERT-fa-base-v2 | ParsBERT-v1 | mBERT | DeepSentiPers |
|
114 |
+
|:------------------------:|:-----------------:|:-----------:|:-----:|:-------------:|
|
115 |
+
| Digikala User Comments | 81.12 | 81.74 | 80.74 | - |
|
116 |
+
| SnappFood User Comments | 85.79 | 88.12 | 87.87 | - |
|
117 |
+
| SentiPers (Multi Class) | 66.12 | 71.11 | - | 69.33 |
|
118 |
+
| SentiPers (Binary Class) | 91.09 | 92.13 | - | 91.98 |
|
119 |
+
|
120 |
+
|
121 |
+
### Text Classification (TC) Task
|
122 |
+
|
123 |
+
| Dataset | ALBERT-fa-base-v2 | ParsBERT-v1 | mBERT |
|
124 |
+
|:-----------------:|:-----------------:|:-----------:|:-----:|
|
125 |
+
| Digikala Magazine | 92.33 | 93.59 | 90.72 |
|
126 |
+
| Persian News | 97.01 | 97.19 | 95.79 |
|
127 |
+
|
128 |
+
|
129 |
+
### Named Entity Recognition (NER) Task
|
130 |
+
|
131 |
+
| Dataset | ALBERT-fa-base-v2 | ParsBERT-v1 | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
|
132 |
+
|:-------:|:-----------------:|:-----------:|:-----:|:----------:|:------------:|:--------:|:--------------:|:----------:|
|
133 |
+
| PEYMA | 88.99 | 93.10 | 86.64 | - | 90.59 | - | 84.00 | - |
|
134 |
+
| ARMAN | 97.43 | 98.79 | 95.89 | 89.9 | 84.03 | 86.55 | - | 77.45 |
|
135 |
+
|
136 |
+
|
137 |
+
### BibTeX entry and citation info
|
138 |
+
|
139 |
+
Please cite in publications as the following:
|
140 |
+
|
141 |
+
```bibtex
|
142 |
+
@misc{ALBERT-Persian,
|
143 |
+
author = {Mehrdad Farahani},
|
144 |
+
title = {ALBERT-Persian: A Lite BERT for Self-supervised Learning of Language Representations for the Persian Language},
|
145 |
+
year = {2020},
|
146 |
+
publisher = {GitHub},
|
147 |
+
journal = {GitHub repository},
|
148 |
+
howpublished = {\url{https://github.com/m3hrdadfi/albert-persian}},
|
149 |
+
}
|
150 |
+
|
151 |
+
@article{ParsBERT,
|
152 |
+
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
|
153 |
+
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
|
154 |
+
journal={ArXiv},
|
155 |
+
year={2020},
|
156 |
+
volume={abs/2005.12515}
|
157 |
+
}
|
158 |
+
```
|
159 |
+
|
160 |
+
## Questions?
|
161 |
+
Post a Github issue on the [ALBERT-Persian](https://github.com/m3hrdadfi/albert-persian) repo.
|