metadata
license: mit
tags:
- generated_from_trainer
base_model: prajjwal1/bert-small
model-index:
- name: bert-small-aze
results: []
bert-small-aze
This model is a trained (not fine-tuned) version of prajjwal1/bert-small architecture on the allmalab/DOLLMA dataset.
Citation
If you use the dataset, please cite the following paper:
@inproceedings{isbarov-etal-2024-open,
title = "Open foundation models for {A}zerbaijani language",
author = "Isbarov, Jafar and
Huseynova, Kavsar and
Mammadov, Elvin and
Hajili, Mammad and
Ataman, Duygu",
editor = {Ataman, Duygu and
Derin, Mehmet Oguz and
Ivanova, Sardana and
K{\"o}ksal, Abdullatif and
S{\"a}lev{\"a}, Jonne and
Zeyrek, Deniz},
booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)",
month = aug,
year = "2024",
address = "Bangkok, Thailand and Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.sigturk-1.2",
pages = "18--28",
abstract = "The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.",
}
https://arxiv.org/abs/2407.02337
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10000
- num_epochs: 10
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.37.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1