atsuki-yamaguchi
/

Llama-2-7b-hf-el-30K-align-2x2ls-512

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Llama2 7B for Greek: 100 target vocabulary size + Align target vocabulary initialization + 2x2LS/512 training

This model is built on top of Llama2 7B adapted for Greek using 30K target language sentences sampled from CC-100.

Model Details

Vocabulary: This model has an additional 100 target vocabulary.
Target vocabulary initialization: The target weights of the embedding and LM head were initialized using Align initialization.
Training: This model was additionally pre-trained on 30K target language sentences sampled from CC-100. The training was conducted with the 2x2LS/512 strategies introduced in the paper.

Model Description

Language: Greek
License: Llama 2 Community License Agreement
Fine-tuned from model: meta-llama/Llama-2-7b-hf

Model Sources

Repository: https://github.com/gucci-j/lowres-cve
Paper: https://arxiv.org/abs/2406.11477

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "atsuki-yamaguchi/Llama-2-7b-hf-el-30K-align-2x2ls-512"
)
tokenizer = AutoTokenizer.from_pretrained(
    "atsuki-yamaguchi/Llama-2-7b-hf-el-30K-align-2x2ls-512"
)

Citation

@article{yamaguchi-etal-2024-effectively,
    title={How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?}, 
    author={Atsuki Yamaguchi and Aline Villavicencio and Nikolaos Aletras},
    year={2024},
    journal={ArXiv},
    year={2024},
    volume={abs/2406.11477},
    url={https://arxiv.org/abs/2406.11477}, 
}

Downloads last month: 12

Safetensors

Model size

6.74B params

Tensor type

F32

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for atsuki-yamaguchi/Llama-2-7b-hf-el-30K-align-2x2ls-512

Base model

meta-llama/Llama-2-7b-hf

Finetuned

(588)

this model