Introducing GenZ Infinite
The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity
Generate responses
Use the generate.py file from the github repo
python generate.py --base_model budecosystem/genz-13b-infinite
You can integrate the model in your code my loading convert_llama_model function.
import torch
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from model.llama import convert_llama_model
local_branch = 2048
global_branch = 10
limit_distance = 2048
model = AutoModelForCausalLM.from_pretrained(
"budecosystem/genz-13b-infinite",
torch_dtype=torch.float16,
device_map="auto",
)
model = convert_llama_model(model, local_branch, global_branch)
Evaluation
Task | 4096 | 5120 | 8192 | 16384 |
---|---|---|---|---|
Passkey retreival | 100 | 75 | 48 | 30 |
Training details
The model is trained of 4 A100 80GB for approximately 55hrs.
Hyperparameters | Value |
---|---|
per_device_train_batch_size | 1 |
gradient_accumulation_steps | 1 |
epoch | 3 |
steps | 8550 |
learning_rate | 2e-4 |
lr schedular type | cosine |
warmup steps | 1000 |
optimizer | adamw |
fp16 | True |
GPU | 4 A100 80GB |
Acknowledgments
We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of LM-Infinite paper and the GitHub repo
- Downloads last month
- 1,337
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.