ruGPT-3.5-13B / Saiga2
LoRA адаптер для ruGPT3.5-13B обученный на коллекции датасетов Saiga.
Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_lora.yml
Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 18.2Gb VRAM и заняло 16h 58m.
output_dir: ./models/ruGPT35_13B_lora
train_path: ./train.ruGPT35_13B.jsonl
val_path: ./val.ruGPT35_13B.jsonl
datasets:
- name: IlyaGusev/ru_turbo_alpaca
converter: impruver.instruction_to_messages
- name: IlyaGusev/ru_turbo_alpaca_evol_instruct
converter: impruver.instruction_to_messages
- name: IlyaGusev/ru_turbo_saiga
converter: impruver.dialog_to_messages
- name: IlyaGusev/ru_sharegpt_cleaned
converter: impruver.dialog_to_messages
- name: IlyaGusev/oasst1_ru_main_branch
converter: impruver.dialog_to_messages
- name: lksy/ru_instruct_gpt4
converter: impruver.converters.instruction_to_messages
model:
class: transformers.AutoModelForCausalLM
name: ai-forever/ruGPT-3.5-13B
load_in_4bit: true
load_in_8bit: false
dtype: bf16
lora:
r: 16
lora_alpha: 16
lora_dropout: 0.05
bias: none
target_modules: [ c_attn ]
task_type: CAUSAL_LM
tokenizer:
class: transformers.AutoTokenizer
name: ai-forever/ruGPT-3.5-13B
max_tokens_count: 1024
trainer:
eval_strategy: steps
save_strategy: steps
eval_steps: 100
save_steps: 100
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
gradient_accumulation_steps: 128
logging_steps: 1
learning_rate: 0.0002
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_steps: 16
optim: adamw_8bit
metric_for_best_model: eval_loss
load_best_model_at_end: true
save_total_limit: 2
seed: 42
remove_unused_columns: false
max_grad_norm: 1.0
weight_decay: 0.08
torch_compile: false
- Downloads last month
- 84
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for evilfreelancer/ruGPT3.5-13B-lora-saiga2
Base model
ai-forever/ruGPT-3.5-13B