File size: 3,688 Bytes
b6b66f5 d4ed008 2371c5a d4ed008 b6b66f5 d4ed008 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
tags:
- generated_from_trainer
model-index:
- name: Qra-7b-dolly-instruction-0.1
results: []
datasets:
- s3nh/alpaca-dolly-instruction-only-polish
language:
- pl
inference: true
license: llama2
pipeline_tag: text-generation
---
# Qra-7b-dolly-instruction-0.1
This model if a fine-tuned version of [OPI-PG/Qra-7b](https://huggingface.co/OPI-PG/Qra-7b) on the [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co/datasets/s3nh/alpaca-dolly-instruction-only-polish) dataset.
## Model Description
Trained from [OPI-PG/Qra-7b](https://huggingface.co/OPI-PG/Qra-7b)
## Intended uses & limitations
This model has been fine-tuned for question-answering task. It is possible to use it as a chat, but it doesn't work well because the dataset did not contain conversations.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_id = "nie3e/Qra-7b-dolly-instruction-0.1"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
"text-generation", model=model, tokenizer=tokenizer, device=device
)
def get_answer(system_prompt: str, user_prompt: str) -> str:
input_msg = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
prompt = pipe.tokenizer.apply_chat_template(
input_msg, tokenize=False,
add_generation_prompt=True
)
outputs = pipe(
prompt, max_new_tokens=512, do_sample=False, temperature=0.1, top_k=50,
top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id,
pad_token_id=pipe.tokenizer.pad_token_id
)
return outputs[0]['generated_text'][len(prompt):].strip()
print(
get_answer(
system_prompt="Jesteś przyjaznym chatbotem",
user_prompt="Napisz czym jest dokument architectural decision record."
)
)
```
## Training and evaluation data
Dataset: [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co/datasets/s3nh/alpaca-dolly-instruction-only-polish)
Each row has been converted into conversation using this function:
```py
system_message = """Jesteś przyjaznym chatbotem"""
def create_conversation(sample) -> dict:
strip_characters = "\"'"
return {
"messages": [
{"role": "system", "content": system_message},
{"role": "user",
"content": f"{sample['instruction'].strip(strip_characters)} "
f"{sample['input'].strip(strip_characters)}"},
{"role": "assistant",
"content": f"{sample['output'].strip(strip_characters)}"}
]
}
```
Train/test split: 90%/10%
## Training procedure
GPU: 2x RTX 4060Ti 16GB
Training time: ~13 hours
Using `device_map="auto"`
### Training hyperparameters
Lora config:
```py
peft_config = LoraConfig(
lora_alpha=128,
lora_dropout=0.05,
r=256,
bias="none",
target_modules="all-linear",
task_type="CAUSAL_LM"
)
```
Training arguments:
```py
args = TrainingArguments(
output_dir="Qra-7b-dolly-instruction-0.1",
num_train_epochs=3,
per_device_train_batch_size=1,
gradient_accumulation_steps=6,
gradient_checkpointing=True,
optim="adamw_torch_fused",
logging_steps=10,
save_strategy="epoch",
learning_rate=2e-4,
bf16=True,
tf32=True,
max_grad_norm=0.3,
warmup_ratio=0.03,
lr_scheduler_type="constant",
push_to_hub=False,
report_to=["tensorboard"],
)
```
### Framework versions
- Transformers 4.39.2
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2 |