File size: 4,943 Bytes
15d26ec 395eaee d11fc53 15d26ec 395eaee f8eac2d 395eaee 69033c9 3443ff0 395eaee c828de4 395eaee 7f33463 395eaee 49dcd6a 395eaee 49dcd6a 395eaee b64a560 395eaee 23a1d76 395eaee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
license: cc-by-nc-2.0
language:
- en
- zh
- ja
tags:
- sft
pipeline_tag: text-generation
widget:
- text: >-
<|prompter|>What is a meme, and what's the history behind this
word?<|endoftext|><|assistant|>
- text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
- text: >-
<|prompter|>Write a story about future of AI
development<|endoftext|><|assistant|>
datasets:
- OpenAssistant/oasst1
- databricks/databricks-dolly-15k
- anon8231489123/ShareGPT_Vicuna_unfiltered
- LIUM/tedlium
- theblackcat102/joke_explaination
---
# Redpajama-3B SFT model
![](https://huggingface.co/ikala/redpajama-3b-chat/resolve/main/redpajama-example.png)
It is based on a RedPajama's 3B that was fine-tuned on human demonstrations
of assistant conversations collected through the
[https://open-assistant.io/](https://open-assistant.io/) human feedback web
app before April 12, 2023.
supervised finetune on sequence length of 5120
## Model Details
- **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/team) and [iKala](https://ikala.ai/)
- **Model type:** Transformer-based Language Model
- **Language:** English, Chinese, Japanese
- **Finetuned from:** [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
- **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
- **License:** Non commercial
## Prompting
Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should
start generating the assistant reply.
## Benchmark
| model | MMLU | BBH | Humaneval @10 |
|---|---|---|---|
| [ikala/redpajama-3b-chat](https://huggingface.co/ikala/redpajama-3b-chat) | 24.6 | 29.3 | 4.8 |
| [ikala/bloom-zh-3b-chat](https://huggingface.co/ikala/bloom-zh-3b-chat) | 31.4 | 30.2 | 0.0 |
| llama-7b (reference) | 30.9 | 27.6 | 10.3 |
## Dev Details
- base model: [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
- checkpoint: 1 epoch (6000 steps)
- hardware: NVIDIA RTX A6000 x 4
command: `deepspeed trainer_sft.py --configs defaults redpajama-3b datasets --num_train_epochs 2 --deepspeed`
data:
```
datasets:
- wmt2019_zh-en:
max_val_set: 1000
max_train_set: 20000
- ted_trans_en-ja:
max_val_set: 1000
max_train_set: 20000
- ted_trans_zh-ja:
max_val_set: 1000
max_train_set: 20000
- ikala:
input_file_path: export_conversation_v4.4.jsonl
val_split: 0.05
- dolly15k:
val_split: 0.05
- oasst_export:
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
val_split: 0.05
- joke
- gsm8k
- webgpt
```
with internal datasets `ikala` so if you try to reproduce please remove the dataset
redpajama-3b:
```
redpajama-3b:
dtype: fp16
log_dir: "redpajama_3b"
learning_rate: 1e-5
model_name: saved_models/RedPajama-INCITE-Base-3B-v1
output_dir: ikala_v4_3b
weight_decay: 0.0
max_length: 8196
warmup_steps: 2000
gradient_checkpointing: true
gradient_accumulation_steps: 32
per_device_train_batch_size: 1
per_device_eval_batch_size: 2
eval_steps: 500
save_steps: 1000
num_train_epochs: 8
save_total_limit: 2
deepspeed_config: configs/zero3_config_sft.json
```
zero config:
```
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": "auto"
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"warmup_type": "linear",
"total_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 3,
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
```
|