File size: 3,404 Bytes
fc8edc9 9e18fc7 fc8edc9 9e18fc7 b05f9a1 9e18fc7 fc8edc9 9e18fc7 fc8edc9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
tags:
- generated_from_trainer
model-index:
- name: myBit-Llama2-jp-127M-4
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# myBit-Llama2-jp-127M-4
This model has 127M parameters.
The model is a pre-trained Bit-Llama2 of Parameters with only 1 epoch on a Japanese dataset.
The dataset used is [range3/wiki40b-ja](https://huggingface.co/datasets/range3/wiki40b-ja).
- Loss: 2.9790
## Model description
Github: [BitNet-b158](https://github.com/Hajime-Y/BitNet-b158)
More information about this model can be found in the following pages:
- [BitNet&BitNet b158の実装①](https://note.com/hatti8/n/nc6890e79a19a)
- [BitNet&BitNet b158の実装②](https://note.com/hatti8/n/ne94f7a7d46df)
## How to use
1. install the library
```
!pip install mybitnet
!pip install -U accelerate transformers==4.38.2
!pip install torch
```
2. get model
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "HachiML/myBit-Llama2-jp-127M-4"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
print(model)
```
3. inference
```
prompt = "昔々あるところに、"
input_ids = tokenizer.encode(
prompt,
return_tensors="pt"
)
tokens = model.generate(
input_ids.to(device=model.device),
max_new_tokens=128,
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(out)
```
## Intended uses & limitations
More information needed
## Training and evaluation data
- [range3/wiki40b-ja](https://huggingface.co/datasets/range3/wiki40b-ja)
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0024
- train_batch_size: 96
- eval_batch_size: 96
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 5000
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 4.8696 | 0.05 | 2000 | 3.8588 |
| 3.7027 | 0.1 | 4000 | 3.6106 |
| 3.5648 | 0.15 | 6000 | 3.5014 |
| 3.448 | 0.2 | 8000 | 3.4153 |
| 3.3884 | 0.25 | 10000 | 3.3650 |
| 3.3462 | 0.29 | 12000 | 3.3280 |
| 3.3155 | 0.34 | 14000 | 3.3053 |
| 3.2932 | 0.39 | 16000 | 3.2891 |
| 3.2762 | 0.44 | 18000 | 3.2673 |
| 3.2594 | 0.49 | 20000 | 3.2533 |
| 3.2432 | 0.54 | 22000 | 3.2398 |
| 3.2286 | 0.59 | 24000 | 3.2186 |
| 3.2083 | 0.64 | 26000 | 3.1957 |
| 3.1867 | 0.69 | 28000 | 3.1769 |
| 3.1676 | 0.74 | 30000 | 3.1568 |
| 3.14 | 0.79 | 32000 | 3.1286 |
| 3.114 | 0.83 | 34000 | 3.1006 |
| 3.0848 | 0.88 | 36000 | 3.0696 |
| 3.0511 | 0.93 | 38000 | 3.0301 |
| 3.005 | 0.98 | 40000 | 2.9790 |
### Framework versions
- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
|