japanese-mistral-300m-base
Overview
Welcome to my model card!
This Model feature is ...
- Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
- Pretrained by wikipedia dataset and cc100 dataset
- Use of Mistral 300M
Yukkuri shite ittene!
How to use the model
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
MODEL_NAME = "ce-lery/japanese-mistral-300m-base"
torch.set_float32_matmul_precision('high')
DEVICE = "cuda"
if torch.cuda.is_available():
print("cuda")
DEVICE = "cuda"
else:
print("cpu")
DEVICE = "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
trust_remote_code=True,
).to(DEVICE)
# streamer = TextStreamer(tokenizer)
prompt = "大規模言語モデルとは、"
inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=256,
do_sample=True,
early_stopping=False,
top_p=0.95,
top_k=50,
temperature=0.9,
# streamer=streamer,
no_repeat_ngram_size=2,
num_beams=3
)
print(outputs.tolist()[0])
outputs_txt = tokenizer.decode(outputs[0])
print(outputs_txt)
Receipe
If you want to restruct this model, you can refer this Github repository.
I wrote the receipe for struction this model. For example,
- Preprocess with sentencepiece
- Pretraining with flash attention2 and torch.compile and DeepSpeed
- Fine-tuning with databricks-dolly-15k-ja
If you find my mistake,error,...etc, please create issue. If you create pulreqest, I'm very happy!
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0006
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 64
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 1
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.2911 | 0.12 | 5000 | 4.2914 |
3.9709 | 0.24 | 10000 | 3.9900 |
3.8229 | 0.36 | 15000 | 3.8388 |
3.7197 | 0.47 | 20000 | 3.7454 |
3.652 | 0.59 | 25000 | 3.6739 |
3.597 | 0.71 | 30000 | 3.6177 |
3.5554 | 0.83 | 35000 | 3.5770 |
3.536 | 0.95 | 40000 | 3.5582 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.5
- Tokenizers 0.14.1
- Downloads last month
- 617
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.