How to use
We write our prompts in the ChatML format.
With vLLM (recommended for much faster inference)
from vllm import LLM, SamplingParams
model_name = "lightblue/jod"
llm = LLM(model=model_name)
SYSTEM_MESSAGE = "You are a helpful assistant."
def process_chat_history(next_user_msg, text_chat_history = []):
prompt_text = "<|im_start|>system\n"
prompt_text += SYSTEM_MESSAGE
prompt_text += "<|im_end|>\n\n"
for user_msg, ai_msg in text_chat_history:
prompt_text += "<|im_start|>user\n"
prompt_text += user_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>assistant\n"
prompt_text += ai_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>user\n"
prompt_text += next_user_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>assistant\n"
return prompt_text
user_prompt = "日本の一番高い山は?"
prompt = process_chat_history(user_prompt)
sampling_params = SamplingParams(temperature=0, max_tokens=528)
outputs = llm.generate(prompt, sampling_params)
bot_message = outputs[0].outputs[0].text.strip()
print(bot_message)
With Huggingface
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = "lightblue/jod"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
model_dir, torch_dtype=torch.bfloat16, device_map='auto', load_in_4bit=True,
)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
SYSTEM_MESSAGE = "You are a helpful assistant."
def process_chat_history(next_user_msg, text_chat_history = []):
prompt_text = "<|im_start|>system\n"
prompt_text += SYSTEM_MESSAGE
prompt_text += "<|im_end|>\n\n"
for user_msg, ai_msg in text_chat_history:
prompt_text += "<|im_start|>user\n"
prompt_text += user_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>assistant\n"
prompt_text += ai_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>user\n"
prompt_text += next_user_msg
prompt_text += "<|im_end|>\n\n"
prompt_text += "<|im_start|>assistant\n"
return prompt_text
user_prompt = "日本の一番高い山は?"
prompt = process_chat_history(user_prompt)
bot_message = pipe(do_closed_qa(test_article, question), max_new_tokens=128, temperature=0)[0]["generated_text"]
print(bot_message)
Training details
We trained on the following 3 datasets:
- (J) - JASTER
- (O) - kunishou/oasst1-89k-ja
- (D) - kunishou/databricks-dolly-15k-ja
using the (Open-Orca/Mistral-7B-SlimOrca) model as our base checkpoint.
This model was trained using the ChatML format, so it should be used for inference using the ChatML chatbot format. We chose this format as the base model (Open-Orca/Mistral-7B-SlimOrca) was trained with this format, and we find the chatbot format more compelling for practical use compared to the Alpaca style instruction format.
We trained for 1 epoch using the following Axolotl config. (Early stopping was not performed during our training.)
Axolotl config .yaml
base_model: Open-Orca/Mistral-7B-SlimOrca
base_model_config: Open-Orca/Mistral-7B-SlimOrca
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: ./data/jaster_plus.jsonl
ds_type: json # see other options below
type: sharegpt
conversation: chatml
dataset_prepared_path: false
val_set_size: 0.002
output_dir: ./train_output/openorca-mistral-jaster-1epoch
use_wandb: true
wandb_project: \<HIDDEN\>
wandb_entity: \<HIDDEN\>
debug:
adapter: qlora
lora_model_dir:
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
gradient_accumulation_steps: 1
micro_batch_size: 10
eval_batch_size: 4
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience: 10
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
eval_steps: 10
eval_table_size: 5
eval_table_max_new_tokens: 128
save_steps: 10
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.