T3Q-LLM-sft1.0-dpo1.0

This model is a version of T3Q-LLM/T3Q-LLM-solar10.8-sft-v1.0 that has been fine-tuned with DPO.

Model Developers Chihoon Lee(chihoonlee10), T3Q

Prompt Template

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: {prompt}
Assistant:

How to Use it

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0")
tokenizer = AutoTokenizer.from_pretrained("T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0")

prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"
text = '한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주'
model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt')

outputs = model.generate(**model_inputs, max_new_tokens=256)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(output_text)

Example Output

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: 한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.

(A) 경성
(B) 부산
(C) 평양
(D) 서울
(E) 전주
Assistant:
(D) 서울이 한국의 수도입니다. 서울은 나라의 북동부에 위치해 있으며, 정치, 경제, 문화의 중심지입니다. 약 1,000만 명이 넘는 인구를 가진 세계에서 가장 큰 도시 중 하나입니다. 서울은 높은 빌딩, 현대적인 인프라, 활기 문화 장면으로 유명합니다. 또한, 많은 역사적 명소와 박물관이 있어 방문객들에게 풍부한 문화 체험을 제공합니다.

Task	Version	Metric	Value		Stderr
kobest_boolq	0	acc	0.9387	±	0.0064
		macro_f1	0.9387	±	0.0064
kobest_copa	0	acc	0.7590	±	0.0135
		macro_f1	0.7585	±	0.0135
kobest_hellaswag	0	acc	0.5080	±	0.0224
		acc_norm	0.5580	±	0.0222
		macro_f1	0.5049	±	0.0224
kobest_sentineg	0	acc	0.8489	±	0.0180
		macro_f1	0.8483	±	0.0180

hf-causal-experimental (pretrained=nlpai-lab/KULLM3,use_accelerate=true,trust_remote_code=true), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task	Version	Metric	Value		Stderr
kobest_boolq	0	acc	0.8896	±	0.0084
		macro_f1	0.8888	±	0.0084
kobest_copa	0	acc	0.6930	±	0.0146
		macro_f1	0.6925	±	0.0147
kobest_hellaswag	0	acc	0.4640	±	0.0223
		acc_norm	0.5240	±	0.0224
		macro_f1	0.4612	±	0.0223
kobest_sentineg	0	acc	0.6297	±	0.0243
		macro_f1	0.6255	±	0.0244

T3Q-LLM
/

T3Q-LLM-sft1.0-dpo1.0

T3Q-LLM-sft1.0-dpo1.0

This model is a version of T3Q-LLM/T3Q-LLM-solar10.8-sft-v1.0 that has been fine-tuned with DPO.

Model Developers Chihoon Lee(chihoonlee10), T3Q

Prompt Template

How to Use it

Example Output

Model tree for T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0

Dataset used to train T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0