hyunjae commited on
Commit
08df47a
β€’
1 Parent(s): a601499

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -10,4 +10,34 @@ pipeline_tag: text-generation
10
  ꡐ윑용으둜 ν•™μŠ΅ ν•œ κ°„λ‹¨ν•œ instruction fine-tuning λͺ¨λΈ
11
 
12
  - Pretrained model: skt/kogpt2-base-v2 (https://github.com/SKT-AI/KoGPT2)
13
- - Training data: kullm-v2(https://huggingface.co/datasets/nlpai-lab/kullm-v2)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ꡐ윑용으둜 ν•™μŠ΅ ν•œ κ°„λ‹¨ν•œ instruction fine-tuning λͺ¨λΈ
11
 
12
  - Pretrained model: skt/kogpt2-base-v2 (https://github.com/SKT-AI/KoGPT2)
13
+ - Training data: kullm-v2(https://huggingface.co/datasets/nlpai-lab/kullm-v2)
14
+
15
+ ```python
16
+ from transformers import AutoModelForCausalLM
17
+ from transformers import PreTrainedTokenizerFast
18
+
19
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunjae/skt-kogpt2-kullm-v2", padding_side="right", model_max_length=512)
20
+ model = AutoModelForCausalLM.from_pretrained('hyunjae/skt-kogpt2-kullm-v2').to('cuda')
21
+
22
+ PROMPT= "### system:μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λ§žλŠ” μ μ ˆν•œ 응닡을 μƒμ„±ν•˜μ„Έμš”.\n### μ‚¬μš©μž:{instruction}\n### 응닡:"
23
+ text = PROMPT.format_map({'instruction':"μ•ˆλ…•? λ„ˆκ°€ ν•  수 μžˆλŠ”κ²Œ 뭐야?"})
24
+ input_ids = tokenizer.encode(text, return_tensors='pt').to(model.device)
25
+
26
+ gen_ids = model.generate(input_ids,
27
+ repetition_penalty=2.0,
28
+ pad_token_id=tokenizer.pad_token_id,
29
+ eos_token_id=tokenizer.eos_token_id,
30
+ bos_token_id=tokenizer.bos_token_id,
31
+ num_beams=4,
32
+ no_repeat_ngram_size=4,
33
+ max_new_tokens=128,
34
+ do_sample=True,
35
+ top_k=50,
36
+ early_stopping=True,
37
+ use_cache=True)
38
+
39
+
40
+ generated = tokenizer.decode(gen_ids[0])
41
+ print(generated)
42
+
43
+ ```