Datascience-Lab commited on
Commit
276eb8e
โ€ข
1 Parent(s): cf35f18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -1,3 +1,41 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags: [gpt2]
4
+ language: ko
5
  ---
6
+
7
+ # KoGPT2-small
8
+
9
+ | Model | Batch Size | Tokenizer | Vocab Size | Max Length | Parameter Size |
10
+ |:---: | :------: | :-----: | :------: | :----: | :------: |
11
+ |GPT2 | 64 | BPE | 30,000 | 1024 | 108M |
12
+
13
+
14
+ # DataSet
15
+ - AIhub - ์›น๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ ๋ฐ์ดํ„ฐ (4.8M)
16
+ - KoWiki dump 230701 (1.4M)
17
+
18
+
19
+ # Inference Example
20
+
21
+ ```python
22
+ from transformers import AutoTokenizer, GPT2LMHeadModel
23
+
24
+ text = "์ถœ๊ทผ์ด ํž˜๋“ค๋ฉด"
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained('Datascience-Lab/GPT2-small')
27
+ model = GPT2LMHeadModel.from_pretrained('Datascience-Lab/GPT2-small')
28
+
29
+ inputs = tokenizer.encode_plus(text, return_tensors='pt', add_special_tokens=False)
30
+
31
+ outputs = model.generate(inputs['input_ids'], max_length=128,
32
+ repetition_penalty=2.0,
33
+ pad_token_id=tokenizer.pad_token_id,
34
+ eos_token_id=tokenizer.eos_token_id,
35
+ bos_token_id=tokenizer.bos_token_id,
36
+ use_cache=True,
37
+ temperature = 0.5)
38
+ outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
39
+
40
+ # ์ถœ๋ ฅ ๊ฒฐ๊ณผ : '์ถœ๊ทผ์ด ํž˜๋“ค๋ฉด ์ถœ๊ทผ์„ ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ข‹๋‹ค. ํ•˜์ง€๋งŒ ์ถœํ‡ด๊ทผ ์‹œ๊ฐ„์„ ๋Šฆ์ถ”๋Š” ๊ฒƒ์€ ์˜คํžˆ๋ ค ๊ฑด๊ฐ•์— ์ข‹์ง€ ์•Š๋‹ค.. ํŠนํžˆ๋‚˜ ์žฅ์‹œ๊ฐ„์˜ ์—…๋ฌด๋กœ ์ธํ•ด ํ”ผ๋กœ๊ฐ€ ์Œ“์ด๊ณ  ๋ฉด์—ญ๋ ฅ์ด ๋–จ์–ด์ง€๋ฉด, ํ”ผ๋กœ๊ฐ์ด ์‹ฌํ•ด์ ธ์„œ ์ž ๋“ค๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด๋Ÿฐ ๊ฒฝ์šฐ๋ผ๋ฉด ํ‰์†Œ๋ณด๋‹ค ๋” ๋งŽ์€ ์–‘์œผ๋กœ ๊ณผ์‹์„ ํ•˜๊ฑฐ๋‚˜ ๋ฌด๋ฆฌํ•œ ๋‹ค์ด์–ดํŠธ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์‹๋‹จ ์กฐ์ ˆ๊ณผ ํ•จ๊ป˜ ์˜์–‘ ๋ณด์ถฉ์— ์‹ ๊ฒฝ ์จ์•ผ ํ•œ๋‹ค. ๋˜ํ•œ ๊ณผ๋„ํ•œ ์Œ์‹์ด ์ฒด์ค‘ ๊ฐ๋Ÿ‰์— ๋„์›€์„ ์ฃผ๋ฏ€๋กœ ์ ์ ˆํ•œ ์šด๋™๋Ÿ‰์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค.'
41
+ ```