File size: 1,855 Bytes
ad6850d
 
276eb8e
 
ad6850d
276eb8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
006da10
 
 
 
 
276eb8e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
tags: [gpt2]
language: ko
---

# KoGPT2-small

| Model | Batch Size | Tokenizer | Vocab Size | Max Length | Parameter Size |
|:---:  | :------:   |  :-----:    |    :------:  |    :----:    |     :------:    |
|GPT2  |   64        | BPE       |    30,000  | 1024      |     108M       |


# DataSet
 - AIhub - ์›น๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ ๋ฐ์ดํ„ฐ (4.8M)
 - KoWiki dump 230701 (1.4M)


# Inference Example

```python
from transformers import AutoTokenizer, GPT2LMHeadModel

text = "์ถœ๊ทผ์ด ํž˜๋“ค๋ฉด"

tokenizer = AutoTokenizer.from_pretrained('Datascience-Lab/GPT2-small')
model = GPT2LMHeadModel.from_pretrained('Datascience-Lab/GPT2-small')

inputs = tokenizer.encode_plus(text, return_tensors='pt', add_special_tokens=False)

outputs = model.generate(inputs['input_ids'], max_length=128, 
                           repetition_penalty=2.0,
                           pad_token_id=tokenizer.pad_token_id,
                           eos_token_id=tokenizer.eos_token_id,
                           bos_token_id=tokenizer.bos_token_id,
                           use_cache=True,
                           temperature = 0.5)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)

# ์ถœ๋ ฅ ๊ฒฐ๊ณผ : '์ถœ๊ทผ์ด ํž˜๋“ค๋ฉด ์ถœ๊ทผ์„ ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ข‹๋‹ค. ํ•˜์ง€๋งŒ ์ถœํ‡ด๊ทผ ์‹œ๊ฐ„์„ ๋Šฆ์ถ”๋Š” ๊ฒƒ์€ ์˜คํžˆ๋ ค ๊ฑด๊ฐ•์— ์ข‹์ง€ ์•Š๋‹ค..
ํŠนํžˆ๋‚˜ ์žฅ์‹œ๊ฐ„์˜ ์—…๋ฌด๋กœ ์ธํ•ด ํ”ผ๋กœ๊ฐ€ ์Œ“์ด๊ณ  ๋ฉด์—ญ๋ ฅ์ด ๋–จ์–ด์ง€๋ฉด, ํ”ผ๋กœ๊ฐ์ด ์‹ฌํ•ด์ ธ์„œ ์ž ๋“ค๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค.
์ด๋Ÿฐ ๊ฒฝ์šฐ๋ผ๋ฉด ํ‰์†Œ๋ณด๋‹ค ๋” ๋งŽ์€ ์–‘์œผ๋กœ ๊ณผ์‹์„ ํ•˜๊ฑฐ๋‚˜ ๋ฌด๋ฆฌํ•œ ๋‹ค์ด์–ดํŠธ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค.
๋”ฐ๋ผ์„œ ์‹๋‹จ ์กฐ์ ˆ๊ณผ ํ•จ๊ป˜ ์˜์–‘ ๋ณด์ถฉ์— ์‹ ๊ฒฝ ์จ์•ผ ํ•œ๋‹ค.
๋˜ํ•œ ๊ณผ๋„ํ•œ ์Œ์‹์ด ์ฒด์ค‘ ๊ฐ๋Ÿ‰์— ๋„์›€์„ ์ฃผ๋ฏ€๋กœ ์ ์ ˆํ•œ ์šด๋™๋Ÿ‰์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค.'
```