vocab exp
Browse files
README.md
CHANGED
@@ -14,8 +14,22 @@ base_model:
|
|
14 |
|
15 |
# Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) | [Colab-tutorial](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) |
|
16 |
|
17 |
-
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
|
21 |
|
@@ -50,28 +64,13 @@ The Bllossom language model is a Korean-English bilingual language model based o
|
|
50 |
|
51 |
|
52 |
## NEWS
|
53 |
-
* [2024
|
|
|
54 |
* [2023/12] We released Bllossom-Vision v1.0, based on Bllossom
|
55 |
* [2023/08] We released Bllossom v1.0, based on llama-2.
|
56 |
* [2023/07] We released Bllossom v0.7, based on polyglot-ko.
|
57 |
|
58 |
|
59 |
-
```bash
|
60 |
-
์ ํฌ ์์ธ๊ณผ๊ธฐ๋ MLP์ฐ๊ตฌ์ค์์ ํ๊ตญ์ด-์์ด ์ด์ค ์ธ์ด๋ชจ๋ธ์ธ Bllossom์ ๊ณต๊ฐํ์ต๋๋ค!
|
61 |
-
- LLama3-8B ๊ธฐ๋ฐ์ ๊ฒฝ๋ํ๋ ์ฌ์ด์ฆ
|
62 |
-
- ํ๊ตญ์ด-์์ด ์ง์์ฐ๊ฒฐ์ ํตํ ํ๊ตญ์ด ์ง์ ๊ฐํ
|
63 |
-
- ํ๊ตญ์ด ์ดํ์ถ๊ฐ
|
64 |
-
- ํ๊ตญ์ด ๋ฌธํ, ์ธ์ด๋ฅผ ๊ณ ๋ คํ ์์ฒด์ ์ ๋ฐ์ดํฐ ๊ธฐ๋ฐ ๋ฏธ์ธ์กฐ์
|
65 |
-
- ๊ฐํํ์ต (DPO)
|
66 |
-
- ์๊ฐ-์ธ์ด ๋ชจ๋ธํ์ฅ
|
67 |
-
|
68 |
-
1. Bllossom์ ์์ธ๊ณผ๊ธฐ๋, ํ
๋์ธ, ์ฐ์ธ๋ ์ธ์ด์์ ์ฐ๊ตฌ์ค์ ์ธ์ดํ์์ ํ์
ํด ๋ง๋ ์ค์ฉ์ฃผ์๊ธฐ๋ฐ ์ธ์ด๋ชจ๋ธ์
๋๋ค! ์์ผ๋ก ์ง์์ ์ธ ์
๋ฐ์ดํธ๋ฅผ ํตํด ๊ด๋ฆฌํ๊ฒ ์ต๋๋ค ๋ง์ด ํ์ฉํด์ฃผ์ธ์ ๐
|
69 |
-
2. Bllossom70B๋ชจ๋ธ, ์ดํํ์ฅ๋ชจ๋ธ, ์๊ฐ-์ธ์ด๋ชจ๋ธ์ ์ถํ ๊ณต๊ฐํ ์์ ์
๋๋ค. (๊ถ๊ธํ์ ๋ถ์ ๊ฐ๋ณ ์ฐ๋ฝ์ฃผ์ธ์, GPU๋ง ์ง์ํด์ฃผ์๋ฉด ๋ฌด๋ฃ๋ก ๋๋ฆฝ๋๋ค!)
|
70 |
-
3. Bllossom์ NAACL2024, LREC-COLING2024 (๊ตฌ๋) ๋ฐํ๋ก ์ฑํ๋์์ต๋๋ค.
|
71 |
-
4. ์ข์ ์ธ์ด๋ชจ๋ธ ๊ณ์ ์
๋ฐ์ดํธ ํ๊ฒ ์ต๋๋ค!! ํ๊ตญ์ด ๊ฐํ๋ฅผ์ํด ๊ณต๋ ์ฐ๊ตฌํ์ค๋ถ ์ธ์ ๋ ํ์ํฉ๋๋ค!!
|
72 |
-
```
|
73 |
-
|
74 |
-
|
75 |
## Example code
|
76 |
|
77 |
### Colab Tutorial
|
@@ -87,7 +86,7 @@ pip install torch transformers==4.40.0 accelerate
|
|
87 |
import transformers
|
88 |
import torch
|
89 |
|
90 |
-
model_id = "MLP-KTLim/
|
91 |
|
92 |
pipeline = transformers.pipeline(
|
93 |
"text-generation",
|
@@ -140,7 +139,7 @@ import os
|
|
140 |
import torch
|
141 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
142 |
|
143 |
-
model_id = 'MLP-KTLim/
|
144 |
|
145 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
146 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
14 |
|
15 |
# Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) | [Colab-tutorial](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) |
|
16 |
|
17 |
+
|
18 |
+
```bash
|
19 |
+
์ ํฌ ์์ธ๊ณผ๊ธฐ๋ MLP์ฐ๊ตฌ์ค์์ ํ๊ตญ์ด-์์ด ์ด์ค ์ธ์ด๋ชจ๋ธ์ธ Bllossom์ ๊ณต๊ฐํ์ต๋๋ค! ์์ธ๊ณผ๊ธฐ๋ ์ํผ์ปดํจํ
์ผํฐ์ ์ง์์ผ๋ก 100GB๊ฐ๋๋ ํ๊ตญ์ด ์ถ๊ฐํ์ต์ ์งํํ ํ๊ตญ์ด ๊ฐํ ์ด์ค์ธ์ด ๋ชจ๋ธ์
๋๋ค!
|
20 |
+
ํ๊ตญ์ด ์ํ๋ ๋ชจ๋ธ ์ฐพ๊ณ ์์ง ์์ผ์
จ๋์?
|
21 |
+
- ๋ฌด๋ ค 3๋ง๊ฐ๊ฐ ๋๋ ํ๊ตญ์ด ์ดํํ์ฅ
|
22 |
+
- Llama3๋๋น ๋๋ต 25% ๋ ๊ธด ๊ธธ์ด์ ํ๊ตญ์ด Context ์ฒ๋ฆฌ๊ฐ๋ฅ
|
23 |
+
- ํ๊ตญ์ด-์์ด Pararell Corpus๋ฅผ ํ์ฉํ ํ๊ตญ์ด-์์ด ์ง์์ฐ๊ฒฐ (์ฌ์ ํ์ต)
|
24 |
+
- ํ๊ตญ์ด ๋ฌธํ, ์ธ์ด๋ฅผ ๊ณ ๋ คํด ์ธ์ดํ์๊ฐ ์ ์ํ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ ๋ฏธ์ธ์กฐ์
|
25 |
+
- ๊ฐํํ์ต
|
26 |
+
์ด ๋ชจ๋ ๊ฒ ํ๊บผ๋ฒ์ ์ ์ฉ๋๊ณ ์์
์ ์ด์ฉ์ด ๊ฐ๋ฅํ Bllossom์ ์ด์ฉํด ์ฌ๋ฌ๋ถ ๋ง์ ๋ชจ๋ธ์ ๋ง๋ค์ด๋ณด์ธ์ฅ! ๋ฌด๋ ค Colab ๋ฌด๋ฃ GPU๋ก ํ์ต์ด ๊ฐ๋ฅํฉ๋๋ค.
|
27 |
+
|
28 |
+
1. Bllossom-8B๋ ์์ธ๊ณผ๊ธฐ๋, ํ
๋์ธ, ์ฐ์ธ๋ ์ธ์ด์์ ์ฐ๊ตฌ์ค์ ์ธ์ดํ์์ ํ์
ํด ๋ง๋ ์ค์ฉ์ฃผ์๊ธฐ๋ฐ ์ธ์ด๋ชจ๋ธ์
๋๋ค! ์์ผ๋ก ์ง์์ ์ธ ์
๋ฐ์ดํธ๋ฅผ ํตํด ๊ด๋ฆฌํ๊ฒ ์ต๋๋ค ๋ง์ด ํ์ฉํด์ฃผ์ธ์ ๐
|
29 |
+
2. ์ด ๊ฐ๋ ฅํ Advanced-Bllossom 8B, 70B๋ชจ๋ธ, ์๊ฐ-์ธ์ด๋ชจ๋ธ์ ๋ณด์ ํ๊ณ ์์ต๋๋ค! (๊ถ๊ธํ์ ๋ถ์ ๊ฐ๋ณ ์ฐ๋ฝ์ฃผ์ธ์!!)
|
30 |
+
3. Bllossom์ NAACL2024, LREC-COLING2024 (๊ตฌ๋) ๋ฐํ๋ก ์ฑํ๋์์ต๋๋ค.
|
31 |
+
4. ์ข์ ์ธ์ด๋ชจ๋ธ ๊ณ์ ์
๋ฐ์ดํธ ํ๊ฒ ์ต๋๋ค!! ํ๊ตญ์ด ๊ฐํ๋ฅผ์ํด ๊ณต๋ ์ฐ๊ตฌํ์ค๋ถ(ํนํ๋
ผ๋ฌธ) ์ธ์ ๋ ํ์ํฉ๋๋ค!! ํนํ ์๋์ GPU๋ผ๋ ๋์ฌ ๊ฐ๋ฅํํ์ ์ธ์ ๋ ์ฐ๋ฝ์ฃผ์ธ์! ๋ง๋ค๊ณ ์ถ์๊ฑฐ ๋์๋๋ ค์.
|
32 |
+
```
|
33 |
|
34 |
The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
|
35 |
|
|
|
64 |
|
65 |
|
66 |
## NEWS
|
67 |
+
* [2024.05.08] Vocab Expansion Model Update
|
68 |
+
* [2024.04.25] We released Bllossom v2.0, based on llama-3
|
69 |
* [2023/12] We released Bllossom-Vision v1.0, based on Bllossom
|
70 |
* [2023/08] We released Bllossom v1.0, based on llama-2.
|
71 |
* [2023/07] We released Bllossom v0.7, based on polyglot-ko.
|
72 |
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
## Example code
|
75 |
|
76 |
### Colab Tutorial
|
|
|
86 |
import transformers
|
87 |
import torch
|
88 |
|
89 |
+
model_id = "MLP-KTLim/llama-3-Korean-Bllossom-8B"
|
90 |
|
91 |
pipeline = transformers.pipeline(
|
92 |
"text-generation",
|
|
|
139 |
import torch
|
140 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
141 |
|
142 |
+
model_id = 'MLP-KTLim/llama-3-Korean-Bllossom-8B'
|
143 |
|
144 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
145 |
model = AutoModelForCausalLM.from_pretrained(
|