File size: 402 Bytes
1b85f85
 
 
 
 
 
 
d25a484
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
datasets:
- infCapital/vnnews_corpus_100K
language:
- vi
---

## Base Model: LLaMa2 7B Chat HF
+ Extend vocab to 44,800 for better Vietnamese understanding
+ Continual Pre-Train with >2B tokens Vietnamese
+ Trainning profile: LoRa (rank=32, alpha=128, 16fp), 1 epoch, block size = 512. Takes 300GPU Hours x RXT4090 24GB

## Can be better use for
+ Futher training / Fine-tuning for Vietnamese tasks