--- language: - ko - en pipeline_tag: text-generation inference: false tags: - solar - mistral - pytorch - solar-ko library_name: transformers license: apache-2.0 --- **Update Log** - 2024.07.01: Released Solar-Ko-Recovery & Uploaded Benchmark scores - 2024.05.16: Preview Released Solar-Ko-Recovery # **Solar-Ko-Recovery-11B** 🌟❤️‍🩹 Solar-Ko-Recovery-11B aimed to recover Solar's capability on Korean with re-arrange of Embeddings and LM head, featuring an expanded vocabulary and the inclusion of a Korean+English corpus for enhanced representation. ## Model Details **Model Developers:** Junbum Lee (Beomi) **Variations:** Solar-Ko-Recovery is available with one parameter sizes — 11B(10.99B🤣). **Input:** The model accepts only text input. **Output:** The model produces text output exclusively. **Model Architecture:** Solar-Ko-Recovery is an auto-regressive language model that leverages an optimized transformer architecture derived from Llama-2. | |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate| |---|---|---|---|---|---|---| |Solar-Ko-Recovery|*A curated mix of Korean+English Corpora*|11B(10.99B)|4k|O|>100B*|5e-5| > NOTE: 2-step training processed > > 1) Only Embedding layer and LM Head layer are trained > 2) Full params trained **Vocab Expansion** Vocab expansion is conducted on edited [upstage/solar-1-mini-tokenizer](https://huggingface.co/upstage/solar-1-mini-tokenizer), which is superset of Solar tokenizer. | Model Name | Vocabulary Size | Description | | --- | --- | --- | | Original Solar | 32000 | Sentencepiece BPE | | **solar-1-mini-tokenizer** | 64000 | Sentencepiece BPE. Added Ko/JP vocabs | **Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."** - SOLAR-10.7B: 26 tokens - Solar-Ko-Recovery: 7 tokens | Model | Tokens | | --- | --- | | SOLAR-10.7B | `['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '날', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '좋', '네', '요', '.']` | | Solar-Ko-Recovery | `['▁안녕하세요', ',', '▁오늘은', '▁날씨가', '▁좋', '네요', '.']` | **Tokenizing "Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!"** - SOLAR-10.7B: 22 tokens - Solar-Ko-Recovery: 22 tokens | Model | Tokens | | --- | --- | | SOLAR-10.7B | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` | | Solar-Ko-Recovery | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` | # LICENSE Apache 2.0 # **Model Benchmark** ## LM Eval Harness - Korean - Used EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) - 5-shot scores | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |----------------------------------------------------------|-------|------|-----:|-----------|-----:|---|------| |haerae |N/A |none | 5|acc_norm |0.7874|± |0.0118| | | |none | 5|acc |0.7874|± |0.0118| | - haerae_general_knowledge | 1|none | 5|acc |0.5000|± |0.0378| | | |none | 5|acc_norm |0.5000|± |0.0378| | - haerae_history | 1|none | 5|acc |0.8723|± |0.0244| | | |none | 5|acc_norm |0.8723|± |0.0244| | - haerae_loan_word | 1|none | 5|acc |0.8402|± |0.0283| | | |none | 5|acc_norm |0.8402|± |0.0283| | - haerae_rare_word | 1|none | 5|acc |0.8346|± |0.0185| | | |none | 5|acc_norm |0.8346|± |0.0185| | - haerae_standard_nomenclature | 1|none | 5|acc |0.8301|± |0.0305| | | |none | 5|acc_norm |0.8301|± |0.0305| |kmmlu_direct |N/A |none | 5|exact_match|0.4205|± |0.0026| | - kmmlu_direct_accounting | 2|none | 5|exact_match|0.3700|± |0.0485| | - kmmlu_direct_agricultural_sciences | 2|none | 5|exact_match|0.3140|± |0.0147| | - kmmlu_direct_aviation_engineering_and_maintenance | 2|none | 5|exact_match|0.3870|± |0.0154| | - kmmlu_direct_biology | 2|none | 5|exact_match|0.3510|± |0.0151| | - kmmlu_direct_chemical_engineering | 2|none | 5|exact_match|0.3910|± |0.0154| | - kmmlu_direct_chemistry | 2|none | 5|exact_match|0.4000|± |0.0200| | - kmmlu_direct_civil_engineering | 2|none | 5|exact_match|0.4010|± |0.0155| | - kmmlu_direct_computer_science | 2|none | 5|exact_match|0.6520|± |0.0151| | - kmmlu_direct_construction | 2|none | 5|exact_match|0.3080|± |0.0146| | - kmmlu_direct_criminal_law | 2|none | 5|exact_match|0.3100|± |0.0328| | - kmmlu_direct_ecology | 2|none | 5|exact_match|0.4660|± |0.0158| | - kmmlu_direct_economics | 2|none | 5|exact_match|0.5385|± |0.0439| | - kmmlu_direct_education | 2|none | 5|exact_match|0.6200|± |0.0488| | - kmmlu_direct_electrical_engineering | 2|none | 5|exact_match|0.3000|± |0.0145| | - kmmlu_direct_electronics_engineering | 2|none | 5|exact_match|0.4740|± |0.0158| | - kmmlu_direct_energy_management | 2|none | 5|exact_match|0.3560|± |0.0151| | - kmmlu_direct_environmental_science | 2|none | 5|exact_match|0.2980|± |0.0145| | - kmmlu_direct_fashion | 2|none | 5|exact_match|0.4470|± |0.0157| | - kmmlu_direct_food_processing | 2|none | 5|exact_match|0.3690|± |0.0153| | - kmmlu_direct_gas_technology_and_engineering | 2|none | 5|exact_match|0.3000|± |0.0145| | - kmmlu_direct_geomatics | 2|none | 5|exact_match|0.3820|± |0.0154| | - kmmlu_direct_health | 2|none | 5|exact_match|0.5700|± |0.0498| | - kmmlu_direct_industrial_engineer | 2|none | 5|exact_match|0.3830|± |0.0154| | - kmmlu_direct_information_technology | 2|none | 5|exact_match|0.6090|± |0.0154| | - kmmlu_direct_interior_architecture_and_design | 2|none | 5|exact_match|0.5440|± |0.0158| | - kmmlu_direct_korean_history | 2|none | 5|exact_match|0.3800|± |0.0488| | - kmmlu_direct_law | 2|none | 5|exact_match|0.4670|± |0.0158| | - kmmlu_direct_machine_design_and_manufacturing | 2|none | 5|exact_match|0.3960|± |0.0155| | - kmmlu_direct_management | 2|none | 5|exact_match|0.5030|± |0.0158| | - kmmlu_direct_maritime_engineering | 2|none | 5|exact_match|0.4283|± |0.0202| | - kmmlu_direct_marketing | 2|none | 5|exact_match|0.7460|± |0.0138| | - kmmlu_direct_materials_engineering | 2|none | 5|exact_match|0.4020|± |0.0155| | - kmmlu_direct_math | 2|none | 5|exact_match|0.2867|± |0.0262| | - kmmlu_direct_mechanical_engineering | 2|none | 5|exact_match|0.3490|± |0.0151| | - kmmlu_direct_nondestructive_testing | 2|none | 5|exact_match|0.3760|± |0.0153| | - kmmlu_direct_patent | 2|none | 5|exact_match|0.3700|± |0.0485| | - kmmlu_direct_political_science_and_sociology | 2|none | 5|exact_match|0.5300|± |0.0289| | - kmmlu_direct_psychology | 2|none | 5|exact_match|0.4470|± |0.0157| | - kmmlu_direct_public_safety | 2|none | 5|exact_match|0.3520|± |0.0151| | - kmmlu_direct_railway_and_automotive_engineering | 2|none | 5|exact_match|0.3220|± |0.0148| | - kmmlu_direct_real_estate | 2|none | 5|exact_match|0.4350|± |0.0351| | - kmmlu_direct_refrigerating_machinery | 2|none | 5|exact_match|0.3240|± |0.0148| | - kmmlu_direct_social_welfare | 2|none | 5|exact_match|0.4970|± |0.0158| | - kmmlu_direct_taxation | 2|none | 5|exact_match|0.3800|± |0.0344| | - kmmlu_direct_telecommunications_and_wireless_technology| 2|none | 5|exact_match|0.5480|± |0.0157| |kobest_boolq | 1|none | 5|acc |0.9202|± |0.0072| | | |none | 5|f1 |0.9202|± |N/A | |kobest_copa | 1|none | 5|acc |0.8680|± |0.0107| | | |none | 5|f1 |0.8678|± |N/A | |kobest_hellaswag | 1|none | 5|acc |0.5560|± |0.0222| | | |none | 5|f1 |0.5520|± |N/A | | | |none | 5|acc_norm |0.6540|± |0.0213| |kobest_sentineg | 1|none | 5|acc |0.9824|± |0.0066| | | |none | 5|f1 |0.9824|± |N/A | ## Citation TBD ## Acknowledgements - Training support was provided by the [TPU Research Cloud](https://sites.research.google/trc/) program.