--- library_name: transformers tags: - unsloth - llama3 - indonesia license: llama3 datasets: - catinthebag/Tumpeng-1-Indonesian language: - id inference: false --- **Exllamav2** quant (**exl2** / **4.0 bpw**) made with ExLlamaV2 v0.1.3 Other EXL2 quants: | **Quant** | **Model Size** | **lm_head** | | ----- | ---------- | ------- | |**[2.2](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-2_2bpw_exl2)** | 3250 MB | 6 | |**[2.5](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-2_5bpw_exl2)** | 3478 MB | 6 | |**[3.0](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-3_0bpw_exl2)** | 3895 MB | 6 | |**[3.5](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-3_5bpw_exl2)** | 4311 MB | 6 | |**[3.75](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-3_75bpw_exl2)** | 4518 MB | 6 | |**[4.0](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-4_0bpw_exl2)** | 4727 MB | 6 | |**[4.25](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-4_25bpw_exl2)** | 4935 MB | 6 | |**[5.0](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-5_0bpw_exl2)** | 5559 MB | 6 | |**[6.0](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-6_0bpw_exl2)** | 6493 MB | 8 | |**[6.5](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-6_5bpw_exl2)** | 6912 MB | 8 | |**[8.0](https://huggingface.co/Zoyd/afrizalha_Kancil-V1-llama3-fp16-8_0bpw_exl2)** | 8116 MB | 8 | Document Title

Introducing the Kancil family of open models

Kancil is a fine-tuned version of Llama 3 8B using synthetic QA dataset generated with Llama 3 70B. Version zero of Kancil is the first generative Indonesian LLM gain functional instruction performance using solely synthetic data.

❕Go straight to the colab demo❕

Beta preview

Selamat datang! I am ultra-overjoyed to introduce you... the 🦌 Kancil! It's a fine-tuned version of Llama 3 8B with the Tumpeng, an instruction dataset of 14.8 million words. Both the model and dataset is openly available in Huggingface. 📚 The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats 😔) 🦚 This follows previous efforts for collection of open, fine-tuned Indonesian models, like Merak and Cendol. However, Kancil solely leverages synthetic data in a very creative way, which makes it a very unique contribution! ### Version 1.0 This is the second working prototype, Kancil V1. ✨ Training - 2.2x Dataset word count - 2x lora parameters - Rank-stabilized lora - 2x fun ✨ New features - Multi-turn conversation (beta; optimized for curhat/personal advice 😂) - Better text generation (full or outline writing; optimized for essays) - QA from text (copy paste to prompt and ask a question about it) - Making slogans This model was fine-tuned with QLoRA using the amazing Unsloth framework! It was built on top of [unsloth/llama-3-8b-bnb-4bit](https://huggingface.co/unsloth/llama-3-8b-bnb-4bit) and subsequently merged with the adapter. ### Uses This model is developed with research purposes for researchers or general AI hobbyists. However, it has one big application: You can have lots of fun with it! ### Out-of-Scope Use This is a research preview model with minimal safety curation. Do not use this model for commercial or practical applications. You are also not allowed to use this model without having fun. ### Getting started As mentioned, this model was trained with Unsloth. Please use its code for better experience. ``` import torch from transformers import AutoTokenizer, AutoModelForCausalLM # Available versions KancilV1 = "catinthebag/Kancil-V1-llama3-fp16" # Load the model tokenizer = AutoTokenizer.from_pretrained("catinthebag/Kancil-V1-llama3-fp16") model = AutoModelForCausalLM.from_pretrained("catinthebag/Kancil-V1-llama3-fp16") device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) ``` ``` # This model was trained on this specific prompt template. Changing it might lead to performance degradations. prompt_template = """<|user|> {prompt} <|assistant|> {response}""" # Start generating! inputs = tokenizer( [ prompt_template.format( prompt="""Bagaimana cara memberi tahu orang tua kalau saya ditolak universitas favorit saya?""", response="",) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens = 600, temperature=.3, use_cache = True) print(tokenizer.batch_decode(outputs)[0].replace('\\n', '\n')) ``` **Note:** There is an issue with the dataset where the newline characters are interpreted as literal strings. Very sorry about this! 😔 Please keep the .replace() method to fix newline errors. ### Acknowledgments - **Developed by:** Afrizal Hasbi Azizy - **License:** Llama 3 Community License Agreement