🦚Merak-7B-v3-Mini-Orca GPTQ🐳
These files are GPTQ model files for Merak-7B-v3-Mini-Orca
Merak-7B-v3-Mini-Orca is Ichsan2895's Merak-7B-v3 fine-tuned on Bahasa Indonesia translated psmathur's orca_mini_v1_dataset.
Prompt format
You can use Vicuna 1.1 format for Ooobabooga's text generation webui.
SYSTEM: Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.
USER: <prompt> (without the <>)
ASSISTANT:
How to easily download and use this model in text-generation-webui.
Please make sure you're using the latest version of text-generation-webui.
It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
- Click the Model tab.
- Under Download custom model or LoRA, enter
asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ
.
- To download from a specific branch, enter for example
asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ
- Click Download.
- The model will start downloading. Once it's finished it will say "Done"
- In the top left, click the refresh icon next to Model.
- In the Model dropdown, choose the model you just downloaded:
Merak-7B-v3-Mini-Orca-Indo-GPTQ
- In the Model Loader dropdown, choose ExLlamav2_HF as the model loader.
- Click load.
- Click the Default tab
- Copy prompt format mentioned above to the input box.
- Enter a prompt and click generate! Click continue to get longer response.
How to use this GPTQ model from Python code
First make sure you have AutoGPTQ installed:
GITHUB_ACTIONS=true pip install auto-gptq
pip install sentencepiece
Then try the following example code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ"
model_basename = "Merak-7B-v3-Mini-Orca-Indo-GPTQ"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
prompt = "Buat rencana untuk menghemat listrik di rumah"
system_message = "Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.\n"
prompt_template=f'''SYSTEM: {system_message}
USER: {prompt}
ASSISTANT: '''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.15
)
print(pipe(prompt_template)[0]['generated_text'])
Compatibility
The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork.
ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
Credits
TheBloke for the Readme template.
- Downloads last month
- 19