Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This repo provides the GGUF format for the aksara_v1 model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only.

To run using Python:

  1. Install llama-cpp-python:
! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
  1. Download the model:
from huggingface_hub import hf_hub_download

model_name = "cropinailab/aksara_v1_GGUF"
model_file = "aksara_v1.Q4_K_M.gguf"
model_path = hf_hub_download(model_name,
                             filename=model_file,
                             token='<YOUR_HF_TOKEN>'
                             local_dir='<PATH_TO_SAVE_MODEL>')
  1. Run the model:
from llama_cpp import Llama
llm = Llama(
  model_path=model_path,  # path to GGUF file
  n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available.
                   # Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers.
)
prompt = "What are the recommended NPK dosage for maize varieties?"

# Simple inference example
output = llm(
  f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
  max_tokens=512,  # Generate up to 512 tokens
  stop=["<|end|>"], 
  echo=True,  # Whether to echo the prompt
)
print(output['choices'][0]['text'])

For using the model with a more detailed pipeline refer to the following notebook

Downloads last month
17
GGUF
Model size
7.24B params
Architecture
llama

4-bit

Inference Examples
Inference API (serverless) has been turned off for this model.