--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - granite-3.0 - llama-cpp - gguf-my-repo base_model: ibm-granite/granite-3.0-8b-instruct --- # eformat/granite-3.0-8b-instruct-Q4_K_M-GGUF Not all tools (vllm, llama.cpp) seem to support the new model config params it seems (25/10/2024). ```json # config.json "model_type": "granite" "architectures": [ "GraniteForCausalLM" ] ``` This gguf conversion done using old ones ```json # config.json "model_type": "llama" "architectures": [ "LlamaForCausalLM" ] ``` This gguf loads OK - tested using: ```bash # llama.cpp ./llama-server --verbose --gpu-layers 99999 --parallel 2 --ctx-size 4096 -m ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf ``` ```bash # vllm vllm serve ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf ```