TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF · Getting runtime error when loading with llama-cpp in a HF space with Nvidia A10G Large

Jan 10

I don't know if I'm doing something wrong but I'm trying to deploy a gradio App using Mixtral-8x7B gguf and llama cpp. My space already has set up the environment variables:

CMAKE_ARGS="-DLLAMA_CUBLAS=on"
FORCE_CMAKE="1"

this is my requirements.txt:

--extra-index-url https://download.pytorch.org/whl/cu113
torch
llama-cpp-python

and my app.py goes as follows:

import gradio as gr
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
import os

import torch
print(f"Is CUDA available: {torch.cuda.is_available()}")
# True
print(f"CUDA device: {torch.cuda.get_device_name(torch.cuda.current_device())}")
print(f"CMAKE_ARGS={os.environ['CMAKE_ARGS']}")
print(f"FORCE_CMAKE={os.environ['FORCE_CMAKE']}")
print(f'Llama={Llama.__name__}')
os.makedirs('models/')
downloaded_model_path = hf_hub_download(repo_id="TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF",
                        filename="mixtral-8x7b-instruct-v0.1.Q2_K.gguf",local_dir = 'models/')

print(f'Downloaded path: {downloaded_model_path}')

print('Initializing model...')
llm = Llama(
  model_path=downloaded_model_path, 
   n_ctx=2048,
   n_threads=10,
   n_gpu_layers=25,
   temp=0.1,
   n_batch = 512, 
   n_predict = -1, 
  n_keep = 0
)
print('Model loaded.')

def mix_query(query, history):
    output = llm(
    f"[INST] {query} [/INST]",
    max_tokens=512,
    stop=["</s>"],
    echo=False
    )

    print(['choices'][0]['text'])
    return ['choices'][0]['text']

demo = gr.ChatInterface(fn=mix_query, 

                        examples=["Explain the Fermi paradox"], title="TARS",
                        theme="soft")
demo.launch()

As you can see I added a lot of prints to check where does the execution fails, and it's during the definition of llm=Llama(..., however when I run this on my local machine it executes flawless. The issue is that I get no logs when it fails, it just does:

Has anyone run into something like this?

Isaid-Silver changed discussion status to closed Jan 10

Isaid-Silver changed discussion status to open Jan 10