mistralai/Mistral-7B-Instruct-v0.1 · Does it work with local open interpreter, and how many gigs of ram is required?

Here is my code. You need to locally save the model in a subfolder ( ./Mistral/ depending on your .py file)

It does work for 1-3 queries. Until it breaks down. As there is absofucking no documentation of how to implement the workflow of Interference to a local pipeline this is the best I got. If people are interested in reverse engineering it. Shot me a message. As it stands now, this is an add front to promote paid services, let's change that.

import gradio as gr
from transformers import pipeline, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./Mistral/")
pipe = pipeline("text-generation", model="./Mistral/", max_new_tokens=512)

chat_history_tokens = []

def generate(chatlog, is_finished):
    global chat_history_tokens

    # Get the latest message from chat
    new_message = chatlog[-1]['content'] if isinstance(chatlog, list) else chatlog

    # Tokenize new message and extend chat history
    new_message_tokens = tokenizer.encode(new_message, add_special_tokens=False)
    chat_history_tokens = new_message_tokens  # We only keep the last message now

    # Decode tokens to string for the prompt
    prompt = tokenizer.decode(chat_history_tokens)
    
    try:
        print("Debug: Sending this prompt to the model:", prompt)
        outputs = pipe(prompt, pad_token_id=tokenizer.eos_token_id)
        print("Debug: Model's raw output:", outputs)

        # Cleanup the generated text
        generated_text = outputs[0]['generated_text'].replace(prompt, "").strip()
        generated_text = generated_text.replace("Answer:", "").replace("A:", "").strip()

        print("Debug: Generated Text After Cleanup:", generated_text)

        # Tokenize the model's reply and add it to the history
        bot_reply_tokens = tokenizer.encode(generated_text, add_special_tokens=False)
        chat_history_tokens.extend(bot_reply_tokens)

    except Exception as e:
        print("Debug: Caught an exception:", str(e))
        return str(e)

    return generated_text

iface = gr.ChatInterface(fn=generate)
iface.launch()