TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF · Python bindings not working

I have never been able to get a decent response out of any library other than kobold or llama cpp (not llama cpp python) and since I work with python a lot I tried ctransformers as well which is the worst to be used (in my experience). However, I am finding it very difficult to community with model using kobold or llama cpp (server) when used as drop in replacement for openai's api.

here is what i tried:

1> run server with command: ./server -m tinyllama.gguf
2> (on different cmd/tab) run openai's replacement: python api_like_OAI.py # must have flask installed
or just run kobold you'll have an endpoint
3> use following code:

from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

llm = OpenAI(openai_api_base="http://10.192.4.242:8081/v1", openai_api_key="somethig")
question = "How many planets are there in our solar system?"
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

response = llm_chain.invoke(question, max_tokens=10)
print(response)

If however i use any model with llama_cpp_python then I get very weird output, which i tried with different models (all quantized) with different prompts. nothing worked :'(