Spaces:
Running
on
CPU Upgrade
Issue with api end point : "/chat"
@ysharma
The following code returns only the first token of response :
from gradio_client import Client
client = Client("https://ysharma-explore-llamav2-with-tgi.hf.space/")
result = client.predict(
"Capital of India", # str in 'Message' Textbox component
api_name="/chat"
)
print(result)
this returns only the first token of response :
that is ' The'
Can you please point out if I am missing something basic .
Was it intended to work like this , or should the return type be a generator object ?
I think I know why. It's because the gradio app has a feature to write the message token by token (Streaming), and the API sends you the first thing it generates. However, knowing this doesn't make any differences. Perhaps you could manage to receive more than 1 response in a single request or something? I tried the /chat_1
endpoint (Because I assumed that would be the "Batch" tab) and it doesn't work properly neither.
Please, if you manage to make it work somehow, share it here.