Inference on HF Endpoints API?
#4
by
kastan
- opened
I have this model running on the Endpoints API, but I can't get it to accept BOTH text and image inputs simultaneously.
What is the required schema?
I also asked here: https://github.com/huggingface/api-inference-community/issues/336
I got close, but it seems it only accepts a single string as input because it's part of the "Text-Generation" family of models.
import json
import requests
import base64
img_url = "https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"
API_URL = "https://api-inference.huggingface.co/models/HuggingFaceM4/idefics-9b-instruct"
headers = {
"Authorization": f"Bearer {HF_TOKEN}",
"Content-Type": "application/json"
}
def query(image_url):
response = requests.get(image_url)
image_bytes = response.content
encoded_image = base64.b64encode(image_bytes).decode('utf-8')
data = {
"inputs": img_url,
# "prompt": "What's in this image?",
# "prompt": encoded_image
# }
# "image": encoded_image,
# "inputs": "What's in this image?",
}
json_data = json.dumps(data)
print("my request", json_data)
response = requests.request("POST", API_URL, headers=headers, data=json_data)
print("Response content:", response.content)
return json.loads(response.content.decode("utf-8"))
print(query(img_url))
## The results seem nearly there!
## [{'generated_text': 'https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPGScooby-Doo, Where Are You!'}]
Hi!
For TGI, if you look into the IDEFICS Playground code for example, you'll see this piece of code:
def prompt_list_to_tgi_input(prompt_list: List[str]) -> str:
"""
TGI expects a string that contains both text and images in the image markdown format (i.e. the `![]()` ).
The images links are parsed on TGI side
"""
result_string_input = ""
for elem in prompt_list:
if is_image(elem):
if is_url(elem):
result_string_input += f"![]({elem})"
else:
result_string_input += f"![]({gradio_link(img_path=elem)})"
else:
result_string_input += elem
return result_string_input
As the docstrings says, TGI is expecting a string with images in markdown format, so if you pass a list of interleaved image, text to this function it should work.