Problem with 'google/gemma-2-2b-it''s API for Chat completion

#40

by adelamare-blockchain - opened Sep 18

Sep 18

Hi !

I am in front of a big problem, while it seems that the API google/gemma-2-2b-it (Official Hugging Face documentation for 'Chat Completion : curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }') is not working for "Chat Completion".
The address 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' points to ```// 20240918223200
// https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions

{
"error": "Model google/gemma-2-2b-it/v1/chat/completions does not exist"
}```.
Which correct API could i use in order to call properly the google/gemma-2-2b-itChat completion please ?

Thx !

GopiUppari

Google org Sep 23

Hi @adelamare-blockchain ,

I was able to reproduce the issue. To resolve it, please use the following API endpoint: https://api-inference.huggingface.co/models/google/gemma-2-2b-it and refer to the corrected code below:

Thank you.

adelamare-blockchain

Sep 23

Thx @GopiUppari for your answer.

Yeah it works for me this way, anyway it appears that this solution reproduce a 'text-to-text' AI API call.
Unfortunately it doesn't reproduce a 'Chat completion', or a conversation with google/gemma-2-2b-it.

It doesn't accept the "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], option from { "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], "max_tokens": 500, "temperature": 0.7, "top_p": 0.95, "repetition_penalty": 1.15, "stream": false } body-request pattern.
Indeed, the 'Chat completion' documentation says that curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }should work, but it didn't due to https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions API which doesn't exist.

How can I use the conversationnal API call of google/gemma-2-2b-it please ?

Thx !

GopiUppari

Google org Sep 24

Hi @adelamare-blockchain ,

In the documentation, passing the chat template format to the tokenizer.apply_chat_template function returns a string format (<class 'str'>) that the model can interpret. You can use this same formatted string in the curl command to ensure the model understands the input correctly.

Thank you.

adelamare-blockchain

Sep 25

Thx for your answer @GopiUppari !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment