Trelis
/

Llama-2-7b-chat-hf-function-calling-v2

Text Generation

function calling

text-generation-inference

Model card Files Files and versions Community

RonanMcGovern commited on Nov 24, 2023

Commit

055776a

•

1 Parent(s): cf55c8a

add runpod inference notes

Files changed (1) hide show

README.md +19 -3

README.md CHANGED Viewed

@@ -73,10 +73,10 @@ The dataset used for training this model can be found at [Trelis Function Callin
 !!! Make sure to check the prompt format below and adjust inference accordingly !!!
-**Quick Start in Google Colab**
 Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
-**Commercial Applications**
 You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
 Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
@@ -85,7 +85,23 @@ And here is a video showing it working with [llama-2-7b-chat-hf-function-calling
 Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
-**Run on your laptop**
 Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
 After running llama.cpp server, you can call the server with this command, with thanks to @jdo300:

 !!! Make sure to check the prompt format below and adjust inference accordingly !!!
+### Quick Start in Google Colab
 Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
+### Text Generation Inference
 You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
 Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
 Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
+#### Runpod Quickstart
+For a quickstart with runpod, you can use this template: [here](https://runpod.io/gsc?template=edxvuji38p&ref=jmfkcdio)
+Once up and running, you can make queries to:
+```
+https://{YOUR_POD_ID}-8080.proxy.runpod.net
+```
+Then, you can make queries to the api as follows:
+```
+curl https://{YOUR_POD_ID}-8080.proxy.runpod.net/generate \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+Or use /generate_stream for streaming. You can also write python scripts and use python to make requests. More info from the text-generation-inference [github repo](https://github.com/huggingface/text-generation-inference/)
+### Run on your laptop
 Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
 After running llama.cpp server, you can call the server with this command, with thanks to @jdo300: