RonanMcGovern
commited on
Commit
•
055776a
1
Parent(s):
cf55c8a
add runpod inference notes
Browse files
README.md
CHANGED
@@ -73,10 +73,10 @@ The dataset used for training this model can be found at [Trelis Function Callin
|
|
73 |
|
74 |
!!! Make sure to check the prompt format below and adjust inference accordingly !!!
|
75 |
|
76 |
-
|
77 |
Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
|
78 |
|
79 |
-
|
80 |
You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
|
81 |
|
82 |
Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
|
@@ -85,7 +85,23 @@ And here is a video showing it working with [llama-2-7b-chat-hf-function-calling
|
|
85 |
|
86 |
Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
|
87 |
|
88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
|
90 |
|
91 |
After running llama.cpp server, you can call the server with this command, with thanks to @jdo300:
|
|
|
73 |
|
74 |
!!! Make sure to check the prompt format below and adjust inference accordingly !!!
|
75 |
|
76 |
+
### Quick Start in Google Colab
|
77 |
Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
|
78 |
|
79 |
+
### Text Generation Inference
|
80 |
You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
|
81 |
|
82 |
Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
|
|
|
85 |
|
86 |
Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
|
87 |
|
88 |
+
#### Runpod Quickstart
|
89 |
+
For a quickstart with runpod, you can use this template: [here](https://runpod.io/gsc?template=edxvuji38p&ref=jmfkcdio)
|
90 |
+
|
91 |
+
Once up and running, you can make queries to:
|
92 |
+
```
|
93 |
+
https://{YOUR_POD_ID}-8080.proxy.runpod.net
|
94 |
+
```
|
95 |
+
Then, you can make queries to the api as follows:
|
96 |
+
```
|
97 |
+
curl https://{YOUR_POD_ID}-8080.proxy.runpod.net/generate \
|
98 |
+
-X POST \
|
99 |
+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
100 |
+
-H 'Content-Type: application/json'
|
101 |
+
```
|
102 |
+
Or use /generate_stream for streaming. You can also write python scripts and use python to make requests. More info from the text-generation-inference [github repo](https://github.com/huggingface/text-generation-inference/)
|
103 |
+
|
104 |
+
### Run on your laptop
|
105 |
Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
|
106 |
|
107 |
After running llama.cpp server, you can call the server with this command, with thanks to @jdo300:
|