RonanMcGovern commited on
Commit
055776a
1 Parent(s): cf55c8a

add runpod inference notes

Browse files
Files changed (1) hide show
  1. README.md +19 -3
README.md CHANGED
@@ -73,10 +73,10 @@ The dataset used for training this model can be found at [Trelis Function Callin
73
 
74
  !!! Make sure to check the prompt format below and adjust inference accordingly !!!
75
 
76
- **Quick Start in Google Colab**
77
  Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
78
 
79
- **Commercial Applications**
80
  You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
81
 
82
  Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
@@ -85,7 +85,23 @@ And here is a video showing it working with [llama-2-7b-chat-hf-function-calling
85
 
86
  Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
87
 
88
- **Run on your laptop**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
90
 
91
  After running llama.cpp server, you can call the server with this command, with thanks to @jdo300:
 
73
 
74
  !!! Make sure to check the prompt format below and adjust inference accordingly !!!
75
 
76
+ ### Quick Start in Google Colab
77
  Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
78
 
79
+ ### Text Generation Inference
80
  You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
81
 
82
  Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
 
85
 
86
  Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
87
 
88
+ #### Runpod Quickstart
89
+ For a quickstart with runpod, you can use this template: [here](https://runpod.io/gsc?template=edxvuji38p&ref=jmfkcdio)
90
+
91
+ Once up and running, you can make queries to:
92
+ ```
93
+ https://{YOUR_POD_ID}-8080.proxy.runpod.net
94
+ ```
95
+ Then, you can make queries to the api as follows:
96
+ ```
97
+ curl https://{YOUR_POD_ID}-8080.proxy.runpod.net/generate \
98
+ -X POST \
99
+ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
100
+ -H 'Content-Type: application/json'
101
+ ```
102
+ Or use /generate_stream for streaming. You can also write python scripts and use python to make requests. More info from the text-generation-inference [github repo](https://github.com/huggingface/text-generation-inference/)
103
+
104
+ ### Run on your laptop
105
  Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
106
 
107
  After running llama.cpp server, you can call the server with this command, with thanks to @jdo300: