danielhanchen commited on
Commit
8015b41
1 Parent(s): 0be2bcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +326 -148
README.md CHANGED
@@ -1,199 +1,377 @@
1
  ---
 
 
 
2
  library_name: transformers
3
- tags: []
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
 
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
 
 
 
69
 
70
- ## How to Get Started with the Model
 
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
130
 
131
- #### Summary
132
 
 
 
 
 
133
 
 
 
 
134
 
135
- ## Model Examination [optional]
 
 
 
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
 
 
 
 
 
138
 
139
- [More Information Needed]
 
 
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
156
 
157
- [More Information Needed]
158
 
159
- ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
166
 
167
- #### Software
168
 
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
 
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
174
 
175
- **BibTeX:**
 
176
 
177
- [More Information Needed]
 
 
 
 
 
 
 
 
178
 
179
- **APA:**
 
 
 
 
 
 
 
 
 
 
180
 
181
- [More Information Needed]
 
182
 
183
- ## Glossary [optional]
 
 
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
186
 
187
- [More Information Needed]
 
 
188
 
189
- ## More Information [optional]
 
 
 
 
190
 
191
- [More Information Needed]
 
 
 
 
 
 
 
 
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
196
 
197
- ## Model Card Contact
 
 
 
 
 
 
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: meta-llama/Meta-Llama-3.1-8B
3
+ language:
4
+ - en
5
  library_name: transformers
6
+ license: cc-by-nc-4.0
7
+ tags:
8
+ - cohere
9
+ - unsloth
10
+ - transformers
11
  ---
12
 
13
+ # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth!
14
 
15
+ We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/unsloth)
18
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
19
 
20
+ ## ✨ Finetune for Free
21
 
22
+ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
23
 
24
+ | Unsloth supports | Free Notebooks | Performance | Memory use |
25
+ |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
26
+ | **Llama-3.1 8b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing) | 2.4x faster | 58% less |
27
+ | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lN6hPQveB_mHSnTOYifygFcrO8C1bxq4?usp=sharing) | 2x faster | 50% less |
28
+ | **Gemma-2 9b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing) | 2.4x faster | 58% less |
29
+ | **Mistral 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | 2.2x faster | 62% less |
30
+ | **TinyLlama** | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing) | 3.9x faster | 74% less |
31
+ | **DPO - Zephyr** | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) | 1.9x faster | 19% less |
32
 
33
+ - This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
34
+ - This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
35
+ - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
36
 
37
+ # Model Card for C4AI Command R 08-2024
38
 
39
+ ## Model Summary
40
+ <!-- Provide a quick summary of what the model is/does. -->
41
+ C4AI Command R 08-2024 is a research release of a 35 billion parameter highly performant generative model. Command R 08-2024 is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command R 08-2024 has the capability for multilingual generation, trained on 23 languages and evaluated in 10 languages and highly performant RAG capabilities.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ Developed by: Cohere and [Cohere For AI](https://cohere.for.ai)
44
 
45
+ - Point of Contact: Cohere For AI: [cohere.for.ai](https://cohere.for.ai/)
46
+ - License: [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy)
47
+ - Model: c4ai-command-r-08-2024
48
+ - Model Size: 35 billion parameters
49
+ - Context length: 128K
50
 
51
+ **Try C4AI Command R**
52
 
53
+ If you want to try Command R before downloading the weights, the model is hosted in a hugging face space [here](https://huggingface.co/spaces/CohereForAI/c4ai-command?model=command-r-08-2024).
54
 
 
55
 
56
+ **Usage**
57
 
58
+ Please use `transformers` version 4.39.1 or higher
59
+ ```python
60
+ # pip install 'transformers>=4.39.1'
61
+ from transformers import AutoTokenizer, AutoModelForCausalLM
62
 
63
+ model_id = "CohereForAI/c4ai-command-r-08-2024"
64
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
65
+ model = AutoModelForCausalLM.from_pretrained(model_id)
66
 
67
+ # Format message with the command-r-08-2024 chat template
68
+ messages = [{"role": "user", "content": "Hello, how are you?"}]
69
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
70
+ ## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
71
 
72
+ gen_tokens = model.generate(
73
+ input_ids,
74
+ max_new_tokens=100,
75
+ do_sample=True,
76
+ temperature=0.3,
77
+ )
78
 
79
+ gen_text = tokenizer.decode(gen_tokens[0])
80
+ print(gen_text)
81
+ ```
82
 
83
+ ## Model Details
84
 
85
+ **Input**: Models input text only.
86
 
87
+ **Output**: Models generate text only.
88
 
89
+ **Model Architecture**: This is an auto-regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. We use grouped query attention (GQA) to improve inference speed.
 
 
 
 
90
 
91
+ **Languages covered**: The model has been trained on 23 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Simplified Chinese, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian) and evaluated on 10 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Simplified Chinese).
92
 
93
+ **Context length**: Command R 08-2024 supports a context length of 128K.
94
 
95
+ ### Grounded Generation and RAG Capabilities:
96
 
97
+ Command R 08-2024 has been specifically trained with grounded generation capabilities. This means that it can generate responses based on a list of supplied document snippets, and it will include grounding spans (citations) in its response indicating the source of the information. This can be used to enable behaviors such as grounded summarization and the final step of Retrieval Augmented Generation (RAG).This behavior has been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will reduce performance. This is why we recommend using the prompt template described below.
98
 
99
+ Command R 08-2024’s grounded generation behavior takes a conversation as input (with an optional user-supplied system preamble, indicating task, context and desired output style), along with a list of retrieved document snippets. The document snippets should be chunks, rather than long documents, typically around 100-400 words per chunk. Document snippets consist of key-value pairs. The keys should be short descriptive strings, the values can be text or semi-structured.
100
 
101
+ By default, Command R 08-2024 will generate grounded responses by first predicting which documents are relevant, then predicting which ones it will cite, then generating an answer. Finally, it will then insert grounding spans into the answer. See below for an example. This is referred to as `accurate` grounded generation.
102
 
103
+ The model is trained with a number of other answering modes, which can be selected by prompt changes. A `fast` citation mode is supported in the tokenizer, which will directly generate an answer with grounding spans in it, without first writing the answer out in full. This sacrifices some grounding accuracy in favor of generating fewer tokens.
104
 
105
+ Comprehensive documentation for working with Command R 08-2024's grounded generation prompt template can be found [here](https://docs.cohere.com/docs/prompting-command-r#augmented-generation-prompt-template-rag-and-summarization), [here](https://docs.cohere.com/docs/prompting-command-r#augmented-generation-rag-with-command-rr) and [here](https://docs.cohere.com/docs/prompting-command-r#augmented-generation-summarization-with-command-rr).
106
 
107
+ You can render the Grounded Generation prompt template by using the function `apply_grounded_generation_template()`. The code snippet below shows a minimal working example on how to render this prompt.
108
 
109
+ <details>
110
+ <summary> <b>Usage: Rendering Grounded Generation prompts [CLICK TO EXPAND]</b> </summary>
111
 
112
+ ````python
113
+ from transformers import AutoTokenizer
114
 
115
+ model_id = "CohereForAI/c4ai-command-r-08-2024"
116
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
117
 
118
+ # define conversation input:
119
+ conversation = [
120
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
121
+ ]
122
+ # define documents to ground on:
123
+ documents = [
124
+ { "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." },
125
+ { "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
126
+ ]
127
 
128
+ # render the tool use prompt as a string:
129
+ grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
130
+ conversation,
131
+ documents=documents,
132
+ citation_mode="accurate", # or "fast"
133
+ tokenize=False,
134
+ add_generation_prompt=True,
135
+ )
136
+ print(grounded_generation_prompt)
137
+ ````
138
+ </details>
139
 
140
+ <details>
141
+ <summary><b>Example Rendered Grounded Generation Prompt [CLICK TO EXPAND]</b></summary>
142
 
143
+ ````
144
+ <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
145
+ The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
146
 
147
+ # System Preamble
148
+ ## Basic Rules
149
+ You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
150
 
151
+ # User Preamble
152
+ ## Task and Context
153
+ You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
154
 
155
+ ## Style Guide
156
+ Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><results>
157
+ Document: 0
158
+ title: Tall penguins
159
+ text: Emperor penguins are the tallest growing up to 122 cm in height.
160
 
161
+ Document: 1
162
+ title: Penguin habitats
163
+ text: Emperor penguins only live in Antarctica.
164
+ </results><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Carefully perform the following instructions, in order, starting each with a new line.
165
+ Firstly, Decide which of the retrieved documents are relevant to the user's last input by writing 'Relevant Documents:' followed by comma-separated list of document numbers. If none are relevant, you should instead write 'None'.
166
+ Secondly, Decide which of the retrieved documents contain facts that should be cited in a good answer to the user's last input by writing 'Cited Documents:' followed a comma-separated list of document numbers. If you dont want to cite any of them, you should instead write 'None'.
167
+ Thirdly, Write 'Answer:' followed by a response to the user's last input in high quality natural english. Use the retrieved documents to help you. Do not insert any citations or grounding markup.
168
+ Finally, Write 'Grounded answer:' followed by a response to the user's last input in high quality natural english. Use the symbols <co: doc> and </co: doc> to indicate when a fact comes from a document in the search result, e.g <co: 0>my fact</co: 0> for a fact from document 0.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
169
+ ````
170
 
171
+ </details>
172
 
173
+ <details>
174
+ <summary><b>Example Rendered Grounded Generation Completion [CLICK TO EXPAND]</b></summary>
175
 
176
+ ````
177
+ Relevant Documents: 0,1
178
+ Cited Documents: 0,1
179
+ Answer: The Emperor Penguin is the tallest or biggest penguin in the world. It is a bird that lives only in Antarctica and grows to a height of around 122 centimetres.
180
+ Grounded answer: The <co: 0>Emperor Penguin</co: 0> is the <co: 0>tallest</co: 0> or biggest penguin in the world. It is a bird that <co: 1>lives only in Antarctica</co: 1> and <co: 0>grows to a height of around 122 centimetres.</co: 0>
181
+ ````
182
+ </details>
183
 
184
+ ### Single-Step Tool Use Capabilities ("Function Calling"):
185
+ Single-step tool use (or “Function Calling”) allows Command R 08-2024 to interact with external tools like APIs, databases, or search engines. Single-step tool use is made of two model inferences:
186
+ - Tool Selection: The model decides which tools to call and with what parameters. It’s then up to the developer to execute these tool calls and obtain tool results.
187
+ - Response Generation: The model generates the final response given the tool results.
188
+ You can learn more about single step tool use in our [documentation](https://docs.cohere.com/docs/tool-use).
189
+
190
+ Command R 08-2024 has been specifically trained with single-step tool use (or “Function Calling”) capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance. This is why we recommend using the prompt template described below.
191
+
192
+ Command R 08-2024’s single-step tool use functionality takes a conversation as input (with an optional user-system preamble), along with a list of available tools. The model will then generate a json-formatted list of actions to execute on a subset of those tools. Command R 08-2024 may use one of its supplied tools more than once.
193
+
194
+ The model has been trained to recognise a special `directly_answer` tool, which it uses to indicate that it doesn’t want to use any of its other tools. The ability to abstain from calling a specific tool can be useful in a range of situations, such as greeting a user, or asking clarifying questions. We recommend including the `directly_answer` tool, but it can be removed or renamed if required.
195
+
196
+ Comprehensive documentation for working with Command R 08-2024's single-step tool use prompt template can be found [here](https://docs.cohere.com/docs/prompting-command-r#single-step-tool-use-with-command-rr-function-calling) and [here](https://docs.cohere.com/docs/prompting-command-r#single-step-tool-use-with-command-rr-function-calling-1).
197
+
198
+ You can render the single-step tool use prompt template by using the function `apply_tool_use_template()`. The code snippet below shows a minimal working example on how to render this prompt.
199
+
200
+ Command R 08-2024 also supports Hugging Face's [tool use API](https://huggingface.co/docs/transformers/main/en/chat_templating#advanced-tool-use--function-calling) to render the same prompt.
201
+
202
+
203
+ <details>
204
+ <summary><b>Usage: Rendering Single-Step Tool Use Prompts [CLICK TO EXPAND]</b> </summary>
205
+
206
+ ```python
207
+ from transformers import AutoTokenizer
208
+
209
+ model_id = "CohereForAI/c4ai-command-r-08-2024"
210
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
211
+
212
+ # define conversation input:
213
+ conversation = [
214
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
215
+ ]
216
+ # Define tools available for the model to use:
217
+ tools = [
218
+ {
219
+ "name": "internet_search",
220
+ "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
221
+ "parameter_definitions": {
222
+ "query": {
223
+ "description": "Query to search the internet with",
224
+ "type": 'str',
225
+ "required": True
226
+ }
227
+ }
228
+ },
229
+ {
230
+ 'name': "directly_answer",
231
+ "description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
232
+ 'parameter_definitions': {}
233
+ }
234
+ ]
235
+
236
+ # render the tool use prompt as a string:
237
+ tool_use_prompt = tokenizer.apply_tool_use_template(
238
+ conversation,
239
+ tools=tools,
240
+ tokenize=False,
241
+ add_generation_prompt=True,
242
+ )
243
+ print(tool_use_prompt)
244
+ ```
245
+
246
+ </details>
247
+
248
+
249
+ <details>
250
+ <summary><b>Usage: Rendering prompts with the Single-Step Tool Use API [CLICK TO EXPAND]</b> </summary>
251
+
252
+ ```python
253
+ from transformers import AutoTokenizer
254
+
255
+ model_id = "CohereForAI/c4ai-command-r-08-2024"
256
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
257
+
258
+ # define conversation input:
259
+ conversation = [
260
+ {"role": "user", "content": "Whats the biggest penguin in the world?"}
261
+ ]
262
+
263
+ # Define tools available for the model to use
264
+ # Type hints and docstrings from Python functions are automatically extracted
265
+ def internet_search(query: str):
266
+ """
267
+ Returns a list of relevant document snippets for a textual query retrieved from the internet
268
+
269
+ Args:
270
+ query: Query to search the internet with
271
+ """
272
+ pass
273
+
274
+ def directly_answer():
275
+ """
276
+ Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
277
+ """
278
+ pass
279
+
280
+ tools = [internet_search, directly_answer]
281
+
282
+ # render the tool use prompt as a string:
283
+ tool_use_prompt = tokenizer.apply_chat_template(
284
+ conversation,
285
+ tools=tools,
286
+ tokenize=False,
287
+ add_generation_prompt=True,
288
+ )
289
+ print(tool_use_prompt)
290
+ ```
291
+
292
+ </details>
293
+
294
+ <details>
295
+ <summary><b>Example Rendered Single-Step Tool Use Prompt [CLICK TO EXPAND]</b></summary>
296
+
297
+ ````
298
+ <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
299
+ The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
300
+
301
+ # System Preamble
302
+ ## Basic Rules
303
+ You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
304
+
305
+ # User Preamble
306
+ ## Task and Context
307
+ You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
308
+
309
+ ## Style Guide
310
+ Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
311
+
312
+ ## Available Tools
313
+ Here is a list of tools that you have available to you:
314
+
315
+ ```python
316
+ def internet_search(query: str) -> List[Dict]:
317
+ """Returns a list of relevant document snippets for a textual query retrieved from the internet
318
+
319
+ Args:
320
+ query (str): Query to search the internet with
321
+ """
322
+ pass
323
+ ```
324
+
325
+ ```python
326
+ def directly_answer() -> List[Dict]:
327
+ """Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
328
+ """
329
+ pass
330
+ ```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example:
331
+ ```json
332
+ [
333
+ {
334
+ "tool_name": title of the tool in the specification,
335
+ "parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
336
+ }
337
+ ]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
338
+ ````
339
+
340
+ </details>
341
+
342
+ <details>
343
+ <summary><b>Example Rendered Single-Step Tool Use Completion [CLICK TO EXPAND]</b></summary>
344
+
345
+ ````
346
+ Action: ```json
347
+ [
348
+ {
349
+ "tool_name": "internet_search",
350
+ "parameters": {
351
+ "query": "biggest penguin in the world"
352
+ }
353
+ }
354
+ ]
355
+ ```
356
+ ````
357
+ </details>
358
+
359
+ ### Multi-Step Tool Use Capabilities ("Agents"):
360
+ Multi-step tool use is suited for building agents that can plan and execute a sequence of actions using multiple tools. Unlike single-step tool use, the model can perform several inference cycles, iterating through Action → Observation → Reflection until it decides on a final response. For more details, refer to our [documentation on multi-step tool use](https://docs.cohere.com/docs/multi-step-tool-use).
361
+
362
+ Command R 08-2024 has been specifically trained with multi-step tool use (or “Agents”) capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance. This is why we recommend using the prompt template described below.
363
+
364
+ The prompt template is not yet available in HuggingFace. However, comprehensive documentation for working with Command R 08-2024's multi-step tool use prompt template can be found [here](https://docs.cohere.com/docs/prompting-command-r#multi-step-tool-use-with-command-rr-agents) and [here](https://docs.cohere.com/docs/prompting-command-r#multihop-tool-use-with-command-rr-agents).
365
+
366
+
367
+ ### Code Capabilities:
368
+ Command R 08-2024 has been optimized to interact with your code, by requesting code snippets, code explanations, or code rewrites. It might not perform well out-of-the-box for pure code completion. For better performance, we also recommend using a low temperature (and even greedy decoding) for code-generation related instructions.
369
+
370
+ ### Model Card Contact
371
+ For errors or additional questions about details in this model card, contact [[email protected]](mailto:[email protected]).
372
+
373
+ ### Terms of Use:
374
+ We hope that the release of this model will make community-based research efforts more accessible, by releasing the weights of a highly performant 35 billion parameter model to researchers all over the world. This model is governed by a [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license) License with an acceptable use addendum, and also requires adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
375
+
376
+ ### Try Chat:
377
+ You can try Command-R chat in the playground [here](https://dashboard.cohere.com/playground/chat).