readme + demo
Browse files- README.md +153 -3
- VisionAtomicFlow.py +116 -2
- VisionAtomicFlow.yaml +11 -7
- __init__.py +1 -1
- demo.yaml +20 -0
- run.py +91 -0
README.md
CHANGED
@@ -1,3 +1,153 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Table of Contents
|
2 |
+
|
3 |
+
* [VisionAtomicFlow](#VisionAtomicFlow)
|
4 |
+
* [VisionAtomicFlow](#VisionAtomicFlow.VisionAtomicFlow)
|
5 |
+
* [get\_image](#VisionAtomicFlow.VisionAtomicFlow.get_image)
|
6 |
+
* [get\_video](#VisionAtomicFlow.VisionAtomicFlow.get_video)
|
7 |
+
* [get\_user\_message](#VisionAtomicFlow.VisionAtomicFlow.get_user_message)
|
8 |
+
|
9 |
+
<a id="VisionAtomicFlow"></a>
|
10 |
+
|
11 |
+
# VisionAtomicFlow
|
12 |
+
|
13 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow"></a>
|
14 |
+
|
15 |
+
## VisionAtomicFlow Objects
|
16 |
+
|
17 |
+
```python
|
18 |
+
class VisionAtomicFlow(OpenAIChatAtomicFlow)
|
19 |
+
```
|
20 |
+
|
21 |
+
This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
|
22 |
+
It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
|
23 |
+
|
24 |
+
*Configuration Parameters*:
|
25 |
+
|
26 |
+
- `name` (str): The name of the flow. Default: "VisionAtomicFlow"
|
27 |
+
- `description` (str): A description of the flow. This description is used to generate the help message of the flow.
|
28 |
+
Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
29 |
+
- enable_cache (bool): If True, the flow will use the cache. Default: True
|
30 |
+
- `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
|
31 |
+
- `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
|
32 |
+
- `system_name` (str): The name of the system. Default: "system"
|
33 |
+
- `user_name` (str): The name of the user. Default: "user"
|
34 |
+
- `assistant_name` (str): The name of the assistant. Default: "assistant"
|
35 |
+
- `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
|
36 |
+
default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
|
37 |
+
whose default value is overwritten:
|
38 |
+
- `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
|
39 |
+
- `model_name` (Union[Dict[str,str],str]): The name of the model to use.
|
40 |
+
When using multiple API providers, the model_name can be a dictionary of the form
|
41 |
+
{"provider_name": "model_name"}.
|
42 |
+
Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
|
43 |
+
- `n` (int) : The number of answers to generate. Default: 1
|
44 |
+
- `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
|
45 |
+
- `temperature` (float): The temperature to use. Default: 0.3
|
46 |
+
- `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
|
47 |
+
the tokens with top_p probability. Default: 0.2
|
48 |
+
- `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
|
49 |
+
- `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
|
50 |
+
- `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
|
51 |
+
By default its of type flows.prompt_template.JinjaPrompt.
|
52 |
+
None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
|
53 |
+
Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
54 |
+
- `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
|
55 |
+
(first time in). It is used to generate the human message. It's passed as the user message to the LLM.
|
56 |
+
By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
|
57 |
+
wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
58 |
+
- `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
|
59 |
+
all the messages of the flows's history are added to the input of the LLM. Default:
|
60 |
+
- `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
|
61 |
+
- `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
|
62 |
+
- Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
|
63 |
+
|
64 |
+
*Input Interface Initialized (Expected input the first time in flow)*:
|
65 |
+
|
66 |
+
- `query` (str): The textual query to run the model on.
|
67 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
68 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
69 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
70 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
71 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
72 |
+
- `video_path` (str): The path to the video.
|
73 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
74 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
75 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
76 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
77 |
+
|
78 |
+
*Input Interface (Expected input the after the first time in flow)*:
|
79 |
+
|
80 |
+
- `query` (str): The textual query to run the model on.
|
81 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
82 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
83 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
84 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
85 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
86 |
+
- `video_path` (str): The path to the video.
|
87 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
88 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
89 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
90 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
91 |
+
|
92 |
+
*Output Interface*:
|
93 |
+
|
94 |
+
- `api_output`s (str): The api output of the flow to the query and data
|
95 |
+
|
96 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_image"></a>
|
97 |
+
|
98 |
+
#### get\_image
|
99 |
+
|
100 |
+
```python
|
101 |
+
@staticmethod
|
102 |
+
def get_image(image)
|
103 |
+
```
|
104 |
+
|
105 |
+
This method returns an image in the appropriate format for API.
|
106 |
+
|
107 |
+
**Arguments**:
|
108 |
+
|
109 |
+
- `image` (`Dict[str, Any]`): The image dictionary.
|
110 |
+
|
111 |
+
**Returns**:
|
112 |
+
|
113 |
+
`Dict[str, Any]`: The image url.
|
114 |
+
|
115 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_video"></a>
|
116 |
+
|
117 |
+
#### get\_video
|
118 |
+
|
119 |
+
```python
|
120 |
+
@staticmethod
|
121 |
+
def get_video(video)
|
122 |
+
```
|
123 |
+
|
124 |
+
This method returns the video in the appropriate format for API.
|
125 |
+
|
126 |
+
**Arguments**:
|
127 |
+
|
128 |
+
- `video` (`Dict[str, Any]`): The video dictionary.
|
129 |
+
|
130 |
+
**Returns**:
|
131 |
+
|
132 |
+
`Dict[str, Any]`: The video url.
|
133 |
+
|
134 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_user_message"></a>
|
135 |
+
|
136 |
+
#### get\_user\_message
|
137 |
+
|
138 |
+
```python
|
139 |
+
@staticmethod
|
140 |
+
def get_user_message(prompt_template, input_data: Dict[str, Any])
|
141 |
+
```
|
142 |
+
|
143 |
+
This method constructs the user message to be passed to the API.
|
144 |
+
|
145 |
+
**Arguments**:
|
146 |
+
|
147 |
+
- `prompt_template` (`PromptTemplate`): The prompt template to use.
|
148 |
+
- `input_data` (`Dict[str, Any]`): The input data.
|
149 |
+
|
150 |
+
**Returns**:
|
151 |
+
|
152 |
+
`Dict[str, Any]`: The constructed user message (images , videos and text).
|
153 |
+
|
VisionAtomicFlow.py
CHANGED
@@ -1,14 +1,96 @@
|
|
1 |
|
2 |
from typing import Dict, Any
|
3 |
-
from flow_modules.aiflows.
|
4 |
from flows.utils.general_helpers import encode_image,encode_from_buffer
|
5 |
import cv2
|
6 |
|
7 |
|
8 |
-
class VisionAtomicFlow(
|
|
|
|
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
@staticmethod
|
11 |
def get_image(image):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
extension_dict = {
|
13 |
"jpg": "jpeg",
|
14 |
"jpeg": "jpeg",
|
@@ -34,6 +116,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
34 |
|
35 |
@staticmethod
|
36 |
def get_video(video):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
video_path = video["video_path"]
|
38 |
resize = video.get("resize",768)
|
39 |
frame_step_size = video.get("frame_step_size",10)
|
@@ -52,6 +141,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
52 |
|
53 |
@staticmethod
|
54 |
def get_user_message(prompt_template, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
|
56 |
media_data = input_data["data"]
|
57 |
if "video" in media_data:
|
@@ -63,6 +161,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
63 |
|
64 |
@staticmethod
|
65 |
def _get_message(prompt_template, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
template_kwargs = {}
|
67 |
for input_variable in prompt_template.input_variables:
|
68 |
template_kwargs[input_variable] = input_data[input_variable]
|
@@ -70,6 +177,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
70 |
return [{"type": "text", "text": msg_content}]
|
71 |
|
72 |
def _process_input(self, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
if self._is_conversation_initialized():
|
74 |
# Construct the message using the human message prompt template
|
75 |
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
|
|
|
1 |
|
2 |
from typing import Dict, Any
|
3 |
+
from flow_modules.aiflows.ChatFlowModule import ChatAtomicFlow
|
4 |
from flows.utils.general_helpers import encode_image,encode_from_buffer
|
5 |
import cv2
|
6 |
|
7 |
|
8 |
+
class VisionAtomicFlow(ChatAtomicFlow):
|
9 |
+
""" This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
|
10 |
+
It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
|
11 |
|
12 |
+
*Configuration Parameters*:
|
13 |
+
|
14 |
+
- `name` (str): The name of the flow. Default: "VisionAtomicFlow"
|
15 |
+
- `description` (str): A description of the flow. This description is used to generate the help message of the flow.
|
16 |
+
Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
17 |
+
- enable_cache (bool): If True, the flow will use the cache. Default: True
|
18 |
+
- `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
|
19 |
+
- `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
|
20 |
+
- `system_name` (str): The name of the system. Default: "system"
|
21 |
+
- `user_name` (str): The name of the user. Default: "user"
|
22 |
+
- `assistant_name` (str): The name of the assistant. Default: "assistant"
|
23 |
+
- `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
|
24 |
+
default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
|
25 |
+
whose default value is overwritten:
|
26 |
+
- `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
|
27 |
+
- `model_name` (Union[Dict[str,str],str]): The name of the model to use.
|
28 |
+
When using multiple API providers, the model_name can be a dictionary of the form
|
29 |
+
{"provider_name": "model_name"}.
|
30 |
+
Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
|
31 |
+
- `n` (int) : The number of answers to generate. Default: 1
|
32 |
+
- `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
|
33 |
+
- `temperature` (float): The temperature to use. Default: 0.3
|
34 |
+
- `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
|
35 |
+
the tokens with top_p probability. Default: 0.2
|
36 |
+
- `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
|
37 |
+
- `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
|
38 |
+
- `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
|
39 |
+
By default its of type flows.prompt_template.JinjaPrompt.
|
40 |
+
None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
|
41 |
+
Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
42 |
+
- `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
|
43 |
+
(first time in). It is used to generate the human message. It's passed as the user message to the LLM.
|
44 |
+
By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
|
45 |
+
wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
46 |
+
- `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
|
47 |
+
all the messages of the flows's history are added to the input of the LLM. Default:
|
48 |
+
- `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
|
49 |
+
- `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
|
50 |
+
- Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
|
51 |
+
|
52 |
+
*Input Interface Initialized (Expected input the first time in flow)*:
|
53 |
+
|
54 |
+
- `query` (str): The textual query to run the model on.
|
55 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
56 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
57 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
58 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
59 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
60 |
+
- `video_path` (str): The path to the video.
|
61 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
62 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
63 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
64 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
65 |
+
|
66 |
+
*Input Interface (Expected input the after the first time in flow)*:
|
67 |
+
|
68 |
+
- `query` (str): The textual query to run the model on.
|
69 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
70 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
71 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
72 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
73 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
74 |
+
- `video_path` (str): The path to the video.
|
75 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
76 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
77 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
78 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
79 |
+
|
80 |
+
*Output Interface*:
|
81 |
+
|
82 |
+
- `api_output`s (str): The api output of the flow to the query and data
|
83 |
+
|
84 |
+
"""
|
85 |
@staticmethod
|
86 |
def get_image(image):
|
87 |
+
""" This method returns an image in the appropriate format for API.
|
88 |
+
|
89 |
+
:param image: The image dictionary.
|
90 |
+
:type image: Dict[str, Any]
|
91 |
+
:return: The image url.
|
92 |
+
:rtype: Dict[str, Any]
|
93 |
+
"""
|
94 |
extension_dict = {
|
95 |
"jpg": "jpeg",
|
96 |
"jpeg": "jpeg",
|
|
|
116 |
|
117 |
@staticmethod
|
118 |
def get_video(video):
|
119 |
+
""" This method returns the video in the appropriate format for API.
|
120 |
+
|
121 |
+
:param video: The video dictionary.
|
122 |
+
:type video: Dict[str, Any]
|
123 |
+
:return: The video url.
|
124 |
+
:rtype: Dict[str, Any]
|
125 |
+
"""
|
126 |
video_path = video["video_path"]
|
127 |
resize = video.get("resize",768)
|
128 |
frame_step_size = video.get("frame_step_size",10)
|
|
|
141 |
|
142 |
@staticmethod
|
143 |
def get_user_message(prompt_template, input_data: Dict[str, Any]):
|
144 |
+
""" This method constructs the user message to be passed to the API.
|
145 |
+
|
146 |
+
:param prompt_template: The prompt template to use.
|
147 |
+
:type prompt_template: PromptTemplate
|
148 |
+
:param input_data: The input data.
|
149 |
+
:type input_data: Dict[str, Any]
|
150 |
+
:return: The constructed user message (images , videos and text).
|
151 |
+
:rtype: Dict[str, Any]
|
152 |
+
"""
|
153 |
content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
|
154 |
media_data = input_data["data"]
|
155 |
if "video" in media_data:
|
|
|
161 |
|
162 |
@staticmethod
|
163 |
def _get_message(prompt_template, input_data: Dict[str, Any]):
|
164 |
+
""" This method constructs the textual message to be passed to the API.
|
165 |
+
|
166 |
+
:param prompt_template: The prompt template to use.
|
167 |
+
:type prompt_template: PromptTemplate
|
168 |
+
:param input_data: The input data.
|
169 |
+
:type input_data: Dict[str, Any]
|
170 |
+
:return: The constructed textual message.
|
171 |
+
:rtype: Dict[str, Any]
|
172 |
+
"""
|
173 |
template_kwargs = {}
|
174 |
for input_variable in prompt_template.input_variables:
|
175 |
template_kwargs[input_variable] = input_data[input_variable]
|
|
|
177 |
return [{"type": "text", "text": msg_content}]
|
178 |
|
179 |
def _process_input(self, input_data: Dict[str, Any]):
|
180 |
+
""" This method processes the input data (prepares the messages to send to the API).
|
181 |
+
|
182 |
+
:param input_data: The input data.
|
183 |
+
:type input_data: Dict[str, Any]
|
184 |
+
:return: The processed input data.
|
185 |
+
:rtype: Dict[str, Any]
|
186 |
+
"""
|
187 |
if self._is_conversation_initialized():
|
188 |
# Construct the message using the human message prompt template
|
189 |
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
|
VisionAtomicFlow.yaml
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
-
|
|
|
|
|
2 |
enable_cache: True
|
3 |
|
4 |
n_api_retries: 6
|
@@ -30,20 +32,22 @@ human_message_prompt_template:
|
|
30 |
template: "{{query}}"
|
31 |
input_variables:
|
32 |
- "query"
|
|
|
33 |
input_interface_initialized:
|
34 |
- "query"
|
35 |
- "data"
|
36 |
|
37 |
-
query_message_prompt_template:
|
38 |
-
_target_: flows.prompt_template.JinjaPrompt
|
39 |
-
|
40 |
-
|
41 |
previous_messages:
|
42 |
first_k: null # Note that the first message is the system prompt
|
43 |
last_k: null
|
44 |
|
45 |
-
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
output_interface:
|
49 |
- "api_output"
|
|
|
1 |
+
name: "VisionAtomicFlow"
|
2 |
+
description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
3 |
+
|
4 |
enable_cache: True
|
5 |
|
6 |
n_api_retries: 6
|
|
|
32 |
template: "{{query}}"
|
33 |
input_variables:
|
34 |
- "query"
|
35 |
+
|
36 |
input_interface_initialized:
|
37 |
- "query"
|
38 |
- "data"
|
39 |
|
|
|
|
|
|
|
|
|
40 |
previous_messages:
|
41 |
first_k: null # Note that the first message is the system prompt
|
42 |
last_k: null
|
43 |
|
44 |
+
input_interface:
|
45 |
+
- "query"
|
46 |
+
- "data"
|
47 |
+
|
48 |
+
input_interface_non_initialized:
|
49 |
+
- "question"
|
50 |
+
- "data"
|
51 |
|
52 |
output_interface:
|
53 |
- "api_output"
|
__init__.py
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
# ~~~ Specify the dependencies ~~
|
2 |
dependencies = [
|
3 |
-
{"url": "aiflows/
|
4 |
]
|
5 |
from flows import flow_verse
|
6 |
flow_verse.sync_dependencies(dependencies)
|
|
|
1 |
# ~~~ Specify the dependencies ~~
|
2 |
dependencies = [
|
3 |
+
{"url": "aiflows/ChatFlowModule", "revision": "a749ad10ed39776ba6721c37d0dc22af49ca0f17"}
|
4 |
]
|
5 |
from flows import flow_verse
|
6 |
flow_verse.sync_dependencies(dependencies)
|
demo.yaml
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
flow:
|
2 |
+
_target_: aiflows.VisionFlowModule.VisionAtomicFlow.instantiate_from_default_config
|
3 |
+
name: "Demo Vision Flow"
|
4 |
+
description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
5 |
+
backend:
|
6 |
+
api_infos: ???
|
7 |
+
|
8 |
+
system_message_prompt_template:
|
9 |
+
template: |2-
|
10 |
+
You are a helpful chatbot that truthfully answers questions.
|
11 |
+
input_variables: []
|
12 |
+
partial_variables: {}
|
13 |
+
|
14 |
+
init_human_message_prompt_template:
|
15 |
+
template: |2-
|
16 |
+
{{query}}
|
17 |
+
input_variables: ["query"]
|
18 |
+
partial_variables: {}
|
19 |
+
|
20 |
+
|
run.py
ADDED
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
|
3 |
+
import hydra
|
4 |
+
|
5 |
+
from flows.flow_launchers import FlowLauncher
|
6 |
+
from flows.backends.api_info import ApiInfo
|
7 |
+
from flows.utils.general_helpers import read_yaml_file
|
8 |
+
|
9 |
+
from flows import logging
|
10 |
+
from flows.flow_cache import CACHING_PARAMETERS, clear_cache
|
11 |
+
|
12 |
+
CACHING_PARAMETERS.do_caching = False # Set to True in order to disable caching
|
13 |
+
# clear_cache() # Uncomment this line to clear the cache
|
14 |
+
|
15 |
+
logging.set_verbosity_debug() # Uncomment this line to see verbose logs
|
16 |
+
|
17 |
+
from flows import flow_verse
|
18 |
+
|
19 |
+
dependencies = [
|
20 |
+
{"url": "aiflows/VisionFlowModule", "revision": os.getcwd()},
|
21 |
+
]
|
22 |
+
flow_verse.sync_dependencies(dependencies)
|
23 |
+
|
24 |
+
if __name__ == "__main__":
|
25 |
+
# ~~~ Set the API information ~~~
|
26 |
+
# OpenAI backend
|
27 |
+
|
28 |
+
api_information = [ApiInfo(backend_used="openai",
|
29 |
+
api_key = os.getenv("OPENAI_API_KEY"))]
|
30 |
+
|
31 |
+
|
32 |
+
# # Azure backend
|
33 |
+
# api_information = ApiInfo(backend_used = "azure",
|
34 |
+
# api_base = os.getenv("AZURE_API_BASE"),
|
35 |
+
# api_key = os.getenv("AZURE_OPENAI_KEY"),
|
36 |
+
# api_version = os.getenv("AZURE_API_VERSION") )
|
37 |
+
|
38 |
+
root_dir = "."
|
39 |
+
cfg_path = os.path.join(root_dir, "demo.yaml")
|
40 |
+
cfg = read_yaml_file(cfg_path)
|
41 |
+
|
42 |
+
cfg["flow"]["backend"]["api_infos"] = api_information
|
43 |
+
|
44 |
+
# ~~~ Instantiate the Flow ~~~
|
45 |
+
flow_with_interfaces = {
|
46 |
+
"flow": hydra.utils.instantiate(cfg['flow'], _recursive_=False, _convert_="partial"),
|
47 |
+
"input_interface": (
|
48 |
+
None
|
49 |
+
if cfg.get( "input_interface", None) is None
|
50 |
+
else hydra.utils.instantiate(cfg['input_interface'], _recursive_=False)
|
51 |
+
),
|
52 |
+
"output_interface": (
|
53 |
+
None
|
54 |
+
if cfg.get( "output_interface", None) is None
|
55 |
+
else hydra.utils.instantiate(cfg['output_interface'], _recursive_=False)
|
56 |
+
),
|
57 |
+
}
|
58 |
+
url_image = {"type": "url",
|
59 |
+
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}
|
60 |
+
|
61 |
+
local_image = {"type": "local_path", "image": "PATH TO YOUR LOCAL IMAGE"}
|
62 |
+
|
63 |
+
video = {"video_path": "PATH TO YOUR LOCAL VIDEO", "resize": 768, "frame_step_size": 30, "start_frame": 0, "end_frame": None }
|
64 |
+
|
65 |
+
# ~~~ Get the data ~~~
|
66 |
+
|
67 |
+
## FOR SINGLE IMAGE
|
68 |
+
data = {"id": 0, "query": "What’s in this image?", "data": {"images": [url_image]}} # This can be a list of samples
|
69 |
+
|
70 |
+
## FOR MULTIPLE IMAGES
|
71 |
+
# data = {"id": 0, "question": "What are in these images? Is there any difference between them?", "data": {"images": [url_image,local_image]}} # This can be a list of samples
|
72 |
+
|
73 |
+
## FOR VIDEO
|
74 |
+
# data = {"id": 0,
|
75 |
+
# "question": "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
|
76 |
+
# "data": {"video": video}} # This can be a list of samples
|
77 |
+
|
78 |
+
|
79 |
+
# ~~~ Run inference ~~~
|
80 |
+
path_to_output_file = None
|
81 |
+
# path_to_output_file = "output.jsonl" # Uncomment this line to save the output to disk
|
82 |
+
|
83 |
+
_, outputs = FlowLauncher.launch(
|
84 |
+
flow_with_interfaces=flow_with_interfaces,
|
85 |
+
data=data,
|
86 |
+
path_to_output_file=path_to_output_file
|
87 |
+
)
|
88 |
+
|
89 |
+
# ~~~ Print the output ~~~
|
90 |
+
flow_output_data = outputs[0]
|
91 |
+
print(flow_output_data)
|