nbaldwin commited on
Commit
ae51174
1 Parent(s): c296fdd

readme + demo

Browse files
Files changed (6) hide show
  1. README.md +153 -3
  2. VisionAtomicFlow.py +116 -2
  3. VisionAtomicFlow.yaml +11 -7
  4. __init__.py +1 -1
  5. demo.yaml +20 -0
  6. run.py +91 -0
README.md CHANGED
@@ -1,3 +1,153 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Table of Contents
2
+
3
+ * [VisionAtomicFlow](#VisionAtomicFlow)
4
+ * [VisionAtomicFlow](#VisionAtomicFlow.VisionAtomicFlow)
5
+ * [get\_image](#VisionAtomicFlow.VisionAtomicFlow.get_image)
6
+ * [get\_video](#VisionAtomicFlow.VisionAtomicFlow.get_video)
7
+ * [get\_user\_message](#VisionAtomicFlow.VisionAtomicFlow.get_user_message)
8
+
9
+ <a id="VisionAtomicFlow"></a>
10
+
11
+ # VisionAtomicFlow
12
+
13
+ <a id="VisionAtomicFlow.VisionAtomicFlow"></a>
14
+
15
+ ## VisionAtomicFlow Objects
16
+
17
+ ```python
18
+ class VisionAtomicFlow(OpenAIChatAtomicFlow)
19
+ ```
20
+
21
+ This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
22
+ It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
23
+
24
+ *Configuration Parameters*:
25
+
26
+ - `name` (str): The name of the flow. Default: "VisionAtomicFlow"
27
+ - `description` (str): A description of the flow. This description is used to generate the help message of the flow.
28
+ Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
29
+ - enable_cache (bool): If True, the flow will use the cache. Default: True
30
+ - `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
31
+ - `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
32
+ - `system_name` (str): The name of the system. Default: "system"
33
+ - `user_name` (str): The name of the user. Default: "user"
34
+ - `assistant_name` (str): The name of the assistant. Default: "assistant"
35
+ - `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
36
+ default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
37
+ whose default value is overwritten:
38
+ - `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
39
+ - `model_name` (Union[Dict[str,str],str]): The name of the model to use.
40
+ When using multiple API providers, the model_name can be a dictionary of the form
41
+ {"provider_name": "model_name"}.
42
+ Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
43
+ - `n` (int) : The number of answers to generate. Default: 1
44
+ - `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
45
+ - `temperature` (float): The temperature to use. Default: 0.3
46
+ - `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
47
+ the tokens with top_p probability. Default: 0.2
48
+ - `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
49
+ - `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
50
+ - `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
51
+ By default its of type flows.prompt_template.JinjaPrompt.
52
+ None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
53
+ Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
54
+ - `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
55
+ (first time in). It is used to generate the human message. It's passed as the user message to the LLM.
56
+ By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
57
+ wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
58
+ - `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
59
+ all the messages of the flows's history are added to the input of the LLM. Default:
60
+ - `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
61
+ - `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
62
+ - Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
63
+
64
+ *Input Interface Initialized (Expected input the first time in flow)*:
65
+
66
+ - `query` (str): The textual query to run the model on.
67
+ - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
68
+ - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
69
+ - `type` (str): The type of the image. It can be "local_path" or "url".
70
+ - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
71
+ - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
72
+ - `video_path` (str): The path to the video.
73
+ - `resize` (int): The resize we want to apply on the frames of the video.
74
+ - `frame_step_size` (int): The step size between the frames of the video (to send to the model).
75
+ - `start_frame` (int): The start frame of the video (to send to the model).
76
+ - `end_frame` (int): The last frame of the video (to send to the model).
77
+
78
+ *Input Interface (Expected input the after the first time in flow)*:
79
+
80
+ - `query` (str): The textual query to run the model on.
81
+ - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
82
+ - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
83
+ - `type` (str): The type of the image. It can be "local_path" or "url".
84
+ - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
85
+ - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
86
+ - `video_path` (str): The path to the video.
87
+ - `resize` (int): The resize we want to apply on the frames of the video.
88
+ - `frame_step_size` (int): The step size between the frames of the video (to send to the model).
89
+ - `start_frame` (int): The start frame of the video (to send to the model).
90
+ - `end_frame` (int): The last frame of the video (to send to the model).
91
+
92
+ *Output Interface*:
93
+
94
+ - `api_output`s (str): The api output of the flow to the query and data
95
+
96
+ <a id="VisionAtomicFlow.VisionAtomicFlow.get_image"></a>
97
+
98
+ #### get\_image
99
+
100
+ ```python
101
+ @staticmethod
102
+ def get_image(image)
103
+ ```
104
+
105
+ This method returns an image in the appropriate format for API.
106
+
107
+ **Arguments**:
108
+
109
+ - `image` (`Dict[str, Any]`): The image dictionary.
110
+
111
+ **Returns**:
112
+
113
+ `Dict[str, Any]`: The image url.
114
+
115
+ <a id="VisionAtomicFlow.VisionAtomicFlow.get_video"></a>
116
+
117
+ #### get\_video
118
+
119
+ ```python
120
+ @staticmethod
121
+ def get_video(video)
122
+ ```
123
+
124
+ This method returns the video in the appropriate format for API.
125
+
126
+ **Arguments**:
127
+
128
+ - `video` (`Dict[str, Any]`): The video dictionary.
129
+
130
+ **Returns**:
131
+
132
+ `Dict[str, Any]`: The video url.
133
+
134
+ <a id="VisionAtomicFlow.VisionAtomicFlow.get_user_message"></a>
135
+
136
+ #### get\_user\_message
137
+
138
+ ```python
139
+ @staticmethod
140
+ def get_user_message(prompt_template, input_data: Dict[str, Any])
141
+ ```
142
+
143
+ This method constructs the user message to be passed to the API.
144
+
145
+ **Arguments**:
146
+
147
+ - `prompt_template` (`PromptTemplate`): The prompt template to use.
148
+ - `input_data` (`Dict[str, Any]`): The input data.
149
+
150
+ **Returns**:
151
+
152
+ `Dict[str, Any]`: The constructed user message (images , videos and text).
153
+
VisionAtomicFlow.py CHANGED
@@ -1,14 +1,96 @@
1
 
2
  from typing import Dict, Any
3
- from flow_modules.aiflows.OpenAIChatFlowModule import OpenAIChatAtomicFlow
4
  from flows.utils.general_helpers import encode_image,encode_from_buffer
5
  import cv2
6
 
7
 
8
- class VisionAtomicFlow(OpenAIChatAtomicFlow):
 
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  @staticmethod
11
  def get_image(image):
 
 
 
 
 
 
 
12
  extension_dict = {
13
  "jpg": "jpeg",
14
  "jpeg": "jpeg",
@@ -34,6 +116,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
34
 
35
  @staticmethod
36
  def get_video(video):
 
 
 
 
 
 
 
37
  video_path = video["video_path"]
38
  resize = video.get("resize",768)
39
  frame_step_size = video.get("frame_step_size",10)
@@ -52,6 +141,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
52
 
53
  @staticmethod
54
  def get_user_message(prompt_template, input_data: Dict[str, Any]):
 
 
 
 
 
 
 
 
 
55
  content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
56
  media_data = input_data["data"]
57
  if "video" in media_data:
@@ -63,6 +161,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
63
 
64
  @staticmethod
65
  def _get_message(prompt_template, input_data: Dict[str, Any]):
 
 
 
 
 
 
 
 
 
66
  template_kwargs = {}
67
  for input_variable in prompt_template.input_variables:
68
  template_kwargs[input_variable] = input_data[input_variable]
@@ -70,6 +177,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
70
  return [{"type": "text", "text": msg_content}]
71
 
72
  def _process_input(self, input_data: Dict[str, Any]):
 
 
 
 
 
 
 
73
  if self._is_conversation_initialized():
74
  # Construct the message using the human message prompt template
75
  user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
 
1
 
2
  from typing import Dict, Any
3
+ from flow_modules.aiflows.ChatFlowModule import ChatAtomicFlow
4
  from flows.utils.general_helpers import encode_image,encode_from_buffer
5
  import cv2
6
 
7
 
8
+ class VisionAtomicFlow(ChatAtomicFlow):
9
+ """ This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
10
+ It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
11
 
12
+ *Configuration Parameters*:
13
+
14
+ - `name` (str): The name of the flow. Default: "VisionAtomicFlow"
15
+ - `description` (str): A description of the flow. This description is used to generate the help message of the flow.
16
+ Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
17
+ - enable_cache (bool): If True, the flow will use the cache. Default: True
18
+ - `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
19
+ - `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
20
+ - `system_name` (str): The name of the system. Default: "system"
21
+ - `user_name` (str): The name of the user. Default: "user"
22
+ - `assistant_name` (str): The name of the assistant. Default: "assistant"
23
+ - `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
24
+ default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
25
+ whose default value is overwritten:
26
+ - `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
27
+ - `model_name` (Union[Dict[str,str],str]): The name of the model to use.
28
+ When using multiple API providers, the model_name can be a dictionary of the form
29
+ {"provider_name": "model_name"}.
30
+ Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
31
+ - `n` (int) : The number of answers to generate. Default: 1
32
+ - `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
33
+ - `temperature` (float): The temperature to use. Default: 0.3
34
+ - `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
35
+ the tokens with top_p probability. Default: 0.2
36
+ - `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
37
+ - `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
38
+ - `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
39
+ By default its of type flows.prompt_template.JinjaPrompt.
40
+ None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
41
+ Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
42
+ - `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
43
+ (first time in). It is used to generate the human message. It's passed as the user message to the LLM.
44
+ By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
45
+ wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
46
+ - `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
47
+ all the messages of the flows's history are added to the input of the LLM. Default:
48
+ - `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
49
+ - `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
50
+ - Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
51
+
52
+ *Input Interface Initialized (Expected input the first time in flow)*:
53
+
54
+ - `query` (str): The textual query to run the model on.
55
+ - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
56
+ - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
57
+ - `type` (str): The type of the image. It can be "local_path" or "url".
58
+ - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
59
+ - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
60
+ - `video_path` (str): The path to the video.
61
+ - `resize` (int): The resize we want to apply on the frames of the video.
62
+ - `frame_step_size` (int): The step size between the frames of the video (to send to the model).
63
+ - `start_frame` (int): The start frame of the video (to send to the model).
64
+ - `end_frame` (int): The last frame of the video (to send to the model).
65
+
66
+ *Input Interface (Expected input the after the first time in flow)*:
67
+
68
+ - `query` (str): The textual query to run the model on.
69
+ - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
70
+ - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
71
+ - `type` (str): The type of the image. It can be "local_path" or "url".
72
+ - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
73
+ - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
74
+ - `video_path` (str): The path to the video.
75
+ - `resize` (int): The resize we want to apply on the frames of the video.
76
+ - `frame_step_size` (int): The step size between the frames of the video (to send to the model).
77
+ - `start_frame` (int): The start frame of the video (to send to the model).
78
+ - `end_frame` (int): The last frame of the video (to send to the model).
79
+
80
+ *Output Interface*:
81
+
82
+ - `api_output`s (str): The api output of the flow to the query and data
83
+
84
+ """
85
  @staticmethod
86
  def get_image(image):
87
+ """ This method returns an image in the appropriate format for API.
88
+
89
+ :param image: The image dictionary.
90
+ :type image: Dict[str, Any]
91
+ :return: The image url.
92
+ :rtype: Dict[str, Any]
93
+ """
94
  extension_dict = {
95
  "jpg": "jpeg",
96
  "jpeg": "jpeg",
 
116
 
117
  @staticmethod
118
  def get_video(video):
119
+ """ This method returns the video in the appropriate format for API.
120
+
121
+ :param video: The video dictionary.
122
+ :type video: Dict[str, Any]
123
+ :return: The video url.
124
+ :rtype: Dict[str, Any]
125
+ """
126
  video_path = video["video_path"]
127
  resize = video.get("resize",768)
128
  frame_step_size = video.get("frame_step_size",10)
 
141
 
142
  @staticmethod
143
  def get_user_message(prompt_template, input_data: Dict[str, Any]):
144
+ """ This method constructs the user message to be passed to the API.
145
+
146
+ :param prompt_template: The prompt template to use.
147
+ :type prompt_template: PromptTemplate
148
+ :param input_data: The input data.
149
+ :type input_data: Dict[str, Any]
150
+ :return: The constructed user message (images , videos and text).
151
+ :rtype: Dict[str, Any]
152
+ """
153
  content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
154
  media_data = input_data["data"]
155
  if "video" in media_data:
 
161
 
162
  @staticmethod
163
  def _get_message(prompt_template, input_data: Dict[str, Any]):
164
+ """ This method constructs the textual message to be passed to the API.
165
+
166
+ :param prompt_template: The prompt template to use.
167
+ :type prompt_template: PromptTemplate
168
+ :param input_data: The input data.
169
+ :type input_data: Dict[str, Any]
170
+ :return: The constructed textual message.
171
+ :rtype: Dict[str, Any]
172
+ """
173
  template_kwargs = {}
174
  for input_variable in prompt_template.input_variables:
175
  template_kwargs[input_variable] = input_data[input_variable]
 
177
  return [{"type": "text", "text": msg_content}]
178
 
179
  def _process_input(self, input_data: Dict[str, Any]):
180
+ """ This method processes the input data (prepares the messages to send to the API).
181
+
182
+ :param input_data: The input data.
183
+ :type input_data: Dict[str, Any]
184
+ :return: The processed input data.
185
+ :rtype: Dict[str, Any]
186
+ """
187
  if self._is_conversation_initialized():
188
  # Construct the message using the human message prompt template
189
  user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
VisionAtomicFlow.yaml CHANGED
@@ -1,4 +1,6 @@
1
- # This is an abstract flow, therefore some required fields are not defined (and must be defined by the concrete flow)
 
 
2
  enable_cache: True
3
 
4
  n_api_retries: 6
@@ -30,20 +32,22 @@ human_message_prompt_template:
30
  template: "{{query}}"
31
  input_variables:
32
  - "query"
 
33
  input_interface_initialized:
34
  - "query"
35
  - "data"
36
 
37
- query_message_prompt_template:
38
- _target_: flows.prompt_template.JinjaPrompt
39
-
40
-
41
  previous_messages:
42
  first_k: null # Note that the first message is the system prompt
43
  last_k: null
44
 
45
- demonstrations: null
46
- demonstrations_response_template: null
 
 
 
 
 
47
 
48
  output_interface:
49
  - "api_output"
 
1
+ name: "VisionAtomicFlow"
2
+ description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
3
+
4
  enable_cache: True
5
 
6
  n_api_retries: 6
 
32
  template: "{{query}}"
33
  input_variables:
34
  - "query"
35
+
36
  input_interface_initialized:
37
  - "query"
38
  - "data"
39
 
 
 
 
 
40
  previous_messages:
41
  first_k: null # Note that the first message is the system prompt
42
  last_k: null
43
 
44
+ input_interface:
45
+ - "query"
46
+ - "data"
47
+
48
+ input_interface_non_initialized:
49
+ - "question"
50
+ - "data"
51
 
52
  output_interface:
53
  - "api_output"
__init__.py CHANGED
@@ -1,6 +1,6 @@
1
  # ~~~ Specify the dependencies ~~
2
  dependencies = [
3
- {"url": "aiflows/OpenAIChatFlowModule", "revision": "eeec09b71e967ce426553e2300c5689f6ea6a662"}
4
  ]
5
  from flows import flow_verse
6
  flow_verse.sync_dependencies(dependencies)
 
1
  # ~~~ Specify the dependencies ~~
2
  dependencies = [
3
+ {"url": "aiflows/ChatFlowModule", "revision": "a749ad10ed39776ba6721c37d0dc22af49ca0f17"}
4
  ]
5
  from flows import flow_verse
6
  flow_verse.sync_dependencies(dependencies)
demo.yaml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flow:
2
+ _target_: aiflows.VisionFlowModule.VisionAtomicFlow.instantiate_from_default_config
3
+ name: "Demo Vision Flow"
4
+ description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
5
+ backend:
6
+ api_infos: ???
7
+
8
+ system_message_prompt_template:
9
+ template: |2-
10
+ You are a helpful chatbot that truthfully answers questions.
11
+ input_variables: []
12
+ partial_variables: {}
13
+
14
+ init_human_message_prompt_template:
15
+ template: |2-
16
+ {{query}}
17
+ input_variables: ["query"]
18
+ partial_variables: {}
19
+
20
+
run.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ import hydra
4
+
5
+ from flows.flow_launchers import FlowLauncher
6
+ from flows.backends.api_info import ApiInfo
7
+ from flows.utils.general_helpers import read_yaml_file
8
+
9
+ from flows import logging
10
+ from flows.flow_cache import CACHING_PARAMETERS, clear_cache
11
+
12
+ CACHING_PARAMETERS.do_caching = False # Set to True in order to disable caching
13
+ # clear_cache() # Uncomment this line to clear the cache
14
+
15
+ logging.set_verbosity_debug() # Uncomment this line to see verbose logs
16
+
17
+ from flows import flow_verse
18
+
19
+ dependencies = [
20
+ {"url": "aiflows/VisionFlowModule", "revision": os.getcwd()},
21
+ ]
22
+ flow_verse.sync_dependencies(dependencies)
23
+
24
+ if __name__ == "__main__":
25
+ # ~~~ Set the API information ~~~
26
+ # OpenAI backend
27
+
28
+ api_information = [ApiInfo(backend_used="openai",
29
+ api_key = os.getenv("OPENAI_API_KEY"))]
30
+
31
+
32
+ # # Azure backend
33
+ # api_information = ApiInfo(backend_used = "azure",
34
+ # api_base = os.getenv("AZURE_API_BASE"),
35
+ # api_key = os.getenv("AZURE_OPENAI_KEY"),
36
+ # api_version = os.getenv("AZURE_API_VERSION") )
37
+
38
+ root_dir = "."
39
+ cfg_path = os.path.join(root_dir, "demo.yaml")
40
+ cfg = read_yaml_file(cfg_path)
41
+
42
+ cfg["flow"]["backend"]["api_infos"] = api_information
43
+
44
+ # ~~~ Instantiate the Flow ~~~
45
+ flow_with_interfaces = {
46
+ "flow": hydra.utils.instantiate(cfg['flow'], _recursive_=False, _convert_="partial"),
47
+ "input_interface": (
48
+ None
49
+ if cfg.get( "input_interface", None) is None
50
+ else hydra.utils.instantiate(cfg['input_interface'], _recursive_=False)
51
+ ),
52
+ "output_interface": (
53
+ None
54
+ if cfg.get( "output_interface", None) is None
55
+ else hydra.utils.instantiate(cfg['output_interface'], _recursive_=False)
56
+ ),
57
+ }
58
+ url_image = {"type": "url",
59
+ "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}
60
+
61
+ local_image = {"type": "local_path", "image": "PATH TO YOUR LOCAL IMAGE"}
62
+
63
+ video = {"video_path": "PATH TO YOUR LOCAL VIDEO", "resize": 768, "frame_step_size": 30, "start_frame": 0, "end_frame": None }
64
+
65
+ # ~~~ Get the data ~~~
66
+
67
+ ## FOR SINGLE IMAGE
68
+ data = {"id": 0, "query": "What’s in this image?", "data": {"images": [url_image]}} # This can be a list of samples
69
+
70
+ ## FOR MULTIPLE IMAGES
71
+ # data = {"id": 0, "question": "What are in these images? Is there any difference between them?", "data": {"images": [url_image,local_image]}} # This can be a list of samples
72
+
73
+ ## FOR VIDEO
74
+ # data = {"id": 0,
75
+ # "question": "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
76
+ # "data": {"video": video}} # This can be a list of samples
77
+
78
+
79
+ # ~~~ Run inference ~~~
80
+ path_to_output_file = None
81
+ # path_to_output_file = "output.jsonl" # Uncomment this line to save the output to disk
82
+
83
+ _, outputs = FlowLauncher.launch(
84
+ flow_with_interfaces=flow_with_interfaces,
85
+ data=data,
86
+ path_to_output_file=path_to_output_file
87
+ )
88
+
89
+ # ~~~ Print the output ~~~
90
+ flow_output_data = outputs[0]
91
+ print(flow_output_data)