Multimodal `ToolCallSummaryMessage` & `FunctionExecutionResult` Support #6381

fang-d · 2025-04-24T08:18:08Z

fang-d
Apr 24, 2025

It seems that the return value of function calling only supports the str type currently. But for VLMs, supporting multimodal function calling is necessary and important. A simple example is that it allows the model to automatically select images from the file system and analyze them. The content of the ToolCallSummaryMessage and FunctionExecutionResult should be str | agent_core.Image | list[str | agent_core.Image].

ekzhu · 2025-04-25T22:56:41Z

ekzhu
Apr 25, 2025
Maintainer

You may want to take a look at the new Workbench feature that is coming in the next release: https://microsoft.github.io/autogen/dev/user-guide/core-user-guide/components/workbench.html. The return value from calling tool in a workbench can be image.

1 reply

fang-d Apr 27, 2025
Author

Hi @ekzhu, thanks for your reply. This feature is awesome!

However, it seems that the default workbench, StaticWorkbench, still assumes that all the tools return a string (or object that can be converted to a string).

autogen/python/packages/autogen-core/src/autogen_core/tools/_static_workbench.py

Lines 55 to 64 in 63c791d

    
           try: 
        
               result_future = asyncio.ensure_future(tool.run_json(arguments, cancellation_token)) 
        
               cancellation_token.link_future(result_future) 
        
               result = await result_future 
        
               is_error = False 
        
           except Exception as e: 
        
               result = str(e) 
        
               is_error = True 
        
           result_str = tool.return_value_as_string(result) 
        
           return ToolResult(name=tool.name, result=[TextResultContent(content=result_str)], is_error=is_error)

Besides, AssistantAgent also assumes that all tools return a string:

autogen/python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py

Lines 1300 to 1314 in 63c791d

    
           # Handle normal tool call using workbench. 
        
           result = await workbench.call_tool( 
        
               name=tool_call.name, 
        
               arguments=arguments, 
        
               cancellation_token=cancellation_token, 
        
           ) 
        
           return ( 
        
               tool_call, 
        
               FunctionExecutionResult( 
        
                   content=result.to_text(), 
        
                   call_id=tool_call.id, 
        
                   is_error=result.is_error, 
        
                   name=tool_call.name, 
        
               ), 
        
           )

As a user, I think maybe it would be better if agent_core.Image objects could be supported by default in the next version.

fang-d · 2025-05-06T04:30:11Z

fang-d
May 6, 2025
Author

It seems that function calling results with images are not supported by the current OpenAI models. I will close this discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multimodal `ToolCallSummaryMessage` & `FunctionExecutionResult` Support #6381

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multimodal ToolCallSummaryMessage & FunctionExecutionResult Support #6381

Uh oh!

Uh oh!

fang-d Apr 24, 2025

Replies: 2 comments · 1 reply

Uh oh!

ekzhu Apr 25, 2025 Maintainer

Uh oh!

Uh oh!

fang-d Apr 27, 2025 Author

Uh oh!

fang-d May 6, 2025 Author

Multimodal `ToolCallSummaryMessage` & `FunctionExecutionResult` Support #6381

fang-d
Apr 24, 2025

Replies: 2 comments 1 reply

ekzhu
Apr 25, 2025
Maintainer

fang-d Apr 27, 2025
Author

fang-d
May 6, 2025
Author