Skip to content

[BUG] Regression caused by v1.14.0 change in ActionStep.to_messages method #1224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stephanlensky opened this issue Apr 18, 2025 · 2 comments · May be fixed by #1225
Open

[BUG] Regression caused by v1.14.0 change in ActionStep.to_messages method #1224

stephanlensky opened this issue Apr 18, 2025 · 2 comments · May be fixed by #1225
Labels
bug Something isn't working

Comments

@stephanlensky
Copy link
Contributor

Hi! We are encountering a regression after updating to v1.14.0.

Specifically, it's due to this change from #1148, which modified the to_messages method to no longer include the tool call ID in the returned message:

-                            "text": (f"Call id: {self.tool_calls[0].id}\n" if self.tool_calls else "")
-                            + f"Observation:\n{self.observations}",
+                            "text": f"Observation:\n{self.observations}",

It's a little hacky, but we were parsing the "Call id" section for our custom Anthropic model.

Anthropic requires that you pass in the tool_use_id for tool results (see here), and parsing it from the string was the only way we could figure out how to retrieve that ID in the context of a custom model.

Code example

We were doing something like this:

from smolagents import Model

class AnthropicModel(Model):
    def __call__(
        self,
        messages: list[dict[str, str]],
        stop_sequences: list[str] | None = None,
        grammar: str | None = None,
        tools_to_call_from: list[Tool] | None = None,
        **kwargs: Any,
    ) -> ChatMessage:
        # ... abbreviated code
        # this is the part where we construct the Anthropic-format messages from the provided messages dict
        for message in messages:
            if message["role"] == MessageRole.TOOL_RESPONSE:
                # parse call id from message["content"][0]["text"]
                # (search for "Call id: ..." section)

which no longer works.

@stephanlensky stephanlensky added the bug Something isn't working label Apr 18, 2025
@stephanlensky stephanlensky linked a pull request Apr 18, 2025 that will close this issue
@albertvillanova
Copy link
Member

Thanks for reporting the regression and the details report.

In principle, I'm OK with reverting the change to restore the tool_call_id in the short term. For future releases, do you have any suggestions on how we might better expose or preserve the tool call ID in a more robust or consistent way? Would be great to make this behavior more intentional and less prone to accidental breakage.

@stephanlensky
Copy link
Contributor Author

stephanlensky commented Apr 23, 2025

Hey Albert, I think the main issue is that currently, the smolagents API assumes that models will want to consume tool calls and responses as raw text. We can see that here and here where messages are transformed into a message with content type: text.

However, for both the OpenAI and Anthropic APIs, you are supposed to return tool calls/responses using a special content type (function_call/function_call_output for OpenAI and tool_use/tool_result for Anthropic) which encodes information about the tool call in structured JSON.

I think this problem of needing to transform the raw text messages returned from smolagents to this structured format is the key point which is making the current API difficult to work with.

Ideally, ActionStep.to_messages should serialize tool calls/observations into a structured format where the tool calls, tool results, and associated IDs can be accessed as individual fields in the resulting dictionary. e.g. in to_messages, you could handle self.tool_calls and self.observations like this:

if self.tool_calls is not None:
    messages.append(
        Message(
            role=MessageRole.TOOL_CALL,
            content=[
                {
                    "type": "tool_use",
                    "id": tool_call.id,
                    "name": tool_call.name,
                    "arguments": make_json_serializable(tool_call.arguments)
                }
                for tool_call in self.tool_calls
            ],
        )
    )

if self.observations is not None:
    messages.append(
        Message(
            role=MessageRole.TOOL_RESPONSE,
            content=[
                # ideally, we should have one content block per tool response to avoid hardcoding self.tool_calls[0].id
                # as-is, this will break if there is more than one tool response in a single message
                # unfortunately, since `self.observations` is just `str` instead of `list[str]` it is probably the best we can do
                {
                    "type": "tool_result",
                    "tool_use_id": self.tool_calls[0].id,
                    "content": self.observations,
                }
            ],
        )
    )

The exact format of these dictionaries is somewhat arbitrary as long as they contain all of the required information.

I used something close to Anthropic's format here, but you could easily model them after OpenAPI/HuggingFace/LiteLLM/whatever instead. The idea is that consuming models could then adapt this dictionary to the exact format expected by their provider.

Unfortunately, this would be a major breaking change, so I'm not sure if it would be feasible for you to implement.

Maybe one alternative would be to pass AgentMemory.steps directly the model in addition to the serialized messages? Then the model could use ActionStep directly to construct a message in the desired format. The downside of this is that it would drastically increase your API surface though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants