Skip to content

Error in Stream in Runner.run_streamed() with LitellmModel(Model) class #601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sumit-lightringai opened this issue Apr 25, 2025 · 2 comments
Labels
bug Something isn't working needs-more-info Waiting for a reply/more info from the author

Comments

@sumit-lightringai
Copy link

Please read this first

  • Have you read the docs? Agents SDK docs Yes
  • Have you searched for related issues? Yes, others may have faced similar issues.

Describe the bug

Runner.run_streamed() is not able to produce a proper stream with LitellmModel, tested with "openai/gpt-4o". When using:

result = Runner.run_streamed(triage_agent, message)
async for event in result.stream_events():
    print(event)

Only a single AgentUpdatedStreamEvent object is printed, rather than a continuous stream of events.

Reproduction Code

Agent definitions:

history_tutor_agent = Agent(
    name="History Tutor",
    handoff_description="Specialist agent for historical questions",
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",
    model=LitellmModel(model=model, api_key=api_key),
)

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model=LitellmModel(model=model, api_key=api_key),
)   

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    model=LitellmModel(model=model, api_key=api_key),
    handoffs=[history_tutor_agent, math_tutor_agent]
)

Expected Behavior

A continuous stream of events should be produced as the agent processes the request, similar to how it works with default OpenAI models.

Current Behavior

Only a single event is output in the stream:

AgentUpdatedStreamEvent(new_agent=Agent(name='Triage Agent', instructions="You determine which agent to use based on the user's homework question", handoff_description=None, handoffs=[Agent(name='History Tutor', instructions='You provide assistance with historical queries. Explain important events and context clearly.', handoff_description='Specialist agent for historical questions', handoffs=[], model=<agents.extensions.models.litellm_model.LitellmModel object at 0x0000026AB77C67B0>, model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=None, truncation=None, max_tokens=None, reasoning=None, metadata=None, store=None, include_usage=None, extra_query=None, extra_body=None), tools=[], mcp_servers=[], mcp_config={}, input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True), Agent(name='Math Tutor', instructions='You provide help with math problems. Explain your reasoning at each step and include examples', handoff_description='Specialist agent for math questions', handoffs=[], model=<agents.extensions.models.litellm_model.LitellmModel object at 0x0000026AB7828A50>, model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=None, truncation=None, max_tokens=None, reasoning=None, metadata=None, store=None, include_usage=None, extra_query=None, extra_body=None), tools=[], mcp_servers=[], mcp_config={}, input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True)], model=<agents.extensions.models.litellm_model.LitellmModel object at 0x0000026AB7828E10>, model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=None, truncation=None, max_tokens=None, reasoning=None, metadata=None, store=None, include_usage=None, extra_query=None, extra_body=None), tools=[], mcp_servers=[], mcp_config={}, input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True), type='agent_updated_stream_event')

Additional Information

  1. Runner.run() (non-streaming version) works correctly with the same LitellmModel agents.
  2. When using default OpenAI models (without specifying model=LitellmModel), streaming works correctly:
history_tutor_agent = Agent(
    name="History Tutor",
    handoff_description="Specialist agent for historical questions",
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",
)

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)   

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    handoffs=[history_tutor_agent, math_tutor_agent]
)

This indicates the issue is specifically with the LitellmModel integration in the streaming functionality of the Agents SDK.

Environment Information

  • Tested with model: "openai/gpt-4o" via LitellmModel
  • Issue appears to be related to the interaction between LitellmModel and the streaming functionality
@sumit-lightringai sumit-lightringai added the bug Something isn't working label Apr 25, 2025
@DanieleMorotti
Copy link
Contributor

Hi, I'm not able to reproduce your error, try the following script:

import asyncio

from agents import Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
from openai.types.responses import ResponseTextDeltaEvent


API_KEY = open("../../OPENAI_API_KEY.txt", "r").read()


model = "openai/gpt-4o"

history_tutor_agent = Agent(
    name="History Tutor",
    handoff_description="Specialist agent for historical questions",
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",
    model=LitellmModel(model=model, api_key=API_KEY),
)

math_tutor_agent = Agent(
    name="Math Tutor",
    handoff_description="Specialist agent for math questions",
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
    model=LitellmModel(model=model, api_key=API_KEY),
)   

triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",
    model=LitellmModel(model=model, api_key=API_KEY),
    handoffs=[history_tutor_agent, math_tutor_agent]
)


async def main():
    result = Runner.run_streamed(triage_agent, "I want to solve the following equation: '2x^2 -32 = 16'")
    async for event in result.stream_events():
        if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
            print(event.data.delta, end="", flush=True)



if __name__ == "__main__":
    asyncio.run(main())

It seems to work properly

@rm-openai rm-openai added the needs-more-info Waiting for a reply/more info from the author label Apr 25, 2025
@sumit-lightringai
Copy link
Author

Thanks, I just pip freeze and found out that was using openai-agents v0.12 and when try running with new version openai-agents v0.13 its working, still v.12 had that bug🙂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-more-info Waiting for a reply/more info from the author
Projects
None yet
Development

No branches or pull requests

3 participants