Skip to content

feat: Add "think" parameter for Ollama #1948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Ryzhtus
Copy link
Contributor

@Ryzhtus Ryzhtus commented Jun 14, 2025

Related Issues

Proposed Changes:

Added an additional think parameter in accordance with Ollama’s current interface, and included a test to cover this feature.

How did you test it?

I added unit tests

Checklist

@Ryzhtus Ryzhtus requested a review from a team as a code owner June 14, 2025 16:50
@Ryzhtus Ryzhtus requested review from anakin87 and removed request for a team June 14, 2025 16:50
@github-actions github-actions bot added integration:ollama type:documentation Improvements or additions to documentation labels Jun 14, 2025
@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Jun 14, 2025

@anakin87 Hi! What's your opinion about storing thinking field in the _meta attribute of ChatMessage. In OllamaChatGenerator the response is converted to a ChatMessage object, where its _meta value is formatted to be compatible with OpenAI API. The thing is OpenAI's ChatCompletion API doesn't support storing reasoning messages, so there's no appropriate field to store it and so this breaks compatibility in some way. But having it probably will be useful for users who want to track reasoning process of their LLMs

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this...

  • I agree with your approach. For the moment, putting the thinking output in ChatMessage._meta.thinking is reasonable.

  • Please rebase your branch, fix conflicts and run tests.

  • I left some other comments .

@@ -156,6 +161,7 @@ def __init__(
url: str = "http://localhost:11434",
generation_kwargs: Optional[Dict[str, Any]] = None,
timeout: int = 120,
think=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this new parameter at the end, to make this change non-breaking.

@@ -172,6 +178,8 @@ def __init__(
[Ollama docs](https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values).
:param timeout:
The number of seconds before throwing a timeout error from the Ollama API.
:param think
Enables the model's "thinking" process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expand this explanation to something like

Suggested change
Enables the model's "thinking" process.
If True, the modell will "think" before producing a response.
Only [thinking models](https://ollama.com/search?c=thinking) support this feature.
The intermediate "thinking" output can be found in the `meta` property of the returned `ChatMessage`.

@@ -36,6 +36,7 @@ def __init__(
template: Optional[str] = None,
raw: bool = False,
timeout: int = 120,
think: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to introduce new features in the Chat Generators only. In the long run, we may deprecate Generators and keep only Chat Generators.

For this reason, I won't introduce support for thinking in Generators.

@@ -508,6 +508,17 @@ def test_run_with_chat_history(self):
city.lower() in response["replies"][-1].text.lower() for city in ["Manchester", "Birmingham", "Glasgow"]
)

@pytest.mark.integration
def test_live_run_with_thinking(self):
chat_generator = OllamaChatGenerator(model="qwen3:1.7b", think=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use this model in an integration test, you should also change the following line

LLM_FOR_TESTS: "llama3.2:3b"

However, I would recommend using qwen3:0.6b if possible: based on my experiment, this would work quite well with our tests and being very small, it can speed up download and inference times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:ollama type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Ollamas Thinking capabilities
2 participants