Skip to content

Releases: deepset-ai/haystack

v2.12.2

14 Apr 13:56
Compare
Choose a tag to compare

🐛 Bug Fixes

  • Fix ChatMessage.from_dict to handle cases where optional fields like name and meta are missing.
  • Make Document's first-level fields to take precedence over meta fields when flattening the dictionary representation.

v2.12.1

10 Apr 07:37
Compare
Choose a tag to compare

🐛 Bug Fixes

  • In Agent we make sure state_schema is always initialized to have 'messages'. Previously this was only happening at run time which is why pipeline.connect failed because output types are set at init time. Now the Agent correctly sets everything in state_schema (including messages by default) at init time.
  • In AsyncPipline the span tag name is updated from hasytack.component.outputs to haystack.component.output. This matches the tag name used in Pipeline and is the tag name expected by our tracers.

v2.12.0

02 Apr 10:30
Compare
Choose a tag to compare

⭐️ Highlights

Agent Component with State Management

The Agent component enables tool-calling functionality with provider-agnostic chat model support and can be used as a standalone component or within a pipeline.
With SERPERDEV_API_KEY and OPENAI_API_KEY defined, a Web Search Agent is as simple as:

from haystack.components.agents import Agent 
from haystack.components.generators.chat import OpenAIChatGenerator 
from haystack.components.websearch import SerperDevWebSearch 
from haystack.dataclasses import ChatMessage 
from haystack.tools.component_tool import ComponentTool 

web_tool = ComponentTool(     
    component=SerperDevWebSearch(), 
) 

agent = Agent(     
    chat_generator=OpenAIChatGenerator(),
    tools=[web_tool],
) 

result = agent.run(
    messages=[ChatMessage.from_user("Find information about Haystack by deepset")]
) 

The Agent supports streaming responses, customizable exit conditions, and a flexible state management system that enables tools to share and modify data during execution:

agent = Agent(
    chat_generator=OpenAIChatGenerator(),
    tools=[web_tool, weather_tool],
    exit_conditions=["text", "weather_tool"],
    state_schema = {...},
    streaming_callback=streaming_callback,
)

SuperComponent for Reusable Pipelines

SuperComponent allows you to wrap complex pipelines into reusable components. This makes it easy to reuse them across your applications. Just initialize a SuperComponent with a pipeline:

from haystack import Pipeline, SuperComponent

with open("pipeline.yaml", "r") as file:
  pipeline = Pipeline.load(file)

super_component = SuperComponent(pipeline)

That's not all! To show the benefits, there are three ready-made SuperComponents in haystack-experimental.
For example, there is a MultiFileConverter that wraps a pipeline with converters for CSV, DOCX, HTML, JSON, MD, PPTX, PDF, TXT, and XSLX. After installing the integration dependencies pip install pypdf markdown-it-py mdit_plain trafilatura python-pptx python-docx jq openpyxl tabulate, you can run with any of the supported file types as input:

from haystack_experimental.super_components.converters import MultiFileConverter

converter = MultiFileConverter()
converter.run(sources=["test.txt", "test.pdf"], meta={})

Here's an example of creating a custom SuperComponent from any Haystack pipeline:

from haystack import Pipeline, SuperComponent
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.dataclasses.chat_message import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.dataclasses import Document

document_store = InMemoryDocumentStore()
documents = [
    Document(content="Paris is the capital of France."),
    Document(content="London is the capital of England."),
]
document_store.write_documents(documents)

prompt_template = [
    ChatMessage.from_user(
    '''
    According to the following documents:
    {% for document in documents %}
    {{document.content}}
    {% endfor %}
    Answer the given question: {{query}}
    Answer:
    '''
    )
]
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*")

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", OpenAIChatGenerator())
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")

# Create a super component with simplified input/output mapping
wrapper = SuperComponent(
    pipeline=pipeline,
    input_mapping={
        "query": ["retriever.query", "prompt_builder.query"],
    },
    output_mapping={"llm.replies": "replies"}
)

# Run the pipeline with simplified interface
result = wrapper.run(query="What is the capital of France?")
print(result)
# {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>,
#  _content=[TextContent(text='The capital of France is Paris.')],...)

⬆️ Upgrade Notes

  • Updated ChatMessage serialization and deserialization. ChatMessage.to_dict() now returns a dictionary with the keys: role, content, meta, and name. ChatMessage.from_dict() supports this format and maintains compatibility with older formats.

    If your application consumes the result of ChatMessage.to_dict(), update your code to handle the new format. No changes are needed if you're using ChatPromptBuilder in a Pipeline.

  • LLMEvaluator, ContextRelevanceEvaluator, and FaithfulnessEvaluator now internally use a ChatGenerator instance instead of a Generator instance. The public attribute generator has been replaced with _chat_generator.

  • to_pandas, comparative_individual_scores_report and score_report were removed from EvaluationRunResult, please use detailed_report, comparative_detailed_report and aggregated_report instead.

🚀 New Features

  • Treat bare types (e.g., List, Dict) as generic types with Any arguments during type compatibility checks.
  • Add compatibility for Callable types.
  • Adds outputs_to_string to Tool and ComponentTool to allow users to customize how the output of a Tool should be converted into a string so that it can be provided back to the ChatGenerator in a ChatMessage. If outputs_to_string is not provided, a default converter is used within ToolInvoker. The default handler uses the current default behavior.
  • Added a new parameter split_mode to the CSVDocumentSplitter component to control the splitting mode. The new parameter can be set to row-wise to split the CSV file by rows. The default value is threshold, which is the previous behavior.
  • We added a new retrieval technique, AutoMergingRetriever which together with the HierarchicalDocumentSplitter implement a auto-merging retrieval technique.
  • Add run_async method to HuggingFaceLocalChatGenerator. This method internally uses ThreadPoolExecutor to return coroutines that can be awaited.
  • Introduced asynchronous functionality and HTTP/2 support in the LinkContentFetcher component, thus improving content fetching in several aspects.
  • The DOCXToDocument component now has the option to include extracted hyperlink addresses in the output Documents. It accepts a link_format parameter that can be set to "markdown" or "plain". By default, no hyperlink addresses are extracted as before.
  • Added a new parameter azure_ad_token_provider to all Azure OpenAI components: AzureOpenAIGenerator, AzureOpenAIChatGenerator, AzureOpenAITextEmbedder and AzureOpenAIDocumentEmbedder. This parameter optionally accepts a callable that returns a bearer token, enabling authentication via Azure AD.
    • Introduced the utility function default_azure_token_provider in haystack/utils/azure.py. This function provides a default token provider that is serializable by Haystack. Users can now pass default_azure_token_provider as the azure_ad_token_provider or implement a custom token provider.
  • Users can now work with date and time in the ChatPromptBuilder. In the same way as the PromptBuilder, the ChatPromptBuilder now supports arrow to work with datetime.
  • Introduce new State dataclass with a customizable schema for managing Agent state. Enhance error logging of Tool and extend the ToolInvoker component to work with newly introduced State.
  • The RecursiveDocumentSplitter now supports splitting by number of tokens. Setting "split_unit" to "token" will use a hard-coded tiktoken tokenizer (o200k_base) and requires having tiktoken installed.

⚡️ Enhancement Notes

  • LLMEvaluator, ContextRelevanceEvaluator, and FaithfulnessEvaluator now accept a chat_generator initialization parameter, consisting of ChatGenerator instance pre-configured to return a JSON object. Previously, these components only supported OpenAI and LLMs with OpenAI-compatible APIs. Regardless of whether the evaluator components are initialized with api, api_key, and api_params or the new chat_generator parameter, the serialization format will now only include chat_generator in preparation for the future removal of api, api_key, and api_params.
  • Improved error handling for component run failures by raising a runtime error that includes the component's name and type.
  • When using Haystack's Agent, the messages are stored and accumulated in ...
Read more

v2.12.0-rc1

01 Apr 16:47
41486ae
Compare
Choose a tag to compare
v2.12.0-rc1 Pre-release
Pre-release
v2.12.0-rc1

v2.11.2

18 Mar 10:47
Compare
Choose a tag to compare

Release Notes

v2.11.2

Enhancement Notes

  • Refactored the processing of streaming chunks from OpenAI to simplify logic.
  • Added tests to ensure expected behavior when handling streaming chunks when using include_usage=True.

Bug Fixes

  • Fixed issue with MistralChatGenerator not returning a finish_reason when using streaming. Fixed by adjusting how we look for the finish_reason when processing streaming chunks. Now, the last non-None finish_reason is used to handle differences between OpenAI and Mistral.

v2.11.1

13 Mar 07:40
Compare
Choose a tag to compare

Release Notes

v2.11.1

Bug Fixes

  • Add dataframe to legacy fields for the Document dataclass. This fixes a bug where Document.from_dict() in haystack-ai>=2.11.0 could not properly deserialize a Document dictionary obtained with document.to_dict(flatten=False) in haystack-ai<=2.10.0.

v2.11.1-rc1

12 Mar 14:22
Compare
Choose a tag to compare
v2.11.1-rc1 Pre-release
Pre-release
v2.11.1-rc1

v2.11.0

10 Mar 15:26
Compare
Choose a tag to compare

⭐️ Highlights

Faster Imports

With lazy importing, importing individual components now requires 50% less CPU time on average. Overall import performance has also significantly improved: for example, import haystack now consumes only 2-5% of the CPU time it previously did.

Extended Async Run Support

As of this release, all chat generators and retrievers in the core package now include a run_async method, enabling asynchronous execution at the component level. When used in an AsyncPipeline, this method runs automatically, providing native async capabilities.

AsyncPipeline vs Pipeline

New MSGToDocument Component

Use MSGToDocument to convert Microsoft Outlook .msg files into Haystack documents. This component extracts the email metadata (such as sender, recipients, CC, BCC, subject) and body content and converts any file attachments into ByteStream objects.

Turn off Validation for Pipeline Connections

Set connection_type_validation to false when initializing Pipeline to disable type validation for pipeline connections. This will allow you to connect any edges and bypass errors you might get, for example, when you connect Optional[str] output to str input.

⬆️ Upgrade Notes

  • The ExtractedTableAnswer dataclass and the dataframe field in the Document dataclass, deprecated in Haystack 2.10.0, have now been removed. pandas is no longer a required dependency for Haystack, making the installation lighter. If a component you use requires pandas, an informative error will be raised, prompting you to install it. For details and motivation, see the GitHub discussion #8688.

  • Starting from Haystack 2.11.0 Python 3.8 is no longer supported. Python 3.8 reached its end of life on October 2024.

  • The AzureOCRDocumentConverter no longer produces Document objects with the deprecated dataframe field.

    Am I affected?

    • If your workflow relies on the dataframe field in Document objects generated by AzureOCRDocumentConverter, you are affected.
    • If you saw a DeprecationWarning in Haystack 2.10 when initializing a Document with a dataframe, this change will now remove that field entirely.

    How to handle the change:

    • Instead of storing detected tables as a dataframe, AzureOCRDocumentConverter now represents tables as CSV-formatted text in the content field of the Document.
    • Update your processing logic to handle CSV-formatted tables instead of a dataframe. If needed, you can convert the CSV text back into a dataframe using pandas.read_csv().

🚀 New Features

  • Add a new MSGToDocument component to convert .msg files into Haystack Document objects.
    • Extracts email metadata (e.g. sender, recipients, CC, BCC, subject) and body content into a Document.
    • Converts attachments into ByteStream objects which can be passed onto a FileTypeRouter + relevant converters.
  • We've introduced a new type_validation parameter to control type compatibility checks in pipeline connections. It can be set to True (default) or False which means no type checks will be done and everything is allowed.
  • Add run_async method to HuggingFaceAPIChatGenerator. This method relies internally on the AsyncInferenceClient from huggingface to generate chat completions and supports the same parameters as the run method. It returns a coroutine that can be awaited.
  • Add run_async method to OpenAIChatGenerator. This method internally uses the async version of the OpenAI client to generate chat completions and supports the same parameters as the run method. It returns a coroutine that can be awaited.
  • The InMemoryDocumentStore and the associated InMemoryBM25Retriever and InMemoryEmbeddingRetriever retrievers now support async mode.
  • Add run_async method to DocumentWriter. This method supports the same parameters as the run method and relies on the DocumentStore to implement write_documents_async. It returns a coroutine that can be awaited.
  • Add run_async method to AzureOpenAIChatGenerator. This method uses AsyncAzureOpenAI to generate chat completions and supports the same parameters as the run method. It returns a coroutine that can be awaited.
  • Sentence Transformers components now support ONNX and OpenVINO backends through the "backend" parameter. Supported backends are torch (default), onnx, and openvino. Refer to the Sentence Transformers documentation for more information.
  • Add run_async method to HuggingFaceLocalChatGenerator. This method internally uses ThreadPoolExecutor to return coroutines that can be awaited.

⚡️ Enhancement Notes

  • Improved AzureDocumentEmbedder to handle embedding generation failures gracefully. Errors are logged, and processing continues with the remaining batches.
  • In the FileTypeRouter add explicit support for classifying .msg files with mimetype "application/vnd.ms-outlook" since the mimetypes module returns None for .msg files by default.
  • Added the store_full_path init variable to XLSXToDocument to allow users to toggle whether to store the full path of the source file in the meta of the Document. This is set to False by default to increase privacy.
  • Increased default timeout for Mermaid server to 30 seconds. Mermaid server is used to draw Pipelines. Exposed the timeout as a parameter for the Pipeline.show and Pipeline.draw methods. This allows users to customize the timeout as needed.
  • Optimize import times through extensive use of lazy imports across packages. Importing one component of a certain package, no longer leads to importing all components of the same package. For example, importing OpenAIChatGenerator no longer imports AzureOpenAIChatGenerator.
  • Haystack now officially supports Python 3.13. Some components and integrations may not yet be compatible. Specifically, the NamedEntityExtractor does not work with Python 3.13 when using the spacy backend. Additionally, you may encounter issues installing openai-whisper, which is required by the LocalWhisperTranscriber component, if you use uv or poetry for installation. In this case, we recommend using pip for installation.
  • EvaluationRunResult can now output the results in JSON, a pandas Dataframe or in a CSV file.
  • Update ListJoiner to only optionally need list_type to be passed. By default it uses type List which acts like List[Any].
    • This allows the ListJoiner to combine any incoming lists into a single flattened list.
    • Users can still pass list_type if they would like to have stricter type validation in their pipelines.
  • Added PDFMinerToDocument functionality to detect and report undecoded CID characters in PDF text extraction, helping users identify potential text extraction quality issues when processing PDFs with non-standard fonts.
  • Simplified the serialization code for better readability and maintainability.
    • Updated deserialization to allow users to omit the typing. prefix for standard typing library types (e.g., List[str] instead of typing.List[str]).

⚠️ Deprecation Notes

  • The use of pandas Dataframe in EvaluationRunResult is now optional and the methods score_report, to_pandas and comparative_individual_scores_report are deprecated and will be removed in the next haystack release.

🐛 Bug Fixes

  • In the ChatMessage.to_openai_dict_format utility method, include the name field in the returned dictionary, if present. Previously, the name field was erroneously skipped.
  • Pipelines with components that return plain pandas dataframes failed. The comparison of socket values is now 'is not' instead of '!=' to avoid errors with dataframes.
  • Make sure that OpenAIChatGenerator sets additionalProperties: False in the tool schema when tool_strict is set to True.
  • Fix a bug where the output_type of a ConditionalRouter was not being serialized correctly. This would cause the router to work incorrectly after being serialized and deserialized.
  • Fixed accumulation of a tools arguments when streaming with an [OpenAIChatGenerator](https://d...
Read more

v2.11.0-rc3

07 Mar 11:44
Compare
Choose a tag to compare
v2.11.0-rc3 Pre-release
Pre-release
v2.11.0-rc3

v2.11.0-rc2

05 Mar 15:41
Compare
Choose a tag to compare
v2.11.0-rc2 Pre-release
Pre-release
v2.11.0-rc2