Skip to content

Commit 06345b2

Browse files
Amnah199denisw
andauthored
feat: Allow full metadata field customization (#1893)
* feat(azure-ai-search): Allow full metadata field customization So far, the `metadata_fields` init parameter only allowed a few custom simple value types to be mapped (e.g., no nested metadata) and also hardcoded the fields to be only `filterable` (but not `searchable` or `facetable`, for instance). For full flexibility, allow an Azure AI Search `SearchField` instance to be passed as mapping instead of a Python type. * PR comments * feat: Add OpenRouter integration (#1723) * Add openrouter integration * Add tests for chat generator and support extra headers * Add async tests * Fix config files * Add example * Fixes * Fix read me * PR comments * Small fixes * Updated labeler and README * Update docstrings * Add user agent to Azure AI Search(#1743) * docs: update changelog for integrations/azure_ai_search (#1745) * Update changelog for integrations/azure_ai_search * Update CHANGELOG.md --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * docs: ChatMessage examples (#1752) * feat: Support Llama API as a Chat Generator (#1742) * init: llama-api chat generator * docs: update comments for LlamaChatGenerator * feat: add keyword only * * fix: replace streaming_callback type * fix: add Toolset for tools * fix: rm unused typing * docs: add meta header * docs: fix comments to llama api * docs: add meta header * docs: add meta header * fix: rename LlamaChat to MetaLlamaChat * docs: add meta header * docs: align doc format * add workflow for nightly tests * add meta_llama to labeler * add new integration to repo readme overview table * replace .llama.chat. with .meta_llama.chat. * fmt * replace llama with meta_llama in pydocs --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update changelog for integrations/meta_llama (#1754) Co-authored-by: julian-risch <4181769+julian-risch@users.noreply.github.com> * chore(deps): bump fossas/fossa-action from 1.6.0 to 1.7.0 (#1750) Bumps [fossas/fossa-action](https://github.com/fossas/fossa-action) from 1.6.0 to 1.7.0. - [Release notes](https://github.com/fossas/fossa-action/releases) - [Commits](fossas/fossa-action@v1.6.0...v1.7.0) --- updated-dependencies: - dependency-name: fossas/fossa-action dependency-version: 1.7.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update how skipping works (#1756) * test: Ollama - make test_run_with_response_format more robust (#1757) * feat: adapt `OllamaGenerator` metadata to OpenAI format (#1753) * feat: adapt Ollama metadata to OpenAI format in `OllamaGenerator` * Add: `OllamaGenerator` support in Langfuse * Ran Linters * Revert "Add: `OllamaGenerator` support in Langfuse" This reverts commit 1f399e0. --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * Add: `OllamaGenerator` support in Langfuse (#1759) * Update changelog for integrations/ollama (#1761) Co-authored-by: sjrl <10526848+sjrl@users.noreply.github.com> * Update changelog for integrations/langfuse (#1762) Co-authored-by: sjrl <10526848+sjrl@users.noreply.github.com> * docs: update changelog for integrations/openrouter (#1763) * Update changelog for integrations/openrouter --------- Co-authored-by: Amnah199 <13835656+Amnah199@users.noreply.github.com> Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * chore: fix README for meta-llama (#1766) * chore(deps): bump aws-actions/configure-aws-credentials (#1751) Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 4.2.0 to 4.2.1. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md) - [Commits](aws-actions/configure-aws-credentials@f24d719...b475783) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-version: 4.2.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * ci: Bedrock - improve worfklow; skip tests from CI (#1773) * feat: OllamaChatGenerator - add Toolset support (#1765) * Add Toolset support to OllamaChatGenerator * Lint * Lambdas are not serializable * Lint * Generate tool call id if not available * Lint * Revert back to not using ToolCall id * Lint * Update changelog for integrations/ollama (#1775) Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com> * feat: MCPTool and MCPToolset async resource management improvements (#1758) * Add MCPClientSessionManager to connect/close mcp clients * Update and refactor mcp tests * More descriptive connection error raising * Proper test cleanup * Testing CI windows * linting * Improve connection error raise * PR feedback * Proper naming, and more precise cleanup sequence --------- Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com> * test: add service_tier to test_convert_anthropic_chunk_to_streaming_chunk (#1778) * fix: Bring Mistral integration up to date with changes made to OpenAIChatGenerator and OpenAI Embedders (#1774) * Bringing Mistral up to date * Fix Mistral Embedders to be deserializable * Fix lint * Fix lint * Bump minimum haystack version * Update changelog for integrations/mistral (#1781) Co-authored-by: sjrl <10526848+sjrl@users.noreply.github.com> * feat: Add `to_dict` to `STACKITDocumentEmbedder` and `STACKITTextEmbedder` and more init parameters from underlying OpenAI classes (#1779) * Add to_dicts and more tests * Bumpy haystack version * Add changes to chat generator as well * Update changelog for integrations/stackit (#1782) Co-authored-by: sjrl <10526848+sjrl@users.noreply.github.com> * feat: add run_async for CohereChatGenerator (#1689) * CohereChatGenerator async support * Tests and linter fixes * fix * refinements * refactor + tests reorgani | ^^^^^ T201 * rename test * remove markers * reformat * fix * minor fixes * Trigger CI --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com> * Update changelog for integrations/cohere (#1784) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * docs: update changelog for integrations/google_ai (#1812) * Update changelog for integrations/google_ai * Update CHANGELOG.md --------- Co-authored-by: wochinge <7667273+wochinge@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fix: Fix exposing Qdrant api-key in `metadata` field when running `to_dict` (#1813) * Add to_dict test * Add more type hints * More type hints * Add fix for exposing api key in metadata when running to_dict * Add unit test * PR comments * Update changelog for integrations/qdrant (#1814) Co-authored-by: sjrl <10526848+sjrl@users.noreply.github.com> * ci: check lowest direct dependencies (#1788) * ci: check lowest direct dependencies * try single quotes * debug * debugging * try chroma * no bedrock * retry * explicit option * don't run tests * debug 1 * try output file * more * no deepeval --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> * build: add pins for Anthropic (#1811) * build: add pins for Anthropic * rm file incorrectly added * Update changelog for integrations/anthropic (#1815) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add pins for Vertex (#1810) * Update changelog for integrations/google_vertex (#1816) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add pins for Cohere (#1817) * Update changelog for integrations/cohere (#1829) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: remove pin for Deepeval (#1826) * Update changelog for integrations/deepeval (#1830) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * feat: Add streamable-http transport MCP support (#1777) * Add streamable-http transport * Improve error message for tool invocation * Add streamable MCPTool example, update examples * Improve examples * Add unit tests * Update integrations/mcp/examples/mcp_client.py Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * initialize vars outside try block * Small fix * Fix linting --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Update changelog for integrations/mcp (#1831) Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com> * build: pining lower versions of haystack and `aiohttp` for `ElasticSearch` (#1827) * pining lower versions * adding missing comma * pinning to >=2.4.0 * pinning to >=2.3.0 * pinning aiohttp to >=3.0.0 * pinning aiohttp to >=2.0.0 * pinning aiohttp to >=2.5.0 * pinning aiohttp to >=2.6.0 * pinning aiohttp to >=3.0.0 * pinning aiohttp to >=3.1.0 * pinning aiohttp to >=3.2.0 * pinning aiohttp to >=3.3.0 * pinning aiohttp to >=3.10.0 * pinning aiohttp to >=3.9.0 * pinning aiohttp to >=3.8.0 * reverting back aiohttp to 3.9.0 * Update changelog for integrations/elasticsearch (#1834) Co-authored-by: davidsbatista <7937824+davidsbatista@users.noreply.github.com> * build: add Jina pins (#1836) * Update changelog for integrations/jina (#1838) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add Langfuse pins (#1837) * Update changelog for integrations/langfuse (#1839) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: pin version for `pymongo` and `haystack` in MongoDB integration (#1832) * pinning to older version o haystack and mongodb * pinining haystack and pymongo * wip * fixing format * adding missing CI job * making sure lowest version of pymongo has the async client * making sure lowest version of pymongo has the async client * versioning * haysack 2.9 * haysack 2.10 * haysack 2.11 * Remove failing test. No need to have it here since it's already tested in haystack main. (#1842) * ci: Missing labels for stackit and anthropic (#1844) * Missing labels for stackit and anthropic * PR comments * build: app pins for MCP (#1845) * Update changelog for integrations/mongodb_atlas (#1840) Co-authored-by: davidsbatista <7937824+davidsbatista@users.noreply.github.com> Co-authored-by: David S. Batista <dsbatista@gmail.com> * docs: update changelog for integrations/mcp (#1848) * Update changelog for integrations/mcp * Update CHANGELOG.md --------- Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * build: add pins for Pgvector (#1849) * Update changelog for integrations/pgvector (#1850) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add pins for Optimum (#1847) * build: add pins for Optimum * try with python 3.13 * don't call HF on unit tests * Update changelog for integrations/optimum (#1852) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add pins for Qdrant (#1853) * build: add pins for Pinecone (#1851) Co-authored-by: David S. Batista <dsbatista@gmail.com> * Update changelog for integrations/pinecone (#1855) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * docs: update changelog for integrations/qdrant (#1856) * Update changelog for integrations/qdrant * Update CHANGELOG.md --------- Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * chore: review license compliance workflow (#1843) * chore: review license compliance workflow * refactor * deepeval * build: add pins for Ragas (#1854) * feat: Add GitHub integration with components, tools, and prompts (#1637) * add agent_prompts and github_components * rename to github_haystack * remove github-haystack * renamed integration, added components dir * add tests, pydoc, update pyproject.toml * add workflow * fmt * fmt * lint * ruff * fmt * lint:all * replace StrEnum for py 3.9+ compatibility * move files * fix tests * lint * fix pydoc and extend init files * Add integration:github to labeler.yml * unify how we set GITHUB_TOKEN in tests * fix 3 usage examples. 3 remaining * remove empty lines from prompts * GitHub capitalization * add license header * all caps for prompts * add GitHubFileEditorTool * enforce kwargs instead of positional args * use _get_request_headers and base_headers consistently * lint * rename GitHubRepositoryViewer to GitHubRepoViewer * lint * add pipeline serialization test * extend pipeline to_dict test * set default branch of repo viewer * lint * add four more tools * lint * rename prompts * add tests for four more tools * rename context prompt * add outputs_to_state as param to GitHubFileEditorTool * add outputs_to_state as param to GitHubRepoViewerTool * set default outputs_to_state for GitHubRepoViewerTool * extract serialize_handlers to utils; don't use mutable defaults * replace init_parameters with data for serde in FileEditor, RepoViewer * add outputs_to_state to GitHubIssueCommenterTool; replace init_parameters with data * add outputs_to_state to GitHubIssueViewerTool; replace init_parameters with data * add outputs_to_state to GitHubPRCreatorTool; replace init_parameters with data * move param docstrings to init methods * use generate_qualified_class_name instead of hardcoded name * test with lowest supported version * don't test http_client_kwargs for compatibility with Haystack 2.12 * build: pinning `unstructured` to lowest working versions (#1841) * finding lowest working versions * adding missing CI job * adding missing limitation * feat: AnthropicChatGenerator - add Toolset support (#1787) * AnthropicChatGenerator - add Toolset support * Use new serialization method for tools * Update haystack dep to 2.13.1 which includes Toolset * Small update * buid: add pins for Snowflake + small refactoring (#1860) * Update changelog for integrations/snowflake (#1862) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * Update changelog for integrations/ragas (#1857) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * Update changelog for integrations/unstructured (#1861) Co-authored-by: davidsbatista <7937824+davidsbatista@users.noreply.github.com> * build: add pins for Nvidia (#1846) * Update changelog for integrations/nvidia (#1863) Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> * build: add pins for Google AI (#1828) * docs: update changelog for integrations/google_ai (#1864) * Update changelog for integrations/google_ai * Update CHANGELOG.md --------- Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update changelog for integrations/anthropic (#1865) Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com> * docs: Update changelog for integrations/github (#1858) Co-authored-by: julian-risch <4181769+julian-risch@users.noreply.github.com> * feat: adding an `HybridRetriever` as a `Supercomponent` having `OpenSearch` as the document store (#1701) * adding tests * linting and typing * adding env variable * env variable * extending docstring * removing generation part * updating tests * adding a run test with mocked sentence_transformers * fixing format * refactor: use `component_to_dict` in OpenSearchHybridRetriever (#1866) * Update changelog for integrations/opensearch (#1867) Co-authored-by: davidsbatista <7937824+davidsbatista@users.noreply.github.com> * oshr-docs (#1868) * refactor: OpenSearchHybridRetriever use `deserialize_chatgenerator_inplace` (#1870) * test to use deserialize_chatgenerator_inplace * removing unused imports * using deserialize_chatgenerator_inplace * Update integrations/opensearch/src/haystack_integrations/components/retrievers/opensearch/open_search_hybrid_retriever.py * Update changelog for integrations/opensearch (#1874) Co-authored-by: davidsbatista <7937824+davidsbatista@users.noreply.github.com> * feat: add run_async support for CohereTextEmbedder (#1873) * feat: add run_async support for CohereTextEmbedder * fix: review comments * feat: Add Google GenAI GoogleGenAIChatGenerator (#1875) * Initial work * Remove utils * Add async support * Async test issue * Simplify async tests * Linting * Improve comment * Linting * Improve pyproject.toml * Add new google genai integration to workflow * Add labeler * Add pydoc * Pin deps * Pin google-genai dep * Update integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/chat_generator.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Update integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/chat_generator.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * PR feedback * Add system message comment * Leave only minimal working examples in README * Update integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/chat_generator.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/chat_generator.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Linting --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update changelog for integrations/google_genai (#1886) Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com> * feat: Use Langfuse local to_openai_dict_format function to serialize messages (#1885) * Use Langfuse local to_openai_dict_format function to serialize messages * Linting * PR feedback * Add detailed tracing for GoogleGenAIChatGenerator (#1887) * docs: update changelog for integrations/langfuse (#1888) * Update changelog for integrations/langfuse * Update CHANGELOG.md --------- Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com> Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> * try reenabling pinecone tests (#1871) * PR comments * Small updates --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Denis Washington <denis.washington@futurice.com> Co-authored-by: Denis Washington <denis@denisw.de>
1 parent d1925a7 commit 06345b2

File tree

7 files changed

+99
-93
lines changed

7 files changed

+99
-93
lines changed

.github/workflows/azure_ai_search.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ concurrency:
1818
env:
1919
PYTHONUNBUFFERED: "1"
2020
FORCE_COLOR: "1"
21-
AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
22-
AZURE_SEARCH_SERVICE_ENDPOINT: ${{ secrets.AZURE_SEARCH_SERVICE_ENDPOINT }}
21+
AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
22+
AZURE_AI_SEARCH_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_ENDPOINT }}
2323

2424
defaults:
2525
run:

integrations/azure_ai_search/src/haystack_integrations/document_stores/azure_ai_search/document_store.py

Lines changed: 71 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import logging as python_logging
55
import os
66
from datetime import datetime
7-
from typing import Any, ClassVar, Dict, List, Optional
7+
from typing import Any, Dict, List, Optional, Union
88

99
from azure.core.credentials import AzureKeyCredential
1010
from azure.core.exceptions import ClientAuthenticationError, HttpResponseError, ResourceNotFoundError
@@ -85,7 +85,6 @@
8585

8686

8787
class AzureAISearchDocumentStore:
88-
TYPE_MAP: ClassVar[Dict[str, type]] = {"str": str, "int": int, "float": float, "bool": bool, "datetime": datetime}
8988

9089
def __init__(
9190
self,
@@ -94,8 +93,8 @@ def __init__(
9493
azure_endpoint: Secret = Secret.from_env_var("AZURE_AI_SEARCH_ENDPOINT", strict=True), # noqa: B008
9594
index_name: str = "default",
9695
embedding_dimension: int = 768,
97-
metadata_fields: Optional[Dict[str, type]] = None,
98-
vector_search_configuration: VectorSearch = None,
96+
metadata_fields: Optional[Dict[str, Union[SearchField, type]]] = None,
97+
vector_search_configuration: Optional[VectorSearch] = None,
9998
**index_creation_kwargs,
10099
):
101100
"""
@@ -106,10 +105,22 @@ def __init__(
106105
:param api_key: The API key to use for authentication.
107106
:param index_name: Name of index in Azure AI Search, if it doesn't exist it will be created.
108107
:param embedding_dimension: Dimension of the embeddings.
109-
:param metadata_fields: A dictionary of metadata keys and their types to create
110-
additional fields in index schema. As fields in Azure SearchIndex cannot be dynamic,
111-
it is necessary to specify the metadata fields in advance.
112-
(e.g. metadata_fields = {"author": str, "date": datetime})
108+
:param metadata_fields: A dictionary mapping metadata field names to their corresponding field definitions.
109+
Each field can be defined either as:
110+
- A SearchField object to specify detailed field configuration like type, searchability, and filterability
111+
- A Python type (`str`, `bool`, `int`, `float`, or `datetime`) to create a simple filterable field
112+
113+
These fields are automatically added when creating the search index.
114+
Example:
115+
metadata_fields={
116+
"Title": SearchField(
117+
name="Title",
118+
type="Edm.String",
119+
searchable=True,
120+
filterable=True
121+
),
122+
"Pages": int
123+
}
113124
:param vector_search_configuration: Configuration option related to vector search.
114125
Default configuration uses the HNSW algorithm with cosine similarity to handle vector searches.
115126
@@ -139,13 +150,12 @@ def __init__(
139150
self._index_name = index_name
140151
self._embedding_dimension = embedding_dimension
141152
self._dummy_vector = [-10.0] * self._embedding_dimension
142-
self._metadata_fields = metadata_fields
153+
self._metadata_fields = self._normalize_metadata_index_fields(metadata_fields)
143154
self._vector_search_configuration = vector_search_configuration or DEFAULT_VECTOR_SEARCH
144155
self._index_creation_kwargs = index_creation_kwargs
145156

146157
@property
147158
def client(self) -> SearchClient:
148-
149159
# resolve secrets for authentication
150160
resolved_endpoint = (
151161
self._azure_endpoint.resolve_value() if isinstance(self._azure_endpoint, Secret) else self._azure_endpoint
@@ -185,6 +195,45 @@ def client(self) -> SearchClient:
185195

186196
return self._client
187197

198+
def _normalize_metadata_index_fields(
199+
self, metadata_fields: Optional[Dict[str, Union[SearchField, type]]]
200+
) -> Dict[str, SearchField]:
201+
"""Create a list of index fields for storing metadata values."""
202+
203+
if not metadata_fields:
204+
return {}
205+
206+
normalized_fields = {}
207+
208+
for key, value in metadata_fields.items():
209+
if isinstance(value, SearchField):
210+
if value.name == key:
211+
normalized_fields[key] = value
212+
else:
213+
msg = f"Name of SearchField ('{value.name}') must match metadata field name ('{key}')"
214+
raise ValueError(msg)
215+
else:
216+
if not key[0].isalpha():
217+
msg = (
218+
f"Azure Search index only allows field names starting with letters. "
219+
f"Invalid key: {key} will be dropped."
220+
)
221+
logger.warning(msg)
222+
continue
223+
224+
field_type = type_mapping.get(value)
225+
if not field_type:
226+
error_message = f"Unsupported field type for key '{key}': {value}"
227+
raise ValueError(error_message)
228+
229+
normalized_fields[key] = SimpleField(
230+
name=key,
231+
type=field_type,
232+
filterable=True,
233+
)
234+
235+
return normalized_fields
236+
188237
def _create_index(self) -> None:
189238
"""
190239
Internally creates a new search index.
@@ -205,29 +254,18 @@ def _create_index(self) -> None:
205254
]
206255

207256
if self._metadata_fields:
208-
default_fields.extend(self._create_metadata_index_fields(self._metadata_fields))
257+
default_fields.extend(self._metadata_fields.values())
258+
209259
index = SearchIndex(
210260
name=self._index_name,
211261
fields=default_fields,
212262
vector_search=self._vector_search_configuration,
213263
**self._index_creation_kwargs,
214264
)
265+
215266
if self._index_client:
216267
self._index_client.create_index(index)
217268

218-
@classmethod
219-
def _deserialize_metadata_fields(cls, fields: Optional[Dict[str, str]]) -> Optional[Dict[str, type]]:
220-
"""Convert string representations back to type objects."""
221-
if not fields:
222-
return None
223-
try:
224-
# Use the class-level TYPE_MAP for conversion.
225-
ans = {key: cls.TYPE_MAP[value] for key, value in fields.items()}
226-
return ans
227-
except KeyError as e:
228-
msg = f"Unsupported type encountered in metadata_fields: {e}"
229-
raise ValueError(msg) from e
230-
231269
@staticmethod
232270
def _serialize_index_creation_kwargs(index_creation_kwargs: Dict[str, Any]) -> Dict[str, Any]:
233271
"""
@@ -265,28 +303,19 @@ def _deserialize_index_creation_kwargs(cls, data: Dict[str, Any]) -> Any:
265303
return result[key]
266304

267305
def to_dict(self) -> Dict[str, Any]:
268-
# This is not the best solution to serialise this class but is the fastest to implement.
269-
# Not all kwargs types can be serialised to text so this can fail. We must serialise each
270-
# type explicitly to handle this properly.
271306
"""
272307
Serializes the component to a dictionary.
273308
274309
:returns:
275310
Dictionary with serialized data.
276311
"""
277-
278-
if self._metadata_fields:
279-
serialized_metadata = {key: value.__name__ for key, value in self._metadata_fields.items()}
280-
else:
281-
serialized_metadata = None
282-
283312
return default_to_dict(
284313
self,
285314
azure_endpoint=self._azure_endpoint.to_dict() if self._azure_endpoint else None,
286315
api_key=self._api_key.to_dict() if self._api_key else None,
287316
index_name=self._index_name,
288317
embedding_dimension=self._embedding_dimension,
289-
metadata_fields=serialized_metadata,
318+
metadata_fields={key: value.as_dict() for key, value in self._metadata_fields.items()},
290319
vector_search_configuration=self._vector_search_configuration.as_dict(),
291320
**self._serialize_index_creation_kwargs(self._index_creation_kwargs),
292321
)
@@ -303,7 +332,11 @@ def from_dict(cls, data: Dict[str, Any]) -> "AzureAISearchDocumentStore":
303332
Deserialized component.
304333
"""
305334
if (fields := data["init_parameters"]["metadata_fields"]) is not None:
306-
data["init_parameters"]["metadata_fields"] = cls._deserialize_metadata_fields(fields)
335+
data["init_parameters"]["metadata_fields"] = {
336+
key: SearchField.from_dict(field) for key, field in fields.items()
337+
}
338+
else:
339+
data["init_parameters"]["metadata_fields"] = {}
307340

308341
for key, _value in AZURE_CLASS_MAPPING.items():
309342
if key in data["init_parameters"]:
@@ -461,46 +494,12 @@ def _convert_haystack_document_to_azure(self, document: Document) -> Dict[str, A
461494

462495
return index_document
463496

464-
def _create_metadata_index_fields(self, metadata: Dict[str, Any]) -> List[SimpleField]:
465-
"""Create a list of index fields for storing metadata values."""
466-
467-
index_fields = []
468-
metadata_field_mapping = self._map_metadata_field_types(metadata)
469-
470-
for key, field_type in metadata_field_mapping.items():
471-
index_fields.append(SimpleField(name=key, type=field_type, filterable=True))
472-
473-
return index_fields
474-
475-
def _map_metadata_field_types(self, metadata: Dict[str, type]) -> Dict[str, str]:
476-
"""Map metadata field types to Azure Search field types."""
477-
478-
metadata_field_mapping = {}
479-
480-
for key, value_type in metadata.items():
481-
482-
if not key[0].isalpha():
483-
msg = (
484-
f"Azure Search index only allows field names starting with letters. "
485-
f"Invalid key: {key} will be dropped."
486-
)
487-
logger.warning(msg)
488-
continue
489-
490-
field_type = type_mapping.get(value_type)
491-
if not field_type:
492-
error_message = f"Unsupported field type for key '{key}': {value_type}"
493-
raise ValueError(error_message)
494-
metadata_field_mapping[key] = field_type
495-
496-
return metadata_field_mapping
497-
498497
def _embedding_retrieval(
499498
self,
500499
query_embedding: List[float],
501500
*,
502501
top_k: int = 10,
503-
filters: Optional[Dict[str, Any]] = None,
502+
filters: Optional[str] = None,
504503
**kwargs,
505504
) -> List[Document]:
506505
"""
@@ -534,7 +533,7 @@ def _bm25_retrieval(
534533
self,
535534
query: str,
536535
top_k: int = 10,
537-
filters: Optional[Dict[str, Any]] = None,
536+
filters: Optional[str] = None,
538537
**kwargs,
539538
) -> List[Document]:
540539
"""
@@ -567,7 +566,7 @@ def _hybrid_retrieval(
567566
query: str,
568567
query_embedding: List[float],
569568
top_k: int = 10,
570-
filters: Optional[Dict[str, Any]] = None,
569+
filters: Optional[str] = None,
571570
**kwargs,
572571
) -> List[Document]:
573572
"""

integrations/azure_ai_search/tests/conftest.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ def document_store(request):
2929
index_name = f"haystack_test_{uuid.uuid4().hex}"
3030
metadata_fields = getattr(request, "param", {}).get("metadata_fields", None)
3131

32-
azure_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
33-
api_key = os.environ["AZURE_SEARCH_API_KEY"]
32+
azure_endpoint = os.environ["AZURE_AI_SEARCH_ENDPOINT"]
33+
api_key = os.environ["AZURE_AI_SEARCH_API_KEY"]
3434

3535
client = SearchIndexClient(azure_endpoint, AzureKeyCredential(api_key))
3636
if index_name in client.list_index_names():

integrations/azure_ai_search/tests/test_bm25_retriever.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def test_to_dict():
4747
"api_key": {"type": "env_var", "env_vars": ["AZURE_AI_SEARCH_API_KEY"], "strict": False},
4848
"index_name": "default",
4949
"embedding_dimension": 768,
50-
"metadata_fields": None,
50+
"metadata_fields": {},
5151
"vector_search_configuration": {
5252
"profiles": [
5353
{"name": "default-vector-config", "algorithm_configuration_name": "cosine-algorithm-config"}
@@ -149,8 +149,8 @@ def test_run_time_params():
149149

150150

151151
@pytest.mark.skipif(
152-
not os.environ.get("AZURE_SEARCH_SERVICE_ENDPOINT", None) and not os.environ.get("AZURE_SEARCH_API_KEY", None),
153-
reason="Missing AZURE_SEARCH_SERVICE_ENDPOINT or AZURE_SEARCH_API_KEY.",
152+
not os.environ.get("AZURE_AI_SEARCH_ENDPOINT", None) and not os.environ.get("AZURE_AI_SEARCH_API_KEY", None),
153+
reason="Missing AZURE_AI_SEARCH_ENDPOINT or AZURE_AI_SEARCH_API_KEY.",
154154
)
155155
@pytest.mark.integration
156156
class TestRetriever:

0 commit comments

Comments
 (0)