Skip to content

Commit b1b8fd2

Browse files
committed
fix error links
Signed-off-by: liyun <leryn.li@zilliz.com>
1 parent fec96e7 commit b1b8fd2

File tree

9 files changed

+181
-6
lines changed

9 files changed

+181
-6
lines changed

assets/inverted.png

Lines changed: 1 addition & 0 deletions
Loading

scripts/config.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,11 @@
344344
"token": "FwO1bZlbBogz0SxsXg5cE4WYnxb",
345345
"type": "image",
346346
"alt_text": "gpu-index-performance"
347+
},
348+
{
349+
"token": "G5VxbkoZLowCcLxEtfmcQl2Yn6d",
350+
"type": "image",
351+
"alt_text": "inverted"
347352
}
348353
]
349354
}

site/en/about/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Milvus supports various types of search functions to meet the demands of differe
7979
- [Range Search](single-vector-search.md#Range-search): Finds vectors within a specified radius from your query vector.
8080
- [Hybrid Search](multi-vector-search.md): Conducts ANN search based on multiple vector fields.
8181
- [Full Text Search](full-text-search.md): Full text search based on BM25.
82-
- [Reranking](reranking.md): Adjusts the order of search results based on additional criteria or a secondary algorithm, refining the initial ANN search results.
82+
- [Reranking](weighted-ranker.md): Adjusts the order of search results based on additional criteria or a secondary algorithm, refining the initial ANN search results.
8383
- [Fetch](get-and-scalar-query.md#Get-Entities-by-ID): Retrieves data by their primary keys.
8484
- [Query](get-and-scalar-query.md#Use-Basic-Operators): Retrieves data using specific expressions.
8585

site/en/integrations/langchain/milvus_hybrid_search_retriever.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ Please keep the order of list of index params consistent with the order of `vect
291291
</div>
292292

293293
### Rerank the candidates
294-
After the first stage of retrieval, we need to rerank the candidates to get a better result. You can choose [WeightedRanker](https://milvus.io/docs/reranking.md#Weighted-Scoring-WeightedRanker) or [RRFRanker](https://milvus.io/docs/reranking.md#Reciprocal-Rank-Fusion-RRFRanker) depending on your requirements. You can refer to the [Reranking](https://milvus.io/docs/reranking.md#Reranking) for more information.
294+
After the first stage of retrieval, we need to rerank the candidates to get a better result. You can choose [WeightedRanker](https://milvus.io/docs/weighted-ranker.md#Weighted-Scoring-WeightedRanker) or [RRFRanker](https://milvus.io/docs/weighted-ranker.md#Reciprocal-Rank-Fusion-RRFRanker) depending on your requirements. You can refer to the [Reranking](https://milvus.io/docs/weighted-ranker.md#Reranking) for more information.
295295

296296
Here is an example for weighted reranking:
297297

site/en/integrations/llamaindex_milvus_full_text_search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ This approach stores documents in a Milvus collection with both vector fields:
233233
- `embedding`: Dense embeddings generated by OpenAI embedding model for semantic search.
234234
- `sparse_embedding`: Sparse embeddings computed using BM25BuiltInFunction for full-text search.
235235

236-
In addition, we have applied a reranking strategy using "RRFRanker" with its default parameters. To customize reranker, you are able to configure `hybrid_ranker` and `hybrid_ranker_params` following the [Milvus Reranking Guide](https://milvus.io/docs/reranking.md).
236+
In addition, we have applied a reranking strategy using "RRFRanker" with its default parameters. To customize reranker, you are able to configure `hybrid_ranker` and `hybrid_ranker_params` following the [Milvus Reranking Guide](https://milvus.io/docs/weighted-ranker.md).
237237

238238
Now, let's test the RAG system with a sample query:
239239

site/en/integrations/llamaindex_milvus_hybrid_search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ class ExampleEmbeddingFunction(BaseSparseEmbeddingFunction):
294294

295295
## Customize hybrid reranker
296296

297-
Milvus supports two types of [reranking strategies](https://milvus.io/docs/reranking.md): Reciprocal Rank Fusion (RRF) and Weighted Scoring. The default ranker in `MilvusVectorStore` hybrid search is RRF with k=60. To customize the hybrid ranker, modify the following parameters:
297+
Milvus supports two types of [reranking strategies](https://milvus.io/docs/weighted-ranker.md): Reciprocal Rank Fusion (RRF) and Weighted Scoring. The default ranker in `MilvusVectorStore` hybrid search is RRF with k=60. To customize the hybrid ranker, modify the following parameters:
298298

299299
- `hybrid_ranker (str)`: Specifies the type of ranker used in hybrid search queries. Currently only supports ["RRFRanker", "WeightedRanker"]. Defaults to "RRFRanker".
300300
- `hybrid_ranker_params (dict, optional)`: Configuration parameters for the hybrid ranker. The structure of this dictionary depends on the specific ranker being used:
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
id: inverted.md
3+
title: "INVERTED"
4+
summary: "The INVERTED index in Milvus is designed to accelerate filter queries on both scalar fields and structured JSON fields. By mapping terms to the documents or records that contain them, inverted indexes greatly improve query performance compared to brute-force searches."
5+
---
6+
7+
# INVERTED
8+
9+
The `INVERTED` index in Milvus is designed to accelerate filter queries on both scalar fields and structured JSON fields. By mapping terms to the documents or records that contain them, inverted indexes greatly improve query performance compared to brute-force searches.
10+
11+
## Overview
12+
13+
Powered by [Tantivy](https://github.com/quickwit-oss/tantivy), Milvus implements inverted indexing to accelerate filter queries, especially for textual data. Here’s how it works:
14+
15+
1. **Tokenize the Data**: Milvus takes your raw data—in this example, two sentences:
16+
17+
- **"Milvus is a cloud-native vector database."**
18+
19+
- **"Milvus is very good at performance."**
20+
21+
and breaks them into unique words (e.g., *Milvus*, *is*, *cloud-native*, *vector*, *database*, *very*, *good*, *at*, *performance*).
22+
23+
1. **Build the Term Dictionary**: These unique words are stored in a sorted list called the **Term Dictionary**. This dictionary lets Milvus quickly check if a word exists and locate its position in the index.
24+
25+
1. **Create the Inverted List**: For each word in the Term Dictionary, Milvus keeps an **Inverted List** showing which documents contain that word. For instance, **"Milvus"** appears in both sentences, so its inverted list points to both document IDs.
26+
27+
![Inverted](../../../../../assets/inverted.png)
28+
29+
Because the dictionary is sorted, term-based filtering can be handled efficiently. Instead of scanning all documents, Milvus just looks up the term in the dictionary and retrieves its inverted list—significantly speeding up searches and filters on large datasets.
30+
31+
## Index a regular scalar field
32+
33+
For scalar fields like **BOOL**, **INT8**, **INT16**, **INT32**, **INT64**, **FLOAT**, **DOUBLE**, **VARCHAR**, and **ARRAY**, creating an inverted index is straightforward. Use the `create_index()` method with the `index_type` parameter set to `"INVERTED"`.
34+
35+
```plaintext
36+
from pymilvus import MilvusClient
37+
38+
client = MilvusClient(
39+
uri="http://localhost:19530",
40+
)
41+
42+
index_params = client.create_index_params() # Prepare an empty IndexParams object, without having to specify any index parameters
43+
index_params.add_index(
44+
field_name="scalar_field_1", # Name of the scalar field to be indexed
45+
index_type="INVERTED", # Type of index to be created
46+
index_name="inverted_index" # Name of the index to be created
47+
)
48+
49+
client.create_index(
50+
collection_name="my_collection", # Specify the collection name
51+
index_params=index_params
52+
)
53+
```
54+
55+
## Index a JSON field
56+
57+
Milvus extends its indexing capabilities to JSON fields, allowing you to efficiently filter on nested or structured data stored within a single column. Unlike scalar fields, when indexing a JSON field you must provide two additional parameters:
58+
59+
- `json_path`**:** Specifies the nested key to index.
60+
61+
- `json_cast_type`**:** Defines the data type (e.g., `"varchar"`, `"double"`, or `"bool"`) to which the extracted JSON value will be cast.
62+
63+
For example, consider a JSON field named `metadata` with the following structure:
64+
65+
```plaintext
66+
{
67+
"metadata": {
68+
"product_info": {
69+
"category": "electronics",
70+
"brand": "BrandA"
71+
},
72+
"price": 99.99,
73+
"in_stock": true,
74+
"tags": ["summer_sale", "clearance"]
75+
}
76+
}
77+
```
78+
79+
To create inverted indexes on specific JSON paths, you can use the following approach:
80+
81+
```python
82+
index_params = client.prepare_index_params()
83+
84+
# Example 1: Index the 'category' key inside 'product_info' as a string.
85+
index_params.add_index(
86+
field_name="metadata", # JSON field name
87+
index_type="INVERTED", # Specify the inverted index type
88+
index_name="json_index_1", # Custom name for this JSON index
89+
params={
90+
"json_path": "metadata[\"product_info\"][\"category\"]", # Path to the 'category' key
91+
"json_cast_type": "varchar" # Cast the value as a string
92+
}
93+
)
94+
95+
# Example 2: Index the 'price' key as a numeric type (double).
96+
index_params.add_index(
97+
field_name="metadata", # JSON field name
98+
index_type="INVERTED",
99+
index_name="json_index_2", # Custom name for this JSON index
100+
params={
101+
"json_path": "metadata[\"price\"]", # Path to the 'price' key
102+
"json_cast_type": "double" # Cast the value as a double
103+
}
104+
)
105+
106+
```
107+
108+
<table>
109+
<tr>
110+
<th><p>Parameter</p></th>
111+
<th><p>Description</p></th>
112+
<th><p>Example Value</p></th>
113+
</tr>
114+
<tr>
115+
<td><p><code>field_name</code></p></td>
116+
<td><p>Name of the JSON field in your schema.</p></td>
117+
<td><p><code>"metadata"</code></p></td>
118+
</tr>
119+
<tr>
120+
<td><p><code>index_type</code></p></td>
121+
<td><p>Index type to create; currently only <code>INVERTED</code> is supported for JSON path indexing.</p></td>
122+
<td><p><code>"INVERTED"</code></p></td>
123+
</tr>
124+
<tr>
125+
<td><p><code>index_name</code></p></td>
126+
<td><p>(Optional) A custom index name. Specify different names if you create multiple indexes on the same JSON field.</p></td>
127+
<td><p><code>"json_index_1"</code></p></td>
128+
</tr>
129+
<tr>
130+
<td><p><code>params.json_path</code></p></td>
131+
<td><p>Specifies which JSON path to index. You can target nested keys, array positions, or both (e.g., <code>metadata["product_info"]["category"]</code> or <code>metadata["tags"][0]</code>).
132+
If the path is missing or the array element does not exist for a particular row, that row is simply skipped during indexing, and no error is thrown.</p></td>
133+
<td><p><code>"metadata[\"product_info\"][\"category\"]"</code></p></td>
134+
</tr>
135+
<tr>
136+
<td><p><code>params.json_cast_type</code></p></td>
137+
<td><p>Data type that Milvus will cast the extracted JSON values to when building the index. Valid values:</p>
138+
<ul>
139+
<li><p><code>"bool"</code> or <code>"BOOL"</code></p></li>
140+
<li><p><code>"double"</code> or <code>"DOUBLE"</code></p></li>
141+
<li><p><code>"varchar"</code> or <code>"VARCHAR"</code></p>
142+
<p><strong>Note</strong>: For integer values, Milvus internally uses double for the index. Large integers above 2^53 lose precision. If the cast fails (due to type mismatch), no error is thrown, and that row’s value is not indexed.</p></li>
143+
</ul></td>
144+
<td><p><code>"varchar"</code></p></td>
145+
</tr>
146+
</table>
147+
148+
## Considerations on JSON indexing
149+
150+
- **Filtering logic**:
151+
152+
- If you **create a double-type index** (`json_cast_type="double"`), only numeric-type filter conditions can use the index. If the filter compares a double index to a non-numeric condition, Milvus falls back to brute force search.
153+
154+
- If you **create a varchar-type index** (`json_cast_type="varchar"`), only string-type filter conditions can use the index. Otherwise, Milvus falls back to brute force.
155+
156+
- **Boolean** indexing behaves similarly to varchar-type.
157+
158+
- **Term expressions**:
159+
160+
- You can use `json["field"] in [value1, value2, …]`. However, the index works only for scalar values stored under that path. If `json["field"]` is an array, the query falls back to brute force (array-type indexing is not yet supported).
161+
162+
- **Numeric precision**:
163+
164+
- Internally, Milvus indexes all numeric fields as doubles. If a numeric value exceeds $2^{53}$, it loses precision, and queries on those out-of-range values may not match exactly.
165+
166+
- **Data integrity**:
167+
168+
- Milvus does not parse or transform JSON keys beyond your specified casting. If the source data is inconsistent (for example, some rows store a string for key `"k"` while others store a number), some rows will not be indexed.
169+

site/en/userGuide/search-query-get/elasticsearch-queries-to-milvus.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,7 @@ In this example, RRF combines results from two retrievers:
475475

476476
Each retriever contributes up to 50 top matches, which are reranked by RRF, and the final top 10 results are returned.
477477

478-
In Milvus, you can achieve a similar hybrid search by combining searches across multiple vector fields, applying a reranking strategy, and retrieving the top-K results from the combined list. Milvus supports both RRF and weighted reranker strategies. For more details, refer to [Reranking](reranking.md).
478+
In Milvus, you can achieve a similar hybrid search by combining searches across multiple vector fields, applying a reranking strategy, and retrieving the top-K results from the combined list. Milvus supports both RRF and weighted reranker strategies. For more details, refer to [Reranking](weighted-ranker.md).
479479

480480
The following is a non-strict equivalence of the above Elasticsearch example in Milvus.
481481

site/en/userGuide/search-query-get/multi-vector-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -689,7 +689,7 @@ To merge and rerank the two sets of ANN search results, it is necessary to selec
689689

690690
- **RRFRanker (Reciprocal Rank Fusion Ranker)**: This strategy is recommended when there is no specific emphasis. The RRF can effectively balance the importance of each vector field.
691691

692-
For more details about the mechanisms of these two reranking strategies, refer to [Reranking](reranking.md).
692+
For more details about the mechanisms of these two reranking strategies, refer to [Reranking](weighted-ranker.md).
693693

694694
The following two examples demonstrate how to use the WeightedRanker and RRFRanker reranking strategies:
695695

0 commit comments

Comments
 (0)