Improve execution of terms queries over wildcard fields #128986

iverase · 2025-06-05T12:57:19Z

Currently, A term query over a wildcard field is executing creating a BinaryDvConfirmedQuery which contains an approximation query that might contain up to 10 terms queries under a boolean SHOULD plus an automaton. This is a pretty big query by itself. More over a terms query over a wildcard field creates one of those queries per term, so we can easily build a huge query that can use many GiB of heap. This PR implements TermsQuery(Collection<?> values, @Nullable SearchExecutionContext context) to avoid memory pressure.

For small number of terms (< 16), it will approximate each term as we are doing now but it will create just a BinaryDvConfirmedQuery with a boolean query containing the approximation of each term.

For bigger number of terms, we swap to a TermsInSetQuery. We only need one token per term to make sure we are matching all the required documents. Ideally this token should be the one that makes the term most unique, although that's different to compute when generating the query.

Performance test shows a big improvement when most of the terms matches some documents in the index but it behaves (slightly) worst when none of the terms matches. All in all I think it is a good improvement that prevent smemory isues on such a queries.

fixes #128201

elasticsearchmachine · 2025-06-05T12:57:44Z

Hi @iverase, I've created a changelog YAML for you.

elasticsearchmachine · 2025-06-05T12:57:46Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

mayya-sharipova · 2025-06-05T14:02:07Z

@iverase Before merging this PR do we want to add "terms query on a wildcard field" operation in our rally benchmarks (for example, http_logs already has a wildcard field) to measure the improve performance?

iverase · 2025-06-05T14:26:19Z

http_logs is using a match_only_text field instead of a wildcard field?

https://github.com/elastic/rally-tracks/blob/master/http_logs/index.json

mayya-sharipova · 2025-06-05T14:45:38Z

@iverase For runtime schedule there is an indexed wildcard field. I wonder if we can add terms query on that.

mayya-sharipova · 2025-06-05T14:48:55Z

...d/src/internalClusterTest/java/org/elasticsearch/xpack/wildcard/search/WildcardSearchIT.java

+        }
+    }
+
+    public void testTermsQueryDuel() {


How long this test will run in number of terms is 8192? Is the performance acceptable to be included as a regular test?

It is very fast, the full suite takes a couple of seconds.

iverase · 2025-06-05T15:24:46Z

@iverase For runtime schedule there is an indexed wildcard field. I wonder if we can add terms query on that.

I will need to learn how to run this mode .... tracks are becoming so complex,

mayya-sharipova

@iverase Thanks, great optimization.

I am not sure about an approach to take middle toke for a term, but I trust your intuition on this.

Would be nice to have a dedicated benchmark for terms query to see if got speedups.

iverase · 2025-06-05T16:16:39Z

Would be nice to have a dedicated benchmark for terms query to see if got speedups.

In this case is easy to prove we got an improvement. If you run the test I have added with the old code, you will get an OOM. It will till be good to have benchmarks not to get regressions.

elasticsearchmachine · 2025-06-06T12:19:24Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128986

This commit implements TermsQuery(Collection<?> values, @nullable SearchExecutionContext context) in the WildCardFieldMapper to avoid memory pressure when building the query.

Improve execution of terms queries over wildcard fields

2ec2534

iverase requested a review from mayya-sharipova June 5, 2025 12:57

iverase added >bug :Search Relevance/Search Catch all for Search Relevance v8.19.0 v9.1.0 labels Jun 5, 2025

Update docs/changelog/128986.yaml

c7b8d23

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 5, 2025

iverase mentioned this pull request Jun 5, 2025

Improve execution of terms queries over wildcard fields #128826

Closed

Merge branch 'main' into wildCardTermsQuery

6c3d3fe

Merge branch 'main' into wildCardTermsQuery

43987b2

mayya-sharipova reviewed Jun 5, 2025

View reviewed changes

mayya-sharipova approved these changes Jun 5, 2025

View reviewed changes

iverase added 4 commits June 6, 2025 06:17

Merge branch 'main' into wildCardTermsQuery

b482dbc

Merge branch 'main' into wildCardTermsQuery

74d1a62

Merge branch 'main' into wildCardTermsQuery

c3b4b52

Merge branch 'main' into wildCardTermsQuery

b1cc13b

iverase added the auto-backport Automatically create backport pull requests when merged label Jun 6, 2025

iverase merged commit ba6987f into elastic:main Jun 6, 2025
18 checks passed

iverase deleted the wildCardTermsQuery branch June 6, 2025 12:18

elasticsearchmachine added the backport pending label Jun 6, 2025

iverase mentioned this pull request Jun 6, 2025

[8.19] Improve execution of terms queries over wildcard fields #129051

Merged

iverase removed the backport pending label Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve execution of terms queries over wildcard fields #128986

Improve execution of terms queries over wildcard fields #128986

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

mayya-sharipova commented Jun 5, 2025 •

edited

Loading

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

mayya-sharipova commented Jun 5, 2025 •

edited

Loading

Uh oh!

mayya-sharipova Jun 5, 2025

Uh oh!

iverase Jun 5, 2025

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

mayya-sharipova left a comment •

edited

Loading

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 6, 2025

Uh oh!

Uh oh!

Improve execution of terms queries over wildcard fields #128986

Improve execution of terms queries over wildcard fields #128986

Uh oh!

Conversation

iverase commented Jun 5, 2025

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

mayya-sharipova commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

mayya-sharipova commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayya-sharipova Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

mayya-sharipova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase commented Jun 5, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 6, 2025

💔 Backport failed

Uh oh!

Uh oh!

mayya-sharipova commented Jun 5, 2025 •

edited

Loading

mayya-sharipova commented Jun 5, 2025 •

edited

Loading

mayya-sharipova left a comment •

edited

Loading