Skip to content

Simplified RRF Retriever #129659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Jun 18, 2025

Adds a simplified syntax for the rrf retriever:

GET my-index/_search
{
  "retriever": {
    "rrf": {
      "fields": ["field_1", "field_2"],
      "query": "my awesome query"
    }
  }
}

fields is optional. If it is not provided, we query the fields defined by the index.query.default_field index setting (which is * by default).

This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via match) and semantic_text fields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final ranking. This is achieved by creating a retriever tree that looks like:

rrf
   multi_match on lexical fields
   rrf
     match on semantic_text field A
     match on semantic_text field B
     match on semantic_text field C

This is a sibling of the simplified linear retriever, which was added in #129200.

@Mikep86 Mikep86 requested review from pmpailis, jimczi and kderusso June 18, 2025 19:41
@Mikep86 Mikep86 added >enhancement auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team :Search Relevance/Search Catch all for Search Relevance v8.19.0 v9.1.0 labels Jun 18, 2025
@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team labels Jun 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @Mikep86, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

Comment on lines +223 to +234
// TODO: Refactor duplicate code
// Using the multi-fields query format
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata();
if (localIndicesMetadata.size() > 1) {
throw new IllegalArgumentException(
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices"
);
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) {
throw new IllegalArgumentException(
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices"
);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kderusso I know you requested that we refactor this common code, can we handle that in a follow up along with refactoring the common test code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, given the timing I'm fine with that happening in a followup

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I am OK with the cleanup/consolidation being a followup, a few questions on tests plus we should update the docs. Approving to not block.

- match: { error.root_cause.0.reason: "[rrf] does not support per-field weights in [fields]" }

---
"Can query keyword fields":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add explicit field calls for the text and semantic_text fields too?

Comment on lines +223 to +234
// TODO: Refactor duplicate code
// Using the multi-fields query format
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata();
if (localIndicesMetadata.size() > 1) {
throw new IllegalArgumentException(
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices"
);
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) {
throw new IllegalArgumentException(
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices"
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, given the timing I'm fine with that happening in a followup

+ " }"
+ " }"
+ "}";
String restContent = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we parse this twice, once with retrievers and once with field/query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is only for XContent parsing purposes, the resulting retriever does not need to pass SearchRequest validation

"foo"
);

// Non-default rank window size and rank constant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand that this is testing anything with rank window size or rank constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing that the rank window size and rank constant are propagated to the rewritten retrievers

Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I liked the tests in 310_rrf_retriever_simplified.yml. It made much easier to understand the required changes.

- match: { hits.hits.0._id: "1" }

---
"Can query date fields":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test where the simplified and fully-expanded format yield the same results?

Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants