-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Simplified RRF Retriever #129659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Simplified RRF Retriever #129659
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @Mikep86, I've created a changelog YAML for you. |
Pinging @elastic/search-eng (Team:SearchOrg) |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
// TODO: Refactor duplicate code | ||
// Using the multi-fields query format | ||
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata(); | ||
if (localIndicesMetadata.size() > 1) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices" | ||
); | ||
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kderusso I know you requested that we refactor this common code, can we handle that in a follow up along with refactoring the common test code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, given the timing I'm fine with that happening in a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I am OK with the cleanup/consolidation being a followup, a few questions on tests plus we should update the docs. Approving to not block.
- match: { error.root_cause.0.reason: "[rrf] does not support per-field weights in [fields]" } | ||
|
||
--- | ||
"Can query keyword fields": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add explicit field calls for the text and semantic_text fields too?
// TODO: Refactor duplicate code | ||
// Using the multi-fields query format | ||
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata(); | ||
if (localIndicesMetadata.size() > 1) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices" | ||
); | ||
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, given the timing I'm fine with that happening in a followup
+ " }" | ||
+ " }" | ||
+ "}"; | ||
String restContent = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we parse this twice, once with retrievers and once with field/query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is only for XContent parsing purposes, the resulting retriever does not need to pass SearchRequest
validation
"foo" | ||
); | ||
|
||
// Non-default rank window size and rank constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand that this is testing anything with rank window size or rank constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing that the rank window size and rank constant are propagated to the rewritten retrievers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I liked the tests in 310_rrf_retriever_simplified.yml
. It made much easier to understand the required changes.
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilder.java
Show resolved
Hide resolved
...rrf/src/yamlRestTest/java/org/elasticsearch/xpack/rank/rrf/RRFRankClientYamlTestSuiteIT.java
Show resolved
Hide resolved
- match: { hits.hits.0._id: "1" } | ||
|
||
--- | ||
"Can query date fields": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test where the simplified and fully-expanded format yield the same results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds a simplified syntax for the
rrf
retriever:fields
is optional. If it is not provided, we query the fields defined by theindex.query.default_field
index setting (which is*
by default).This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via
match
) andsemantic_text
fields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final ranking. This is achieved by creating a retriever tree that looks like:This is a sibling of the simplified
linear
retriever, which was added in #129200.