Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: improve with CompositeSuggestQueryParser #151

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ptamarit
Copy link
Member

@ptamarit ptamarit commented Dec 9, 2024

❤️ Thank you for your contribution!

Description

  • Fixes Improve users search results #150 (see details in description)
  • Commits
    1. The first commits makes the test assertions more precise
    2. The second commit uses CompositeSuggestQueryParser to improve search results

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@ptamarit ptamarit changed the title 150 user search improve search: improve with CompositeSuggestQueryParser Dec 9, 2024
@ptamarit ptamarit force-pushed the 150-user-search-improve branch from ee9e9e7 to eb1fcfa Compare December 9, 2024 08:22
@@ -81,7 +86,6 @@ class UserSearchOptions(SearchOptions, SearchOptionsMixin):
"name": "profile.full_name",
},
type="most_fields", # https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types
fuzziness="AUTO", # https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzziness is applied in CompositeSuggestQueryParser.

tree_transformer_cls=SearchFieldTransformer,
fields=["username^2", "email^2", "profile.full_name^3", "profile.affiliations"],
fields=[
"username^2",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
Searching for a username with a dash is still not working well.
For instance, searching from "one-two" seems to search for usernames starting with "one" and starting with "two", and therefore does not find anything.

Copy link
Contributor

@kpsherva kpsherva Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the issue: text field is split by - https://github.com/inveniosoftware/invenio-users-resources/blob/master/invenio_users_resources/records/mappings/os-v2/users/user-v3.0.0.json#L132
if you search by username.keyword, it should work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot @kpsherva !
2 months later 😅, I confirm that this fixes the issue.

I modified the tests to include a username with a dash, and I added a test showing that usernames with dashes can now be searched properly.

Copy link
Member

@zzacharo zzacharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a long standing PR. Can we maybe have a summary of the improvements or maybe a small demo?

@ptamarit ptamarit force-pushed the 150-user-search-improve branch from eb1fcfa to 87d182e Compare February 21, 2025 14:07
@ptamarit
Copy link
Member Author

This seems like a long standing PR. Can we maybe have a summary of the improvements or maybe a small demo?

@zzacharo Sorry for the long wait. I finally found time to get back to it, and it's now ready for prime time.

What is described in the linked issue is a good summary of the problems:

Searching for a person with a common given name and a common family name includes every user with either one of the names and the ranking does not help.
Searching for users with a dash in their usernames behaves the same way.

Happy to give a demo, but the tests are meant to show what changed in this pull request:

  1. The test modification in the 2nd commit shows that: before, searching for 2 words was including all the users matching at least 1 of the 2 words. Now, searching for 2 words only includes users matching both words. This means that if a user has a common name, adding more words should help restrict the results and find the proper result.
    image
  2. The test modification in the 4th commit shows that: before, searching for pub-res would not return the user with the username pub-res. Now, it works as expected.
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve users search results
4 participants