Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when searching affiliations: too_many_nested_clauses #459

Open
ntarocco opened this issue Feb 3, 2025 · 1 comment · May be fixed by #460 or inveniosoftware/invenio-records-resources#608
Open

Comments

@ntarocco
Copy link
Contributor

ntarocco commented Feb 3, 2025

TransportError(500, 'search_phase_execution_exception', 'too_many_nested_clauses: Query contains too many nested clauses; maxClauseCount is set to 1024')

Request: '/api/affiliations?size=20&suggest=The+Barcelona+Institute+of+Science+and+Technology'

Search params:

{page: 1, size: 20, sort: 'bestmatch', suggest: 'The Barcelona Institute of Science and Technology'}

Partial query:

{"query":{"bool":{"filter":[{"bool":{"must_not":[{"terms":{"tags":["unlisted"]}}]}}],"should":[{"multi_match":{"query":"The Barcelona Institute of Science and Technology","fields":["acronym.keyword","acronym","name","aliases","title.en","id","identifiers.identifier","country","country_name","types"],"operator":"and","type":"cross_fields","boost":3}},{"multi_match":{"query":"The Barcelona Institute of Science and Technology","fields":["acronym.keyword^50","acronym^10","name^10","aliases^5","title.en^2"
{error: {failed_shards: [{"index":"'cds-rdm-prod-affiliations-affiliation-v2.0.0-1728558302'","node":"'hGnt4dq-T66FzzECMrjZNg'","reason":"{'type': 'too_many_nested_clauses', 'reason': 'too_many_nested_clauses: Query contains too many nested clauses; maxClauseCount is set to 1024'}","shard":"0"}], grouped: True, phase: 'query', reason: 'all shards failed', root_cause: [{"reason":"'too_many_nested_clauses: Query contains too many nested clauses; maxClauseCount is set to 1024'","type":"'too_many_nested_clauses'"}], type: 'search_phase_execution_exception'}}

this Sentry excp

@ntarocco ntarocco converted this from a draft issue Feb 3, 2025
@ntarocco ntarocco moved this from Ready to In progress in Sprint Q1/2025 Feb 5, 2025
@sakshamarora1
Copy link
Contributor

This is happening because we are using wildcard search on the title subfields. Also, the recent changes increase the number of clauses. Currently, we have 126 languages in cds-rdm which means each new word in the search query increases the clauses because of the n-gram analyzer.

Also, bit weird but I did some trial and error on what works and what doesn't:

The In of Sci and te

Even this query gives an Internal Server Error, while other queries of similar length and words work fine. Something to investigate into further...

@sakshamarora1 sakshamarora1 moved this from In progress to In review 🔍 in Sprint Q1/2025 Feb 10, 2025
@sakshamarora1 sakshamarora1 removed their assignment Feb 12, 2025
@kpsherva kpsherva moved this from In review 🔍 to In progress in Sprint Q1/2025 Feb 17, 2025
@sakshamarora1 sakshamarora1 removed their assignment Feb 17, 2025
@sakshamarora1 sakshamarora1 moved this from In progress to In review 🔍 in Sprint Q1/2025 Feb 17, 2025
@jrcastro2 jrcastro2 moved this from In review 🔍 to To release 🤖 in Sprint Q1/2025 Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To release 🤖
2 participants