-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MB-60269 - Merge path fixes #202
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TODO Open go-faiss PR to clean up MergeFrom() now, or let it remain in case we use it later? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@metonymic-smokey like we spoke, would you add some backing tests or results at the least on how things looked with this change.
@blevesearch/collaborators must take another look.
Thejas-bhat
approved these changes
Jan 5, 2024
abhinavdangeti
approved these changes
Jan 5, 2024
abhinavdangeti
added a commit
to blevesearch/bleve
that referenced
this pull request
Jan 5, 2024
To include: * 8c668c4 Aditi Ahuja | MB-60269 - Merge path fixes (blevesearch/zapx#202)
abhinavdangeti
added a commit
to blevesearch/bleve
that referenced
this pull request
Jan 5, 2024
To include: * 8c668c4 Aditi Ahuja | MB-60269 - Merge path fixes (blevesearch/zapx#202)
moshaad7
pushed a commit
that referenced
this pull request
Sep 12, 2024
* reconstruction for all index types * correct ordering of vector IDs * remove unused code * fixed commentary
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses 2 merge path issues -
A. IDs need to be added to a vector index in the same order as the vectors to maintain the integrity of the mapping between vectors and IDs which is important for both search and reconstruction.
When reconstructing indexes, the current approach uses the keys of a map to decide the order of the final vector IDs. However, since the Keys() function provides no guarantee of ordering of the returned list, the vectors and IDs are not guaranteed to be inserted in the same order.
B. Currently, merging flat indexes involves adding all existing flat indexes into an existing index.
This PR uses the reconstruction method for flat indexes too and rebuilds the flat index with updated data and IDs.
Testing -
Performed local testing with the SIFT Small dataset(10k vectors) with k = 2/3 for the following cases:
Picked the dataset since this has a reasonable recall with the current index configuration.