-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MB-57888: Index Update #2106
base: master
Are you sure you want to change the base?
MB-57888: Index Update #2106
Conversation
Likith101
commented
Nov 26, 2024
•
edited
Loading
edited
- Added new apis for index update
- Added logic to determine whether index mappings can be updated and what specifically changed between mappings
- Added logic to store and retrieve said information within bolt
- Added checks to prevent deleted data from being referenced within queries till the actual data on the segment is removed during the merge process
65b6392
to
289e64a
Compare
- Few changes to the deleted fields logic to accommodate edge cases - Added deletion logic to vector paths - Tweaked deletion logic in index path
- Bug Fixes - Name changes - More test cases
- Test case coverage for the same - Better loading and storing from bolt
49047cf
to
b4d5f7e
Compare
} | ||
} | ||
} | ||
return fieldInfo, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Likith101, let me know if the following understanding is correct.
To me it looks like we're doing multiple iterations over the two pairs of type mappings to first validate, then accumulate all the fields' information in both the type mappings as part of fieldInfo and then remove the ones which are identical.
This approach seems to have redundant work in it imo, like isn't it possible to do all this in a single iteration with a custom callback that defines the checking of the field mapping's info between the orig and updated ones and populate a fieldInfo map that's maintained in this function?
Please let me know if the suggested thought is valid or not, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiple iterations of processing the two mappings were necessary. At each step, we eliminate a lot of possible edge cases and handling them all at once in the end is a huge task.
For example, the very first iteration, we simply check for mappings or fields that are present in the updated mapping which are not present in the original. This was done so that any future functions do not have to worry about the edge case where a mapping in updated does not have a corresponding one in the original.
Another case is where after adding every single field info, we validate with all of the existing ones to check if there are any corner cases like same path or same field alias while having different deleted information being hit. While this seems redundant, doing it all at the end was near impossible.
Code complexity wise, I don't believe we are doing too much redundant work overall. We do two passes of each mapping and then if there are n fields in updated mapping, we do n2 checks with a map along with n proper field comparisons.