Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-57888: Index Update #2106

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

MB-57888: Index Update #2106

wants to merge 9 commits into from

Conversation

Likith101
Copy link
Member

@Likith101 Likith101 commented Nov 26, 2024

  • Added new apis for index update
  • Added logic to determine whether index mappings can be updated and what specifically changed between mappings
  • Added logic to store and retrieve said information within bolt
  • Added checks to prevent deleted data from being referenced within queries till the actual data on the segment is removed during the merge process

@abhinavdangeti abhinavdangeti added this to the v2.5.0 milestone Jan 6, 2025
@Likith101 Likith101 changed the title MB-57888: WIP: Index Update MB-57888: Index Update Jan 9, 2025
@Likith101 Likith101 force-pushed the IndexUpdate branch 2 times, most recently from 65b6392 to 289e64a Compare January 17, 2025 08:05
 - Few changes to the deleted fields logic to accommodate edge cases
 - Added deletion logic to vector paths
 - Tweaked deletion logic in index path
 - Bug Fixes
 - Name changes
 - More test cases
 - Test case coverage for the same
 - Better loading and storing from bolt
}
}
}
return fieldInfo, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Likith101, let me know if the following understanding is correct.

To me it looks like we're doing multiple iterations over the two pairs of type mappings to first validate, then accumulate all the fields' information in both the type mappings as part of fieldInfo and then remove the ones which are identical.

This approach seems to have redundant work in it imo, like isn't it possible to do all this in a single iteration with a custom callback that defines the checking of the field mapping's info between the orig and updated ones and populate a fieldInfo map that's maintained in this function?

Please let me know if the suggested thought is valid or not, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiple iterations of processing the two mappings were necessary. At each step, we eliminate a lot of possible edge cases and handling them all at once in the end is a huge task.

For example, the very first iteration, we simply check for mappings or fields that are present in the updated mapping which are not present in the original. This was done so that any future functions do not have to worry about the edge case where a mapping in updated does not have a corresponding one in the original.

Another case is where after adding every single field info, we validate with all of the existing ones to check if there are any corner cases like same path or same field alias while having different deleted information being hit. While this seems redundant, doing it all at the end was near impossible.

Code complexity wise, I don't believe we are doing too much redundant work overall. We do two passes of each mapping and then if there are n fields in updated mapping, we do n2 checks with a map along with n proper field comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants