Skip to content

[8.19] Skip UTF8 to UTF16 conversion during document indexing (#126492) #129023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jordan-powers
Copy link
Contributor

Backports the following commits to 8.19:

When parsing documents, we receive the document as UTF-8 encoded data which
we then parse and convert the fields to java-native UTF-16 encoded Strings. 
We then convert these strings back to UTF-8 for storage in lucene.

This patch skips the redundant conversion, instead passing lucene a
direct reference to the received UTF-8 bytes when possible.
@jordan-powers jordan-powers added :Core/Infra/Core Core issues without another label :StorageEngine/Mapping The storage related side of mappings >non-issue auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:Core/Infra Meta label for core/infra team Team:StorageEngine labels Jun 6, 2025
@elasticsearchmachine elasticsearchmachine merged commit cf0b1ef into elastic:8.19 Jun 6, 2025
15 checks passed
@jordan-powers jordan-powers deleted the backport/8.19/pr-126492 branch June 6, 2025 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :Core/Infra/Core Core issues without another label >non-issue :StorageEngine/Mapping The storage related side of mappings Team:Core/Infra Meta label for core/infra team Team:StorageEngine v8.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants