Skip to content

Commit 489fcef

Browse files
committed
Squashed commit of the following:
commit e6b8a64 Author: Lorenzo Dematte <lorenzo.dematte@elastic.co> Date: Tue Jun 10 12:02:16 2025 +0200 PR comments commit ad0902e Author: Moritz Mack <mmack@apache.org> Date: Fri Jun 6 13:33:37 2025 +0200 Update ReproduceInfoPrinter to correctly print a reproduction for lucene / BC upgrade tests. Relates to ES-12005 commit 78b4168 Author: Alexander Spies <alexander.spies@elastic.co> Date: Fri Jun 6 11:37:53 2025 +0200 ESQL: Throw ISE instead of IAE for illegal block in page (elastic#128960) IAE gets reported as a 400 status code, but that's inappropriate when inconsistent pages are always bugs, and should be reported with a 500. Throw ISE instead. commit 29e68bd Author: Aurélien FOUCRET <aurelien.foucret@gmail.com> Date: Fri Jun 6 11:24:23 2025 +0200 [ES|QL] Fix test releases for telemetry. (elastic#128971) commit 1a76bc2 Author: Bogdan Pintea <bogdan.pintea@elastic.co> Date: Fri Jun 6 11:01:49 2025 +0200 ESQL: Workaround for RLike handling of empty lang pattern (elastic#128895) Lucene's `org.apache.lucene.util.automaton.Operations#getSingleton` fails with an Automaton for a `REGEXP_EMPTY` `RegExp`. This adds a workaround for that, to check the type of automaton before calling into that failing method. Closes elastic#128813 commit e24fd32 Author: Aurélien FOUCRET <aurelien.foucret@gmail.com> Date: Fri Jun 6 10:25:28 2025 +0200 [ES|QL] Enable the completion command as a tech preview feature (elastic#128948) commit 3f03775 Author: Niels Bauman <33722607+nielsbauman@users.noreply.github.com> Date: Fri Jun 6 09:00:24 2025 +0200 Remove non-test usages of `Metadata.Builder#putCustom` (elastic#128801) This removes all non-test usages of ``` Metadata.Builder.putCustom(String type, ProjectCustom custom) ``` And replaces it with appropriate calls to the equivalent method on `ProjectMetadata.Builder`. In most cases this _does not_ make the code project aware, but does reduce the number of deprecated methods in use. commit 330d127 Author: Niels Bauman <33722607+nielsbauman@users.noreply.github.com> Date: Fri Jun 6 08:07:59 2025 +0200 Make utility methods in `IndexLifecycleTransition` project-aware (elastic#128930) Modifies the methods to work with a project scope rather than a cluster scope. This is part of an iterative process to make ILM project-aware. commit 1b5720d Author: Aurélien FOUCRET <aurelien.foucret@gmail.com> Date: Fri Jun 6 07:50:52 2025 +0200 [ES|QL] Fix test releases for LookupJoinTypesIT. (elastic#128985) commit 40cf2d3 Author: Tim Vernum <tim@adjective.org> Date: Fri Jun 6 13:09:31 2025 +1000 Add "extension" attribute validation to IdP SPs (elastic#128805) This extends the change from elastic#128176 to validate the "custom attributes" on a per Service Provider basis. Each Service Provider (whether registered or wildcard based) has a field "attributes.extensions" which is a list of attribute names that may be provided by the caller of "/_idp/saml/init". Service Providers that have not be configured with extension attributes will reject any custom attributes in SAML init. This necessitates a new field in the service provider index (but only if the new `extensions` attribute is set). The template has been updated, but there is no data migration because the `saml-service-provider` index does not exist in any of the environments into which we wish to deploy this change. commit 496fb2d Author: Jordan Powers <jordan.powers@elastic.co> Date: Thu Jun 5 19:50:09 2025 -0700 Skip UTF8 to UTF16 conversion during document indexing (elastic#126492) When parsing documents, we receive the document as UTF-8 encoded data which we then parse and convert the fields to java-native UTF-16 encoded Strings. We then convert these strings back to UTF-8 for storage in lucene. This patch skips the redundant conversion, instead passing lucene a direct reference to the received UTF-8 bytes when possible. commit c34f8b6 Author: Tim Vernum <tim@adjective.org> Date: Fri Jun 6 12:03:25 2025 +1000 Improve cache invalidation in IdP SP cache (elastic#128890) The Identity Provider's Service Provider cache had two issues: 1. It checked for identity based on sequence numbers, but didn't include the `seq_no_primary_term` parameter on searches, which means the sequence would always by `-2` 2. It didn't track whether the index was deleted, which means it could be caching values from an old version of the index This commit fixes both of these issues. In practice neither issue was a problem because there are no deployments that use index-based service providers, however the 2nd issue did cause some challenges for testing. commit 923f029 Author: Nhat Nguyen <nhat.nguyen@elastic.co> Date: Thu Jun 5 18:09:58 2025 -0700 Fix block loader with missing ignored source (elastic#129006) We miss appending null when ignored_source is not available. Our randomized tests already cover this case, but we do not check it when loading fields. I labelled this non-issue for an unreleased bug. Closes elastic#128959 Relates elastic#119546 commit 0f8178a Author: Bogdan Pintea <bogdan.pintea@elastic.co> Date: Fri Jun 6 02:02:24 2025 +0200 ESQL: Forward port 8.19 RegexMatch serialization change version (elastic#128979) Fwd port ESQL_REGEX_MATCH_WITH_CASE_INSENSITIVITY_8_19. Related: elastic#128919. commit ee716f1 Author: Simon Chase <simon.chase@elastic.co> Date: Thu Jun 5 15:20:08 2025 -0700 transport: edit TransportConnectionListener for close exceptions (elastic#129015) The TransportConnectionListener interface has previously included the Transport.Connection being closed and unregistered in its onNodeDisconnected callback. This is not in use, and can be removed as it is also available in the onConnectionClosed callback. It is being replaced with a Nullable exception that caused the close. This is being used in pending work (ES-11448) to differentiate network issues from node restarts. Closes ES-12007 commit aceaf23 Merge: f18f4ee 159c57f Author: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Date: Thu Jun 5 19:27:54 2025 +0000 Merge patch/serverless-fix into main commit 159c57f Author: Rene Groeschke <rene@elastic.co> Date: Tue Jun 3 11:56:27 2025 +0200 Prepare serverless patch including elastic#128784 elastic#128740 (elastic#128807) * Change default for vector.rescoring.directio to false (elastic#128784) On serverless (and potentially elsewhere), direct IO is not available, which can cause BBQ shards to fail to read with org.apache.lucene.CorruptIndexException when this setting is true. * Optimize sparse vector stats collection (elastic#128740) This change improves the performance of sparse vector statistics gathering by using the document count of terms directly, rather than relying on the field name field to compute stats. By avoiding per-term disk/network reads and instead leveraging statistics already loaded into leaf readers at index opening, we expect to significantly reduce overhead. Relates to elastic#128583 --------- Co-authored-by: Dave Pifke <dave.pifke@elastic.co> Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
1 parent 860d64f commit 489fcef

File tree

109 files changed

+3231
-2303
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+3231
-2303
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
package org.elasticsearch.benchmark.xcontent;
11+
12+
import org.elasticsearch.benchmark.index.mapper.MapperServiceFactory;
13+
import org.elasticsearch.common.UUIDs;
14+
import org.elasticsearch.common.bytes.BytesReference;
15+
import org.elasticsearch.common.logging.LogConfigurator;
16+
import org.elasticsearch.index.mapper.MapperService;
17+
import org.elasticsearch.index.mapper.SourceToParse;
18+
import org.elasticsearch.xcontent.XContentBuilder;
19+
import org.elasticsearch.xcontent.XContentFactory;
20+
import org.elasticsearch.xcontent.XContentType;
21+
import org.openjdk.jmh.annotations.Benchmark;
22+
import org.openjdk.jmh.annotations.BenchmarkMode;
23+
import org.openjdk.jmh.annotations.Fork;
24+
import org.openjdk.jmh.annotations.Level;
25+
import org.openjdk.jmh.annotations.Measurement;
26+
import org.openjdk.jmh.annotations.Mode;
27+
import org.openjdk.jmh.annotations.OutputTimeUnit;
28+
import org.openjdk.jmh.annotations.Param;
29+
import org.openjdk.jmh.annotations.Scope;
30+
import org.openjdk.jmh.annotations.Setup;
31+
import org.openjdk.jmh.annotations.State;
32+
import org.openjdk.jmh.annotations.Threads;
33+
import org.openjdk.jmh.annotations.Warmup;
34+
import org.openjdk.jmh.infra.Blackhole;
35+
36+
import java.io.IOException;
37+
import java.util.Random;
38+
import java.util.concurrent.TimeUnit;
39+
40+
/**
41+
* Benchmark to measure indexing performance of keyword fields. Used to measure performance impact of skipping
42+
* UTF-8 to UTF-16 conversion during document parsing.
43+
*/
44+
@BenchmarkMode(Mode.AverageTime)
45+
@OutputTimeUnit(TimeUnit.MILLISECONDS)
46+
@State(Scope.Benchmark)
47+
@Fork(1)
48+
@Threads(1)
49+
@Warmup(iterations = 1)
50+
@Measurement(iterations = 5)
51+
public class OptimizedTextBenchmark {
52+
static {
53+
// For Elasticsearch900Lucene101Codec:
54+
LogConfigurator.loadLog4jPlugins();
55+
LogConfigurator.configureESLogging();
56+
LogConfigurator.setNodeName("test");
57+
}
58+
59+
/**
60+
* Total number of documents to index.
61+
*/
62+
@Param("1048576")
63+
private int nDocs;
64+
65+
private MapperService mapperService;
66+
private SourceToParse[] sources;
67+
68+
private String randomValue(int length) {
69+
final String CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
70+
Random random = new Random();
71+
StringBuilder builder = new StringBuilder(length);
72+
for (int i = 0; i < length; i++) {
73+
builder.append(CHARS.charAt(random.nextInt(CHARS.length())));
74+
}
75+
return builder.toString();
76+
}
77+
78+
@Setup(Level.Trial)
79+
public void setup() throws IOException {
80+
mapperService = MapperServiceFactory.create("""
81+
{
82+
"_doc": {
83+
"dynamic": false,
84+
"properties": {
85+
"field": {
86+
"type": "keyword"
87+
}
88+
}
89+
}
90+
}
91+
""");
92+
93+
sources = new SourceToParse[nDocs];
94+
for (int i = 0; i < nDocs; i++) {
95+
XContentBuilder b = XContentFactory.jsonBuilder();
96+
b.startObject().field("field", randomValue(8)).endObject();
97+
sources[i] = new SourceToParse(UUIDs.randomBase64UUID(), BytesReference.bytes(b), XContentType.JSON);
98+
}
99+
}
100+
101+
@Benchmark
102+
public void indexDocuments(final Blackhole bh) {
103+
final var mapper = mapperService.documentMapper();
104+
for (int i = 0; i < nDocs; i++) {
105+
bh.consume(mapper.parse(sources[i]));
106+
}
107+
}
108+
}

docs/changelog/128805.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 128805
2+
summary: Add "extension" attribute validation to IdP SPs
3+
area: IdentityProvider
4+
type: enhancement
5+
issues: []

docs/changelog/128890.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 128890
2+
summary: Improve cache invalidation in IdP SP cache
3+
area: IdentityProvider
4+
type: bug
5+
issues: []

docs/changelog/128895.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 128895
2+
summary: Workaround for RLike handling of empty lang pattern
3+
area: ES|QL
4+
type: bug
5+
issues:
6+
- 128813

docs/changelog/128948.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 128948
2+
summary: ES|QL - Add COMPLETION command as a tech preview feature
3+
area: ES|QL
4+
type: feature
5+
issues:
6+
- 124405

docs/changelog/128960.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 128960
2+
summary: Throw ISE instead of IAE for illegal block in page
3+
area: ES|QL
4+
type: bug
5+
issues: []
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
package org.elasticsearch.xcontent.provider.json;
11+
12+
import com.fasterxml.jackson.core.JsonEncoding;
13+
import com.fasterxml.jackson.core.JsonFactory;
14+
import com.fasterxml.jackson.core.JsonFactoryBuilder;
15+
import com.fasterxml.jackson.core.JsonParser;
16+
import com.fasterxml.jackson.core.io.IOContext;
17+
import com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper;
18+
import com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer;
19+
20+
import java.io.IOException;
21+
22+
public class ESJsonFactory extends JsonFactory {
23+
ESJsonFactory(JsonFactoryBuilder b) {
24+
super(b);
25+
}
26+
27+
@Override
28+
protected JsonParser _createParser(byte[] data, int offset, int len, IOContext ctxt) throws IOException {
29+
if (len > 0
30+
&& Feature.CHARSET_DETECTION.enabledIn(_factoryFeatures)
31+
&& Feature.CANONICALIZE_FIELD_NAMES.enabledIn(_factoryFeatures)) {
32+
var bootstrap = new ByteSourceJsonBootstrapper(ctxt, data, offset, len);
33+
var encoding = bootstrap.detectEncoding();
34+
if (encoding == JsonEncoding.UTF8) {
35+
boolean invalidBom = false;
36+
int ptr = offset;
37+
// Skip over the BOM if present
38+
if ((data[ptr] & 0xFF) == 0xEF) {
39+
if (len < 3) {
40+
invalidBom = true;
41+
} else if ((data[ptr + 1] & 0xFF) != 0xBB) {
42+
invalidBom = true;
43+
} else if ((data[ptr + 2] & 0xFF) != 0xBF) {
44+
invalidBom = true;
45+
} else {
46+
ptr += 3;
47+
}
48+
}
49+
if (invalidBom == false) {
50+
ByteQuadsCanonicalizer can = _byteSymbolCanonicalizer.makeChild(_factoryFeatures);
51+
return new ESUTF8StreamJsonParser(
52+
ctxt,
53+
_parserFeatures,
54+
null,
55+
_objectCodec,
56+
can,
57+
data,
58+
ptr,
59+
offset + len,
60+
ptr - offset,
61+
false
62+
);
63+
}
64+
}
65+
}
66+
return new ByteSourceJsonBootstrapper(ctxt, data, offset, len).constructParser(
67+
_parserFeatures,
68+
_objectCodec,
69+
_byteSymbolCanonicalizer,
70+
_rootCharSymbols,
71+
_factoryFeatures
72+
);
73+
}
74+
}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
package org.elasticsearch.xcontent.provider.json;
11+
12+
import com.fasterxml.jackson.core.JsonFactory;
13+
import com.fasterxml.jackson.core.JsonFactoryBuilder;
14+
15+
public class ESJsonFactoryBuilder extends JsonFactoryBuilder {
16+
@Override
17+
public JsonFactory build() {
18+
return new ESJsonFactory(this);
19+
}
20+
}
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
package org.elasticsearch.xcontent.provider.json;
11+
12+
import com.fasterxml.jackson.core.JsonToken;
13+
import com.fasterxml.jackson.core.ObjectCodec;
14+
import com.fasterxml.jackson.core.SerializableString;
15+
import com.fasterxml.jackson.core.io.IOContext;
16+
import com.fasterxml.jackson.core.json.UTF8StreamJsonParser;
17+
import com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer;
18+
19+
import org.elasticsearch.xcontent.Text;
20+
import org.elasticsearch.xcontent.XContentString;
21+
22+
import java.io.IOException;
23+
import java.io.InputStream;
24+
25+
public class ESUTF8StreamJsonParser extends UTF8StreamJsonParser {
26+
protected int stringEnd = -1;
27+
28+
public ESUTF8StreamJsonParser(
29+
IOContext ctxt,
30+
int features,
31+
InputStream in,
32+
ObjectCodec codec,
33+
ByteQuadsCanonicalizer sym,
34+
byte[] inputBuffer,
35+
int start,
36+
int end,
37+
int bytesPreProcessed,
38+
boolean bufferRecyclable
39+
) {
40+
super(ctxt, features, in, codec, sym, inputBuffer, start, end, bytesPreProcessed, bufferRecyclable);
41+
}
42+
43+
/**
44+
* Method that will try to get underlying UTF-8 encoded bytes of the current string token.
45+
* This is only a best-effort attempt; if there is some reason the bytes cannot be retrieved, this method will return null.
46+
* Currently, this is only implemented for ascii-only strings that do not contain escaped characters.
47+
*/
48+
public Text getValueAsText() throws IOException {
49+
if (_currToken == JsonToken.VALUE_STRING && _tokenIncomplete) {
50+
if (stringEnd > 0) {
51+
final int len = stringEnd - 1 - _inputPtr;
52+
// For now, we can use `len` for `stringLength` because we only support ascii-encoded unescaped strings,
53+
// which means each character uses exactly 1 byte.
54+
return new Text(new XContentString.UTF8Bytes(_inputBuffer, _inputPtr, len), len);
55+
}
56+
return _finishAndReturnText();
57+
}
58+
return null;
59+
}
60+
61+
protected Text _finishAndReturnText() throws IOException {
62+
int ptr = _inputPtr;
63+
if (ptr >= _inputEnd) {
64+
_loadMoreGuaranteed();
65+
ptr = _inputPtr;
66+
}
67+
68+
int startPtr = ptr;
69+
final int[] codes = INPUT_CODES_UTF8;
70+
final int max = _inputEnd;
71+
final byte[] inputBuffer = _inputBuffer;
72+
while (ptr < max) {
73+
int c = inputBuffer[ptr] & 0xFF;
74+
if (codes[c] != 0) {
75+
if (c == INT_QUOTE) {
76+
stringEnd = ptr + 1;
77+
final int len = ptr - startPtr;
78+
// For now, we can use `len` for `stringLength` because we only support ascii-encoded unescaped strings,
79+
// which means each character uses exactly 1 byte.
80+
return new Text(new XContentString.UTF8Bytes(inputBuffer, startPtr, len), len);
81+
}
82+
return null;
83+
}
84+
++ptr;
85+
}
86+
return null;
87+
}
88+
89+
@Override
90+
public JsonToken nextToken() throws IOException {
91+
if (_currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0) {
92+
_inputPtr = stringEnd;
93+
_tokenIncomplete = false;
94+
}
95+
stringEnd = -1;
96+
return super.nextToken();
97+
}
98+
99+
@Override
100+
public boolean nextFieldName(SerializableString str) throws IOException {
101+
if (_currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0) {
102+
_inputPtr = stringEnd;
103+
_tokenIncomplete = false;
104+
}
105+
stringEnd = -1;
106+
return super.nextFieldName(str);
107+
}
108+
109+
@Override
110+
public String nextFieldName() throws IOException {
111+
if (_currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0) {
112+
_inputPtr = stringEnd;
113+
_tokenIncomplete = false;
114+
}
115+
stringEnd = -1;
116+
return super.nextFieldName();
117+
}
118+
}

libs/x-content/impl/src/main/java/org/elasticsearch/xcontent/provider/json/JsonXContentImpl.java

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111

1212
import com.fasterxml.jackson.core.JsonEncoding;
1313
import com.fasterxml.jackson.core.JsonFactory;
14-
import com.fasterxml.jackson.core.JsonFactoryBuilder;
1514
import com.fasterxml.jackson.core.JsonGenerator;
1615
import com.fasterxml.jackson.core.JsonParser;
1716

@@ -47,7 +46,7 @@ public static final XContent jsonXContent() {
4746
}
4847

4948
static {
50-
jsonFactory = XContentImplUtils.configure(new JsonFactoryBuilder());
49+
jsonFactory = XContentImplUtils.configure(new ESJsonFactoryBuilder());
5150
jsonFactory.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, true);
5251
jsonFactory.configure(JsonParser.Feature.ALLOW_COMMENTS, true);
5352
jsonFactory.configure(JsonFactory.Feature.FAIL_ON_SYMBOL_HASH_OVERFLOW, false); // this trips on many mappings now...

0 commit comments

Comments
 (0)