Skip to content

Commit d7bcda3

Browse files
authored
[8.19] Add none chunking strategy to disable automatic chunking for inference endpoints (#129324)
* Add `none` chunking strategy to disable automatic chunking for inference endpoints (#129150) This introduces a `none` chunking strategy that disables automatic chunking when using an inference endpoint. It enables users to provide pre-chunked input directly to a `semantic_text` field without any additional splitting. The chunking strategy can be configured either on the inference endpoint or directly in the `semantic_text` field definition. **Example:** ```json PUT test-index { "mappings": { "properties": { "my_semantic_field": { "type": "semantic_text", "chunking_settings": { "strategy": "none" <1> } } } } } ``` <1> Disables automatic chunking on `my_semantic_field`. ```json PUT test-index/_doc/1 { "my_semantic_field": ["my first chunk", "my second chunk", ...] <1> ... } ``` <1> Pre-chunked input provided as an array of strings. Each array element represents a single chunk that will be sent directly to the inference service without further processing. * fix compil after backport * another fix * fix docs
1 parent 233847a commit d7bcda3

File tree

18 files changed

+389
-15
lines changed

18 files changed

+389
-15
lines changed

docs/changelog/129150.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129150
2+
summary: Add `none` chunking strategy to disable automatic chunking for inference
3+
endpoints
4+
area: Machine Learning
5+
type: feature
6+
issues: []

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,18 +100,19 @@ If not specified, the {infer} endpoint defined by `inference_id` will be used at
100100
(Optional, object) Settings for chunking text into smaller passages.
101101
If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
102102
If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
103+
To completely disable chunking, use the `none` chunking strategy.
103104

104105
.Valid values for `chunking_settings`
105106
[%collapsible%open]
106107
====
107108
`type`:::
108109
Indicates the type of chunking strategy to use.
109-
Valid values are `word` or `sentence`.
110+
Valid values are `none`, word` or `sentence`.
110111
Required.
111112
112113
`max_chunk_size`:::
113-
The maximum number of works in a chunk.
114-
Required.
114+
The maximum number of words in a chunk.
115+
Required for `word` and `sentence` strategies.
115116
116117
`overlap`:::
117118
The number of overlapping words allowed in chunks.
@@ -123,6 +124,10 @@ The number of overlapping words allowed in chunks.
123124
Valid values are `0` or `1`.
124125
Required for `sentence` type chunking settings.
125126
127+
WARNING: If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
128+
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
129+
model's limit.
130+
126131
====
127132

128133
[discrete]
@@ -147,7 +152,48 @@ When querying, the individual passages will be automatically searched for each d
147152

148153
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.
149154

150-
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
155+
You can also pre-chunk the input by sending it to Elasticsearch as an array of strings.
156+
Example:
157+
158+
[source,console]
159+
------------------------------------------------------------
160+
PUT test-index
161+
{
162+
"mappings": {
163+
"properties": {
164+
"my_semantic_field": {
165+
"type": "semantic_text",
166+
"chunking_settings": {
167+
"strategy": "none" <1>
168+
}
169+
}
170+
}
171+
}
172+
}
173+
------------------------------------------------------------
174+
// TEST[skip:Requires inference endpoint]
175+
<1> Disable chunking on `my_semantic_field`.
176+
177+
[source,console]
178+
------------------------------------------------------------
179+
PUT test-index/_doc/1
180+
{
181+
"my_semantic_field": ["my first chunk", "my second chunk"] <1>
182+
}
183+
------------------------------------------------------------
184+
// TEST[skip:Requires inference endpoint]
185+
<1> The text is pre-chunked and provided as an array of strings.
186+
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
187+
188+
**Important considerations**:
189+
190+
* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
191+
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
192+
* If a chunk exceeds the model's token limit, the behavior depends on the service:
193+
* Some services (such as OpenAI) will return an error.
194+
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
195+
196+
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text`.
151197

152198
[discrete]
153199
[[semantic-text-highlighting]]

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,7 @@ static TransportVersion def(int id) {
239239
public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM_8_19 = def(8_841_0_46);
240240
public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED_8_19 = def(8_841_0_47);
241241
public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK_ADDED_8_19 = def(8_841_0_48);
242+
public static final TransportVersion NONE_CHUNKING_STRATEGY_8_19 = def(8_841_0_49);
242243

243244
/*
244245
* STOP! READ THIS FIRST! No, really,

server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515

1616
public enum ChunkingStrategy {
1717
WORD("word"),
18-
SENTENCE("sentence");
18+
SENTENCE("sentence"),
19+
NONE("none");
1920

2021
private final String chunkingStrategy;
2122

x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.inference.TaskSettings;
2626
import org.elasticsearch.inference.TaskType;
2727
import org.elasticsearch.xcontent.XContentBuilder;
28+
import org.elasticsearch.xpack.inference.chunking.NoopChunker;
2829
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunker;
2930
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3031

@@ -126,7 +127,14 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
126127
}
127128

128129
List<ChunkedInput> chunkedInputs = new ArrayList<>();
129-
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130+
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
131+
var offsets = NoopChunker.INSTANCE.chunk(input.input(), chunkingSettings);
132+
List<ChunkedInput> ret = new ArrayList<>();
133+
for (var offset : offsets) {
134+
ret.add(new ChunkedInput(inputText.substring(offset.start(), offset.end()), offset.start(), offset.end()));
135+
}
136+
return ret;
137+
} else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130138
WordBoundaryChunker chunker = new WordBoundaryChunker();
131139
WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
132140
List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
2727
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
2828
import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
29+
import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
2930
import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
3031
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3132
import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
@@ -553,6 +554,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
553554
}
554555

555556
private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
557+
namedWriteables.add(
558+
new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
559+
);
556560
namedWriteables.add(
557561
new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
558562
);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
1616
}
1717

1818
return switch (chunkingStrategy) {
19+
case NONE -> NoopChunker.INSTANCE;
1920
case WORD -> new WordBoundaryChunker();
2021
case SENTENCE -> new SentenceBoundaryChunker();
2122
};

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
4545
settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
4646
);
4747
return switch (chunkingStrategy) {
48+
case NONE -> NoneChunkingSettings.INSTANCE;
4849
case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
4950
case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
5051
};
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.TransportVersion;
11+
import org.elasticsearch.TransportVersions;
12+
import org.elasticsearch.common.Strings;
13+
import org.elasticsearch.common.ValidationException;
14+
import org.elasticsearch.common.io.stream.StreamOutput;
15+
import org.elasticsearch.inference.ChunkingSettings;
16+
import org.elasticsearch.inference.ChunkingStrategy;
17+
import org.elasticsearch.xcontent.XContentBuilder;
18+
19+
import java.io.IOException;
20+
import java.util.Arrays;
21+
import java.util.Locale;
22+
import java.util.Map;
23+
import java.util.Objects;
24+
import java.util.Set;
25+
26+
public class NoneChunkingSettings implements ChunkingSettings {
27+
public static final String NAME = "NoneChunkingSettings";
28+
public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();
29+
30+
private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
31+
private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());
32+
33+
private NoneChunkingSettings() {}
34+
35+
@Override
36+
public ChunkingStrategy getChunkingStrategy() {
37+
return STRATEGY;
38+
}
39+
40+
@Override
41+
public String getWriteableName() {
42+
return NAME;
43+
}
44+
45+
@Override
46+
public TransportVersion getMinimalSupportedVersion() {
47+
return TransportVersions.NONE_CHUNKING_STRATEGY_8_19;
48+
}
49+
50+
@Override
51+
public void writeTo(StreamOutput out) throws IOException {}
52+
53+
@Override
54+
public Map<String, Object> asMap() {
55+
return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
56+
}
57+
58+
public static NoneChunkingSettings fromMap(Map<String, Object> map) {
59+
ValidationException validationException = new ValidationException();
60+
61+
var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
62+
if (invalidSettings.length > 0) {
63+
validationException.addValidationError(
64+
Strings.format(
65+
"When chunking is disabled (none), settings can not have the following: %s",
66+
Arrays.toString(invalidSettings)
67+
)
68+
);
69+
}
70+
71+
if (validationException.validationErrors().isEmpty() == false) {
72+
throw validationException;
73+
}
74+
75+
return NoneChunkingSettings.INSTANCE;
76+
}
77+
78+
@Override
79+
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
80+
builder.startObject();
81+
{
82+
builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
83+
}
84+
builder.endObject();
85+
return builder;
86+
}
87+
88+
@Override
89+
public boolean equals(Object o) {
90+
if (this == o) return true;
91+
if (o == null || getClass() != o.getClass()) return false;
92+
return true;
93+
}
94+
95+
@Override
96+
public int hashCode() {
97+
return Objects.hash(getClass());
98+
}
99+
100+
@Override
101+
public String toString() {
102+
return Strings.toString(this);
103+
}
104+
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.common.Strings;
11+
import org.elasticsearch.inference.ChunkingSettings;
12+
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;
13+
14+
import java.util.List;
15+
16+
/**
17+
* A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
18+
*
19+
* <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
20+
* may return an error.
21+
* </p>
22+
*/
23+
public class NoopChunker implements Chunker {
24+
public static final NoopChunker INSTANCE = new NoopChunker();
25+
26+
private NoopChunker() {}
27+
28+
@Override
29+
public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
30+
if (chunkingSettings instanceof NoneChunkingSettings) {
31+
return List.of(new ChunkOffset(0, input.length()));
32+
} else {
33+
throw new IllegalArgumentException(
34+
Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
35+
);
36+
}
37+
}
38+
}

x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilderTests.java

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ public void testValidChunkingStrategy() {
2727
}
2828

2929
private Map<ChunkingStrategy, Class<? extends Chunker>> chunkingStrategyToExpectedChunkerClassMap() {
30-
return Map.of(ChunkingStrategy.WORD, WordBoundaryChunker.class, ChunkingStrategy.SENTENCE, SentenceBoundaryChunker.class);
30+
return Map.of(
31+
ChunkingStrategy.NONE,
32+
NoopChunker.class,
33+
ChunkingStrategy.WORD,
34+
WordBoundaryChunker.class,
35+
ChunkingStrategy.SENTENCE,
36+
SentenceBoundaryChunker.class
37+
);
3138
}
3239
}

x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsTests.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ public static ChunkingSettings createRandomChunkingSettings() {
2020
ChunkingStrategy randomStrategy = randomFrom(ChunkingStrategy.values());
2121

2222
switch (randomStrategy) {
23+
case NONE -> {
24+
return NoneChunkingSettings.INSTANCE;
25+
}
2326
case WORD -> {
2427
var maxChunkSize = randomIntBetween(10, 300);
2528
return new WordBoundaryChunkingSettings(maxChunkSize, randomIntBetween(1, maxChunkSize / 2));
@@ -37,15 +40,15 @@ public static Map<String, Object> createRandomChunkingSettingsMap() {
3740
chunkingSettingsMap.put(ChunkingSettingsOptions.STRATEGY.toString(), randomStrategy.toString());
3841

3942
switch (randomStrategy) {
43+
case NONE -> {
44+
}
4045
case WORD -> {
4146
var maxChunkSize = randomIntBetween(10, 300);
4247
chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), maxChunkSize);
4348
chunkingSettingsMap.put(ChunkingSettingsOptions.OVERLAP.toString(), randomIntBetween(1, maxChunkSize / 2));
4449

4550
}
46-
case SENTENCE -> {
47-
chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
48-
}
51+
case SENTENCE -> chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
4952
default -> {
5053
}
5154
}

x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,22 @@ public void testEmptyInput_SentenceChunker() {
4646
assertThat(batches, empty());
4747
}
4848

49+
public void testEmptyInput_NoopChunker() {
50+
var batches = new EmbeddingRequestChunker<>(List.of(), 10, NoneChunkingSettings.INSTANCE).batchRequestsWithListeners(
51+
testListener()
52+
);
53+
assertThat(batches, empty());
54+
}
55+
56+
public void testAnyInput_NoopChunker() {
57+
var randomInput = randomAlphaOfLengthBetween(100, 1000);
58+
var batches = new EmbeddingRequestChunker<>(List.of(new ChunkInferenceInput(randomInput)), 10, NoneChunkingSettings.INSTANCE)
59+
.batchRequestsWithListeners(testListener());
60+
assertThat(batches, hasSize(1));
61+
assertThat(batches.get(0).batch().inputs().get(), hasSize(1));
62+
assertThat(batches.get(0).batch().inputs().get().get(0), Matchers.is(randomInput));
63+
}
64+
4965
public void testWhitespaceInput_SentenceChunker() {
5066
var batches = new EmbeddingRequestChunker<>(
5167
List.of(new ChunkInferenceInput(" ")),

0 commit comments

Comments
 (0)