[8.19] Add none chunking strategy to disable automatic chunking for inference endpoints (#129324)

jimczi · web-flow · commit d7bcda3f7c9d · 2025-06-13T03:23:54.000+10:00
* Add `none` chunking strategy to disable automatic chunking for inference endpoints (#129150) This introduces a `none` chunking strategy that disables automatic chunking when using an inference endpoint. It enables users to provide pre-chunked input directly to a `semantic_text` field without any additional splitting. The chunking strategy can be configured either on the inference endpoint or directly in the `semantic_text` field definition. **Example:** ```json PUT test-index { "mappings": { "properties": { "my_semantic_field": { "type": "semantic_text", "chunking_settings": { "strategy": "none" <1> } } } } } ``` <1> Disables automatic chunking on `my_semantic_field`. ```json PUT test-index/_doc/1 { "my_semantic_field": ["my first chunk", "my second chunk", ...] <1> ... } ``` <1> Pre-chunked input provided as an array of strings. Each array element represents a single chunk that will be sent directly to the inference service without further processing. * fix compil after backport * another fix * fix docs
diff --git a/docs/changelog/129150.yaml b/docs/changelog/129150.yaml
@@ -0,0 +1,6 @@
+pr: 129150
+summary: Add `none` chunking strategy to disable automatic chunking for inference
+  endpoints
+area: Machine Learning
+type: feature
+issues: []
diff --git a/docs/reference/mapping/types/semantic-text.asciidoc b/docs/reference/mapping/types/semantic-text.asciidoc
@@ -100,18 +100,19 @@ If not specified, the {infer} endpoint defined by `inference_id` will be used at
 (Optional, object) Settings for chunking text into smaller passages.
 If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
 If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
+To completely disable chunking, use the `none` chunking strategy.
 
 .Valid values for `chunking_settings`
 [%collapsible%open]
 ====
 `type`:::
 Indicates the type of chunking strategy to use.
-Valid values are `word` or `sentence`.
+Valid values are `none`, word` or `sentence`.
 Required.
 
 `max_chunk_size`:::
-The maximum number of works in a chunk.
-Required.
+The maximum number of words in a chunk.
+Required for `word` and `sentence` strategies.
 
 `overlap`:::
 The number of overlapping words allowed in chunks.
@@ -123,6 +124,10 @@ The number of overlapping words allowed in chunks.
 Valid values are `0` or `1`.
 Required for `sentence` type chunking settings.
 
+WARNING: If the input exceeds the maximum token limit of the underlying model,  some services (such as OpenAI) may return an
+error. In contrast, the `elastic` and `elasticsearch` services  will automatically truncate the input to fit within the
+model's limit.
+
 ====
 
 [discrete]
@@ -147,7 +152,48 @@ When querying, the individual passages will be automatically searched for each d
 
 For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.
 
-Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
+You can also pre-chunk the input by sending it to Elasticsearch as an array of strings.
+Example:
+
+[source,console]
+------------------------------------------------------------
+PUT test-index
+{
+  "mappings": {
+    "properties": {
+      "my_semantic_field": {
+        "type": "semantic_text",
+        "chunking_settings": {
+          "strategy": "none"    <1>
+        }
+      }
+    }
+  }
+}
+------------------------------------------------------------
+// TEST[skip:Requires inference endpoint]
+<1> Disable chunking on `my_semantic_field`.
+
+[source,console]
+------------------------------------------------------------
+PUT test-index/_doc/1
+{
+    "my_semantic_field": ["my first chunk", "my second chunk"]    <1>
+}
+------------------------------------------------------------
+// TEST[skip:Requires inference endpoint]
+<1> The text is pre-chunked and provided as an array of strings.
+Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
+
+**Important considerations**:
+
+* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
+* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
+* If a chunk exceeds the model's token limit, the behavior depends on the service:
+* Some services (such as OpenAI) will return an error.
+* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
+
+Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text`.
 
 [discrete]
 [[semantic-text-highlighting]]
diff --git a/server/src/main/java/org/elasticsearch/TransportVersions.java b/server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -239,6 +239,7 @@ static TransportVersion def(int id) {
     public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM_8_19 = def(8_841_0_46);
     public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED_8_19 = def(8_841_0_47);
     public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK_ADDED_8_19 = def(8_841_0_48);
+    public static final TransportVersion NONE_CHUNKING_STRATEGY_8_19 = def(8_841_0_49);
 
     /*
      * STOP! READ THIS FIRST! No, really,
diff --git a/server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java b/server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java
@@ -15,7 +15,8 @@
 
 public enum ChunkingStrategy {
     WORD("word"),
-    SENTENCE("sentence");
+    SENTENCE("sentence"),
+    NONE("none");
 
     private final String chunkingStrategy;
 
diff --git a/x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java b/x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java
@@ -25,6 +25,7 @@
 import org.elasticsearch.inference.TaskSettings;
 import org.elasticsearch.inference.TaskType;
 import org.elasticsearch.xcontent.XContentBuilder;
+import org.elasticsearch.xpack.inference.chunking.NoopChunker;
 import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunker;
 import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
 
@@ -126,7 +127,14 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
         }
 
         List<ChunkedInput> chunkedInputs = new ArrayList<>();
-        if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
+        if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
+            var offsets = NoopChunker.INSTANCE.chunk(input.input(), chunkingSettings);
+            List<ChunkedInput> ret = new ArrayList<>();
+            for (var offset : offsets) {
+                ret.add(new ChunkedInput(inputText.substring(offset.start(), offset.end()), offset.start(), offset.end()));
+            }
+            return ret;
+        } else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
             WordBoundaryChunker chunker = new WordBoundaryChunker();
             WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
             List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(
diff --git a/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java b/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java
@@ -26,6 +26,7 @@
 import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
 import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
 import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
+import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
 import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
 import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
 import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
@@ -553,6 +554,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
     }
 
     private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
+        namedWriteables.add(
+            new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
+        );
         namedWriteables.add(
             new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
         );
diff --git a/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java b/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java
@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
         }
 
         return switch (chunkingStrategy) {
+            case NONE -> NoopChunker.INSTANCE;
             case WORD -> new WordBoundaryChunker();
             case SENTENCE -> new SentenceBoundaryChunker();
         };
diff --git a/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java b/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java
@@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
             settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
         );
         return switch (chunkingStrategy) {
+            case NONE -> NoneChunkingSettings.INSTANCE;
             case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
             case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
         };
diff --git a/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/NoneChunkingSettings.java b/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/NoneChunkingSettings.java
@@ -0,0 +1,104 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+package org.elasticsearch.xpack.inference.chunking;
+
+import org.elasticsearch.TransportVersion;
+import org.elasticsearch.TransportVersions;
+import org.elasticsearch.common.Strings;
+import org.elasticsearch.common.ValidationException;
+import org.elasticsearch.common.io.stream.StreamOutput;
+import org.elasticsearch.inference.ChunkingSettings;
+import org.elasticsearch.inference.ChunkingStrategy;
+import org.elasticsearch.xcontent.XContentBuilder;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+public class NoneChunkingSettings implements ChunkingSettings {
+    public static final String NAME = "NoneChunkingSettings";
+    public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();
+
+    private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
+    private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());
+
+    private NoneChunkingSettings() {}
+
+    @Override
+    public ChunkingStrategy getChunkingStrategy() {
+        return STRATEGY;
+    }
+
+    @Override
+    public String getWriteableName() {
+        return NAME;
+    }
+
+    @Override
+    public TransportVersion getMinimalSupportedVersion() {
+        return TransportVersions.NONE_CHUNKING_STRATEGY_8_19;
+    }
+
+    @Override
+    public void writeTo(StreamOutput out) throws IOException {}
+
+    @Override
+    public Map<String, Object> asMap() {
+        return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
+    }
+
+    public static NoneChunkingSettings fromMap(Map<String, Object> map) {
+        ValidationException validationException = new ValidationException();
+
+        var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
+        if (invalidSettings.length > 0) {
+            validationException.addValidationError(
+                Strings.format(
+                    "When chunking is disabled (none), settings can not have the following: %s",
+                    Arrays.toString(invalidSettings)
+                )
+            );
+        }
+
+        if (validationException.validationErrors().isEmpty() == false) {
+            throw validationException;
+        }
+
+        return NoneChunkingSettings.INSTANCE;
+    }
+
+    @Override
+    public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
+        builder.startObject();
+        {
+            builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
+        }
+        builder.endObject();
+        return builder;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+        return true;
+    }
+
+    @Override
+    public int hashCode() {
+        return Objects.hash(getClass());
+    }
+
+    @Override
+    public String toString() {
+        return Strings.toString(this);
+    }
+}
diff --git a/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/NoopChunker.java b/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/NoopChunker.java
@@ -0,0 +1,38 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+package org.elasticsearch.xpack.inference.chunking;
+
+import org.elasticsearch.common.Strings;
+import org.elasticsearch.inference.ChunkingSettings;
+import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;
+
+import java.util.List;
+
+/**
+ * A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
+ *
+ * <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
+ * may return an error.
+ * </p>
+ */
+public class NoopChunker implements Chunker {
+    public static final NoopChunker INSTANCE = new NoopChunker();
+
+    private NoopChunker() {}
+
+    @Override
+    public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
+        if (chunkingSettings instanceof NoneChunkingSettings) {
+            return List.of(new ChunkOffset(0, input.length()));
+        } else {
+            throw new IllegalArgumentException(
+                Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
+            );
+        }
+    }
+}
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilderTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilderTests.java
@@ -27,6 +27,13 @@ public void testValidChunkingStrategy() {
     }
 
     private Map<ChunkingStrategy, Class<? extends Chunker>> chunkingStrategyToExpectedChunkerClassMap() {
-        return Map.of(ChunkingStrategy.WORD, WordBoundaryChunker.class, ChunkingStrategy.SENTENCE, SentenceBoundaryChunker.class);
+        return Map.of(
+            ChunkingStrategy.NONE,
+            NoopChunker.class,
+            ChunkingStrategy.WORD,
+            WordBoundaryChunker.class,
+            ChunkingStrategy.SENTENCE,
+            SentenceBoundaryChunker.class
+        );
     }
 }
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsTests.java
@@ -20,6 +20,9 @@ public static ChunkingSettings createRandomChunkingSettings() {
         ChunkingStrategy randomStrategy = randomFrom(ChunkingStrategy.values());
 
         switch (randomStrategy) {
+            case NONE -> {
+                return NoneChunkingSettings.INSTANCE;
+            }
             case WORD -> {
                 var maxChunkSize = randomIntBetween(10, 300);
                 return new WordBoundaryChunkingSettings(maxChunkSize, randomIntBetween(1, maxChunkSize / 2));
@@ -37,15 +40,15 @@ public static Map<String, Object> createRandomChunkingSettingsMap() {
         chunkingSettingsMap.put(ChunkingSettingsOptions.STRATEGY.toString(), randomStrategy.toString());
 
         switch (randomStrategy) {
+            case NONE -> {
+            }
             case WORD -> {
                 var maxChunkSize = randomIntBetween(10, 300);
                 chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), maxChunkSize);
                 chunkingSettingsMap.put(ChunkingSettingsOptions.OVERLAP.toString(), randomIntBetween(1, maxChunkSize / 2));
 
             }
-            case SENTENCE -> {
-                chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
-            }
+            case SENTENCE -> chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
             default -> {
             }
         }
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/EmbeddingRequestChunkerTests.java
@@ -46,6 +46,22 @@ public void testEmptyInput_SentenceChunker() {
         assertThat(batches, empty());
     }
 
+    public void testEmptyInput_NoopChunker() {
+        var batches = new EmbeddingRequestChunker<>(List.of(), 10, NoneChunkingSettings.INSTANCE).batchRequestsWithListeners(
+            testListener()
+        );
+        assertThat(batches, empty());
+    }
+
+    public void testAnyInput_NoopChunker() {
+        var randomInput = randomAlphaOfLengthBetween(100, 1000);
+        var batches = new EmbeddingRequestChunker<>(List.of(new ChunkInferenceInput(randomInput)), 10, NoneChunkingSettings.INSTANCE)
+            .batchRequestsWithListeners(testListener());
+        assertThat(batches, hasSize(1));
+        assertThat(batches.get(0).batch().inputs().get(), hasSize(1));
+        assertThat(batches.get(0).batch().inputs().get().get(0), Matchers.is(randomInput));
+    }
+
     public void testWhitespaceInput_SentenceChunker() {
         var batches = new EmbeddingRequestChunker<>(
             List.of(new ChunkInferenceInput("   ")),
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/NoneChunkingSettingsTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/NoneChunkingSettingsTests.java
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/WordBoundaryChunkerTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/chunking/WordBoundaryChunkerTests.java
diff --git a/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldTests.java b/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldTests.java
diff --git a/x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/25_semantic_text_field_mapping_chunking.yml b/x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/25_semantic_text_field_mapping_chunking.yml
diff --git a/x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/25_semantic_text_field_mapping_chunking_bwc.yml b/x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/25_semantic_text_field_mapping_chunking_bwc.yml

Original file line number	Diff line number	Diff line change
`@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {`
`16`	`16`	`}`
`17`	`17`
`18`	`18`	`return switch (chunkingStrategy) {`
	`19`	`+ case NONE -> NoopChunker.INSTANCE;`
`19`	`20`	`case WORD -> new WordBoundaryChunker();`
`20`	`21`	`case SENTENCE -> new SentenceBoundaryChunker();`
`21`	`22`	`};`
Original file line number	Diff line number	Diff line change
`@@ -27,6 +27,13 @@ public void testValidChunkingStrategy() {`
`27`	`27`	`}`
`28`	`28`
`29`	`29`	`private Map<ChunkingStrategy, Class<? extends Chunker>> chunkingStrategyToExpectedChunkerClassMap() {`
`30`		`- return Map.of(ChunkingStrategy.WORD, WordBoundaryChunker.class, ChunkingStrategy.SENTENCE, SentenceBoundaryChunker.class);`
	`30`	`+ return Map.of(`
	`31`	`+ ChunkingStrategy.NONE,`
	`32`	`+ NoopChunker.class,`
	`33`	`+ ChunkingStrategy.WORD,`
	`34`	`+ WordBoundaryChunker.class,`
	`35`	`+ ChunkingStrategy.SENTENCE,`
	`36`	`+ SentenceBoundaryChunker.class`
	`37`	`+ );`
`31`	`38`	`}`
`32`	`39`	`}`