From 5d6fc6613c78e28bec8262e05953f1b784ef1876 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Wed, 11 Jun 2025 14:41:06 -0400 Subject: [PATCH 1/5] Explain `ignore_above` better This concept is complicated. Closes #128991 --- .../elasticsearch/mapping-reference/keyword.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/reference/elasticsearch/mapping-reference/keyword.md b/docs/reference/elasticsearch/mapping-reference/keyword.md index 22c5ff9c1044c..d18cb91ddb53a 100644 --- a/docs/reference/elasticsearch/mapping-reference/keyword.md +++ b/docs/reference/elasticsearch/mapping-reference/keyword.md @@ -70,7 +70,15 @@ The following parameters are accepted by `keyword` fields: : Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations. [`ignore_above`](/reference/elasticsearch/mapping-reference/ignore-above.md) -: Do not index any string longer than this value. Defaults to `2147483647` in standard indices so that all values would be accepted, and `8191` in logsdb indices to protect against Lucene's term byte-length limit of `32766`. Please however note that default dynamic mapping rules create a sub `keyword` field that overrides this default by setting `ignore_above: 256`. +: Do not index any string with more characters than this value. This is important because `keyword` + fields will reject documents with `keyword` fields that encode to utf-8 longer than `32766` bytes. + If you need to never reject documents, this should have some value `<=8191`. All documents with + more characters will just skip building the index for this field. + The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and + `8191` in logsdb indices. So, if unspecified, standard indices *can* reject documents. And logsdb indices + will index the document, but skip this field. + The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields + defaults to a `text` field with a sub-`keyword` field with an `ignore_above` of `256`. [`index`](/reference/elasticsearch/mapping-reference/mapping-index.md) : Should the field be quickly searchable? Accepts `true` (default) and `false`. `keyword` fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower. From c1c509d7f531cb21fe9cae7d9e69277b6b72b0e5 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Thu, 12 Jun 2025 08:28:59 -0400 Subject: [PATCH 2/5] Format --- docs/reference/elasticsearch/mapping-reference/keyword.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/reference/elasticsearch/mapping-reference/keyword.md b/docs/reference/elasticsearch/mapping-reference/keyword.md index d18cb91ddb53a..56bfc0c5e4ac3 100644 --- a/docs/reference/elasticsearch/mapping-reference/keyword.md +++ b/docs/reference/elasticsearch/mapping-reference/keyword.md @@ -72,11 +72,14 @@ The following parameters are accepted by `keyword` fields: [`ignore_above`](/reference/elasticsearch/mapping-reference/ignore-above.md) : Do not index any string with more characters than this value. This is important because `keyword` fields will reject documents with `keyword` fields that encode to utf-8 longer than `32766` bytes. + If you need to never reject documents, this should have some value `<=8191`. All documents with more characters will just skip building the index for this field. + The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and `8191` in logsdb indices. So, if unspecified, standard indices *can* reject documents. And logsdb indices will index the document, but skip this field. + The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields defaults to a `text` field with a sub-`keyword` field with an `ignore_above` of `256`. From 6f9f2bb53894445d77f05f6db5695d78e6aa3ba5 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Thu, 12 Jun 2025 09:18:56 -0400 Subject: [PATCH 3/5] Explain more --- docs/reference/elasticsearch/mapping-reference/keyword.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/reference/elasticsearch/mapping-reference/keyword.md b/docs/reference/elasticsearch/mapping-reference/keyword.md index 56bfc0c5e4ac3..43bec32d6c806 100644 --- a/docs/reference/elasticsearch/mapping-reference/keyword.md +++ b/docs/reference/elasticsearch/mapping-reference/keyword.md @@ -81,7 +81,8 @@ The following parameters are accepted by `keyword` fields: will index the document, but skip this field. The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields - defaults to a `text` field with a sub-`keyword` field with an `ignore_above` of `256`. + defaults to a `text` field with a sub-`keyword` field with an `ignore_above` of `256`. This indexes + all values for full text search, and indexes short values get indexed for exact matching and aggregation. [`index`](/reference/elasticsearch/mapping-reference/mapping-index.md) : Should the field be quickly searchable? Accepts `true` (default) and `false`. `keyword` fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower. From c3c720697312cda34ea7f77b18708d362f15d966 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Thu, 12 Jun 2025 13:38:48 -0400 Subject: [PATCH 4/5] Apply suggestions from code review Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --- docs/reference/elasticsearch/mapping-reference/keyword.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/keyword.md b/docs/reference/elasticsearch/mapping-reference/keyword.md index 43bec32d6c806..9aa5c2aa47dcb 100644 --- a/docs/reference/elasticsearch/mapping-reference/keyword.md +++ b/docs/reference/elasticsearch/mapping-reference/keyword.md @@ -70,11 +70,10 @@ The following parameters are accepted by `keyword` fields: : Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations. [`ignore_above`](/reference/elasticsearch/mapping-reference/ignore-above.md) -: Do not index any string with more characters than this value. This is important because `keyword` - fields will reject documents with `keyword` fields that encode to utf-8 longer than `32766` bytes. +: Do not index any field containing a string with more characters than this value. This is important because {{es}} +will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded. - If you need to never reject documents, this should have some value `<=8191`. All documents with - more characters will just skip building the index for this field. + To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this length will be excluded from indexing. The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and `8191` in logsdb indices. So, if unspecified, standard indices *can* reject documents. And logsdb indices From eaefb8de2084ed9ada2c4a6f7c48eba82333d474 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Thu, 12 Jun 2025 13:51:53 -0400 Subject: [PATCH 5/5] More --- .../mapping-reference/keyword.md | 27 ++++++++++--------- 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/keyword.md b/docs/reference/elasticsearch/mapping-reference/keyword.md index 9aa5c2aa47dcb..268327fd90d16 100644 --- a/docs/reference/elasticsearch/mapping-reference/keyword.md +++ b/docs/reference/elasticsearch/mapping-reference/keyword.md @@ -70,18 +70,21 @@ The following parameters are accepted by `keyword` fields: : Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations. [`ignore_above`](/reference/elasticsearch/mapping-reference/ignore-above.md) -: Do not index any field containing a string with more characters than this value. This is important because {{es}} -will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded. - - To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this length will be excluded from indexing. - - The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and - `8191` in logsdb indices. So, if unspecified, standard indices *can* reject documents. And logsdb indices - will index the document, but skip this field. - - The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields - defaults to a `text` field with a sub-`keyword` field with an `ignore_above` of `256`. This indexes - all values for full text search, and indexes short values get indexed for exact matching and aggregation. +: Do not index any field containing a string with more characters than this value. This is important because {{es}} + will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded. + + To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this + length will be excluded from indexing. + + The defaults are complicated: + * Standard indices: `2147483647` (effectively unbounded). Documents containing `keyword` fields longer than `32766` + bytes will be rejected. + * `logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are + accepted and the values unindexed values are available from `_source. + * The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields + defaults to a `text` field with a [sub](/reference/elasticsearch/mapping-reference/multi-fields.md)-`keyword` + field with an `ignore_above` of `256`. String fields longer than 256 characters are available for full text + search but won't have a value in their `.keyword` sub-field they can not do exact matching over _search. [`index`](/reference/elasticsearch/mapping-reference/mapping-index.md) : Should the field be quickly searchable? Accepts `true` (default) and `false`. `keyword` fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower.