Skip to content

Explain ignore_above better #129284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 11, 2025

This concept is complicated.

Closes #128991

This concept is complicated.

Closes elastic#128991
@nik9000 nik9000 requested a review from limotova June 11, 2025 18:42
@nik9000 nik9000 added >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types v9.1.0 labels Jun 11, 2025
@elasticsearchmachine elasticsearchmachine added Team:Docs Meta label for docs team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Jun 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

If you need to never reject documents, this should have some value `<=8191`. All documents with
more characters will just skip building the index for this field.

The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and
Copy link
Contributor

@leemthompo leemthompo Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using bullets for defaults/dynamic mapping info for readability

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Copy link
Member

@bmorelli25 bmorelli25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked like fun so I couldn't resist. Sorry if I'm wrong! Also, hi Nik 👋 .

Comment on lines +73 to +77
: Do not index any field containing a string with more characters than this value. This is important because {{es}}
will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded.

To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this
length will be excluded from indexing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on text fields? Or only keyword fields?

Also further down you say:

`logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
      accepted and the values unindexed values are available from `_source.

Does the previous statement only apply to logsdb indices? Or to standard indices as well? If both, that feels important.

What about this:

Skip indexing of a keyword value whose UTF-8–encoded size is larger than ignore_above. The value is still kept in _source, but the field won’t be searchable or aggregatable.

If you do not set ignore_above, {es} will reject entire documents if they contain one or more keyword fields exceeding a UTF-8–encoded size of 32766.

To avoid any risk of document rejection, set this value to 8191 or less.

Comment on lines +79 to +83
The defaults are complicated:
* Standard indices: `2147483647` (effectively unbounded). Documents containing `keyword` fields longer than `32766`
bytes will be rejected.
* `logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
accepted and the values unindexed values are available from `_source.
Copy link
Member

@bmorelli25 bmorelli25 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a table for this information?

The defaults are complicated:

Index type Default Effect
Standard indices 2147483647 (effectively unbounded) Documents will be rejected if any keyword exceeds 32766 bytes.
logsdb indices 8191 Documents are never rejected. Keywords exceding this limit are still kept in _source, but won’t be searchable or aggregatable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see this is in definition list already, so maybe a table won't work. But if you like my wording you can update accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me like that wording :)

Comment on lines +84 to +87
* The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields
defaults to a `text` field with a [sub](/reference/elasticsearch/mapping-reference/multi-fields.md)-`keyword`
field with an `ignore_above` of `256`. String fields longer than 256 characters are available for full text
search but won't have a value in their `.keyword` sub-field they can not do exact matching over _search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I struggle to understand. But it feels separate from the defaults above? Maybe this can be in a new paragraph. I think you're saying that...

When ES finds a new string field without an explicit mapping, it automatically:

  1. Maps the field to a text field so the entire value is searchable with full-text search.
  2. Adds a sub keyword field with ignore_above set to 256 bytes. This means that values less than 256 bytes are available for exact matching over _search. Values longer than that are still searchable via the text field, but are not indexed as keywords.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Docs Meta label for docs team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify docs around keyword ignore_above setting.
4 participants