diff --git a/docs/reference/query-dsl/query_filter_context.asciidoc b/docs/reference/query-dsl/query_filter_context.asciidoc index 78e1549644aa6..1fd75ef0e841d 100644 --- a/docs/reference/query-dsl/query_filter_context.asciidoc +++ b/docs/reference/query-dsl/query_filter_context.asciidoc @@ -29,26 +29,45 @@ parameter, such as the `query` parameter in the [discrete] [[filter-context]] === Filter context -In a filter context, a query clause answers the question ``__Does this -document match this query clause?__'' The answer is a simple Yes or No -- no -scores are calculated. Filter context is mostly used for filtering structured -data, e.g. -* __Does this +timestamp+ fall into the range 2015 to 2016?__ -* __Is the +status+ field set to ++"published"++__? +A filter answers the binary question “Does this document match this query clause?”. The answer is simply "yes" or "no". +Filtering has several benefits: -Frequently used filters will be cached automatically by Elasticsearch, to -speed up performance. +. *Simple binary logic*: In a filter context, a query clause determines document matches based on a yes/no criterion, without score calculation. +. *Performance*: Because they don't compute relevance scores, filters execute faster than queries. +. *Caching*: {es} automatically caches frequently used filters, speeding up subsequent search performance. +. *Resource efficiency*: Filters consume less CPU resources compared to full-text queries. +. *Query combination*: Filters can be combined with scored queries to refine result sets efficiently. -Filter context is in effect whenever a query clause is passed to a `filter` -parameter, such as the `filter` or `must_not` parameters in the -<> query, the `filter` parameter in the -<> query, or the -<> aggregation. +Filters are particularly effective for querying structured data and implementing "must have" criteria in complex searches. + +Structured data refers to information that is highly organized and formatted in a predefined manner. In the context of Elasticsearch, this typically includes: + +* Numeric fields (integers, floating-point numbers) +* Dates and timestamps +* Boolean values +* Keyword fields (exact match strings) +* Geo-points and geo-shapes + +Unlike full-text fields, structured data has a consistent, predictable format, making it ideal for precise filtering operations. + +Common filter applications include: + +* Date range checks: for example is the `timestamp` field between 2015 and 2016 +* Specific field value checks: for example is the `status` field equal to "published" or is the `author` field equal to "John Doe" + +Filter context applies when a query clause is passed to a `filter` parameter, such as: + +* `filter` or `must_not` parameters in <> queries +* `filter` parameter in <> queries +* <> aggregations + +Filters optimize query performance and efficiency, especially for structured data queries and when combined with full-text searches. [discrete] [[query-filter-context-ex]] === Example of query and filter contexts + Below is an example of query clauses being used in query and filter context in the `search` API. This query will match documents where all of the following conditions are met: @@ -93,4 +112,4 @@ significand's precision will be converted to floats with loss of precision. TIP: Use query clauses in query context for conditions which should affect the score of matching documents (i.e. how well does the document match), and use -all other query clauses in filter context. \ No newline at end of file +all other query clauses in filter context. diff --git a/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc b/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc new file mode 100644 index 0000000000000..46cadc19f2547 --- /dev/null +++ b/docs/reference/quickstart/full-text-filtering-tutorial.asciidoc @@ -0,0 +1,626 @@ +[[full-text-filter-tutorial]] +== Basic full-text search and filtering in {es} +++++ +Basics: Full-text search and filtering +++++ + +This is a hands-on introduction to the basics of full-text search with {es}, also known as _lexical search_, using the <> and <>. +You'll also learn how to filter data, to narrow down search results based on exact criteria. + +In this scenario, we're implementing a search function for a cooking blog. +The blog contains recipes with various attributes including textual content, categorical data, and numerical ratings. + +The goal is to create search queries that enable users to: + +* Find recipes based on ingredients they want to use or avoid +* Discover dishes suitable for their dietary needs +* Find highly-rated recipes in specific categories +* Find recent recipes from their favorite authors + +To achieve these goals we'll use different Elasticsearch queries to perform full-text search, apply filters, and combine multiple search criteria. + +[discrete] +[[full-text-filter-tutorial-create-index]] +=== Step 1: Create an index + +Create the `cooking_blog` index to get started: + +[source,console] +---- +PUT /cooking_blog +---- +// TESTSETUP + +Now define the mappings for the index: + +[source,console] +---- +PUT /cooking_blog/_mapping +{ + "properties": { + "title": { + "type": "text", + "analyzer": "standard", <1> + "fields": { <2> + "keyword": { + "type": "keyword", + "ignore_above": 256 <3> + } + } + }, + "description": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "author": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "date": { + "type": "date", + "format": "yyyy-MM-dd" + }, + "category": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "tags": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword" + } + } + }, + "rating": { + "type": "float" + } + } +} +---- +// TEST +<1> The `standard` analyzer is used by default for `text` fields if an `analyzer` isn't specified. It's included here for demonstration purposes. +<2> <> are used here to index `text` fields as both `text` and `keyword` <>. This enables both full-text search and exact matching/filtering on the same field. +Note that if you used <>, these multi-fields would be created automatically. +<3> The <> prevents indexing values longer than 256 characters in the `keyword` field. Again this is the default value, but it's included here for for demonstration purposes. +It helps to save disk space and avoid potential issues with Lucene's term byte-length limit. + +[TIP] +==== +Full-text search is powered by <>. +Text analysis normalizes and standardizes text data so it can be efficiently stored in an inverted index and searched in near real-time. +Analysis happens at both <>. +This tutorial won't cover analysis in detail, but it's important to understand how text is processed to create effective search queries. +==== + +[discrete] +[[full-text-filter-tutorial-index-data]] +=== Step 2: Add sample blog posts to your index + +Now you'll need to index some example blog posts using the <>. +Note that `text` fields are analyzed and multi-fields are generated at index time. + +[source,console] +---- +POST /cooking_blog/_bulk?refresh=wait_for +{"index":{"_id":"1"}} +{"title":"Perfect Pancakes: A Fluffy Breakfast Delight","description":"Learn the secrets to making the fluffiest pancakes, so amazing you won't believe your tastebuds. This recipe uses buttermilk and a special folding technique to create light, airy pancakes that are perfect for lazy Sunday mornings.","author":"Maria Rodriguez","date":"2023-05-01","category":"Breakfast","tags":["pancakes","breakfast","easy recipes"],"rating":4.8} +{"index":{"_id":"2"}} +{"title":"Spicy Thai Green Curry: A Vegetarian Adventure","description":"Dive into the flavors of Thailand with this vibrant green curry. Packed with vegetables and aromatic herbs, this dish is both healthy and satisfying. Don't worry about the heat - you can easily adjust the spice level to your liking.","author":"Liam Chen","date":"2023-05-05","category":"Main Course","tags":["thai","vegetarian","curry","spicy"],"rating":4.6} +{"index":{"_id":"3"}} +{"title":"Classic Beef Stroganoff: A Creamy Comfort Food","description":"Indulge in this rich and creamy beef stroganoff. Tender strips of beef in a savory mushroom sauce, served over a bed of egg noodles. It's the ultimate comfort food for chilly evenings.","author":"Emma Watson","date":"2023-05-10","category":"Main Course","tags":["beef","pasta","comfort food"],"rating":4.7} +{"index":{"_id":"4"}} +{"title":"Vegan Chocolate Avocado Mousse","description":"Discover the magic of avocado in this rich, vegan chocolate mousse. Creamy, indulgent, and secretly healthy, it's the perfect guilt-free dessert for chocolate lovers.","author":"Alex Green","date":"2023-05-15","category":"Dessert","tags":["vegan","chocolate","avocado","healthy dessert"],"rating":4.5} +{"index":{"_id":"5"}} +{"title":"Crispy Oven-Fried Chicken","description":"Get that perfect crunch without the deep fryer! This oven-fried chicken recipe delivers crispy, juicy results every time. A healthier take on the classic comfort food.","author":"Maria Rodriguez","date":"2023-05-20","category":"Main Course","tags":["chicken","oven-fried","healthy"],"rating":4.9} +---- +// TEST[continued] + +[discrete] +[[full-text-filter-tutorial-match-query]] +=== Step 3: Perform basic full-text searches + +Full-text search involves executing text-based queries across one or more document fields. +These queries calculate a relevance score for each matching document, based on how closely the document's content aligns with the search terms. +{es} offers various query types, each with its own method for matching text and <>. + +[discrete] +==== `match` query + +The <> query is the standard query for full-text, or "lexical", search. +The query text will be analyzed according to the analyzer configuration specified on each field (or at query time). + +First, search the `description` field for "fluffy pancakes": + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "match": { + "description": { + "query": "fluffy pancakes" <1> + } + } + } +} +---- +// TEST[continued] +<1> By default, the `match` query uses `OR` logic between the resulting tokens. This means it will match documents that contain either "fluffy" or "pancakes", or both, in the description field. + +At search time, {es} defaults to the analyzer defined in the field mapping. In this example, we're using the `standard` analyzer. Using a different analyzer at search time is an <>. + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 0, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { <1> + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.8378843, <2> + "hits": [ + { + "_index": "cooking_blog", + "_id": "1", + "_score": 1.8378843, <3> + "_source": { + "title": "Perfect Pancakes: A Fluffy Breakfast Delight", <4> + "description": "Learn the secrets to making the fluffiest pancakes, so amazing you won't believe your tastebuds. This recipe uses buttermilk and a special folding technique to create light, airy pancakes that are perfect for lazy Sunday mornings.", <5> + "author": "Maria Rodriguez", + "date": "2023-05-01", + "category": "Breakfast", + "tags": [ + "pancakes", + "breakfast", + "easy recipes" + ], + "rating": 4.8 + } + } + ] + } +} +---- +// TESTRESPONSE[s/"took": 0/"took": "$body.took"/] +// TESTRESPONSE[s/"total": 1/"total": $body._shards.total/] +// TESTRESPONSE[s/"successful": 1/"successful": $body._shards.successful/] +// TESTRESPONSE[s/"value": 1/"value": $body.hits.total.value/] +// TESTRESPONSE[s/"max_score": 1.8378843/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 1.8378843/"_score": $body.hits.hits.0._score/] +<1> The `hits` object contains the total number of matching documents and their relation to the total. Refer to <> for more details about the `hits` object. +<2> `max_score` is the highest relevance score among all matching documents. In this example, we only have one matching document. +<3> `_score` is the relevance score for a specific document, indicating how well it matches the query. Higher scores indicate better matches. In this example the `max_score` is the same as the `_score`, as there is only one matching document. +<4> The title contains both "Fluffy" and "Pancakes", matching our search terms exactly. +<5> The description includes "fluffiest" and "pancakes", further contributing to the document's relevance due to the analysis process. +============== + +[discrete] +==== Require all terms in a match query + +Specify the `and` operator to require both terms in the `description` field. +This stricter search returns _zero hits_ on our sample data, as no document contains both "fluffy" and "pancakes" in the description. + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "match": { + "description": { + "query": "fluffy pancakes", + "operator": "and" + } + } + } +} +---- +// TEST[continued] + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 0, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + } +} +---- +// TESTRESPONSE[s/"took": 0/"took": "$body.took"/] +============== + +[discrete] +==== Specify a minimum number of terms to match + +Use the <> parameter to specify the minimum number of terms a document should have to be included in the search results. + +Search the title field to match at least 2 of the 3 terms: "fluffy", "pancakes", or "breakfast". +This is useful for improving relevance while allowing some flexibility. + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "match": { + "title": { + "query": "fluffy pancakes breakfast", + "minimum_should_match": 2 + } + } + } +} +---- +// TEST[continued] + +[discrete] +[[full-text-filter-tutorial-multi-match]] +=== Step 4: Search across multiple fields at once + +When users enter a search query, they often don't know (or care) whether their search terms appear in a specific field. +A <> query allows searching across multiple fields simultaneously. + +Let's start with a basic `multi_match` query: + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "multi_match": { + "query": "vegetarian curry", + "fields": ["title", "description", "tags"] + } + } +} +---- +// TEST[continued] + +This query searches for "vegetarian curry" across the title, description, and tags fields. Each field is treated with equal importance. + +However, in many cases, matches in certain fields (like the title) might be more relevant than others. We can adjust the importance of each field using field boosting: + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "multi_match": { + "query": "vegetarian curry", + "fields": ["title^3", "description^2", "tags"] <1> + } + } +} +---- +// TEST[continued] +<1> The `^` syntax applies a boost to specific fields: ++ +* `title^3`: The title field is 3 times more important than an unboosted field +* `description^2`: The description is 2 times more important +* `tags`: No boost applied (equivalent to `^1`) ++ +These boosts help tune relevance, prioritizing matches in the title over the description, and matches in the description over tags. + +Learn more about fields and per-field boosting in the <> reference. + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 0, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 7.546015, + "hits": [ + { + "_index": "cooking_blog", + "_id": "2", + "_score": 7.546015, + "_source": { + "title": "Spicy Thai Green Curry: A Vegetarian Adventure", <1> + "description": "Dive into the flavors of Thailand with this vibrant green curry. Packed with vegetables and aromatic herbs, this dish is both healthy and satisfying. Don't worry about the heat - you can easily adjust the spice level to your liking.", <2> + "author": "Liam Chen", + "date": "2023-05-05", + "category": "Main Course", + "tags": [ + "thai", + "vegetarian", + "curry", + "spicy" + ], <3> + "rating": 4.6 + } + } + ] + } +} +---- +// TESTRESPONSE[s/"took": 0/"took": "$body.took"/] +// TESTRESPONSE[s/"_score": 7.546015/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"max_score": 7.546015/"max_score": $body.hits.max_score/] +<1> The title contains "Vegetarian" and "Curry", which matches our search terms. The title field has the highest boost (^3), contributing significantly to this document's relevance score. +<2> The description contains "curry" and related terms like "vegetables", further increasing the document's relevance. +<3> The tags include both "vegetarian" and "curry", providing an exact match for our search terms, albeit with no boost. + +This result demonstrates how the `multi_match` query with field boosts helps users find relevant recipes across multiple fields. +Even though the exact phrase "vegetarian curry" doesn't appear in any single field, the combination of matches across fields produces a highly relevant result. +============== + +[TIP] +==== +The `multi_match` query is often recommended over a single `match` query for most text search use cases, as it provides more flexibility and better matches user expectations. +==== + +[discrete] +[[full-text-filter-tutorial-filtering]] +=== Step 5: Filter and find exact matches + +<> allows you to narrow down your search results based on exact criteria. +Unlike full-text searches, filters are binary (yes/no) and do not affect the relevance score. +Filters execute faster than queries because excluded results don't need to be scored. + +This <> query will return only blog posts in the "Breakfast" category. + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "bool": { + "filter": [ + { "term": { "category.keyword": "Breakfast" } } <1> + ] + } + } +} +---- +// TEST[continued] +<1> Note the use of `category.keyword` here. This refers to the <> multi-field of the `category` field, ensuring an exact, case-sensitive match. + +[TIP] +==== +The `.keyword` suffix accesses the unanalyzed version of a field, enabling exact, case-sensitive matching. This works in two scenarios: + +1. *When using dynamic mapping for text fields*. Elasticsearch automatically creates a `.keyword` sub-field. +2. *When text fields are explicitly mapped with a `.keyword` sub-field*. For example, we explicitly mapped the `category` field in <> of this tutorial. +==== + +[discrete] +[[full-text-filter-tutorial-range-query]] +==== Search for posts within a date range + +Often users want to find content published within a specific time frame. +A <> query finds documents that fall within numeric or date ranges. + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "range": { + "date": { + "gte": "2023-05-01", <1> + "lte": "2023-05-31" <2> + } + } + } +} +---- +// TEST[continued] +<1> Greater than or equal to May 1, 2023. +<2> Less than or equal to May 31, 2023. + +[discrete] +[[full-text-filter-tutorial-term-query]] +==== Find exact matches + +Sometimes users want to search for exact terms to eliminate ambiguity in their search results. +A <> query searches for an exact term in a field without analyzing it. +Exact, case-sensitive matches on specific terms are often referred to as "keyword" searches. + +Here you'll search for the author "Maria Rodriguez" in the `author.keyword` field. + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "term": { + "author.keyword": "Maria Rodriguez" <1> + } + } +} +---- +// TEST[continued] +<1> The `term` query has zero flexibility. For example, here the queries `maria` or `maria rodriguez` would have zero hits, due to case sensitivity. + +[TIP] +==== +Avoid using the `term` query for <> because they are transformed by the analysis process. +==== + +[discrete] +[[full-text-filter-tutorial-complex-bool]] +=== Step 6: Combine multiple search criteria + +A <> query allows you to combine multiple query clauses to create sophisticated searches. +In this tutorial scenario it's useful for when users have complex requirements for finding recipes. + +Let's create a query that addresses the following user needs: + +* Must be a vegetarian main course +* Should contain "curry" or "spicy" in the title or description +* Must not be a dessert +* Must have a rating of at least 4.5 +* Should prefer recipes published in the last month + +[source,console] +---- +GET /cooking_blog/_search +{ + "query": { + "bool": { + "must": [ + { + "term": { + "category.keyword": "Main Course" + } + }, + { + "term": { + "tags": "vegetarian" + } + }, + { + "range": { + "rating": { + "gte": 4.5 + } + } + } + ], + "should": [ + { + "multi_match": { + "query": "curry spicy", + "fields": ["title^2", "description"] + } + }, + { + "range": { + "date": { + "gte": "now-1M/d" + } + } + } + ], + "must_not": [ <1> + { + "term": { + "category.keyword": "Dessert" + } + } + ] + } + } +} +---- +// TEST[continued] +<1> The `must_not` clause excludes documents that match the specified criteria. This is a powerful tool for filtering out unwanted results. + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 1, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 7.9835095, + "hits": [ + { + "_index": "cooking_blog", + "_id": "2", + "_score": 7.9835095, + "_source": { + "title": "Spicy Thai Green Curry: A Vegetarian Adventure", <1> + "description": "Dive into the flavors of Thailand with this vibrant green curry. Packed with vegetables and aromatic herbs, this dish is both healthy and satisfying. Don't worry about the heat - you can easily adjust the spice level to your liking.", <2> + "author": "Liam Chen", + "date": "2023-05-05", + "category": "Main Course", <3> + "tags": [ <4> + "thai", + "vegetarian", <5> + "curry", + "spicy" + ], + "rating": 4.6 <6> + } + } + ] + } +} +---- +// TESTRESPONSE[s/"took": 1/"took": "$body.took"/] +<1> The title contains "Spicy" and "Curry", matching our should condition. With the default <> behavior, this field contributes most to the relevance score. +<2> While the description also contains matching terms, only the best matching field's score is used by default. +<3> The recipe was published within the last month, satisfying our recency preference. +<4> The "Main Course" category matches our `must` condition. +<5> The "vegetarian" tag satisfies another `must` condition, while "curry" and "spicy" tags align with our `should` preferences. +<6> The rating of 4.6 meets our minimum rating requirement of 4.5. +============== + +[discrete] +[[full-text-filter-tutorial-learn-more]] +=== Learn more + +This tutorial introduced the basics of full-text search and filtering in {es}. +Building a real-world search experience requires understanding many more advanced concepts and techniques. +Here are some resources once you're ready to dive deeper: + +* <>: Understand all your options for searching and analyzing data in {es}. +* <>: Understand how text is processed for full-text search. +* <>: Learn about more advanced search techniques using the `_search` API, including semantic search. + + diff --git a/docs/reference/quickstart/index.asciidoc b/docs/reference/quickstart/index.asciidoc index 2d9114882254f..ed4c128392994 100644 --- a/docs/reference/quickstart/index.asciidoc +++ b/docs/reference/quickstart/index.asciidoc @@ -15,7 +15,8 @@ Get started <> , or see our <>. Learn about indices, documents, and mappings, and perform a basic search. +* <>. Learn about indices, documents, and mappings, and perform a basic search using the Query DSL. +* <>. Learn about different options for querying data, including full-text search and filtering, using the Query DSL. [discrete] [[quickstart-python-links]] @@ -27,3 +28,4 @@ If you're interested in using {es} with Python, check out Elastic Search Labs: * https://www.elastic.co/search-labs/tutorials/search-tutorial/welcome[Tutorial]: This walks you through building a complete search solution with {es} from the ground up using Flask. include::getting-started.asciidoc[] +include::full-text-filtering-tutorial.asciidoc[]