Skip to content

ESQL: Add documents_found and values_loaded #125631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 25, 2025

This adds documents_found and values_loaded to the to the ESQL response:

{
  "took" : 194,
  "is_partial" : false,
  "documents_found" : 100000,
  "values_loaded" : 200000,
  "columns" : [
    { "name" : "a", "type" : "long" },
    { "name" : "b", "type" : "long" }
  ],
  "values" : [[10, 1]]
}

These are cheap enough to collect that we can do it for every query and
return it with every response. It's small, but it still gives you a
reasonable sense of how much work Elasticsearch had to go through to
perform the query.

I've also added these two fields to the driver profile and task status:

    "drivers" : [
      {
        "description" : "data",
        "cluster_name" : "runTask",
        "node_name" : "runTask-0",
        "start_millis" : 1742923173077,
        "stop_millis" : 1742923173087,
        "took_nanos" : 9557014,
        "cpu_nanos" : 9091340,
        "documents_found" : 5,   <---- THESE
        "values_loaded" : 15,    <---- THESE
        "iterations" : 6,
...

These are at a high level and should be easy to reason about. We'd like to
extract this into a "show me how difficult this running query is" API one
day. But today, just plumbing it into the debugging output is good.

Any Operator can claim to "find documents" or "load values" by overriding
a method on its Operator.Status implementation:

/**
 * The number of documents found by this operator. Most operators
 * don't find documents and will return {@code 0} here.
 */
default long documentsFound() {
    return 0;
}

/**
 * The number of values loaded by this operator. Most operators
 * don't load values and will return {@code 0} here.
 */
default long valuesLoaded() {
    return 0;
}

In this PR all of the LuceneOperators declare that each position they
emit is a "document found" and the ValuesSourceValuesSourceReaderOperator
says each value it makes is a "value loaded". That's pretty pretty much
true. The LuceneCountOperator and LuceneMinMaxOperator sort of pretend
that the count/min/max that they emit is a "document" - but that's good
enough to give you a sense of what's going on. It's like document.

nik9000 added 4 commits March 25, 2025 13:19
Adds the `found_documents` and `values_loaded` fields to the tasks and
profile for ESQL:
```
    "drivers" : [
      {
        "description" : "data",
        "cluster_name" : "runTask",
        "node_name" : "runTask-0",
        "start_millis" : 1742923173077,
        "stop_millis" : 1742923173087,
        "took_nanos" : 9557014,
        "cpu_nanos" : 9091340,
        "documents_found" : 5,   <---- THESE
        "values_loaded" : 15,    <---- THESE
        "iterations" : 6,
...
```
```
{
  "took" : 194,
  "is_partial" : false,
  "documents_found" : 100000,
  "values_loaded" : 200000,
  "columns" : [
    { "name" : "a", "type" : "long" },
    { "name" : "b", "type" : "long" }
  ],
  "values" : [[10, 1]]
}
```
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000
Copy link
Member Author

nik9000 commented Mar 25, 2025

I've stuck the ES|QL-ui label on this to ping the folks working on the editor. I think this is somewhat useful information to hand folks but I have no idea how you'd do it. It's certainly not a big shouty thing. It's more the style of the line number read out in your text editor.

@nik9000 nik9000 requested a review from giladgal March 25, 2025 23:07
@nik9000
Copy link
Member Author

nik9000 commented Mar 25, 2025

I believe I can get a bytes_loaded without a huge amount of trouble, but I'd like to wait for a follow up because it's kind of tricky. The easiest thing I can get is the size of the outgoing blocks from the column reader. That's absolutely not the number of bytes we read from disk because we convert from doc values into our block format. And if we're loading from _source we have to load the whole thing and decompress it and stuff. But it's a useful figure.

@nik9000
Copy link
Member Author

nik9000 commented Mar 26, 2025

Sorry for the ping @giladgal. Wrong Gal.

@nik9000
Copy link
Member Author

nik9000 commented Mar 26, 2025

I'll need to update the docs in this PR!

@nik9000
Copy link
Member Author

nik9000 commented Mar 26, 2025

@GalLalouche, this doesn't expose the documents_found and values_loaded running sums in the node level tasks - but it does make the sum. I think we can shift it in a follow up change.

Copy link
Contributor

@GalLalouche GalLalouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a few small nitpicks and suggestions.

}

Status(StreamInput in) throws IOException {
super(in);
readersBuilt = in.readOrderedMap(StreamInput::readString, StreamInput::readVInt);
if (in.getTransportVersion().onOrAfter(ESQL_VALUES_LOADED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using ?: here (I Know you don't like it, but it seems so obvious here :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better here, yeah. Sometimes in the deserialization the ternary makes it unreadable. But here it's good.

* The number of documents found by this driver.
*/
public long documentsFound() {
long documentsFound = completedOperators.stream().mapToLong(OperatorStatus::documentsFound).sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider just inlining this. The streams can also be combined:

return Stream.concat(completedOperators.stream(), activeOperators.stream()).mapToLong(OperatorStatus::documentsFound).sum()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's better. I had thought about it and didn't like how it looked in my mind but when you type it it looks fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment - how often will this method be called? @idegtiarenko found several instances of streams showing up in the profiler vs regular for/int.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every minute or so.

@@ -83,7 +83,11 @@ public int getPositionCount() {

@Override
public int getTotalValueCount() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed to long, or are we not worried about overflows here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow would be a bug in planning. That'd be us using billions of values in a single block....

/**
* Information returned when one of more {@link Driver}s is completed.
* @param documentsFound The number of documents found by all lucene queries performed by these drivers.
* @param valuesLoaded The number of values loaded from lucene for all drivers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you go into more details as to how valuesLoaded differs from documentsFound? Maybe it's obvious and I just don't have the context, but it's unclear to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valuesLoaded are the field values - so the value for @timestamp or foo or whatever. And it's the number of them. So [1, 2] is two values and null is zero values.

* Are the {@code documents_found} and {@code values_loaded} fields available
* in the response and profile?
*/
DOCUMENTS_FOUND,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named something more inclusive to values_loaded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be. I can make the transport version the same and have it mention both.

default long valuesLoaded() {
return 0;
}
}
Copy link
Contributor

@idegtiarenko idegtiarenko Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change, but I think it is worth revisiting operator status.
Currently this is interface with implementation per each operator. It is used to expose some diagnostics attributes outside.

I believe we should replace it with a simple map (or maybe a wrapper on top of the map).
Common schema could be achieved with some sort of status builder (something like status().withPagesProcessed(..).withRowReceived(..).withRowsEmitted(..).build())
This way we would not need to have a custom implementation per operator, could have common equals/hashCode and toString logic. With map structure we will not need a new transport version when adding/renaming/removing infos.

Copy link
Contributor

@idegtiarenko idegtiarenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall 👍 from me.
Left suggestion for status implementation that we should discuss separately.

@nik9000
Copy link
Member Author

nik9000 commented Apr 15, 2025

The 9.0 bwc tests failed because of the 9.0 release. main is getting update to know about the version number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants