Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eap-api): support conditional aggregations in SELECT #6870

Merged
merged 15 commits into from
Feb 18, 2025

Conversation

xurui-c
Copy link
Member

@xurui-c xurui-c commented Feb 10, 2025

https://github.com/getsentry/eap-planning/issues/166

We now support conditional aggregations for EndpointTraceItemTable. Next is time series

@xurui-c xurui-c force-pushed the rachel/aggregateIf branch 2 times, most recently from 3224936 to 4217164 Compare February 10, 2025 22:19
Copy link

codecov bot commented Feb 10, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2882 1 2881 11
View the top 1 failed test(s) by shortest run time
tests.web.rpc.v1.visitors.test_sparse_aggregate_attribute_transformer::test_basic
Stack Traces | 0.002s run time
Traceback (most recent call last):
  File ".../v1/visitors/test_sparse_aggregate_attribute_transformer.py", line 79, in test_basic
    assert transformed.filter == TraceItemFilter(
AssertionError: assert exists_filter {\n  key {\n    type: TYPE_STRING\n    name: "sentry.category"\n  }\n}\n == and_filter {\n  filters {\n    exists_filter {\n      key {\n        type: TYPE_STRING\n        name: "sentry.category"\n      }\n    }\n  }\n  filters {\n    or_filter {\n      filters {\n        exists_filter {\n          key {\n            type: TYPE_DOUBLE\n            name: "my.float.field"\n          }\n        }\n      }\n      filters {\n        exists_filter {\n          key {\n            type: TYPE_DOUBLE\n            name: "my.float.field"\n          }\n        }\n      }\n    }\n  }\n}\n
 +  where exists_filter {\n  key {\n    type: TYPE_STRING\n    name: "sentry.category"\n  }\n}\n = meta {\n  organization_id: 1\n  cogs_category: "something"\n  referrer: "something"\n  project_ids: 1\n  project_ids: 2\n  project_ids: 3\n  start_timestamp {\n    seconds: 1739530800\n  }\n  end_timestamp {\n    seconds: 1739530800\n  }\n  trace_item_type: TRACE_ITEM_TYPE_SPAN\n}\ncolumns {\n  key {\n    type: TYPE_STRING\n    name: "location"\n  }\n}\ncolumns {\n  aggregation {\n    aggregate: FUNCTION_MAX\n    key {\n      type: TYPE_DOUBLE\n      name: "my.float.field"\n    }\n    label: "max(my.float.field)"\n    extrapolation_mode: EXTRAPOLATION_MODE_NONE\n  }\n}\ncolumns {\n  aggregation {\n    aggregate: FUNCTION_AVG\n    key {\n      type: TYPE_DOUBLE\n      name: "my.float.field"\n    }\n    label: "avg(my.float.field)"\n    extrapolation_mode: EXTRAPOLATION_MODE_NONE\n  }\n}\nfilter {\n  exists_filter {\n    key {\n      type: TYPE_STRING\n      name: "sentry.category"\n    }\n  }\n}\norder_by {\n  column {\n    key {\n      type: TYPE_STRING\n      name: "location"\n    }\n  }\n}\ngroup_by {\n  type: TYPE_STRING\n  name: "location"\n}\nlimit: 5\n.filter
 +  and   and_filter {\n  filters {\n    exists_filter {\n      key {\n        type: TYPE_STRING\n        name: "sentry.category"\n      }\n    }\n  }\n  filters {\n    or_filter {\n      filters {\n        exists_filter {\n          key {\n            type: TYPE_DOUBLE\n            name: "my.float.field"\n          }\n        }\n      }\n      filters {\n        exists_filter {\n          key {\n            type: TYPE_DOUBLE\n            name: "my.float.field"\n          }\n        }\n      }\n    }\n  }\n}\n = TraceItemFilter(and_filter=filters {\n  exists_filter {\n    key {\n      type: TYPE_STRING\n      name: "sentry.category"\n    }\n  }\n}\nfilters {\n  or_filter {\n    filters {\n      exists_filter {\n        key {\n          type: TYPE_DOUBLE\n          name: "my.float.field"\n        }\n      }\n    }\n    filters {\n      exists_filter {\n        key {\n          type: TYPE_DOUBLE\n          name: "my.float.field"\n        }\n      }\n    }\n  }\n}\n)
 +    where filters {\n  exists_filter {\n    key {\n      type: TYPE_STRING\n      name: "sentry.category"\n    }\n  }\n}\nfilters {\n  or_filter {\n    filters {\n      exists_filter {\n        key {\n          type: TYPE_DOUBLE\n          name: "my.float.field"\n        }\n      }\n    }\n    filters {\n      exists_filter {\n        key {\n          type: TYPE_DOUBLE\n          name: "my.float.field"\n        }\n      }\n    }\n  }\n}\n = AndFilter(filters=[exists_filter {\n  key {\n    type: TYPE_STRING\n    name: "sentry.category"\n  }\n}\n, or_filter {\n  filters {\n    exists_filter {\n      key {\n        type: TYPE_DOUBLE\n        name: "my.float.field"\n      }\n    }\n  }\n  filters {\n    exists_filter {\n      key {\n        type: TYPE_DOUBLE\n        name: "my.float.field"\n      }\n    }\n  }\n}\n])

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@xurui-c xurui-c marked this pull request as ready for review February 11, 2025 21:43
@xurui-c xurui-c requested review from a team as code owners February 11, 2025 21:43
Copy link
Contributor

@davidtsuk davidtsuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to update the confidence interval calculations to include the condition. Also, could you please add a test for extrapolated conditional aggregates

snuba/web/db_query.py Outdated Show resolved Hide resolved
snuba/web/rpc/v1/endpoint_trace_item_table.py Outdated Show resolved Hide resolved
Copy link
Member

@kylemumma kylemumma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should clean up those print statements laying around but other than that LGTM assuming the tests pass. Also thanks for including a PR description.

Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the changes that I see in sentry-protos, you have recreated the aggregation struct with the condition field that you need (we spoke about this):

https://github.com/getsentry/sentry-protos/pull/109/files#diff-4b3d0e1ccced0b672ff3b41746de66f106ab7239b75c2bcc10048ec4c0017a9bR9-R13

But in this code here, what you are doing is creating branches for both cases, this makes the code more fragmented.

What you should be doing instead, is converting the old (soon to be deprecated) object into the new AttributeConditionalAggregation object in the very beginning and then doing all of your operations from there

@xurui-c xurui-c force-pushed the rachel/aggregateIf branch 2 times, most recently from d9ff324 to 4c5dcc2 Compare February 12, 2025 22:23
@xurui-c
Copy link
Member Author

xurui-c commented Feb 12, 2025

There's still some of the fragmented code in aggregation.py because it's still being used by the TimeSeries endpoint. I'll get rid of those in a subsequent PR for time series

@xurui-c xurui-c force-pushed the rachel/aggregateIf branch 2 times, most recently from 68376ab to cea3456 Compare February 13, 2025 17:48
Rachel Chen added 2 commits February 13, 2025 13:15
@xurui-c xurui-c force-pushed the rachel/aggregateIf branch 2 times, most recently from c8d8a92 to dca9e59 Compare February 14, 2025 01:56
Rachel Chen and others added 2 commits February 13, 2025 18:02
),
alias=alias,
)


def _get_count_column_alias(aggregation: AttributeAggregation) -> str:
def _get_count_column_alias(
aggregation: AttributeAggregation | AttributeConditionalAggregation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you've transformed everything up front, when would you still have an AttributeAggregation in the arguments here?

Copy link
Member Author

@xurui-c xurui-c Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time series endpoint calls get_count_column which calls _get_count_column_alias. We haven't transformed it in time series endpoint yet (will do in later PR)

@@ -80,6 +87,48 @@ def _transform_request(request: TraceItemTableRequest) -> TraceItemTableRequest:
return SparseAggregateAttributeTransformer(request).transform()


def convert_to_conditional_aggregation(in_msg: TraceItemTableRequest) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How different would this function be for the TimeSeries endpoint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. leave a comment explaining why this is being done. It's not immediately clear unless you are the user
  2. Test this function independently. It has clearly defined inputs and outputs

Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! My main concern is just testing that request processor you wrote independently

@@ -2595,12 +2785,73 @@ def test_nonexistent_attribute(setup_teardown: Any) -> None:
]


def _build_sum_attribute_aggregation_column_with_name(name: str) -> Column:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by which i mean this should be in its own file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_conditional_aggregation.py

This function adds the equivalent conditional aggregation for every aggregation in each Column or AggregationFilter. We don't add the filter field so outside code logic will set the default condition to true. The purpose of this function is to "transform" every AttributeAggregation to AttributeConditionalAggregation in order to avoid code fragmentation
"""

def _add_conditional_aggregation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if you can replace in all the cases so you don't have the split behavior with Column and AggregationComparsionFilter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace is done with input.ClearField("aggregation"), but I still need the separate isinstance(input, Column) and isinstance(input, AggregationFilter) because of mypy. Also I feel like it makes _convert more readable(?)

@@ -80,6 +87,52 @@ def _transform_request(request: TraceItemTableRequest) -> TraceItemTableRequest:
return SparseAggregateAttributeTransformer(request).transform()


def convert_to_conditional_aggregation(in_msg: TraceItemTableRequest) -> None:
"""
This function adds the equivalent conditional aggregation for every aggregation in each Column or AggregationFilter. We don't add the filter field so outside code logic will set the default condition to true. The purpose of this function is to "transform" every AttributeAggregation to AttributeConditionalAggregation in order to avoid code fragmentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain historical reasons here, that's the important part. Think of yourself as a person new to this code

@xurui-c xurui-c requested a review from volokluev February 18, 2025 19:55
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better

@xurui-c xurui-c enabled auto-merge (squash) February 18, 2025 23:16
@xurui-c xurui-c merged commit 7b60369 into master Feb 18, 2025
32 checks passed
@xurui-c xurui-c deleted the rachel/aggregateIf branch February 18, 2025 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants