Skip to content

ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 23, 2025

Conversation

untitaker
Copy link
Member

@untitaker untitaker commented May 23, 2025

VIEPF-30

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 23, 2025
@untitaker untitaker marked this pull request as ready for review May 23, 2025 10:59
@untitaker untitaker requested review from a team as code owners May 23, 2025 10:59
Copy link

codecov bot commented May 23, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 6 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/spans/consumers/process/flusher.py 73.91% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #92195   +/-   ##
=======================================
  Coverage   87.93%   87.93%           
=======================================
  Files       10174    10174           
  Lines      583376   583385    +9     
  Branches    22596    22596           
=======================================
+ Hits       512975   512989   +14     
+ Misses      69949    69944    -5     
  Partials      452      452           

@@ -118,41 +119,50 @@ def produce(payload: KafkaPayload) -> None:
producer_futures.append(producer.produce(topic, payload))

while not stopped.value:
now = int(time.time()) + current_drift.value
flushed_segments = buffer.flush_segments(max_segments=max_flush_segments, now=now)
with metrics.timer("spans.buffer.flusher.loop_body"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This measurement includes the time.sleep below which seems unfortunate, since we're just idling - it doesn't tell us how fast the loop can actually process data. Otherwise, all the relevant measurements are already covered by flusher.produce below. I'd opt to keep only the produce timings below, but not the entire loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

continue

spans = [span.payload for span in flushed_segment.spans]
with metrics.timer("spans.buffer.flusher.produce"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the call to produce fully async below, or can it block when the buffer is full?
Either way, a significant portion of this is the dumps - Can we split this timing into serializing and producing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's mostly about the dumps, yeah. spawning the futures and everything that is necessary. open to suggestions to rename the metric.

wait_produce is then joining on those futures

metrics.timing("spans.buffer.segment_size_bytes", len(kafka_payload.value))
produce(kafka_payload)
kafka_payload = KafkaPayload(
None, rapidjson.dumps({"spans": spans}).encode("utf8"), []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it - most other code uses orjson. Can you double-check and update this, if it's valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@untitaker untitaker merged commit ad8c5f5 into master May 23, 2025
61 checks passed
@untitaker untitaker deleted the ref/flusher-backpressure-and-metrics branch May 23, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants