ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

untitaker · 2025-05-23T10:56:31Z

VIEPF-30

codecov · 2025-05-23T11:26:09Z

Codecov Report

Attention: Patch coverage is 80.00000% with 6 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/spans/consumers/process/flusher.py	73.91%	6 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #92195   +/-   ##
=======================================
  Coverage   87.93%   87.93%           
=======================================
  Files       10174    10174           
  Lines      583376   583385    +9     
  Branches    22596    22596           
=======================================
+ Hits       512975   512989   +14     
+ Misses      69949    69944    -5     
  Partials      452      452

jan-auer · 2025-05-23T11:43:19Z

src/sentry/spans/consumers/process/flusher.py

@@ -118,41 +119,50 @@ def produce(payload: KafkaPayload) -> None:
                    producer_futures.append(producer.produce(topic, payload))

            while not stopped.value:
-                now = int(time.time()) + current_drift.value
-                flushed_segments = buffer.flush_segments(max_segments=max_flush_segments, now=now)
+                with metrics.timer("spans.buffer.flusher.loop_body"):


This measurement includes the time.sleep below which seems unfortunate, since we're just idling - it doesn't tell us how fast the loop can actually process data. Otherwise, all the relevant measurements are already covered by flusher.produce below. I'd opt to keep only the produce timings below, but not the entire loop.

jan-auer · 2025-05-23T11:44:21Z

src/sentry/spans/consumers/process/flusher.py

                        continue

-                    spans = [span.payload for span in flushed_segment.spans]
+                    with metrics.timer("spans.buffer.flusher.produce"):


Isn't the call to produce fully async below, or can it block when the buffer is full?
Either way, a significant portion of this is the dumps - Can we split this timing into serializing and producing?

it's mostly about the dumps, yeah. spawning the futures and everything that is necessary. open to suggestions to rename the metric.

wait_produce is then joining on those futures

jan-auer · 2025-05-23T11:44:34Z

src/sentry/spans/consumers/process/flusher.py

-                    metrics.timing("spans.buffer.segment_size_bytes", len(kafka_payload.value))
-                    produce(kafka_payload)
+                            kafka_payload = KafkaPayload(
+                                None, rapidjson.dumps({"spans": spans}).encode("utf8"), []


While we're at it - most other code uses orjson. Can you double-check and update this, if it's valid?

ref(span-buffer/flusher): More metrics and more lenient backpressure.

9644300

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 23, 2025

vercel bot deployed to Preview May 23, 2025 10:57 View deployment

untitaker marked this pull request as ready for review May 23, 2025 10:59

untitaker requested review from a team as code owners May 23, 2025 10:59

jan-auer reviewed May 23, 2025

View reviewed changes

address review feedbakc

cb34e6e

vercel bot deployed to Preview May 23, 2025 11:55 View deployment

untitaker requested a review from jan-auer May 23, 2025 12:14

Merge branch 'master' into ref/flusher-backpressure-and-metrics

e210427

vercel bot deployed to Preview May 23, 2025 12:57 View deployment

jan-auer approved these changes May 23, 2025

View reviewed changes

untitaker merged commit ad8c5f5 into master May 23, 2025
61 checks passed

untitaker deleted the ref/flusher-backpressure-and-metrics branch May 23, 2025 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

Uh oh!

untitaker commented May 23, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 23, 2025 •

edited

Loading

Uh oh!

jan-auer May 23, 2025

Uh oh!

untitaker May 23, 2025

Uh oh!

jan-auer May 23, 2025

Uh oh!

untitaker May 23, 2025

Uh oh!

jan-auer May 23, 2025

Uh oh!

untitaker May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

ref(span-buffer/flusher): More metrics and more lenient backpressure. #92195

Uh oh!

Conversation

untitaker commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jan-auer May 23, 2025

Choose a reason for hiding this comment

Uh oh!

untitaker May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jan-auer May 23, 2025

Choose a reason for hiding this comment

Uh oh!

untitaker May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jan-auer May 23, 2025

Choose a reason for hiding this comment

Uh oh!

untitaker May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

untitaker commented May 23, 2025 •

edited

Loading

codecov bot commented May 23, 2025 •

edited

Loading