-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(profiles): rollout data compression #92133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,32 +18,39 @@ def process_message(message: Message[KafkaPayload]) -> None: | |
sampled = is_sampled(message.payload.headers) | ||
|
||
if sampled or options.get("profiling.profile_metrics.unsampled_profiles.enabled"): | ||
b64encoded = b64encode(message.payload.value).decode("utf-8") | ||
process_profile_task.delay(payload=b64encoded, sampled=sampled, compressed_profile=False) | ||
b64encoded_uncompressed = b64encode(message.payload.value).decode("utf-8") | ||
|
||
if random.random() < options.get("taskworker.try_compress.profile_metrics"): | ||
if random.random() < options.get("taskworker.try_compress.profile_metrics.rollout"): | ||
import time | ||
import zlib | ||
|
||
metrics.distribution("profiling.profile_metrics.uncompressed_bytes", len(b64encoded)) | ||
metrics.distribution( | ||
"profiling.profile_metrics.uncompressed_bytes", len(b64encoded_uncompressed) | ||
) | ||
|
||
start_time = time.perf_counter() | ||
b64encoded_compressed = b64encode( | ||
zlib.compress( | ||
message.payload.value, | ||
Comment on lines
+32
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry for the drive-by review after the fact: Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred? It is just really weird that this end up being base64 inflates the payload size by 33% by definition, and its a bit wasteful to do it twice even. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
zlib was in stdlib, and I didn't know we already had the necessary dependencies for zstandard.
The current implementation doesn't have a double base64 encode. We do base64 twice so that we can measure the effects of compression, but the task payload should only be base64 encoded once after compression.
Yes, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for clarifying. I got confused by the double-base64, re-reading this again, its clear there is no double-encoding going on :-D |
||
level=options.get("taskworker.try_compress.profile_metrics.level"), | ||
) | ||
) | ||
metrics.distribution( | ||
"profiling.profile_metrics.compressed_bytes", | ||
len( | ||
b64encode( | ||
zlib.compress( | ||
message.payload.value, | ||
level=options.get("taskworker.try_compress.profile_metrics.level"), | ||
) | ||
) | ||
), | ||
len(b64encoded_compressed), | ||
) | ||
end_time = time.perf_counter() | ||
metrics.distribution( | ||
"profiling.profile_metrics.compression_time", | ||
end_time - start_time, | ||
) | ||
process_profile_task.delay( | ||
payload=b64encoded_compressed, sampled=sampled, compressed_profile=True | ||
) | ||
else: | ||
process_profile_task.delay( | ||
payload=b64encoded_uncompressed, sampled=sampled, compressed_profile=False | ||
) | ||
|
||
|
||
class ProcessProfileStrategyFactory(ProcessingStrategyFactory[KafkaPayload]): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this, but in the future you can use
sentry.options.rollout.in_random_rollout()
for these kinds of checks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's good to know, thank you!