Description
Expected Behavior
An application should be able to send significantly more than 10MB/s to SQS per vCPU. Ideally, an application should be able to send well over 100MB/s per vCPU.
Current Behavior
We are finding that sending about 130MB/s of messages to SQS is consuming about 15 vCPUs. We are finding a lot of CPU time being spent by the GC because we are finding that there are over 5GB of allocations for each 100MB of messages sent.
We also find that this issue is proportional to total/aggregate payload size, not number of messages. That is, if we send much less data spread over many more messages, the CPU load is significantly less.
Possible Solution
Simplify/streamline the .NET SQS client so it is performance-optimised. Minimise allocations, reducing GC pressure.
Steps to Reproduce (for bugs)
Just create a simple application that uses the SQS client to concurrently send a large number of large messages, such that over 100MB/s of messages are being sent. It uses about 15 vCPUs on .NET Core 3.1 on Linux. The performance is even worse on Windows.
Context
Our application produces a massive amount of data that needs to be sent through SQS, and at 15 vCPUs per 100MB, we find that a lot of our compute costs are coming from the .NET SQS client.
Your Environment
- AWSSDK.SQS version used: 3.3.102.104
- Operating System and version: Ubuntu 18.04
- Visual Studio version: Visual Studio 2019 (16.5)
- Targeted .NET platform: .NET Core 3.1
.NET Core Info
- .NET Core version used for development: .NET Core 3.1
- .NET Core version installed in the environment where application runs: .NET Core 3.1