Skip to content

V0.8.1 unhandled S3 SlowDown error #5448

Closed
@tuziben

Description

@tuziben

Describe the bug
A clear and concise description of what the bug is.

version: V0.8.1

As the doc from AWS said: your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

error log from qw indexer

storage error(kind=Service, source=service error: unhandled error: unhandled error: Error { code: "SlowDown", message: "Please reduce your request rate.", s3_extended_request_id: "*******+QcTMf+==", aws_request_id: "***********" } 

(ServiceError(ServiceError { source: Unhandled(Unhandled { source: ErrorMetadata { code: Some("SlowDown"), message: Some("Please reduce your request rate."), extras: Some({"s3_extended_request_id": "*********************+QcTMf+==", "aws_request_id": "**************"}) }, meta: ErrorMetadata { code: Some("SlowDown"), 

message: Some("Please reduce your request rate."), extras: Some({"s3_extended_request_id": "**************+QcTMf+**************==", "aws_request_id": "**************"}) } }), 

raw: Response { inner: Response { status: 503, version: HTTP/1.1, headers: {"x-amz-request-id": "**************", "x-amz-id-2": "**************+QcTMf+**************==", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Tue, 24 Sep 2024 15:12:39 GMT", "server": "AmazonS3", "connection": "close"}, body: SdkBody { inner: Once(Some(b"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>**************</RequestId><HostId>**************+QcTMf+**************==</HostId></Error>")), retryable: true } }, 

properties: SharedPropertyBag(Mutex { data: PropertyBag { contents: ["aws_types::SigningService", "alloc::vec::Vec<http::version::Version>", "aws_smithy_http::operation::Metadata", "aws_smithy_http::connection::CaptureSmithyConnection", "aws_credential_types::credentials_impl::Credentials", "aws_http::user_agent::AwsUserAgent", "aws_sig_auth::signer::OperationSigningConfig", "aws_types::region::Region", "aws_smithy_types::endpoint::Endpoint", "aws_sig_auth::middleware::Signature", "aws_credential_types::cache::SharedCredentialsCache", "aws_sdk_s3::endpoint::Params", "aws_types::region::SigningRegion"] }, poisoned: false, .. }) } })))

After this error occurred, the Quickwit cluster became very unstable. Kafka consumption kept rebalancing continuously and impossible to perform the merge operation.

How to fix? According to AWS's recommendation, S3 prefixes need to be subdivided to improve performance.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions