Skip to content

Commit f67def9

Browse files
authored
chore: docs about max_request_size (#280)
Related: infinyon/fluvio#4195
1 parent 2309be5 commit f67def9

File tree

2 files changed

+85
-7
lines changed

2 files changed

+85
-7
lines changed

docs/fluvio/apis/overview.mdx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,14 @@ cluster.
6161
Once you've got a connection handler, you will want to create a producer for a
6262
given topic.
6363

64-
The producer could be created with the following configurations: `batch_size`, `compression`, `linger` and `partitioner`.
64+
The producer could be created with the following configurations: `max_request_size`, `batch_size`, `compression`, `linger` and `partitioner`.
6565

6666
These configurations control the behavior of the producer in the following way:
6767

68-
* `batch_size`: Maximum amount of bytes accumulated by the records before sending the batch. Defaults to 16384 bytes.
68+
* `max_request_size`: Maximum number of bytes that the producer can send in a single request. If the record is larger than the max request size, the producer drops the record and returns an error. Defaults to 1048576 bytes.
69+
* `batch_size`: Maximum number of bytes accumulated by the records before sending the batch. If the record is larger than the batch size, the producer will split the records and send them in multiple batches. Defaults to 16384 bytes.
6970
* `compression`: Compression algorithm used by the producer to compress each batch before sending to the SPU. Supported compression algorithms are `none`, `gzip`, `snappy` and `lz4`.
70-
* `linger`: Time to wait before sending messages to the server. Defaults to 100 ms.
71+
* `linger`: The maximum time to wait to accumulate records before sending the batch. Defaults to 100 ms.
7172
* `partitioner`: custom class/struct that assigns the partition to each record that needs to be send. Defaults to Siphash Round Robin partitioner.
7273

7374
### Sending

docs/fluvio/concepts/batching.mdx

Lines changed: 81 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,87 @@ title: "Batching"
66
Fluvio producers try to send records in batches to reduce the number of messages sent and improve throughput. Each producer has some configurations that can be set to improve performance for a specific use case. For instance, they can be used to reduce disk usage, reduce latency, or improve throughput.
77
As of today, batching behavior in Fluvio Producers can be modified with the following configurations:
88

9-
- `batch_size`: Indicates the maximum amount of bytes that can be accumulated in a batch.
10-
- `linger`: Time to wait before sending messages to the server. Defaults to 100 ms.
9+
- `max_request_size`: Indicates the maximum number of bytes that the producer can send in a single request. If the record is larger than the max request size, the producer will fail to send the record. Only the uncompressed size of the record is considered. Defaults to 1048576 bytes.
10+
- `batch_size`: Indicates the maximum number of bytes that can be accumulated in a batch. If the record is larger than the batch size, the producer will send the record in a single new batch. Only the uncompressed size of the record is considered. Defaults to 16384 bytes.
1111
- `compression`: Compression algorithm used by the producer to compress each batch before sending it to the SPU. Supported compression algorithms are none, gzip, snappy and lz4.
12+
- `linger`: Time to wait before sending batches to the server that have not reached maximum batch size. Defaults to 100 ms.
1213

13-
In general, each one of these configurations has a benefit and a potential drawback. For instance, with the compression algorithm, it is a trade-off between disk usage in the server and CPU usage in the producer and the consumer for compression and decompression. Typically, the compression ratio is improved when the payload is large, therefore a larger `batch_size` could be used to improve the compression ratio. A `linger` equals `0` means that each record is sent as soon as possible. A `linger` time larger than zero introduces latency but improves throughput.
1414

15-
The ideal parameters for the `batch_size`, `linger` and `compression` depend on your application needs.
15+
# Trade-offs and Considerations
16+
17+
Every configuration presents a mix of advantages and disadvantages:
18+
19+
- `max_request_size`: Allows the producer to send larger records, will improve throughput but drop packets that don't match criteria.
20+
- `batch_size`: Larger value can reduce the number of requests sent to the server, but will increase latency.
21+
- `compression`: Helps decrease storage size and improve networking throughput but will increase CPU usage and add latency.
22+
- `linger`: A value of 0 sends records immediately, minimizing latency but will reduce throughput. Higher values will introduce delay but improve throughput and network utilization.
23+
24+
The ideal parameters for the `max_request_size`, `batch_size`, `linger` and `compression` depend on your application needs.
25+
26+
# Example Scenarios
27+
28+
Create a topic and generate a large data file:
29+
30+
```bash
31+
fluvio topic create example-topic
32+
printf 'This is a sample line. ' | awk -v b=500000 '{while(length($0) < b) $0 = $0 $0}1' | cut -c1-500000 > large-data-file.txt
33+
```
34+
35+
### Max Request Size
36+
37+
`max_request_size` defines the maximum size of a message that can be sent by the producer. If a message exceeds this size, Fluvio will throw an error.
38+
39+
```bash
40+
fluvio produce example-topic --max-request-size 16384 --file large-data-file.txt --raw
41+
```
42+
43+
Will be displayed the following error:
44+
45+
```bash
46+
Error: Record dropped: record size (xyz bytes), exceeded maximum request size (16384 bytes)
47+
```
48+
49+
### Batch Size
50+
51+
`batch_size` defines the cumulative size of all records sent in the same batch. If a record exceeds this size, Fluvio will process the record in a new batch without the `batch_size` as limit.
52+
53+
```bash
54+
fluvio produce example-topic --batch-size 16536 --file large-data-file.txt --raw
55+
```
56+
57+
In this example, the record is divided into multiple batches. Hence, there is no error.
58+
59+
### Compression
60+
61+
The algorithm computes all values pre-compression. Use raw size values to ensure to ensure your records are processed.
62+
63+
`batch_size` and `max_request_size` will only use the uncompressed message size.
64+
65+
```bash
66+
fluvio produce example-topic --batch-size 16536 --compression gzip --file large-data-file.txt --raw
67+
fluvio produce example-topic --max-request-size 16384 --compression gzip --file large-data-file.txt --raw
68+
```
69+
70+
Only the second command will display an error because the uncompressed message exceeds the max request size.
71+
72+
73+
### Linger
74+
75+
`linger` defines the time that the producer will wait before sending a batch of records.
76+
77+
As linger is only relevant when the records are smaller than the batch size, in the following example, the records are sent without delay:
78+
79+
```bash
80+
fluvio produce example-topic --linger 10sec --file large-data-file.txt --raw
81+
```
82+
83+
In the following example, we are using small records and linger waits for the time-based trigger to produce:
84+
85+
```bash
86+
fluvio produce example-topic --linger 10sec
87+
> abc
88+
> abc
89+
> abc
90+
```
91+
92+
As all the records are small and the batch is not full, the producer will wait for the linger time to send the batch.

0 commit comments

Comments
 (0)