You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+63-12Lines changed: 63 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -8,18 +8,69 @@ A common use case for `@shutterstock/p-map-iterable` is as a "prefetcher" that w
8
8
9
9
These classes will typically be helpful in batch or queue consumers, not as much in request/response services.
10
10
11
-
# Example Usage Scenario
12
-
13
-
- AWS Lambda function for Kafka or Kinesis stream processing
14
-
- Each invocation has a batch of, say, 100 records
15
-
- Each record points to a file on S3 that needs to be parsed, processed, then written back to S3
16
-
- The average file size is 50 MB, so these will take a few seconds to fetch and write
17
-
- The Lambda function has 1,769 MB of RAM which gives it 1 vCPU, allowing the JS thread to run at roughly 100% speed of 1 core
18
-
- If the code never waits to fetch or write files then the Lambda function will be able to use 100% of the paid-for CPU time
19
-
- If there is no back pressure then reading 100 * 50 MB files would fill up the default Lambda temp disk space of 512 MB OR would consume 5 GB of memory, which would cause the Lambda function to fail
20
-
- We can use `IterableMapper` to fetch up to 5 of the next files while the current file is being processed, then pause until the current file is processed and the next file is consumed
21
-
- We can also use `IterableQueueMapper.enqueue()` to write the files back to S3 without waiting for them to finish, unless we get more than, say, 3-4 files being uploaded at once, at which point we can pause until the current file is uploaded before we allow queuing another file for upload
22
-
- In the rare case that 3-4 files are uploading, but not yet finished, we would block on `.enqueue()` and not consume much CPU while waiting for at least 1 upload to finish
11
+
# Example Usage Scenarios
12
+
13
+
Consider a typical processing loop without IterableMapper:
14
+
15
+
```typescript
16
+
const source =newSomeSource();
17
+
const sourceIds = [1, 2,...1000];
18
+
const sink =newSomeSink();
19
+
for (const sourceId ofsourceIds) {
20
+
const item =awaitsource.read(sourceId); // takes 300 ms of I/O wait, no CPU
21
+
const outputItem =doSomeOperation(item); // takes 20 ms of CPU
22
+
awaitsink.write(outputItem); // takes 500 ms of I/O wait, no CPU
23
+
}
24
+
```
25
+
26
+
Each iteration takes 820ms total, but we waste time waiting for I/O. We could prefetch the next read (300ms) while processing (20ms) and writing (500ms), without changing the order of reads or writes.
27
+
28
+
Using IterableMapper as a prefetcher:
29
+
30
+
```typescript
31
+
const source =newSomeSource();
32
+
const sourceIds = [1, 2,...1000];
33
+
// Pre-reads up to 8 items serially and releases in sequential order
const outputItem =doSomeOperation(item); // takes 20 ms of CPU
41
+
awaitsink.write(outputItem); // takes 500 ms of I/O wait, no CPU
42
+
}
43
+
```
44
+
45
+
This reduces iteration time to 520ms by overlapping reads with processing/writing.
46
+
47
+
For maximum throughput, make the writes concurrent with IterableQueueMapper (to iterate results with backpressure when too many unread items) or IterableQueueMapperSimple (to handle errors at end without custom iteration or backpressure):
const outputItem =doSomeOperation(item); // takes 20 ms of CPU
63
+
awaitflusher.enqueue(outputItem); // usually takes no time
64
+
}
65
+
// Wait for all writes to complete
66
+
awaitflusher.onIdle();
67
+
// Check for errors
68
+
if (flusher.errors.length>0) {
69
+
// ...
70
+
}
71
+
```
72
+
73
+
This reduces iteration time to about 20ms by overlapping reads and writes with the CPU processing step. In this contrived (but common) example we would get a 41x improvement in throughput, removing 97.5% of the time to process each item and fully utilizing the CPU time available in the JS event loop.
0 commit comments