Skip to content

Commit 68e89f4

Browse files
authored
Sdf beta10 changes (#403)
Also added FAQ --------- Co-authored-by: Luis Moreno <morenol@users.noreply.github.com>
1 parent 601b40a commit 68e89f4

File tree

17 files changed

+91
-51
lines changed

17 files changed

+91
-51
lines changed

docs/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

sdf/SDF_VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
sdf-beta9
1+
sdf-beta10

sdf/_embeds/install-sdf.bash

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
fvm install sdf-beta9
1+
fvm install sdf-beta10

sdf/concepts/window-processing.mdx

+27-27
Original file line numberDiff line numberDiff line change
@@ -68,29 +68,8 @@ window:
6868

6969
Now, let's explore these components in more detail.
7070

71-
### `assign-timestamp` Operator
72-
The first crucial step in event-time window processing is to determine the event time of each incoming record. This is handled by the assign-timestamp operator. The event timestamp is vital because it dictates which window(s) a particular record belongs to.
73-
74-
You define a custom function (e.g., in Rust) that takes as input a type (that should match with previous configurations of the service) and its system event time (the time it was received by SDF) and returns the logical event timestamp (as an i64 Unix timestamp in milliseconds) to be used for windowing.
75-
76-
#### Example:
77-
78-
```YAML
79-
# Inside the 'window' block
80-
assign-timestamp:
81-
run: |
82-
fn assign_timestamp_fn(value: InputMessage, _event_time: i64) -> Result<i64> {
83-
// Logic to extract or assign a timestamp from the 'value'
84-
// For example, if 'value' has a 'timestamp' field:
85-
Ok(value.timestamp)
86-
}
87-
```
88-
89-
In this example, InputMessage is a defined type that includes a timestamp field. The function extracts this field to be used as the event time. If the actual event time from the source system is not present in the message, you might use _event_time (SDF's ingestion time) or implement logic to derive it.
90-
9171
### Tumbling Window
92-
93-
Tumbling windows are defined by a single parameter: `duration`. This specifies the fixed length of each window. Windows are contiguous and non-overlapping.
72+
Tumbling windows are defined by a single parameter: duration. This specifies the fixed length of each window. Windows are contiguous and non-overlapping.
9473

9574
#### Example:
9675
```YAML
@@ -120,14 +99,34 @@ Sliding windows are defined by two parameters:
12099

121100
Here, window.sliding.duration: 60s and window.sliding.slide: 10s mean that every 10 seconds, a new window is initiated, covering the last 60 seconds of data.
122101

102+
### `assign-timestamp` Operator
103+
The first crucial step in event-time window processing is to determine the event time of each incoming record. This is handled by the assign-timestamp operator. The event timestamp is vital because it dictates which window(s) a particular record belongs to.
104+
105+
You must define a custom function (e.g., in Rust) that takes as input a type (that should match with previous configurations of the service) and its system event time (the time it was received by SDF) and returns the logical event timestamp (as an i64 Unix timestamp in milliseconds) to be used for windowing.
106+
107+
#### Example:
108+
109+
```YAML
110+
# Inside the 'window' block
111+
assign-timestamp:
112+
run: |
113+
fn assign_timestamp_fn(value: InputMessage, _event_time: i64) -> Result<i64> {
114+
// Logic to extract or assign a timestamp from the 'value'
115+
// For example, if 'value' has a 'timestamp' field:
116+
Ok(value.timestamp)
117+
}
118+
```
119+
120+
In this example, InputMessage is a defined type that includes a timestamp field. The function extracts this field to be used as the event time. If the actual event time from the source system is not present in the message, you might use _event_time (SDF's ingestion time) or implement logic to derive it.
121+
123122
### Watermark
124123
125124
Watermarks are a fundamental concept in stream processing, representing the point in event time up to which SDF assumes all data has been received for a given partition. They are crucial for triggering window computations (i.e., deciding when a window is "closed") and handling late-arriving data. SDF advances watermarks based on the timestamps assigned by assign-timestamp.
126125
127126
The watermark configuration is nested within the window block.
128127
129128
- `grace-period`: The grace-period (also known as allowed lateness) defines an additional amount of time after a window's end time (as determined by the watermark) during which late-arriving records are still accepted and processed for that window. Records whose timestamps fall within the window's original range but arrive after the watermark has passed the window's end (but before the grace period expires) can still be included.
130-
- `idleness`: The idleness configuration helps in scenarios where there are not new events for an extended period of time. If the service is idle for the specified duration (i.e., no new messages arrive), SDF can advance the watermark and therefore closing a window, preventing the overall system watermark from being stalled.
129+
- `idleness`: The idleness configuration helps in scenarios where there are not new events for an extended period of time. If the service is idle for the specified duration (i.e., no new messages arrive), SDF can advance the watermark and therefore closing a window, preventing the overall system watermark from being stalled. it's generally recommended that the idleness timeout is larger than the duration of the window to avoid premature window closure.
131130

132131
#### Example:
133132

@@ -137,24 +136,25 @@ The watermark configuration is nested within the window block.
137136
tumbling:
138137
duration: 10s
139138
watermark:
140-
idleness: 12s # Consider a partition idle after 10s of no data
139+
idleness: 12s # Consider a partition idle after 12s of no data
141140
grace-period: 3s # Allow data to be 3s late for this window
142141
```
143142
144143
In this setup, for a 10-second tumbling window (e.g., for event times 00:00:00 to 00:00:10), even if the watermark has advanced past 00:00:10, records with timestamps within this [0, 10s) interval can still be included if they arrive before the watermark passes 00:00:13 (window_end + grace_period).
145144
Also, If SDF hasn't received any data for 12 seconds, then the current watermark is advanced and that triggers closing of current window and creation of a new one.
146145
147-
### Core Window Operations
146+
### Inner Window operators
148147
Within the window block, several operators define how data is processed once its timestamp is assigned and it's notionally part of one or more window instances:
149148
150149
#### transforms (within window context):
151-
As shown in the overview , transforms can be defined as a direct child of the window block. These transformations (e.g., map, flat-map, filter) are applied to each record after its timestamp has been assigned by assign-timestamp but before it is passed to the partition operator. This allows for data shaping, enrichment, or filtering specific to the windowing logic before key-based operations commence.
150+
As shown in the overview , transforms can be defined as a direct child of the window block. These transformations (e.g., map, flat-map, filter, filter-map) are applied to each record after its timestamp has been assigned by assign-timestamp but before it is passed to the partition operator. This allows for data shaping, enrichment, or filtering specific to the windowing logic before key-based operations commence.
152151
#### partition
153152
Windowed operations are typically stateful and performed on a per-key basis. The partition operator defines how incoming records (which have already been timestamped and possibly transformed) are grouped by a key, and how state associated with that key is updated for each record.
154153
- assign-key: This sub-operator contains a function that takes a record and returns a key. All records sharing the same key will be processed together within their respective window instances, allowing for stateful computations per key.
155154
- transforms: Group of operations to run before update-state step, can be any of the SDF transforms operators (filter, map, filter-map, flat-map)
156155
- update-state: This sub-operator contains a function that is executed for each record within its assigned partition (i.e., for its key). It typically interacts with a keyed state (defined in the states section of the service) to accumulate information or perform calculations.
157-
#### flush Operator
156+
157+
### flush Operator
158158
The flush operator defines the action to be taken when a window instance closes for a particular key (triggered by the watermark advancing past the window's end plus any grace period). This is typically where final aggregations are computed from the accumulated state, and the results for that window instance are emitted. The flush function has access to the state that was built up by update-state operations for the specific key and window that is closing.
159159
160160
##### Example:

sdf/faq.mdx

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: FAQ
3+
description: Frequently Asked Questions
4+
sidebar_position: 90
5+
---
6+
7+
### What is Stateful Dataflows (SDF)?
8+
9+
Stateful Dataflows (SDF) is an intuitive platform designed to help developers build, troubleshoot, and run full-featured event-driven pipelines, also known as dataflows.
10+
11+
### How does SDF relate to Fluvio?
12+
13+
SDF leverages Fluvio topics as the mechanism for receiving input and publishing the output of its data processing pipelines.
14+
15+
### Is prior experience with Fluvio required to use SDF effectively?
16+
17+
No, while SDF utilizes Fluvio for message transport, direct interaction with Fluvio by SDF developers is minimal. You don't need in-depth Fluvio expertise to begin using SDF.
18+
19+
### What programming languages are supported for writing custom logic in SDF?
20+
21+
Currently, our tooling exclusively supports Rust. We may explore the implementation of support for other languages in the future.
22+
23+
### How does SDF ensure low latency of dataflows?
24+
25+
SDF achieves low latency by leveraging the performance characteristics of Rust and WebAssembly (WASM), the compilation target for custom logic within the dataflows.
26+
27+
## Can I perform HTTP(S) requests from operators?
28+
29+
Yes, operators can perform HTTP(S) requests by utilizing the `sdf-http-guest` crate available at https://github.com/infinyon/sdf-http-guest.
30+
31+
### How does SDF handle data serialization and deserialization?
32+
33+
SDF automatically generates serialization and deserialization logic based on the topic configuration defined for the dataflows. Each topic must specify the associated data type and whether the format is JSON or raw (which is only permitted for native data types).
34+
35+
### What are SDF packages?
36+
37+
An SDF package is a modular unit used to define reusable data types and/or functions that can then be incorporated into dataflows. Utilizing packages is the recommended and most effective way to add custom logic to your dataflows.
38+
39+
### Can I change a namespace in a package after I have implemented the code?
40+
41+
Yes, you can change the namespace of a package after implementation. However, this is a potentially breaking change. If the namespace is modified, any code that imports types or functions from this package will likely need to be updated to reflect the new namespace in their import statements.

sdf/index.mdx

-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@ Stateful Dataflows are an extension of [Fluvio], leveraging the Fluvio infrastru
3535
Fluvio [connectors] serve as the interface to external ingress and egress services. Fluvio publishes a growing library of connectors, such as HTTP, Kafka, NATs, SQL, S3, etc. Connectors are easy to build, test, deploy, and share with the community.
3636

3737

38-
3938
## Next Step
4039

4140
In then [Let's Get Started] section, we'll walk through the steps to get started with Stateful Dataflows.

sdf/whatsnew.mdx

+10-10
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar_position: 20
77
import CodeBlock from '@theme/CodeBlock';
88
import InstallFvm from '!!raw-loader!./_embeds/install-sdf.bash';
99

10-
# Beta 9
10+
# Beta 10
1111

1212
## Upgrading
1313

@@ -24,19 +24,19 @@ $ sdf worker create <host-worker-name>
2424

2525
For upgrading Infinyon Cloud (remote) workers, please contact [InfinyOn support](#infinyon-support).
2626

27-
### CLI changes
2827

29-
- SDF interactive shell not longer exits on ctrl-c
30-
- Introduced the `--clear-all` and `--clear-state` options to the dataflow restart command, allowing users to clear the dataflow state before restarting.
28+
### Improvements
3129

32-
### New features
30+
- Improve error output for some of the yaml parsing errors.
31+
- Do not require `type-name` in list type
32+
- Avoid generating invalid WIT names
33+
- Use dev runtime by default on `sdf test`
34+
- Add validation to check if pkg namespace/name conflicts with df
35+
- Improve State recovery on window services on restart
3336

34-
- Added capability to schedule events using CRON format.
35-
- Added grace period configuration to window watermark, in order to support allowing out of order events.
37+
### Changes
3638

37-
### Logging and Debugging Enhancements
38-
39-
- Enhanced the sdf log -v command to include record offset and partition information.
39+
- Make types respect casing defined on yaml for deserialization and serialization when using `apiVersion: 0.6.0`
4040

4141
## InfinyOn Support
4242

versioned_docs/version-0.11.11/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.11.12/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.12.0/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.13.0/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.14.0/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.14.1/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.15.0/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.15.1/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.16.1/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

versioned_docs/version-0.17.0/fluvio/installation/docker.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ With the profile set, you are now able to perform Fluvio Client operations
118118
like listing topics:
119119

120120
```bash
121-
fluvio topics list
121+
fluvio topic list
122122
```
123123

124124
## Teardown

0 commit comments

Comments
 (0)