Skip to content

Commit ecef3e8

Browse files
authored
chore: start beta5 updateS (#312)
1 parent da37cd2 commit ecef3e8

File tree

7 files changed

+95
-39
lines changed

7 files changed

+95
-39
lines changed

sdf/SDF_VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
sdf-beta4
1+
sdf-beta6

sdf/_embeds/install-sdf.bash

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
fvm install sdf-beta4
1+
fvm install sdf-beta5

sdf/cli/deploy.mdx

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -56,17 +56,17 @@ SDF - Stateful Dataflow
5656
Usage: <COMMAND>
5757

5858
Commands:
59-
show Show or List states. Use `show state --help` for more info
60-
select
61-
delete
62-
restart
63-
stop
59+
show Show or List states or dataflows Use `show --help` for more info
60+
select Select dataflow in context
61+
delete Delete a dataflow
62+
restart Restart a dataflow
63+
stop Stop a dataflow
64+
sql Start sql mode
6465
exit Stop interactive session
6566
help Print this message or the help of the given subcommand(s)
6667

6768
```
6869
69-
7070
#### `show state`
7171
7272
Show states or show state for given namespace and key.
@@ -88,9 +88,13 @@ Options:
8888
Where:
8989
* `--key` and `--filter` refines the result.
9090
91+
#### SQL mode
92+
93+
Use the SQL mode in the CLI, to be able to run SQL queries on SDF states. See more details in [sql mode for sdf run]
94+
9195
### Managing dataflow in interactive shell
9296
9397
Please see the [deployment] section for more details.
9498
95-
96-
[deployment]: /sdf/deployment
99+
[deployment]: /sdf/deployment
100+
[sql mode for sdf run]: /sdf/cli/run.mdx#sql-mode

sdf/cli/run.mdx

Lines changed: 58 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,8 @@ Options:
3535
when set, it will skip running the service
3636
--build-profile <BUILD_PROFILE>
3737
[default: release]
38-
--dev
39-
set runtime to use dev mode [env: DEV=]
4038
--prod
41-
set runtime to use production mode [env: PROD=]
39+
set runtime to use production mode this will disable dev configurations [env: PROD=]
4240
--force-update
4341
Force update
4442
```
@@ -49,7 +47,6 @@ Where:
4947
* `--env` sets environment variables to be passed to operators
5048
* `--skip-running` - compiles components and exists without running the dataflow
5149
* `--build-profile` - sets the build profile
52-
* `--dev` - sets runtime to apply dev specific parameters
5350
* `--prod` - sets runtime to apply prod specific parameters
5451
* `--force-update` - forces the update of the project dependencies
5552

@@ -68,10 +65,11 @@ Usage: <COMMAND>
6865

6966
Commands:
7067
show Show or List states. Use `show state --help` for more info
68+
sql Start sql mode
7169
exit Stop interactive session
70+
help Print this message or the help of the given subcommand(s)
7271
```
7372

74-
7573
#### `show state`
7674

7775
Show states or show state for given namespace and key.
@@ -95,7 +93,6 @@ Where:
9593

9694
#### Examples
9795

98-
9996
##### Run command
10097

10198
Navigate to the directory with `dataflow.yaml` file, and run the command:
@@ -128,3 +125,58 @@ Show the detailed information:
128125
Key Window succeeded failed
129126
stats * 2 0
130127
```
128+
129+
#### SQL mode
130+
131+
Use the SQL mode in the CLI, to be able to run SQL queries on SDF states. For a given dataflow, we will have in context for SQL all the dataframe states, which are basically the states with an `arrow-row` value.
132+
133+
For states that are scoped to a window, we will have access to the last flush state. For states that are not window aware we will have access to the global state.
134+
135+
In order to enter the SQL mode, type `sql` in the SDF interactive shell. In the SQL mode we could perform any sql command supported by the polars engine.
136+
137+
#### Examples:
138+
139+
##### Run command
140+
141+
Navigate to the directory with `dataflow.yaml` file, and run the command:
142+
143+
```bash
144+
$ sdf run
145+
```
146+
147+
##### Enter the SQL mode
148+
149+
Using the sql command:
150+
151+
```bash
152+
>> sql
153+
SDF SQL version sdf-beta5
154+
Type .help for help.
155+
```
156+
157+
#### Show tables in context
158+
```bash
159+
sql>> show tables
160+
shape: (1, 1)
161+
┌────────────────┐
162+
│ name │
163+
│ --- │
164+
│ str │
165+
╞════════════════╡
166+
│ count_per_word │
167+
└────────────────┘
168+
```
169+
170+
#### Perform a query
171+
172+
```bash
173+
sql>> select * from count_per_word;
174+
shape: (0, 2)
175+
┌──────┬─────────────┐
176+
│ _key ┆ occurrences │
177+
│ --- ┆ --- │
178+
│ str ┆ u32 │
179+
╞══════╪═════════════╡
180+
│ abc │ 10 |
181+
└──────┴─────────────┘
182+
```

sdf/concepts/dataflow-yaml.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ To develop package from start:
263263

264264
* Create a local package
265265
* Add `dev` section to the `dataflow.yaml` file to locate the local package.
266-
* Run the dataflow with the `--dev` flag to load the local package instead of downloading them from the Hub.
266+
* Run the dataflow without the `--prod` flag to load the local package instead of downloading them from the Hub.
267267
* Repeat the process until the package is ready for publishing.
268268
* Then publish the package to the Hub.
269269

sdf/concepts/state-dataframe.mdx

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Then this will be mapped to arrow dataframe as follows:
3535
| banana | 2 |
3636
| grape | 1 |
3737

38+
## Updating a Dataframe state
3839

3940
To update the state, you can use the `update-state` operator as below:
4041

@@ -55,15 +56,16 @@ This API is invoked by the `update-state` operator, which only returns the value
5556

5657
In the example, `count_per_word` represents a row value of the dataframe. If operator sees `apple`, it will be first row in the dataframe above.
5758

58-
However, aggregate operators like `flush` can access the entire state and perform aggregation across all partitions. In this case, the `count_per_word` state function returns the entire DataFrame, not just individual rows. You can then perform DataFrame operations using the SQL API. The snippet below shows how to use SQL to get the 3 most frequent words.
59+
## SQL function
60+
61+
Aggregate operators like `flush`, or external services that reference a state can perform SQL queries on the aggregated data of all partitions of a state. In order to do that, is introduced a function `sql` to the context. The `sql` state function performs the SQL operation passed as parameter on the aggregated view of the states and not in their individual rows. The snippet below shows how to use SQL to get the 3 most frequent words.
5962

6063
```yaml
6164
flush:
6265
run: |
6366
fn aggregate_wordcount() -> Result<TopWords> {
64-
let word_counts = count_per_word();
6567
66-
let top3 = word_counts.sql("select * from count_per_word order by count desc limit 3")?;
68+
let top3 = sql("select * from count_per_word order by count desc limit 3")?;
6769
let rows = top3.rows()?;
6870
6971
let mut top_words = vec![];
@@ -81,15 +83,18 @@ flush:
8183
}
8284
```
8385

86+
The output of the `sql` function implements also the following methods that will be described above: sql, rows, col, key, next
87+
8488
## SQL API
8589

8690
For any state that is dataframe, you can use SQL API to perform dataframe operation. SDF uses polar SQL to perform dataframe operation.
8791
The result of the SQL operation is always dataframe. So you can perform multiple SQL operation to get the desired result.
8892

89-
The SQL is executed in the context of the dataframe. And name of the dataframe is state as illustrated below:
93+
The SQL is executed in the context of all the available dataframes, so you can perform any JOIN or complex SQL operations with them. Each dataframe is represented as a table, and each table name is their state name replacing hyphens(-) with underscores(_) as illustrated below.
94+
9095

9196
```rust
92-
let top3 = word_counts.sql("select * from count_per_word order by count desc limit 3")?;
97+
let top3 = sql("select * from count_per_word order by count desc limit 3")?;
9398
```
9499

95100
## Row API

sdf/whatsnew.mdx

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,6 @@ To upgrade CLI to the beta4, run the following command:
1515

1616
<CodeBlock language="bash">{InstallFvm}</CodeBlock>
1717

18-
Make sure that [`wasm32-wasip2`](https://doc.rust-lang.org/rustc/platform-support/wasm32-wasip2.html#wasm32-wasip2) target is installed. Typically, can be installed via:
19-
20-
```bash
21-
$ rustup target add wasm32-wasip2
22-
```
23-
2418
To upgrade host workers, shutdown and restart the worker:
2519

2620
```bash
@@ -30,26 +24,27 @@ $ sdf worker create <host-worker-name>
3024

3125
For upgrading cloud workers, please contact [InfinyOn support](#infinyon-support).
3226

33-
### Compatibility and Breaking changes
27+
## Featured change
3428

29+
- Added [sql mode] to interactive shell. With this change, user should be able to run SQL queries (including JOINS) in states of dataflow. The states that support the queries are the [dataframe states]. In particular, when the state has a window context, the queries are againts the last flushed state.
3530

3631
### CLI changes
37-
- `sdf setup` now checks that Fluvio is running and that we can connect to it.
3832

39-
### Changes
40-
- renamed [configuration used to connect to remote clusters] from `profile` to `remote_cluster_profile`.
41-
- updated to use `wasm32-wasip2` target for building wasm modules.
33+
- `sdf run` not longer accepts `--dev`. Develoment mode is now the default for `sdf run`. If you want to run in non-development mode use `--prod`.
34+
35+
### Improvements
4236

43-
## Improvements
44-
- Support definition of [nested types].
37+
- Added capability to run complex queries like join on states in operator context through the [sql function].
4538
- Performance improvements.
39+
- Improved error messages when nested types definitions are wrong.
4640

47-
## Bug Fixes
48-
- When using windows, events with an older timestamp are skipped now.
41+
### Changes
42+
- Replaced dashes in tables. Previously, when the state name has dashes in it, we were escaping the state name in sql context with quotes. From sdf-beta5, we should access them using `_` instead of `-` on the table name in order to avoid the escaping.
4943

5044
## InfinyOn Support
5145

5246
For any questions or issues, please contact InfinyOn support at support@infinyon.com or https://discordapp.com/invite/bBG2dTz
5347

54-
[configuration used to connect to remote clusters]: concepts/dataflow-yaml.mdx#topics
55-
[nested types]: concepts/types.mdx#nested-types
48+
[sql mode]: cli/run.mdx#sql-mode
49+
[sql function]: concepts/state-dataframe.mdx#sql-function
50+
[dataframe states]: concepts/state-dataframe.mdx

0 commit comments

Comments
 (0)