-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSTORE-1147] Online feature store notification system #343
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
4773f30
quick draft
bubriks a7fd729
mini fix
bubriks c08a7be
hsfs update notification topic name
bubriks c6116be
more draft changes
bubriks 00164e2
improvements
bubriks 4ecae21
fix file name
bubriks 7530561
small improvements
bubriks 10c13f0
small updates
bubriks 8f61337
fix feedback
bubriks e31023f
Merge remote-tracking branch 'upstream/main' into FSTORE-1147
SirOibaf e023886
Fix title
SirOibaf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+43.6 KB
docs/assets/images/guides/feature_group/create_online_enabled_feature_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.3 KB
docs/assets/images/guides/feature_group/edit_online_enabled_feature_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
description: Documentation on Change Data Capture for feature groups in Hopsworks. | ||
--- | ||
|
||
# Change Data Capture for feature groups | ||
|
||
## Introduction | ||
|
||
Changes to online-enabled feature groups can be captured by listening to events on specified topics. | ||
This optimizes the user experience by allowing users to proactively make predictions as soon as there is an update on the features. | ||
|
||
In this guide you will learn how to enable Change Data Capture (CDC) for online feature groups within Hopsworks, showing examples in HSFS APIs as well as the user interface. | ||
|
||
## Prerequisites | ||
|
||
Before you begin this guide we suggest you read the [Feature Group](../../../concepts/fs/feature_group/fg_overview.md) concept page to understand what a feature group is and how it fits in the ML pipeline. | ||
Subsequently [create a Kafka topic](../../projects/kafka/create_topic.md), this topic will be used for storing Change Data Capture events. | ||
|
||
## Using HSFS APIs | ||
|
||
### Create a Feature Group with Change Data Capture | ||
|
||
To enable Change Data Capture for an online-enabled feature group using the HSFS APIs you need to [create a feature group](./create.md) and set the `notification_topic_name` properties value to the previously created topic. | ||
|
||
=== "Python" | ||
|
||
```python | ||
fg = fs.create_feature_group( | ||
name="feature_group_name", | ||
version=feature_group_version, | ||
primary_key=feature_group_primary_keys, | ||
online_enabled=True, | ||
notification_topic_name="notification_topic_name") | ||
``` | ||
|
||
### Update Feature Group Change Data Capture topic | ||
|
||
The notification topic name can be changed after the creation of the feature group. | ||
By setting the `notification_topic_name` value to `None` or empty string notification will be disabled. | ||
With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups. | ||
|
||
=== "Python" | ||
|
||
```python | ||
fg.update_notification_topic_name( | ||
notification_topic_name="new_notification_topic_name") | ||
``` | ||
|
||
## Using UI | ||
|
||
### Create a Feature Group with Change Data Capture | ||
|
||
During the creation of the feature group enable online feature serving. | ||
When enabled you will be able to set the `CDC topic name` property. | ||
|
||
<p align="center"> | ||
<figure> | ||
<img src="../../../../assets/images/guides/feature_group/create_online_enabled_feature_group.png" alt="Create online enabled feature group"> | ||
</figure> | ||
</p> | ||
|
||
### Update Feeature Group with Change Data Capture topic | ||
|
||
The notification topic name can be changed after creation by editing the feature group. | ||
By setting the `CDC topic name` value to empty the notifications will be disabled. | ||
With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups. | ||
|
||
<p align="center"> | ||
<figure> | ||
<img src="../../../../assets/images/guides/feature_group/edit_online_enabled_feature_group.png" alt="Edit online enabled feature group"> | ||
</figure> | ||
</p> | ||
|
||
## Example of Change Data Capture event | ||
|
||
Once properly set up the online feature store service will produce events to the provided topic when data ingestion is completed for records. | ||
|
||
Here is an example output: | ||
|
||
``` | ||
{ | ||
"projectId":119, # project used for data ingestion | ||
"featureStoreId":67, # feature store where changes took place | ||
"featureGroupId":14, # feature group changed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noticed that we dont provide name/version for FG should we make an update to do it like we do for FV? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes we should add it. |
||
"entry":{ # values of the affected feature group entry | ||
"id":"15", | ||
"text":"test" | ||
}, | ||
"featureViews":[ # list of feature views affected | ||
{ | ||
"id":9, # id of the feature view | ||
"name":"test", # name of the feature view | ||
"version":1, # version of the feature view | ||
"featurestoreId":67 # feature store where feature view resides | ||
} | ||
] | ||
} | ||
``` | ||
|
||
The list of `featureViews` in the event could be outdated for up to 10 minutes, due to internal logging in onlinefs service. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in onlinefs it is set like this:
.expireAfterAccess(30, TimeUnit.MINUTES)
so user would not be able to change it (same for all cached entries), should we enable user changing this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it might be better to change them from
expireAfterAccess
(this includes last access, so if user constantly writes it might not expire ever) toexpireAfterWrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah It should be a configurable parameter.