Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-1147] Online feature store notification system #343

Merged
merged 11 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
100 changes: 100 additions & 0 deletions docs/user_guides/fs/feature_group/notification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
description: Documentation on Change Data Capture for feature groups in Hopsworks.
---

# Change Data Capture for a Feature Group

## Introduction

Changes to online-enabled feature groups can be captured by listening to events on specified topics.
This optimizes the user experience by allowing users to proactively make predictions as soon as there is an update on the features.

In this guide you will learn how to enable Change Data Capture (CDC) for online feature groups within Hopsworks, showing examples in HSFS APIs as well as the user interface.

## Prerequisites

Before you begin this guide we suggest you read the [Feature Group](../../../concepts/fs/feature_group/fg_overview.md) concept page to understand what a feature group is and how it fits in the ML pipeline.
Subsequently [create a Kafka topic](../../projects/kafka/create_topic.md), this topic will be used for storing Change Data Capture events.

## Using HSFS APIs

### Create a Feature Group with Change Data Capture

To enable Change Data Capture for an online-enabled feature group using the HSFS APIs you need to [create a feature group](./create.md) and set the `notification_topic_name` properties value to the previously created topic.

=== "Python"

```python
fg = fs.create_feature_group(
name="feature_group_name",
version=feature_group_version,
primary_key=feature_group_primary_keys,
online_enabled=True,
notification_topic_name="notification_topic_name")
```

### Update Feature Group Change Data Capture topic

The notification topic name can be changed after the creation of the feature group.
By setting the `notification_topic_name` value to `None` or empty string notification will be disabled.
With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in onlinefs it is set like this: .expireAfterAccess(30, TimeUnit.MINUTES)
so user would not be able to change it (same for all cached entries), should we enable user changing this value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it might be better to change them from expireAfterAccess (this includes last access, so if user constantly writes it might not expire ever) to expireAfterWrite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah It should be a configurable parameter.


=== "Python"

```python
fg.update_notification_topic_name(
notification_topic_name="new_notification_topic_name")
```

## Using UI

### Create a Feature Group with Change Data Capture

During the creation of the feature group enable online feature serving.
When enabled you will be able to set the `CDC topic name` property.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/create_online_enabled_feature_group.png" alt="Create online enabled feature group">
</figure>
</p>

### Update Feeature Group with Change Data Capture topic

The notification topic name can be changed after creation by editing the feature group.
By setting the `CDC topic name` value to empty the notifications will be disabled.
With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/edit_online_enabled_feature_group.png" alt="Edit online enabled feature group">
</figure>
</p>

## Example of Change Data Capture event

Once properly set up the online feature store service will produce events to the provided topic when data ingestion is completed for records.

Here is an example output:

```
{
"projectId":119, # project used for data ingestion
"featureStoreId":67, # feature store where changes took place
"featureGroupId":14, # feature group changed
Copy link
Contributor Author

@bubriks bubriks Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed that we dont provide name/version for FG should we make an update to do it like we do for FV?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should add it.

"entry":{ # values of the affected feature group entry
"id":"15",
"text":"test"
},
"featureViews":[ # list of feature views affected
{
"id":9, # id of the feature view
"name":"test", # name of the feature view
"version":1, # version of the feature view
"featurestoreId":67 # feature store where feature view resides
}
]
}
```

The list of `featureViews` in the event could be outdated for up to 10 minutes, due to internal logging in onlinefs service.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ nav:
- Create: user_guides/fs/feature_group/create.md
- Create External: user_guides/fs/feature_group/create_external.md
- Deprecating: user_guides/fs/feature_group/deprecation.md
- Notification: user_guides/fs/feature_group/notification.md
- Data Types and Schema management: user_guides/fs/feature_group/data_types.md
- Statistics: user_guides/fs/feature_group/statistics.md
- Data Validation: user_guides/fs/feature_group/data_validation.md
Expand Down
Loading