|
| 1 | +--- |
| 2 | +description: Documentation on Change Data Capture for feature groups in Hopsworks. |
| 3 | +--- |
| 4 | + |
| 5 | +# Change Data Capture for feature groups |
| 6 | + |
| 7 | +## Introduction |
| 8 | + |
| 9 | +Changes to online-enabled feature groups can be captured by listening to events on specified topics. |
| 10 | +This optimizes the user experience by allowing users to proactively make predictions as soon as there is an update on the features. |
| 11 | + |
| 12 | +In this guide you will learn how to enable Change Data Capture (CDC) for online feature groups within Hopsworks, showing examples in HSFS APIs as well as the user interface. |
| 13 | + |
| 14 | +## Prerequisites |
| 15 | + |
| 16 | +Before you begin this guide we suggest you read the [Feature Group](../../../concepts/fs/feature_group/fg_overview.md) concept page to understand what a feature group is and how it fits in the ML pipeline. |
| 17 | +Subsequently [create a Kafka topic](../../projects/kafka/create_topic.md), this topic will be used for storing Change Data Capture events. |
| 18 | + |
| 19 | +## Using HSFS APIs |
| 20 | + |
| 21 | +### Create a Feature Group with Change Data Capture |
| 22 | + |
| 23 | +To enable Change Data Capture for an online-enabled feature group using the HSFS APIs you need to [create a feature group](./create.md) and set the `notification_topic_name` properties value to the previously created topic. |
| 24 | + |
| 25 | +=== "Python" |
| 26 | + |
| 27 | + ```python |
| 28 | + fg = fs.create_feature_group( |
| 29 | + name="feature_group_name", |
| 30 | + version=feature_group_version, |
| 31 | + primary_key=feature_group_primary_keys, |
| 32 | + online_enabled=True, |
| 33 | + notification_topic_name="notification_topic_name") |
| 34 | + ``` |
| 35 | + |
| 36 | +### Update Feature Group Change Data Capture topic |
| 37 | + |
| 38 | +The notification topic name can be changed after the creation of the feature group. |
| 39 | +By setting the `notification_topic_name` value to `None` or empty string notification will be disabled. |
| 40 | +With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups. |
| 41 | + |
| 42 | +=== "Python" |
| 43 | + |
| 44 | + ```python |
| 45 | + fg.update_notification_topic_name( |
| 46 | + notification_topic_name="new_notification_topic_name") |
| 47 | + ``` |
| 48 | + |
| 49 | +## Using UI |
| 50 | + |
| 51 | +### Create a Feature Group with Change Data Capture |
| 52 | + |
| 53 | +During the creation of the feature group enable online feature serving. |
| 54 | +When enabled you will be able to set the `CDC topic name` property. |
| 55 | + |
| 56 | +<p align="center"> |
| 57 | + <figure> |
| 58 | + <img src="../../../../assets/images/guides/feature_group/create_online_enabled_feature_group.png" alt="Create online enabled feature group"> |
| 59 | + </figure> |
| 60 | +</p> |
| 61 | + |
| 62 | +### Update Feeature Group with Change Data Capture topic |
| 63 | + |
| 64 | +The notification topic name can be changed after creation by editing the feature group. |
| 65 | +By setting the `CDC topic name` value to empty the notifications will be disabled. |
| 66 | +With the default configuration, it can take up to 30 minutes for these changes to take place since the onlinefs service internally caches feature groups. |
| 67 | + |
| 68 | +<p align="center"> |
| 69 | + <figure> |
| 70 | + <img src="../../../../assets/images/guides/feature_group/edit_online_enabled_feature_group.png" alt="Edit online enabled feature group"> |
| 71 | + </figure> |
| 72 | +</p> |
| 73 | + |
| 74 | +## Example of Change Data Capture event |
| 75 | + |
| 76 | +Once properly set up the online feature store service will produce events to the provided topic when data ingestion is completed for records. |
| 77 | + |
| 78 | +Here is an example output: |
| 79 | + |
| 80 | +``` |
| 81 | +{ |
| 82 | + "projectId":119, # project used for data ingestion |
| 83 | + "featureStoreId":67, # feature store where changes took place |
| 84 | + "featureGroupId":14, # feature group changed |
| 85 | + "entry":{ # values of the affected feature group entry |
| 86 | + "id":"15", |
| 87 | + "text":"test" |
| 88 | + }, |
| 89 | + "featureViews":[ # list of feature views affected |
| 90 | + { |
| 91 | + "id":9, # id of the feature view |
| 92 | + "name":"test", # name of the feature view |
| 93 | + "version":1, # version of the feature view |
| 94 | + "featurestoreId":67 # feature store where feature view resides |
| 95 | + } |
| 96 | + ] |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +The list of `featureViews` in the event could be outdated for up to 10 minutes, due to internal logging in onlinefs service. |
0 commit comments