Skip to content

Commit 791e27e

Browse files
committed
[FSTORE-1672] Allow multiple on-demand features to be returned from an on-demand transformation function and allow passing of local variables to a transformation function (logicalclocks#439)
1 parent 5a8891c commit 791e27e

6 files changed

+74
-9
lines changed

docs/user_guides/fs/feature_group/on_demand_transformations.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,13 @@
55
## On Demand Transformation Function Creation
66

77

8-
An on-demand transformation function may be created by associating a [transformation function](../transformation_functions.md) with a feature group. Each on-demand transformation function generates a single on-demand feature, which, by default, is assigned the same name as the associated transformation function. For instance, in the example below, the on-demand transformation function `transaction_age` produces an on-demand feature named transaction_age. Alternatively, the name of the resulting on-demand feature can be explicitly defined using the [`alias`](../transformation_functions.md#specifying-output-features–names-for-transformation-functions) function.
9-
10-
It is important to note that only one-to-one or many-to-one transformation functions are compatible with the creation of on-demand transformation functions.
8+
An on-demand transformation function may be created by associating a [transformation function](../transformation_functions.md) with a feature group. Each on-demand transformation function can generate one or multiple on-demand features. If the on-demand transformation function returns a single feature, it is automatically assigned the same name as the transformation function. However, if it returns multiple features, they are by default named using the format `functionName_outputColumnNumber`. For instance, in the example below, the on-demand transformation function `transaction_age` produces an on-demand feature named `transaction_age` and the on-demand transformation function `stripped_strings` produces the on-demand features names `stripped_strings_0` and `stripped_strings_1`. Alternatively, the name of the resulting on-demand feature can be explicitly defined using the [`alias`](../transformation_functions.md#specifying-output-features–names-for-transformation-functions) function.
119

1210
!!! warning "On-demand transformation"
1311
All on-demand transformation functions attached to a feature group must have unique names and, in contrast to model-dependent transformations, they do not have access to training dataset statistics.
1412

1513
Each on-demand transformation function can map specific features to its arguments by explicitly providing their names as arguments to the transformation function. If no feature names are provided, the transformation function will default to using features that match the name of the transformation function's argument.
1614

17-
18-
1915
=== "Python"
2016
!!! example "Creating on-demand transformation functions."
2117
```python
@@ -24,14 +20,18 @@ Each on-demand transformation function can map specific features to its argument
2420
def transaction_age(transaction_date, current_date):
2521
return (current_date - transaction_date).dt.days
2622

23+
@hopsworks.udf(return_type=[str, str], drop=["current_date"])
24+
def stripped_strings(country, city):
25+
return county.strip(), city.strip()
26+
2727
# Attach transformation function to feature group to create on-demand transformation function.
2828
fg = feature_store.create_feature_group(name="fg_transactions",
2929
version=1,
3030
description="Transaction Features",
3131
online_enabled=True,
3232
primary_key=['id'],
3333
event_time='event_time'
34-
transformation_functions=[transaction_age]
34+
transformation_functions=[transaction_age, stripped_strings]
3535
)
3636
```
3737

docs/user_guides/fs/feature_view/batch-data.md

+14-1
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,17 @@ If you have specified transformation functions when creating a feature view, you
5353
feature_view.init_batch_scoring(training_dataset_version=1)
5454
```
5555

56-
It is important to note that in addition to the filters defined in feature view, [extra filters](./training-data.md#Extra-filters) will be applied if they are defined in the given training dataset version.
56+
It is important to note that in addition to the filters defined in feature view, [extra filters](./training-data.md#Extra-filters) will be applied if they are defined in the given training dataset version.
57+
58+
59+
## Passing Context Variables to Transformation Functions
60+
After [defining a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the necessary context variables through the `transformation_context` parameter when fetching batch data.
61+
62+
63+
=== "Python"
64+
!!! example "Passing context variables while fetching batch data."
65+
```python
66+
# Passing context variable to IN-MEMORY Training Dataset.
67+
batch_data = feature_view.get_batch_data(transformation_context={"context_parameter":10})
68+
69+
```

docs/user_guides/fs/feature_view/feature-vectors.md

+13
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,19 @@ You can also use the parameter to provide values for all the features which are
191191
)
192192
```
193193

194+
## Passing Context Variables to Transformation Functions
195+
After [defining a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the required context variables using the `transformation_context` parameter when fetching the feature vectors.
196+
197+
=== "Python"
198+
!!! example "Passing context variables while fetching batch data."
199+
```python
200+
# Passing context variable to IN-MEMORY Training Dataset.
201+
batch_data = feature_view.get_feature_vectors(
202+
entry = [{ "pk1": 1 }],
203+
transformation_context={"context_parameter":10}
204+
)
205+
```
206+
194207
## Choose the right Client
195208

196209
The Online Store can be accessed via the **Python** or **Java** client allowing you to use your language of choice to connect to the Online Store. Additionally, the Python client provides two different implementations to fetch data: **SQL** or **REST**. The SQL client is the default implementation. It requires a direct SQL connection to your RonDB cluster and uses python asyncio to offer high performance even when your Feature View rows involve querying multiple different tables. The REST client is an alternative implementation connecting to [RonDB Feature Vector Server](./feature-server.md). Perfect if you want to avoid exposing ports of your database cluster directly to clients. This implementation is available as of Hopsworks 3.7.

docs/user_guides/fs/feature_view/model-dependent-transformations.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -93,14 +93,14 @@ To attach built-in transformation functions from the `hopsworks` module they can
9393

9494
!!! example "Creating model-dependent transformation using built-in transformation functions imported from hopsworks"
9595
```python
96-
from hopsworks.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler
96+
from hopsworks.hsfs.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler
9797
9898
feature_view = fs.create_feature_view(
9999
name='transactions_view',
100100
query=query,
101101
labels=["fraud_label"],
102102
transformation_functions = [
103-
label_encoder("category": ),
103+
label_encoder("category"),
104104
robust_scaler("amount"),
105105
min_max_scaler("loc_delta"),
106106
standard_scaler("age_at_transaction")

docs/user_guides/fs/feature_view/training-data.md

+24
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,30 @@ X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_da
9494
X_train, X_val, X_test, y_train, y_val, y_test = feature_view.get_train_validation_test_split(training_dataset_version=1)
9595
```
9696

97+
## Passing Context Variables to Transformation Functions
98+
Once you have [defined a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the required context variables using the `transformation_context` parameter when generating IN-MEMORY training data or materializing a training dataset.
99+
100+
!!! note
101+
Passing context variables for materializing a training dataset is only supported in the PySpark Kernel.
102+
103+
104+
=== "Python"
105+
!!! example "Passing context variables while creating training data."
106+
```python
107+
# Passing context variable to IN-MEMORY Training Dataset.
108+
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_dataset_version=1,
109+
primary_key=True,
110+
event_time=True,
111+
transformation_context={"context_parameter":10})
112+
113+
# Passing context variable to Materialized Training Dataset.
114+
version, job = feature_view.get_train_test_split(training_dataset_version=1,
115+
primary_key=True,
116+
event_time=True,
117+
transformation_context={"context_parameter":10})
118+
119+
```
120+
97121
## Read training data with primary key(s) and event time
98122
For certain use cases, e.g. time series models, the input data needs to be sorted according to the primary key(s) and event time combination.
99123
Primary key(s) and event time are not usually included in the feature view query as they are not features used for training.

docs/user_guides/fs/transformation_functions.md

+15
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,21 @@ The `TransformationStatistics` instance contains separate objects with the sam
228228
return argument + argument2 + argument3 + statistics.argument1.mean + statistics.argument2.mean + statistics.argument3.mean
229229
```
230230

231+
### Passing context variables to transformation function
232+
233+
The `context` keyword argument can be defined in a transformation function to access shared context variables. These variables contain common data used across transformation functions. By including the context argument, you can pass the necessary data as a dictionary into the into the `context` argument of the transformation function during [training dataset creation](feature_view/training-data.md#passing-context-variables-to-transformation-functions) or [feature vector retrieval](feature_view/feature-vectors.md#passing-context-variables-to-transformation-functions) or [batch data retrieval](feature_view/batch-data.md#passing-context-variables-to-transformation-functions).
234+
235+
236+
=== "Python"
237+
!!! example "Creation of a transformation function in Hopsworks that accepts context variables"
238+
```python
239+
from hopsworks import udf
240+
241+
@udf(int)
242+
def add_features(argument1, context):
243+
return argument + context["value_to_add"]
244+
```
245+
231246

232247
## Saving to the Feature Store
233248

0 commit comments

Comments
 (0)