Skip to content

Commit f7555cd

Browse files
committed
documentation of Model Dependent Transformation Functions
1 parent 2689d3f commit f7555cd

File tree

3 files changed

+168
-42
lines changed

3 files changed

+168
-42
lines changed

Diff for: docs/user_guides/fs/feature_view/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,6 @@ This section serves to provide guides and examples for the common usage of abstr
99
- [Feature Server](feature-server.md)
1010
- [Query](query.md)
1111
- [Helper columns](helper-columns.md)
12-
- [Transformation Functions](transformation-function.md)
12+
- [Model-Dependent Transformation Functions](transformation-function.md)
1313
- [Spines](spine-query.md)
1414
- [Feature Monitoring](feature_monitoring.md)
+166-40
Original file line numberDiff line numberDiff line change
@@ -1,75 +1,151 @@
1-
# Transformation Functions
1+
# Model Dependent Transformation Functions
22

3-
HSFS provides functionality to attach transformation functions to [feature views](./overview.md).
3+
Hopsworks provides functionality to attach transformation functions to [feature views](./overview.md).
44

5-
User defined, custom transformation functions need to be registered in the feature store to make them accessible for feature view creation. To register them in the feature store, they either have to be part of the library [installed](../../../user_guides/projects/python/python_install.md) in Hopsworks or attached when starting a [Jupyter notebook](../../../user_guides/projects/jupyter/python_notebook.md) or [Hopsworks job](../../../user_guides/projects/jobs/spark_job.md).
5+
These transformation functions are primarily [model-dependent transformations](https://www.hopsworks.ai/dictionary/model-dependent-transformations). Model-dependent transformations generate feature data tailored to a specific model, often requiring the computation of training dataset statistics. Hopsworks enables you to define custom model-dependent transformation functions that can take multiple features and their associated statistics as input and produce multiple transformed features as output. Hopsworks also automatically executes the defined transformation function as a [`@pandas_udf`]((https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.pandas_udf.html)) in a PySpark application and as Pandas functions in Python clients.
6+
7+
Custom transformation functions created in Hopsworks can be directly attached to feature views or stored in the feature store for later retrieval and attachment. These custom functions can be part of a library [installed](../../../user_guides/projects/python/python_install.md) in Hopsworks or added when starting a [Jupyter notebook](../../../user_guides/projects/jupyter/python_notebook.md) or [Hopsworks job](../../../user_guides/projects/jobs/spark_job.md).
8+
9+
Hopsworks also includes built-in transformation functions such as `min_max_scaler`, `standard_scaler`, `robust_scaler`, `label_encoder`, and `one_hot_encoder` that can be easily imported and used.
610

711
!!! warning "Pyspark decorators"
812

9-
Don't decorate transformation functions with Pyspark `@udf` or `@pandas_udf`, and also make sure not to use any Pyspark dependencies. That is because, the transformation functions may be executed by Python clients. HSFS will decorate transformation function for you only if it is used inside Pyspark application.
13+
Don't decorate transformation functions with Pyspark `@udf` or `@pandas_udf`, and also make sure not to use any Pyspark dependencies. That is because, the transformation functions may be executed by Python clients. Hopsworks will automatically run transformations as pandas udfs for you only if it is used inside Pyspark application.
14+
15+
!!! warning "Java/Scala support"
16+
17+
Creating and attaching Transformation functions to feature views are not supported for HSFS Java or Scala client. If feature view with transformation function was created using python client, you cannot get training data or get feature vectors from HSFS Java or Scala client.
18+
1019

20+
## Creation of Custom Transformation Functions
1121

12-
## Creation
13-
Hopsworks ships built-in transformation functions such as `min_max_scaler`, `standard_scaler`, `robust_scaler` and `label_encoder`.
22+
User-defined, custom transformation functions can be created in Hopsworks using the `@udf` decorator. These functions should be designed as Pandas functions, meaning they must take input features as a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) and return either a Pandas Series or a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
1423

15-
You can also create new functions. Let's assume that you have already installed Python library [transformation_fn_template](https://github.com/logicalclocks/transformation_fn_template) containing the transformation function `plus_one`.
24+
The `@udf` decorator in Hopsworks creates a metadata class called `HopsworksUdf`. This class manages the necessary operations to supply feature statistics to custom transformation functions and execute them as `@pandas_udf` in PySpark applications or as pure Pandas functions in Python clients. The decorator requires the `return_type` of the transformation function, which indicates the type of features returned. This can be a single Python type if the transformation function returns a single transformed feature as a Pandas Series, or a list of Python types if it returns multiple transformed features as a Pandas DataFrame. The supported types include `str`, `int`, `float`, `bool`, `datetime.datetime`, `datetime.date`, and `datetime.time`.
25+
26+
Hopsworks supports four types of transformation functions:
27+
28+
1. One to One: Transforms one feature into one transformed feature.
29+
2. One to Many: Transforms one feature into multiple transformed features.
30+
3. Many to One: Transforms multiple features into one transformed feature.
31+
4. Many to Many: Transforms multiple features into multiple transformed features.
32+
33+
To create a One to One transformation function, the hopsworks `@udf` decorator must be provided with the return type as a Python type and the transformation function should take one argument as input and return a Pandas Series.
1634

1735
=== "Python"
1836

19-
!!! example "Register transformation function `plus_one` in the Hopsworks feature store."
37+
!!! example "Creation of a Custom One to One Transformation Function in Hopsworks."
2038
```python
21-
from custom_functions import transformations
22-
plus_one_meta = fs.create_transformation_function(
23-
transformation_function=transformations.plus_one,
24-
output_type=int,
25-
version=1)
26-
plus_one_meta.save()
39+
from hopsworks import udf
40+
41+
@udf(int)
42+
def add_one(feature):
43+
return feature + 1
2744
```
2845

29-
## Retrieval
30-
To retrieve all transformation functions from the feature store, use `get_transformation_functions` which will return the list of available `TransformationFunction` objects. A specific transformation function can be retrieved with the `get_transformation_function` method where you can provide its name and version of the transformation function. If only the function name is provided then it will default to version 1.
46+
Creation of a Many to One transformation function is similar to that of One to One transformation function, the only difference being that the transformation function accepts multiple features as input.
3147

3248
=== "Python"
49+
!!! example "Creation of a Many to One Custom Transformation Function in Hopsworks."
50+
```python
51+
from hopsworks import udf
3352

34-
!!! example "Retrieving transformation functions from the feature store"
53+
@udf(int)
54+
def add_features(feature1, feature2, feature3):
55+
return feature + feature2 + feature3
56+
```
57+
58+
To create a One to Many transformation function, the hopsworks `@udf` decorator must be provided with the return type as a list of Python types and the transformation function should take one argument as input and return multiple features as a Pandas DataFrame. The return types provided to the decorator must match the types of each column in the returned Pandas DataFrame.
59+
60+
=== "Python"
61+
!!! example "Creation of a One to Many Custom Transformation Function in Hopsworks."
3562
```python
36-
# get all transformation functions
37-
fs.get_transformation_functions()
63+
from hopsworks import udf
64+
import pandas as pd
3865

39-
# get transformation function by name. This will default to version 1
40-
plus_one_fn = fs.get_transformation_function(name="plus_one")
66+
@udf([int, int])
67+
def add_one_and_two(feature1):
68+
return pd.DataFrame({"add_one":feature1 + 1, "add_two":feature1 + 2})
69+
```
4170

42-
# get built-in transformation function min max scaler
43-
min_max_scaler_fn = fs.get_transformation_function(name="min_max_scaler")
71+
Creation of a Many to Many transformation function is similar to that of One to May transformation function, the only difference being that the transformation function accepts multiple features as input.
4472

45-
# get transformation function by name and version.
46-
plus_one_fn = fs.get_transformation_function(name="plus_one", version=2)
73+
=== "Python"
74+
!!! example "Creation of a Many to Many Custom Transformation Function in Hopsworks."
75+
```python
76+
from hopsworks import udf
77+
import pandas as pd
78+
79+
@udf([int, int, int])
80+
def add_one_multiple(feature1, feature2, feature2):
81+
return pd.DataFrame({"add_one_feature1":feature1 + 1, "add_one_feature2":feature2 + 1, "add_one_feature3":feature3 + 1})
82+
```
83+
To access statistics pertaining to an argument provided as input to the transformation function, it is necessary to define a keyword argument named `statistics` in the transformation function. This statistics argument should be provided with an instance of class `TransformationStatistics` as default value. The `TransformationStatistics` instance must be initialized with the names of the arguments for which statistical information is required.
84+
85+
The `TransformationStatistics` instance contains separate objects with the same name as the arguments used to initialize it. These objects encapsulate statistics related to the feature as instances of the `FeatureTransformationStatistics` class. Upon instantiation, instances of `FeatureTransformationStatistics` are initialized with `None` values. These placeholders are subsequently populated with the required statistics when the training dataset is created.
86+
87+
=== "Python"
88+
!!! example "Creation of a Custom Transformation Function in Hopsworks that accesses Feature Statistics"
89+
```python
90+
from hopsworks import udf
91+
from hsfs.transformation_statistics import TransformationStatistics
92+
93+
stats = TransformationStatistics("feature1", "feature2", "feature3")
94+
95+
@udf(int)
96+
def add_features(feature1, feature2, feature3, statistics=stats):
97+
return feature + feature2 + feature3 + statistics.feature1.mean + statistics.feature2.mean + statistics.feature3.mean
4798
```
4899

100+
The output column generated by the transformation function follows a naming convention structured as `functionName_features_outputColumnNumber`. For instance, for the function named `add_one_multiple`, the output columns would be labeled as `add_one_multiple_feature1-feature2-feature3_0`, `add_one_multiple_feature1-feature2-feature3_1`, and `add_one_multiple_feature1-feature2-feature3_2`.
101+
49102
## Apply transformation functions to features
50103

51-
You can define in the feature view transformation functions as dict, where key is feature name and value is online transformation function name. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation).
104+
Transformation functions can be attached to a feature view as a list. Each transformation function can specify which features are to be use by explicitly providing their names as arguments. If no feature names are provided explicitly, the transformation function will default to using features from the feature view that matches the name of the transformation function's argument. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation). By default all features provided as input to a transformation function are dropped when training data, batch data or feature vectors as created.
52105

53106
=== "Python"
54107

55108
!!! example "Attaching transformation functions to the feature view"
56109
```python
57-
plus_one_fn = fs.get_transformation_function(name="plus_one", version=1)
58110
feature_view = fs.create_feature_view(
59111
name='transactions_view',
60112
query=query,
61113
labels=["fraud_label"],
62-
transformation_functions={
63-
"amount_spent": plus_one_fn
64-
}
114+
transformation_functions=[
115+
add_one,
116+
add_features,
117+
add_one_and_two,
118+
add_one_multiple
119+
]
120+
)
121+
```
122+
123+
To explicitly pass the features to a transformation function the feature name to be used can be passed as arguments to the transformation function.
124+
125+
126+
=== "Python"
127+
128+
!!! example "Attaching transformation functions to the feature view by explicitly specifying features to be passed to transformation function"
129+
```python
130+
feature_view = fs.create_feature_view(
131+
name='transactions_view',
132+
query=query,
133+
labels=["fraud_label"],
134+
transformation_functions=[
135+
add_one("feature_1"),
136+
add_one("feature_2"),
137+
add_features("feature_1", "feature_2", "feature_3"),
138+
add_one_and_two("feature_4"),
139+
add_one_multiple("feature_5", "feature_6", "feature_7")
140+
]
65141
)
66142
```
67143

68-
Built-in transformation functions are attached in the same way. The only difference is that it will compute the necessary statistics for the specific function in the background. For example min and max values for `min_max_scaler`; mean and standard deviation for `standard_scaler` etc.
144+
Built-in transformation functions are attached in the same way. The only difference is that they can either be retrieved from the Hopsworks or imported from the hsfs module
69145

70146
=== "Python"
71147

72-
!!! example "Attaching built-in transformation functions to the feature view"
148+
!!! example "Attaching built-in transformation functions to the feature view by retrieving from Hopsworks"
73149
```python
74150
min_max_scaler = fs.get_transformation_function(name="min_max_scaler")
75151
standard_scaler = fs.get_transformation_function(name="standard_scaler")
@@ -80,15 +156,65 @@ Built-in transformation functions are attached in the same way. The only differe
80156
name='transactions_view',
81157
query=query,
82158
labels=["fraud_label"],
83-
transformation_functions = {
84-
"category": label_encoder,
85-
"amount": robust_scaler,
86-
"loc_delta": min_max_scaler,
87-
"age_at_transaction": standard_scaler
88-
}
159+
transformation_functions = [
160+
label_encoder("category": ),
161+
robust_scaler("amount"),
162+
min_max_scaler("loc_delta"),
163+
standard_scaler("age_at_transaction")
164+
]
89165
)
90166
```
91167

92-
!!! warning "Java/Scala support"
168+
To attach built in transformation functions from the hsfs module they can be directly imported into the code from `hsfs.builtin_transformations`.
169+
170+
=== "Python"
171+
172+
!!! example "Attaching built-in transformation functions to the feature view by importing from hsfs"
173+
```python
174+
from hsfs.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler
175+
176+
feature_view = fs.create_feature_view(
177+
name='transactions_view',
178+
query=query,
179+
labels=["fraud_label"],
180+
transformation_functions = [
181+
label_encoder("category": ),
182+
robust_scaler("amount"),
183+
min_max_scaler("loc_delta"),
184+
standard_scaler("age_at_transaction")
185+
]
186+
)
187+
```
188+
189+
## Saving Transformation Functions to Feature Store
190+
To save a transformation function to the feature store, use the `create_transformation_function` which would create a `TransformationFunction` object. The `TransformationFunction` object can then be saved by calling the save function.
191+
192+
=== "Python"
193+
194+
!!! example "Register transformation function `add_one` in the Hopsworks feature store."
195+
```python
196+
plus_one_meta = fs.create_transformation_function(
197+
transformation_function=add_one,
198+
version=1)
199+
plus_one_meta.save()
200+
```
201+
202+
## Retrieval from Feature Store
203+
To retrieve all transformation functions from the feature store, use `get_transformation_functions` which will return the list of available `TransformationFunction` objects. A specific transformation function can be retrieved with the `get_transformation_function` method where you can provide its name and version of the transformation function. If only the function name is provided then it will default to version 1.
204+
205+
=== "Python"
206+
207+
!!! example "Retrieving transformation functions from the feature store"
208+
```python
209+
# get all transformation functions
210+
fs.get_transformation_functions()
211+
212+
# get transformation function by name. This will default to version 1
213+
plus_one_fn = fs.get_transformation_function(name="plus_one")
93214

94-
Creating and attaching Transformation functions to feature views are not supported for HSFS Java or Scala client. If feature view with transformation function was created using python client, you cannot get training data or get feature vectors from HSFS Java or Scala client.
215+
# get built-in transformation function min max scaler
216+
min_max_scaler_fn = fs.get_transformation_function(name="min_max_scaler")
217+
218+
# get transformation function by name and version.
219+
plus_one_fn = fs.get_transformation_function(name="plus_one", version=2)
220+
```

Diff for: mkdocs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ nav:
9494
- Feature server: user_guides/fs/feature_view/feature-server.md
9595
- Query: user_guides/fs/feature_view/query.md
9696
- Helper Columns: user_guides/fs/feature_view/helper-columns.md
97-
- Transformation Functions: user_guides/fs/feature_view/transformation-function.md
97+
- Model-Dependent Transformation Functions: user_guides/fs/feature_view/transformation-function.md
9898
- Spines: user_guides/fs/feature_view/spine-query.md
9999
- Feature Monitoring:
100100
- Getting started: user_guides/fs/feature_view/feature_monitoring.md

0 commit comments

Comments
 (0)