You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HSFS provides functionality to attach transformation functions to [feature views](./overview.md).
3
+
Hopsworks provides functionality to attach transformation functions to [feature views](./overview.md).
4
4
5
-
User defined, custom transformation functions need to be registered in the feature store to make them accessible for feature view creation. To register them in the feature store, they either have to be part of the library [installed](../../../user_guides/projects/python/python_install.md) in Hopsworks or attached when starting a [Jupyter notebook](../../../user_guides/projects/jupyter/python_notebook.md) or [Hopsworks job](../../../user_guides/projects/jobs/spark_job.md).
5
+
These transformation functions are primarily [model-dependent transformations](https://www.hopsworks.ai/dictionary/model-dependent-transformations). Model-dependent transformations generate feature data tailored to a specific model, often requiring the computation of training dataset statistics. Hopsworks enables you to define custom model-dependent transformation functions that can take multiple features and their associated statistics as input and produce multiple transformed features as output. Hopsworks also automatically executes the defined transformation function as a [`@pandas_udf`]((https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.pandas_udf.html)) in a PySpark application and as Pandas functions in Python clients.
6
+
7
+
Custom transformation functions created in Hopsworks can be directly attached to feature views or stored in the feature store for later retrieval and attachment. These custom functions can be part of a library [installed](../../../user_guides/projects/python/python_install.md) in Hopsworks or added when starting a [Jupyter notebook](../../../user_guides/projects/jupyter/python_notebook.md) or [Hopsworks job](../../../user_guides/projects/jobs/spark_job.md).
8
+
9
+
Hopsworks also includes built-in transformation functions such as `min_max_scaler`, `standard_scaler`, `robust_scaler`, `label_encoder`, and `one_hot_encoder` that can be easily imported and used.
6
10
7
11
!!! warning "Pyspark decorators"
8
12
9
-
Don't decorate transformation functions with Pyspark `@udf` or `@pandas_udf`, and also make sure not to use any Pyspark dependencies. That is because, the transformation functions may be executed by Python clients. HSFS will decorate transformation function for you only if it is used inside Pyspark application.
13
+
Don't decorate transformation functions with Pyspark `@udf` or `@pandas_udf`, and also make sure not to use any Pyspark dependencies. That is because, the transformation functions may be executed by Python clients. Hopsworks will automatically run transformations as pandas udfs for you only if it is used inside Pyspark application.
14
+
15
+
!!! warning "Java/Scala support"
16
+
17
+
Creating and attaching Transformation functions to feature views are not supported for HSFS Java or Scala client. If feature view with transformation function was created using python client, you cannot get training data or get feature vectors from HSFS Java or Scala client.
18
+
10
19
20
+
## Creation of Custom Transformation Functions
11
21
12
-
## Creation
13
-
Hopsworks ships built-in transformation functions such as `min_max_scaler`, `standard_scaler`, `robust_scaler` and `label_encoder`.
22
+
User-defined, custom transformation functions can be created in Hopsworks using the `@udf` decorator. These functions should be designed as Pandas functions, meaning they must take input features as a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) and return either a Pandas Series or a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
14
23
15
-
You can also create new functions. Let's assume that you have already installed Python library [transformation_fn_template](https://github.com/logicalclocks/transformation_fn_template) containing the transformation function `plus_one`.
24
+
The `@udf` decorator in Hopsworks creates a metadata class called `HopsworksUdf`. This class manages the necessary operations to supply feature statistics to custom transformation functions and execute them as `@pandas_udf` in PySpark applications or as pure Pandas functions in Python clients. The decorator requires the `return_type` of the transformation function, which indicates the type of features returned. This can be a single Python type if the transformation function returns a single transformed feature as a Pandas Series, or a list of Python types if it returns multiple transformed features as a Pandas DataFrame. The supported types include `str`, `int`, `float`, `bool`, `datetime.datetime`, `datetime.date`, and `datetime.time`.
25
+
26
+
Hopsworks supports four types of transformation functions:
27
+
28
+
1. One to One: Transforms one feature into one transformed feature.
29
+
2. One to Many: Transforms one feature into multiple transformed features.
30
+
3. Many to One: Transforms multiple features into one transformed feature.
31
+
4. Many to Many: Transforms multiple features into multiple transformed features.
32
+
33
+
To create a One to One transformation function, the hopsworks `@udf` decorator must be provided with the return type as a Python type and the transformation function should take one argument as input and return a Pandas Series.
16
34
17
35
=== "Python"
18
36
19
-
!!! example "Register transformation function `plus_one` in the Hopsworks feature store."
37
+
!!! example "Creation of a Custom One to One Transformation Function in Hopsworks."
To retrieve all transformation functions from the feature store, use `get_transformation_functions` which will return the list of available `TransformationFunction` objects. A specific transformation function can be retrieved with the `get_transformation_function` method where you can provide its name and version of the transformation function. If only the function name is provided then it will default to version 1.
46
+
Creation of a Many to One transformation function is similar to that of One to One transformation function, the only difference being that the transformation function accepts multiple features as input.
31
47
32
48
=== "Python"
49
+
!!! example "Creation of a Many to One Custom Transformation Function in Hopsworks."
50
+
```python
51
+
from hopsworks import udf
33
52
34
-
!!! example "Retrieving transformation functions from the feature store"
53
+
@udf(int)
54
+
def add_features(feature1, feature2, feature3):
55
+
return feature + feature2 + feature3
56
+
```
57
+
58
+
To create a One to Many transformation function, the hopsworks `@udf` decorator must be provided with the return type as a list of Python types and the transformation function should take one argument as input and return multiple features as a Pandas DataFrame. The return types provided to the decorator must match the types of each column in the returned Pandas DataFrame.
59
+
60
+
=== "Python"
61
+
!!! example "Creation of a One to Many Custom Transformation Function in Hopsworks."
35
62
```python
36
-
# get all transformation functions
37
-
fs.get_transformation_functions()
63
+
from hopsworks import udf
64
+
import pandas as pd
38
65
39
-
# get transformation function by name. This will default to version 1
Creation of a Many to Many transformation function is similar to that of One to May transformation function, the only difference being that the transformation function accepts multiple features as input.
44
72
45
-
# get transformation function by name and version.
To access statistics pertaining to an argument provided as input to the transformation function, it is necessary to define a keyword argument named `statistics` in the transformation function. This statistics argument should be provided with an instance of class `TransformationStatistics` as default value. The `TransformationStatistics` instance must be initialized with the names of the arguments for which statistical information is required.
84
+
85
+
The `TransformationStatistics` instance contains separate objects with the same name as the arguments used to initialize it. These objects encapsulate statistics related to the feature as instances of the `FeatureTransformationStatistics` class. Upon instantiation, instances of `FeatureTransformationStatistics` are initialized with `None` values. These placeholders are subsequently populated with the required statistics when the training dataset is created.
86
+
87
+
=== "Python"
88
+
!!! example "Creation of a Custom Transformation Function in Hopsworks that accesses Feature Statistics"
89
+
```python
90
+
from hopsworks import udf
91
+
from hsfs.transformation_statistics import TransformationStatistics
The output column generated by the transformation function follows a naming convention structured as `functionName_features_outputColumnNumber`. For instance, for the function named `add_one_multiple`, the output columns would be labeled as `add_one_multiple_feature1-feature2-feature3_0`, `add_one_multiple_feature1-feature2-feature3_1`, and `add_one_multiple_feature1-feature2-feature3_2`.
101
+
49
102
## Apply transformation functions to features
50
103
51
-
You can define in the feature view transformation functions as dict, where key is feature name and value is online transformation function name. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation).
104
+
Transformation functions can be attached to a feature view as a list. Each transformation function can specify which features are to be use by explicitly providing their names as arguments. If no feature names are provided explicitly, the transformation function will default to using features from the feature view that matches the name of the transformation function's argument. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation). By default all features provided as input to a transformation function are dropped when training data, batch data or feature vectors as created.
52
105
53
106
=== "Python"
54
107
55
108
!!! example "Attaching transformation functions to the feature view"
Built-in transformation functions are attached in the same way. The only difference is that it will compute the necessary statistics for the specific function in the background. For example min and max values for `min_max_scaler`; mean and standard deviation for `standard_scaler` etc.
144
+
Built-in transformation functions are attached in the same way. The only difference is that they can either be retrieved from the Hopsworks or imported from the hsfs module
69
145
70
146
=== "Python"
71
147
72
-
!!! example "Attaching built-in transformation functions to the feature view"
148
+
!!! example "Attaching built-in transformation functions to the feature view by retrieving from Hopsworks"
@@ -80,15 +156,65 @@ Built-in transformation functions are attached in the same way. The only differe
80
156
name='transactions_view',
81
157
query=query,
82
158
labels=["fraud_label"],
83
-
transformation_functions = {
84
-
"category": label_encoder,
85
-
"amount": robust_scaler,
86
-
"loc_delta": min_max_scaler,
87
-
"age_at_transaction": standard_scaler
88
-
}
159
+
transformation_functions = [
160
+
label_encoder("category": ),
161
+
robust_scaler("amount"),
162
+
min_max_scaler("loc_delta"),
163
+
standard_scaler("age_at_transaction")
164
+
]
89
165
)
90
166
```
91
167
92
-
!!! warning "Java/Scala support"
168
+
To attach built in transformation functions from the hsfs module they can be directly imported into the code from `hsfs.builtin_transformations`.
169
+
170
+
=== "Python"
171
+
172
+
!!! example "Attaching built-in transformation functions to the feature view by importing from hsfs"
173
+
```python
174
+
from hsfs.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler
175
+
176
+
feature_view = fs.create_feature_view(
177
+
name='transactions_view',
178
+
query=query,
179
+
labels=["fraud_label"],
180
+
transformation_functions = [
181
+
label_encoder("category": ),
182
+
robust_scaler("amount"),
183
+
min_max_scaler("loc_delta"),
184
+
standard_scaler("age_at_transaction")
185
+
]
186
+
)
187
+
```
188
+
189
+
## Saving Transformation Functions to Feature Store
190
+
To save a transformation function to the feature store, use the `create_transformation_function` which would create a `TransformationFunction` object. The `TransformationFunction` object can then be saved by calling the save function.
191
+
192
+
=== "Python"
193
+
194
+
!!! example "Register transformation function `add_one` in the Hopsworks feature store."
To retrieve all transformation functions from the feature store, use `get_transformation_functions` which will return the list of available `TransformationFunction` objects. A specific transformation function can be retrieved with the `get_transformation_function` method where you can provide its name and version of the transformation function. If only the function name is provided then it will default to version 1.
204
+
205
+
=== "Python"
206
+
207
+
!!! example "Retrieving transformation functions from the feature store"
208
+
```python
209
+
# get all transformation functions
210
+
fs.get_transformation_functions()
211
+
212
+
# get transformation function by name. This will default to version 1
Creating and attaching Transformation functions to feature views are not supported for HSFS Java or Scala client. If feature view with transformation function was created using python client, you cannot get training data or get feature vectors from HSFS Java or Scala client.
215
+
# get built-in transformation function min max scaler
0 commit comments