Skip to content

Commit 9b1eab4

Browse files
committed
updating based on review comments
1 parent e472fe6 commit 9b1eab4

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/user_guides/fs/feature_view/transformation-function.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Hopsworks also includes built-in transformation functions such as `min_max_scale
1919

2020
## Creation of Custom Transformation Functions
2121

22-
User-defined, custom transformation functions can be created in Hopsworks using the `@udf` decorator. These functions should be designed as Pandas functions, meaning they must take input features as a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) and return either a Pandas Series or a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
22+
User-defined, custom transformation functions can be created in Hopsworks using the [`@udf`](http://docs.hopsworks.ai/hopsworks-api/latest/generated/api/udf/) decorator. These functions should be designed as Pandas functions, meaning they must take input features as a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) and return either a Pandas Series or a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).
2323

2424
The `@udf` decorator in Hopsworks creates a metadata class called `HopsworksUdf`. This class manages the necessary operations to supply feature statistics to custom transformation functions and execute them as `@pandas_udf` in PySpark applications or as pure Pandas functions in Python clients. The decorator requires the `return_type` of the transformation function, which indicates the type of features returned. This can be a single Python type if the transformation function returns a single transformed feature as a Pandas Series, or a list of Python types if it returns multiple transformed features as a Pandas DataFrame. The supported types include `str`, `int`, `float`, `bool`, `datetime.datetime`, `datetime.date`, and `datetime.time`.
2525

@@ -82,26 +82,26 @@ Creation of a Many to Many transformation function is similar to that of One to
8282
```
8383
To access statistics pertaining to an argument provided as input to the transformation function, it is necessary to define a keyword argument named `statistics` in the transformation function. This statistics argument should be provided with an instance of class `TransformationStatistics` as default value. The `TransformationStatistics` instance must be initialized with the names of the arguments for which statistical information is required.
8484

85-
The `TransformationStatistics` instance contains separate objects with the same name as the arguments used to initialize it. These objects encapsulate statistics related to the feature as instances of the `FeatureTransformationStatistics` class. Upon instantiation, instances of `FeatureTransformationStatistics` are initialized with `None` values. These placeholders are subsequently populated with the required statistics when the training dataset is created.
85+
The `TransformationStatistics` instance contains separate objects with the same name as the arguments used to initialize it. These objects encapsulate statistics related to the argument as instances of the `FeatureTransformationStatistics` class. Upon instantiation, instances of `FeatureTransformationStatistics` are initialized with `None` values. These placeholders are subsequently populated with the required statistics when the training dataset is created.
8686

8787
=== "Python"
8888
!!! example "Creation of a Custom Transformation Function in Hopsworks that accesses Feature Statistics"
8989
```python
9090
from hopsworks import udf
9191
from hsfs.transformation_statistics import TransformationStatistics
9292

93-
stats = TransformationStatistics("feature1", "feature2", "feature3")
93+
stats = TransformationStatistics("argument1", "argument2", "argument3")
9494

9595
@udf(int)
96-
def add_features(feature1, feature2, feature3, statistics=stats):
97-
return feature + feature2 + feature3 + statistics.feature1.mean + statistics.feature2.mean + statistics.feature3.mean
96+
def add_features(argument1, argument2, argument3, statistics=stats):
97+
return argument + argument2 + argument3 + statistics.argument1.mean + statistics.argument2.mean + statistics.argument3.mean
9898
```
9999

100-
The output column generated by the transformation function follows a naming convention structured as `functionName_features_outputColumnNumber`. For instance, for the function named `add_one_multiple`, the output columns would be labeled as `add_one_multiple_feature1-feature2-feature3_0`, `add_one_multiple_feature1-feature2-feature3_1`, and `add_one_multiple_feature1-feature2-feature3_2`.
100+
The output column generated by the transformation function follows a naming convention structured as `functionName_features_outputColumnNumber`. For instance, for the function named `add_one_multiple`, the output columns would be labeled as `add_one_multiple_feature1_feature2_feature3_0`, `add_one_multiple_feature1_feature2_feature3_1`, and `add_one_multiple_feature1_feature2_feature3_2`.
101101

102102
## Apply transformation functions to features
103103

104-
Transformation functions can be attached to a feature view as a list. Each transformation function can specify which features are to be use by explicitly providing their names as arguments. If no feature names are provided explicitly, the transformation function will default to using features from the feature view that matches the name of the transformation function's argument. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation). By default all features provided as input to a transformation function are dropped when training data, batch data or feature vectors as created.
104+
Transformation functions can be attached to a feature view as a list. Each transformation function can specify which features are to be use by explicitly providing their names as arguments. If no feature names are provided explicitly, the transformation function will default to using features from the feature view that matches the name of the transformation function's argument. Then the transformation functions are applied when you [read training data](./training-data.md#read-training-data), [read batch data](./batch-data.md#creation-with-transformation), or [get feature vectors](./feature-vectors.md#retrieval-with-transformation). The generated data includes both transformed and untransformed features in a DataFrame. The transformed features are organized by their output column names and are positioned after the untransformed features. By default all features provided as input to a transformation function are dropped when training data, batch data or feature vectors as created.
105105

106106
=== "Python"
107107

@@ -157,7 +157,7 @@ Built-in transformation functions are attached in the same way. The only differe
157157
query=query,
158158
labels=["fraud_label"],
159159
transformation_functions = [
160-
label_encoder("category": ),
160+
label_encoder("category"),
161161
robust_scaler("amount"),
162162
min_max_scaler("loc_delta"),
163163
standard_scaler("age_at_transaction")

0 commit comments

Comments
 (0)