diff --git a/docs/user_guides/fs/feature_group/on_demand_transformations.md b/docs/user_guides/fs/feature_group/on_demand_transformations.md index 269bbf38..81efb413 100644 --- a/docs/user_guides/fs/feature_group/on_demand_transformations.md +++ b/docs/user_guides/fs/feature_group/on_demand_transformations.md @@ -5,17 +5,13 @@ ## On Demand Transformation Function Creation -An on-demand transformation function may be created by associating a [transformation function](../transformation_functions.md) with a feature group. Each on-demand transformation function generates a single on-demand feature, which, by default, is assigned the same name as the associated transformation function. For instance, in the example below, the on-demand transformation function `transaction_age` produces an on-demand feature named transaction_age. Alternatively, the name of the resulting on-demand feature can be explicitly defined using the [`alias`](../transformation_functions.md#specifying-output-features–names-for-transformation-functions) function. - -It is important to note that only one-to-one or many-to-one transformation functions are compatible with the creation of on-demand transformation functions. +An on-demand transformation function may be created by associating a [transformation function](../transformation_functions.md) with a feature group. Each on-demand transformation function can generate one or multiple on-demand features. If the on-demand transformation function returns a single feature, it is automatically assigned the same name as the transformation function. However, if it returns multiple features, they are by default named using the format `functionName_outputColumnNumber`. For instance, in the example below, the on-demand transformation function `transaction_age` produces an on-demand feature named `transaction_age` and the on-demand transformation function `stripped_strings` produces the on-demand features names `stripped_strings_0` and `stripped_strings_1`. Alternatively, the name of the resulting on-demand feature can be explicitly defined using the [`alias`](../transformation_functions.md#specifying-output-features–names-for-transformation-functions) function. !!! warning "On-demand transformation" All on-demand transformation functions attached to a feature group must have unique names and, in contrast to model-dependent transformations, they do not have access to training dataset statistics. Each on-demand transformation function can map specific features to its arguments by explicitly providing their names as arguments to the transformation function. If no feature names are provided, the transformation function will default to using features that match the name of the transformation function's argument. - - === "Python" !!! example "Creating on-demand transformation functions." ```python @@ -24,6 +20,10 @@ Each on-demand transformation function can map specific features to its argument def transaction_age(transaction_date, current_date): return (current_date - transaction_date).dt.days + @hopsworks.udf(return_type=[str, str], drop=["current_date"]) + def stripped_strings(country, city): + return county.strip(), city.strip() + # Attach transformation function to feature group to create on-demand transformation function. fg = feature_store.create_feature_group(name="fg_transactions", version=1, @@ -31,7 +31,7 @@ Each on-demand transformation function can map specific features to its argument online_enabled=True, primary_key=['id'], event_time='event_time' - transformation_functions=[transaction_age] + transformation_functions=[transaction_age, stripped_strings] ) ``` diff --git a/docs/user_guides/fs/feature_view/batch-data.md b/docs/user_guides/fs/feature_view/batch-data.md index 307a66ef..5f1fd9a7 100644 --- a/docs/user_guides/fs/feature_view/batch-data.md +++ b/docs/user_guides/fs/feature_view/batch-data.md @@ -53,4 +53,17 @@ If you have specified transformation functions when creating a feature view, you feature_view.init_batch_scoring(training_dataset_version=1) ``` -It is important to note that in addition to the filters defined in feature view, [extra filters](./training-data.md#Extra-filters) will be applied if they are defined in the given training dataset version. \ No newline at end of file +It is important to note that in addition to the filters defined in feature view, [extra filters](./training-data.md#Extra-filters) will be applied if they are defined in the given training dataset version. + + +## Passing Context Variables to Transformation Functions +After [defining a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the necessary context variables through the `transformation_context` parameter when fetching batch data. + + +=== "Python" + !!! example "Passing context variables while fetching batch data." + ```python + # Passing context variable to IN-MEMORY Training Dataset. + batch_data = feature_view.get_batch_data(transformation_context={"context_parameter":10}) + + ``` \ No newline at end of file diff --git a/docs/user_guides/fs/feature_view/feature-vectors.md b/docs/user_guides/fs/feature_view/feature-vectors.md index f7f96679..ed7f46b1 100644 --- a/docs/user_guides/fs/feature_view/feature-vectors.md +++ b/docs/user_guides/fs/feature_view/feature-vectors.md @@ -191,6 +191,19 @@ You can also use the parameter to provide values for all the features which are ) ``` +## Passing Context Variables to Transformation Functions +After [defining a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the required context variables using the `transformation_context` parameter when fetching the feature vectors. + +=== "Python" + !!! example "Passing context variables while fetching batch data." + ```python + # Passing context variable to IN-MEMORY Training Dataset. + batch_data = feature_view.get_feature_vectors( + entry = [{ "pk1": 1 }], + transformation_context={"context_parameter":10} + ) + ``` + ## Choose the right Client The Online Store can be accessed via the **Python** or **Java** client allowing you to use your language of choice to connect to the Online Store. Additionally, the Python client provides two different implementations to fetch data: **SQL** or **REST**. The SQL client is the default implementation. It requires a direct SQL connection to your RonDB cluster and uses python asyncio to offer high performance even when your Feature View rows involve querying multiple different tables. The REST client is an alternative implementation connecting to [RonDB Feature Vector Server](./feature-server.md). Perfect if you want to avoid exposing ports of your database cluster directly to clients. This implementation is available as of Hopsworks 3.7. diff --git a/docs/user_guides/fs/feature_view/model-dependent-transformations.md b/docs/user_guides/fs/feature_view/model-dependent-transformations.md index 1a81533c..314bd14e 100644 --- a/docs/user_guides/fs/feature_view/model-dependent-transformations.md +++ b/docs/user_guides/fs/feature_view/model-dependent-transformations.md @@ -93,14 +93,14 @@ To attach built-in transformation functions from the `hopsworks` module they can !!! example "Creating model-dependent transformation using built-in transformation functions imported from hopsworks" ```python - from hopsworks.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler + from hopsworks.hsfs.builtin_transformations import min_max_scaler, label_encoder, robust_scaler, standard_scaler feature_view = fs.create_feature_view( name='transactions_view', query=query, labels=["fraud_label"], transformation_functions = [ - label_encoder("category": ), + label_encoder("category"), robust_scaler("amount"), min_max_scaler("loc_delta"), standard_scaler("age_at_transaction") diff --git a/docs/user_guides/fs/feature_view/training-data.md b/docs/user_guides/fs/feature_view/training-data.md index acb648eb..e5692cd0 100644 --- a/docs/user_guides/fs/feature_view/training-data.md +++ b/docs/user_guides/fs/feature_view/training-data.md @@ -94,6 +94,30 @@ X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_da X_train, X_val, X_test, y_train, y_val, y_test = feature_view.get_train_validation_test_split(training_dataset_version=1) ``` +## Passing Context Variables to Transformation Functions +Once you have [defined a transformation function using a context variable](../transformation_functions.md#passing-context-variables-to-transformation-function), you can pass the required context variables using the `transformation_context` parameter when generating IN-MEMORY training data or materializing a training dataset. + +!!! note + Passing context variables for materializing a training dataset is only supported in the PySpark Kernel. + + +=== "Python" + !!! example "Passing context variables while creating training data." + ```python + # Passing context variable to IN-MEMORY Training Dataset. + X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_dataset_version=1, + primary_key=True, + event_time=True, + transformation_context={"context_parameter":10}) + + # Passing context variable to Materialized Training Dataset. + version, job = feature_view.get_train_test_split(training_dataset_version=1, + primary_key=True, + event_time=True, + transformation_context={"context_parameter":10}) + + ``` + ## Read training data with primary key(s) and event time For certain use cases, e.g. time series models, the input data needs to be sorted according to the primary key(s) and event time combination. Primary key(s) and event time are not usually included in the feature view query as they are not features used for training. diff --git a/docs/user_guides/fs/transformation_functions.md b/docs/user_guides/fs/transformation_functions.md index 013a7718..fb0f5bee 100644 --- a/docs/user_guides/fs/transformation_functions.md +++ b/docs/user_guides/fs/transformation_functions.md @@ -228,6 +228,21 @@ The `TransformationStatistics` instance contains separate objects with the sam return argument + argument2 + argument3 + statistics.argument1.mean + statistics.argument2.mean + statistics.argument3.mean ``` +### Passing context variables to transformation function + +The `context` keyword argument can be defined in a transformation function to access shared context variables. These variables contain common data used across transformation functions. By including the context argument, you can pass the necessary data as a dictionary into the into the `context` argument of the transformation function during [training dataset creation](feature_view/training-data.md#passing-context-variables-to-transformation-functions) or [feature vector retrieval](feature_view/feature-vectors.md#passing-context-variables-to-transformation-functions) or [batch data retrieval](feature_view/batch-data.md#passing-context-variables-to-transformation-functions). + + +=== "Python" + !!! example "Creation of a transformation function in Hopsworks that accepts context variables" + ```python + from hopsworks import udf + + @udf(int) + def add_features(argument1, context): + return argument + context["value_to_add"] + ``` + ## Saving to the Feature Store