|
| 1 | +# 4.0 Migration Guide |
| 2 | + |
| 3 | +## Breaking Changes |
| 4 | + |
| 5 | +With the release of Hopsworks 4.0, a number of necessary breaking |
| 6 | +changes have been put in place to improve the overall experience of |
| 7 | +using the Hopsworks platform. These breaking changes can be categorized |
| 8 | +in the following areas: |
| 9 | + |
| 10 | +- Python API |
| 11 | + |
| 12 | +- Multi-Environment Docker Images |
| 13 | + |
| 14 | +- On-Demand Transformation Functions |
| 15 | + |
| 16 | +### Python API |
| 17 | + |
| 18 | +A number of significant changes have been made in the Python API |
| 19 | +Hopsworks 4.0. Previously, in Hopsworks 3.X, there were 3 python |
| 20 | +libraries used (“hopsworks”, “hsfs” & “hsml”) to develop feature, |
| 21 | +training & inference pipelines, with the 4.0 release there is now |
| 22 | +one single “hopsworks” python library that can should be used. For |
| 23 | +backwards compatibility, it will still be possible to import both |
| 24 | +the “hsfs” & “hsml” libraries but these are now effectively aliases |
| 25 | +to the “hopsworks” python library and their use going forward should |
| 26 | +be considered as deprecated. |
| 27 | + |
| 28 | +Another significant change in the Hopsworks Python API is the use of |
| 29 | +optional extras to allow a developer to easily import exactly what is |
| 30 | +needed as part of their work. The main ones are great-expectations and |
| 31 | +polars. It is arguable whether this is a breaking change but it is |
| 32 | +important to note depending on how a particular pipeline has been |
| 33 | +written which may encounter a problem when executing using Hopsworks |
| 34 | +4.0. |
| 35 | + |
| 36 | +Finally, there are a number of relatively small breaking changes and |
| 37 | +deprecated methods to improve the developer experience, these include: |
| 38 | + |
| 39 | +- connection.init() is now considered deprecated |
| 40 | + |
| 41 | +- When loading arrow_flight_client, an OptionalDependencyNotFoundError can be now thrown providing more detailed information on the error than the previous ModuleNotFoundError in 3.X. |
| 42 | + |
| 43 | +- DatasetApi's zip and unzip will now return False when a timeout is exceeded instead of previously throwing an Exception |
| 44 | + |
| 45 | + |
| 46 | +### Multi-Environment Docker Images |
| 47 | + |
| 48 | +As part of the Hopsworks 4.0 release, an engineering team using |
| 49 | +Hopsworks can now customize the docker images that they use for their |
| 50 | +feature, training and inference pipelines. By adding this flexibility, |
| 51 | +a set of breaking changes are necessary. Instead of having one common |
| 52 | +docker image for fti pipelines, with the release of 4.0 a number of |
| 53 | +specific docker images are provided to allow an engineering team using |
| 54 | +Hopsworks to install exactly what they need to get their feature, |
| 55 | +training and inference pipelines up and running. This breaking change |
| 56 | +will require existing customers running Hopsworks 3.X to test their |
| 57 | +existing pipelines using Hopsworks 4.0 before upgrading their |
| 58 | +production environments. |
| 59 | + |
| 60 | + |
| 61 | +### On-Demand Transformation Functions |
| 62 | + |
| 63 | +A number of changes have been made to transformation functions in the |
| 64 | +last releases of Hopsworks. With 4.0, On-Demand Transformation Functions |
| 65 | +are now better supported which has resulted in some breaking changes. |
| 66 | +The following is how transformation functions were used in previous |
| 67 | +versions of Hopsworks and the how transformation functions are used |
| 68 | +in the 4.0 release. |
| 69 | + |
| 70 | + |
| 71 | +=== "Pre-4.0" |
| 72 | + ```python |
| 73 | + ################################################# |
| 74 | + # Creating transformation funciton Hopsworks 3.8# |
| 75 | + ################################################# |
| 76 | + |
| 77 | + # Define custom transformation function |
| 78 | + def add_one(feature): |
| 79 | + return feature + 1 |
| 80 | + |
| 81 | + # Create transformation function |
| 82 | + add_one = fs.create_transformation_function(add_one, |
| 83 | + output_type=int, |
| 84 | + version=1, |
| 85 | + ) |
| 86 | + |
| 87 | + # Save transformation function |
| 88 | + add_one.save() |
| 89 | + |
| 90 | + # Retrieve transformation function |
| 91 | + scaler = fs.get_transformation_function( |
| 92 | + name="add_one", |
| 93 | + version=1, |
| 94 | + ) |
| 95 | + |
| 96 | + # Create feature view |
| 97 | + feature_view = fs.get_or_create_feature_view( |
| 98 | + name='serving_fv', |
| 99 | + version=1, |
| 100 | + query=selected_features, |
| 101 | + # Apply your custom transformation functions to the feature `feature_1` |
| 102 | + transformation_functions={ |
| 103 | + "feature_1": add_one, |
| 104 | + }, |
| 105 | + labels=['target'], |
| 106 | + ) |
| 107 | + ``` |
| 108 | + |
| 109 | +=== "4.0" |
| 110 | + ```python |
| 111 | + ################################################# |
| 112 | + # Creating transformation funciton Hopsworks 4.0# |
| 113 | + ################################################# |
| 114 | + |
| 115 | + # Define custom transformation function |
| 116 | + @hopsworks.udf(int) |
| 117 | + def add_one(feature): |
| 118 | + return feature + 1 |
| 119 | + |
| 120 | + # Create feature view |
| 121 | + feature_view = fs.get_or_create_feature_view( |
| 122 | + name='serving_fv', |
| 123 | + version=1, |
| 124 | + query=selected_features, |
| 125 | + # Apply the custom transformation functions defined to the feature `feature_1` |
| 126 | + transformation_functions=[ |
| 127 | + add_one("feature_1"), |
| 128 | + ], |
| 129 | + labels=['target'], |
| 130 | + ) |
| 131 | + ``` |
| 132 | + |
| 133 | +Note that the number of lines of code required has been significantly |
| 134 | +reduced using the “@hopsworks.udf” python decorator. |
0 commit comments