You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/user_guides/fs/compute_engines.md
+14-15
Original file line number
Diff line number
Diff line change
@@ -4,31 +4,31 @@ In order to execute a feature pipeline to write to the Feature Store, as well as
4
4
Hopsworks Feature Store APIs are built around dataframes, that means feature data is inserted into the Feature Store from a Dataframe and likewise when reading data from the Feature Store, it is returned
5
5
as a Dataframe.
6
6
7
-
As such, Hopsworks supports four computational engines:
7
+
As such, Hopsworks supports five computational engines:
8
8
9
9
1.[Apache Spark](https://spark.apache.org): Spark Dataframes and Spark Structured Streaming Dataframes are supported, both from Python environments (PySpark) and from Scala environments.
10
10
2.[Python](https://www.python.org/): For pure Python environments without dependencies on Spark, Hopsworks supports [Pandas Dataframes](https://pandas.pydata.org/) and [Polars Dataframes](https://pola.rs/).
11
11
3.[Apache Flink](https://flink.apache.org): Flink Data Streams are currently supported as an experimental feature from Java/Scala environments.
12
-
3.[Apache Beam](https://beam.apache.org/)*experimental*: Beam Data Streams are currently supported as an experimental feature from Java/Scala environments.
12
+
4.[Apache Beam](https://beam.apache.org/)*experimental*: Beam Data Streams are currently supported as an experimental feature from Java/Scala environments.
13
+
5.[Java](https://www.java.com): For pure Java environments without dependencies on Spark, Hopsworks supports writing using List of POJO Objects.
13
14
14
15
Hopsworks supports running [compute on the platform itself](../../concepts/dev/inside.md) in the form of [Jobs](../projects/jobs/pyspark_job.md) or in [Jupyter Notebooks](../projects/jupyter/python_notebook.md).
15
16
Alternatlively, you can also connect to Hopsworks using Python or Spark from [external environments](../../concepts/dev/outside.md), given that there is network connectivity.
16
17
17
18
## Functionality Support
18
19
19
-
Hopsworks is aiming to provide funtional parity between the computational engines, however, there are certain Hopsworks functionalities which are exclusive to the engines.
20
+
Hopsworks is aiming to provide functional parity between the computational engines, however, there are certain Hopsworks functionalities which are exclusive to the engines.
| Feature Group Creation from dataframes |[`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) |:white_check_mark:|:white_check_mark:| - | - | Currently Flink/Beam doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam.|
24
-
| Training Dataset Creation from dataframes |[`TrainingDataset.save()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) |:white_check_mark:| - | - | - | Functionality was deprecated in version 3.0 |
25
-
| Data validation using Great Expectations for streaming dataframes |[`FeatureGroup.validate()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - |`insert_stream` does not perform any data validation even when a expectation suite is attached. |
26
-
| Stream ingestion |[`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) |:white_check_mark:| - |:white_check_mark:|:white_check_mark:| Python/Pandas/Polars has currently no notion of streaming. |
27
-
| Stream ingestion |[`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) |:white_check_mark:| - |:white_check_mark:|:white_check_mark:| Python/Pandas/Polars has currently no notion of streaming. |
28
-
| Reading from Streaming Storage Connectors |[`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) |:white_check_mark:| - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam only write operations are supported |
29
-
| Reading training data from external storage other than S3 |[`FeatureView.get_training_data()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) |:white_check_mark:| - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30
-
| Reading External Feature Groups into Dataframe |[`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) |:white_check_mark:| - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31
-
| Read Queries containing External Feature Groups into Dataframe |[`Query.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) |:white_check_mark:| - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
| Feature Group Creation from dataframes |[`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) |:white_check_mark:|:white_check_mark:| - | - | - | Currently Flink/Beam/Java doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam. |
25
+
| Training Dataset Creation from dataframes |[`TrainingDataset.save()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) |:white_check_mark:| - | - | - | - | Functionality was deprecated in version 3.0 |
26
+
| Data validation using Great Expectations for streaming dataframes |[`FeatureGroup.validate()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) <br/> [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - | - |`insert_stream` does not perform any data validation even when a expectation suite is attached. |
27
+
| Stream ingestion |[`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) |:white_check_mark:| - |:white_check_mark:|:white_check_mark:|:white_check_mark:| Python/Pandas/Polars has currently no notion of streaming. |
28
+
| Reading from Streaming Storage Connectors |[`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) |:white_check_mark:| - | - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam/Java only write operations are supported |
29
+
| Reading training data from external storage other than S3 |[`FeatureView.get_training_data()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) |:white_check_mark:| - | - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30
+
| Reading External Feature Groups into Dataframe |[`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) |:white_check_mark:| - | - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31
+
| Read Queries containing External Feature Groups into Dataframe |[`Query.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) |:white_check_mark:| - | - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
32
32
33
33
## Python
34
34
@@ -79,6 +79,5 @@ For more details head over to the [Getting Started Guide](https://github.com/log
79
79
80
80
## Java
81
81
It is also possible to interact to Hopsworks feature store using pure Java environments without dependencies on Spark, Flink or Beam.
82
-
However, this is limited to retrieval of feature vector(s) from the online Feature Store.
83
82
84
83
For more details head over to the [Getting Started Guide](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/java).
0 commit comments