Skip to content

Commit 6f859e6

Browse files
authored
Fix feature-store-api and machine-learning-api links (#419)
1 parent 7ce6da4 commit 6f859e6

24 files changed

+183
-225
lines changed

docs/js/inject-api-links.js

+6-11
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,12 @@
11
window.addEventListener("DOMContentLoaded", function () {
22
var windowPathNameSplits = window.location.pathname.split("/");
3-
var majorVersionRegex = new RegExp("(\\d+[.]\\d+)")
4-
var latestRegex = new RegExp("latest")
5-
if (majorVersionRegex.test(windowPathNameSplits[1])) { // On landing page docs.hopsworks.api/3.0 - URL contains major version
3+
var majorVersionRegex = new RegExp("(\\d+[.]\\d+)");
4+
var latestRegex = new RegExp("latest");
5+
if (majorVersionRegex.test(windowPathNameSplits[1])) { // On landing page docs.hopsworks.api/4.0 - URL contains major version
66
// Version API dropdown
77
document.getElementById("hopsworks_api_link").href = "https://docs.hopsworks.ai/hopsworks-api/" + windowPathNameSplits[1] + "/generated/api/login/";
8-
document.getElementById("hsfs_api_link").href = "https://docs.hopsworks.ai/feature-store-api/" + windowPathNameSplits[1] + "/generated/api/connection_api/";
9-
document.getElementById("hsfs_javadoc_link").href = "https://docs.hopsworks.ai/feature-store-api/" + windowPathNameSplits[1] + "/javadoc";
10-
document.getElementById("hsml_api_link").href = "https://docs.hopsworks.ai/machine-learning-api/" + windowPathNameSplits[1] + "/generated/connection_api/";
11-
} else { // on docs.hopsworks.api/feature-store-api/3.0 / docs.hopsworks.api/hopsworks-api/3.0 / docs.hopsworks.api/machine-learning-api/3.0
12-
8+
document.getElementById("hsfs_javadoc_link").href = "https://docs.hopsworks.ai/hopsworks-api/" + windowPathNameSplits[1] + "/javadoc";
9+
} else { // on / docs.hopsworks.api/hopsworks-api/4.0
1310
if (latestRegex.test(windowPathNameSplits[2]) || latestRegex.test(windowPathNameSplits[1])) {
1411
var majorVersion = "latest";
1512
} else {
@@ -26,8 +23,6 @@ window.addEventListener("DOMContentLoaded", function () {
2623
document.getElementsByClassName("md-tabs__link")[6].href = "https://docs.hopsworks.ai/" + majorVersion + "/admin/";
2724
// Version API dropdown
2825
document.getElementById("hopsworks_api_link").href = "https://docs.hopsworks.ai/hopsworks-api/" + majorVersion + "/generated/api/login/";
29-
document.getElementById("hsfs_api_link").href = "https://docs.hopsworks.ai/feature-store-api/" + majorVersion + "/generated/api/connection_api/";
30-
document.getElementById("hsfs_javadoc_link").href = "https://docs.hopsworks.ai/feature-store-api/" + majorVersion + "/javadoc";
31-
document.getElementById("hsml_api_link").href = "https://docs.hopsworks.ai/machine-learning-api/" + majorVersion + "/generated/connection_api/";
26+
document.getElementById("hsfs_javadoc_link").href = "https://docs.hopsworks.ai/hopsworks-api/" + majorVersion + "/javadoc";
3227
}
3328
});

docs/reference_guides/index.md

-37
This file was deleted.

docs/setup_installation/common/arrow_flight_duckdb.md

+15-15
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@ By default, Hopsworks uses big data technologies (Spark or Hive) to create train
33
This is great for large datasets, but for small or moderately sized datasets (think of the size of data that would fit in a Pandas
44
DataFrame in your local Python environment), the overhead of starting a Spark or Hive job and doing distributed data processing can be significant.
55

6-
ArrowFlight Server with DuckDB significantly reduces the time that Python clients need to read feature groups
6+
ArrowFlight Server with DuckDB significantly reduces the time that Python clients need to read feature groups
77
and batch inference data from the Feature Store, as well as creating moderately-sized in-memory training datasets.
88

99
When the service is enabled, clients will automatically use it for the following operations:
1010

11-
- [reading Feature Groups](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#read)
12-
- [reading Queries](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read)
13-
- [reading Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data)
14-
- [creating In-Memory Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#training_data)
15-
- [reading Batch Inference Data](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_batch_data)
11+
- [reading Feature Groups](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#read)
12+
- [reading Queries](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/query_api/#read)
13+
- [reading Training Datasets](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data)
14+
- [creating In-Memory Training Datasets](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#training_data)
15+
- [reading Batch Inference Data](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_batch_data)
1616

1717
For larger datasets, clients can still make use of the Spark/Hive backend by explicitly setting
1818
`read_options={"use_hive": True}`.
@@ -21,9 +21,9 @@ For larger datasets, clients can still make use of the Spark/Hive backend by exp
2121

2222
!!! note
2323
Supported only on AWS at the moment.
24-
24+
2525
The ArrowFlight Server is co-located with RonDB in the Hopsworks cluster.
26-
If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50%
26+
If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50%
2727
of the available resources on the node, so they can co-exist without impacting each other.
2828
Just like RonDB, the ArrowFlight Server can be replicated across multiple nodes to serve more clients at lower latency.
2929
To guarantee high performance, each individual ArrowFlight Server instance processes client requests sequentially.
@@ -42,12 +42,12 @@ To deploy ArrowFlight Server on a cluster:
4242
2. Select an instance type with at least 16GB of memory and 4 cores. (*)
4343
3. Tick the checkbox `Enable ArrowFlight Server`.
4444

45-
(*) The service should have at least the 2x the amount of memory available that a typical Python client would have.
46-
Because RonDB and ArrowFlight Server share the same node we recommend selecting an instance type with at least 4x the
47-
client memory. For example, if the service serves Python clients with typically 4GB of memory,
48-
an instance with at least 16GB of memory should be selected.
49-
An instance with 16GB of memory will be able to read feature groups and training datasets of up to 10-100M rows,
50-
depending on the number of columns and size of the features (~2GB in parquet). The same instance will be able to create
51-
point-in-time correct training datasets with 1-10M rows, also depending on the number and the size of the features.
45+
(*) The service should have at least the 2x the amount of memory available that a typical Python client would have.
46+
Because RonDB and ArrowFlight Server share the same node we recommend selecting an instance type with at least 4x the
47+
client memory. For example, if the service serves Python clients with typically 4GB of memory,
48+
an instance with at least 16GB of memory should be selected.
49+
An instance with 16GB of memory will be able to read feature groups and training datasets of up to 10-100M rows,
50+
depending on the number of columns and size of the features (~2GB in parquet). The same instance will be able to create
51+
point-in-time correct training datasets with 1-10M rows, also depending on the number and the size of the features.
5252
Larger instances are able to handle larger datasets. The numbers scale roughly linearly with the instance size.
5353

docs/user_guides/fs/compute_engines.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ Hopsworks is aiming to provide functional parity between the computational engin
2020

2121
| Functionality | Method | Spark | Python | Flink | Beam | Comment |
2222
| ----------------------------------------------------------------- | ------ | ----- | ------ | ------ | ------ | ------- |
23-
| Feature Group Creation from dataframes | [`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) | :white_check_mark: | :white_check_mark: | - | - | Currently Flink/Beam doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam.|
24-
| Training Dataset Creation from dataframes | [`TrainingDataset.save()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) | :white_check_mark: | - | - | - | Functionality was deprecated in version 3.0 |
25-
| Data validation using Great Expectations for streaming dataframes | [`FeatureGroup.validate()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - | `insert_stream` does not perform any data validation even when a expectation suite is attached. |
26-
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
27-
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
28-
| Reading from Streaming Storage Connectors | [`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) | :white_check_mark: | - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam only write operations are supported |
29-
| Reading training data from external storage other than S3 | [`FeatureView.get_training_data()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) | :white_check_mark: | - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30-
| Reading External Feature Groups into Dataframe | [`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) | :white_check_mark: | - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31-
| Read Queries containing External Feature Groups into Dataframe | [`Query.read()`](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) | :white_check_mark: | - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
23+
| Feature Group Creation from dataframes | [`FeatureGroup.create_feature_group()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#create_feature_group) | :white_check_mark: | :white_check_mark: | - | - | Currently Flink/Beam doesn't support registering feature group metadata. Thus it needs to be pre-registered before you can write real time features computed by Flink/Beam.|
24+
| Training Dataset Creation from dataframes | [`TrainingDataset.save()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/training_dataset_api/#save) | :white_check_mark: | - | - | - | Functionality was deprecated in version 3.0 |
25+
| Data validation using Great Expectations for streaming dataframes | [`FeatureGroup.validate()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#validate) [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | - | - | - | - | `insert_stream` does not perform any data validation even when a expectation suite is attached. |
26+
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
27+
| Stream ingestion | [`FeatureGroup.insert_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#insert_stream) | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | Python/Pandas/Polars has currently no notion of streaming. |
28+
| Reading from Streaming Storage Connectors | [`KafkaConnector.read_stream()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/storage_connector_api/#read_stream) | :white_check_mark: | - | - | - | Python/Pandas/Polars has currently no notion of streaming. For Flink/Beam only write operations are supported |
29+
| Reading training data from external storage other than S3 | [`FeatureView.get_training_data()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) | :white_check_mark: | - | - | - | Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs, instead you will have to use the storage's native client. |
30+
| Reading External Feature Groups into Dataframe | [`ExternalFeatureGroup.read()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/external_feature_group_api/#read) | :white_check_mark: | - | - | - | Reading an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the [Query API](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/query_api/) to create Feature Views/Training Data containing External Feature Groups. |
31+
| Read Queries containing External Feature Groups into Dataframe | [`Query.read()`](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) | :white_check_mark: | - | - | - | Reading a Query containing an External Feature Group directly into a Pandas/Polars Dataframe is not supported, however, you can use the Query to create Feature Views/Training Data and write the data to a Storage Connector, from where you can read up the data into a Pandas/Polars Dataframe. |
3232

3333
## Python
3434

@@ -64,7 +64,7 @@ Connecting to the Feature Store from an external Flink cluster, such as GCP Data
6464

6565
### Inside Hopsworks
6666

67-
Beam is only supported as an external client.
67+
Beam is only supported as an external client.
6868

6969
### Outside Hopsworks
7070

0 commit comments

Comments
 (0)