Skip to content

Commit 6cf8170

Browse files
committed
[HWORKS-888] Improve documentation for gRPC support
1 parent edb6afd commit 6cf8170

File tree

8 files changed

+139
-19
lines changed

8 files changed

+139
-19
lines changed
Loading

docs/concepts/mlops/kserve.md

+28-17
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,30 @@
1-
KServe is an open-source framework for model serving on Kubernetes.
2-
In Hopsworks, you can easily deploy models from the model registry in KServe or in Docker containers (for Hopsworks Community). You can deploy models in either programs, using the HSML library, or in the UI. A KServe model deployment can include the following components:
3-
4-
- **Transformer**
5-
- A pre-processing and post-processing component that can transform model inputs before predictions are made
6-
- **Predictor**
7-
- A predictor is a ML model in a Python object that takes a feature vector as input and returns a prediction as output
8-
- **Inference Logger**
9-
- Hopsworks logs inputs and outputs of transformers and predictors to a Kafka topic that is part of the same project as the model
10-
- **Inference Batcher**
11-
- Inference requests can be batched in different ways to adjust the trade-off between throughput and latencies of the model predictions
12-
- **Versioned Deployments**
13-
- Model deployments are versioned, enabling A/B testing and more.
14-
- **Istio Model Endpoint**
15-
- You can publish model via a REST Endpoint using Istio and access it over HTTP using a Hopsworks API key (with serving scope). Secure and authorized access is guaranteed by Hopsworks.
16-
17-
Models deployed on KServe in Hopsworks can be easily integrated with the Hopsworks feature store using a Transformer Python script, that builds the predictor's input feature vector using the application input and pre-computed features from the feature store.
1+
In Hopsworks, you can easily deploy models from the model registry in KServe or in Docker containers (for Hopsworks Community). KServe is an open-source framework for model serving on Kubernetes. You can deploy models in either programs, using the HSML library, or in the UI. A KServe model deployment can include the following components:
2+
3+
**`Transformer`**
4+
5+
: A ^^pre-processing^^ and ^^post-processing^^ component that can transform model inputs before predictions are made, and predictions before these are delivered back to the client.
6+
7+
8+
**`Predictor`**
9+
10+
: A predictor is a ML model in a Python object that takes a feature vector as input and returns a prediction as output.
11+
12+
**`Inference Logger`**
13+
14+
: Hopsworks logs inputs and outputs of transformers and predictors to a ^^Kafka topic^^ that is part of the same project as the model.
15+
16+
**`Inference Batcher`**
17+
18+
: Inference requests can be batched in different ways to adjust the trade-off between throughput and latencies of the model predictions.
19+
20+
**`Istio Model Endpoint`**
21+
22+
: You can publish a model via a ^^REST or gRPC Endpoint^^ using Istio and access it over HTTP using a Hopsworks API key (with serving scope). Secure and authorized access is guaranteed by Hopsworks.
23+
24+
25+
Models deployed on KServe in Hopsworks can be easily integrated with the Hopsworks Feature Store using a Transformer Python script, that builds the predictor's input feature vector using the application input and pre-computed features from the Feature Store.
1826

1927
<img src="../../../assets/images/concepts/mlops/kserve.svg">
28+
29+
!!! info "Model Serving Guide"
30+
More information can be found in the [Model Serving guide](../../../user_guides/mlops/serving/index.md).

docs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ pointer-events: initial;
155155
<div class="rec_frame_main">
156156
<div class="text_and_icon">
157157
<div class="svg_icon w-embed"><img alt="svgImg" src="data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHg9IjBweCIgeT0iMHB4Igp3aWR0aD0iNjQiIGhlaWdodD0iNjQiCnZpZXdCb3g9IjAgMCA1MTIgNTEyIgpzdHlsZT0iIGZpbGw6I3VuZGVmaW5lZDsiPjxwYXRoIGQ9Ik01MyA0MTUuNjQ1YTMwLjAzNCAzMC4wMzQgMCAwMDMwIDMwSDQyOWEzMC4wMzQgMzAuMDM0IDAgMDAzMC0zMFYxNjcuNTEzSDUzek0zMDIuNjY3IDI2My41MzdhMTAgMTAgMCAxMTE0LjAwOC0xNC4yNzZsNDQuMTM3IDQzLjMxYTEwIDEwIDAgMDEwIDE0LjI3NmwtNDQuMTM3IDQzLjMwOWExMCAxMCAwIDExLTE0LjAwOC0xNC4yNzZsMzYuODYyLTM2LjE3MXptLTgwLjkgMTA5LjVsMzguMjU3LTE1MC41YTEwIDEwIDAgMTExOS4zODMgNC45MjdsLTM4LjI1NyAxNTAuNWExMCAxMCAwIDAxLTE5LjM4My00LjkyN3ptLTc3LjA3NC04MC40NjVsNDQuMTM3LTQzLjMxYTEwIDEwIDAgMTExNC4wMDggMTQuMjc2bC0zNi44NjIgMzYuMTcyIDM2Ljg2MiAzNi4xNzFhMTAgMTAgMCAwMS0xNC4wMDggMTQuMjc2bC00NC4xMzctNDMuMzA5YTEwIDEwIDAgMDEwLTE0LjI3NnpNNDI5IDY2LjM1NUg4M2EzMC4wMzQgMzAuMDM0IDAgMDAtMzAgMzB2NTEuMTU3SDQ1OVY5Ni4zNTVBMzAuMDM0IDMwLjAzNCAwIDAwNDI5IDY2LjM1NXoiPjwvcGF0aD48L3N2Zz4="></div>
158-
<div class="name_item_02"><a href="./concepts/projects/search/">Search</a>, <a href="./concepts/fs/feature_group/versioning/">Versioning</a>, <a href="./concepts/fs/feature_group/fg_statistics/">Statistics</a></div>
158+
<div class="name_item_02"><a href="./concepts/projects/search/">Search</a>, <a href="./concepts/fs/feature_group/versioning/">Versioning</a>, <a href="./concepts/fs/feature_group/fg_statistics/">Statistics</a>, <a href="./concepts/fs/feature_group/feature_monitoring/">Monitoring</a></div>
159159
</div>
160160
</div>
161161
<div class="rec_frame_main">
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# How to Select the API protocol for a Deployment
2+
3+
## Introduction
4+
5+
Hopsworks supports both REST and gRPC as API protocols for sending inference requests to model deployments. While REST API protocol is supported in all types of model deployments, support for gRPC is only available for models served with [KServe](predictor.md#serving-tool).
6+
7+
!!! warning
8+
At the moment, gRPC API protocol is only supported for **Python model deployments** (e.g., scikit-learn, xgboost, ...). Support for Tensorflow model deployments is coming soon.
9+
10+
## GUI
11+
12+
### Step 1: Create new deployment
13+
14+
If you have at least one model already trained and saved in the Model Registry, navigate to the deployments page by clicking on the `Deployments` tab on the navigation menu on the left.
15+
16+
<p align="center">
17+
<figure>
18+
<img src="../../../../assets/images/guides/mlops/serving/deployments_tab_sidebar.png" alt="Deployments navigation tab">
19+
<figcaption>Deployments navigation tab</figcaption>
20+
</figure>
21+
</p>
22+
23+
Once in the deployments page, click on `New deployment` if there are not existing deployments or on `Create new deployment` at the top-right corner to open the deployment creation form.
24+
25+
### Step 2: Go to advanced options
26+
27+
A simplified creation form will appear including the most common deployment fields among all the configuration possible. Resource allocation is part of the advanced options of a deployment. To navigate to the advanced creation form, click on `Advanced options`.
28+
29+
<p align="center">
30+
<figure>
31+
<img style="max-width: 85%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png" alt="Advance options">
32+
<figcaption>Advanced options. Go to advanced deployment creation form</figcaption>
33+
</figure>
34+
</p>
35+
36+
### Step 3: Select the API protocol
37+
38+
Enabling gRPC as the API protocol for a model deployment requires KServe as the serving platform for the deployment. Make sure that KServe is enabled by activating the corresponding checkbox.
39+
40+
<p align="center">
41+
<figure>
42+
<img src="../../../../assets/images/guides/mlops/serving/deployment_adv_form_kserve.png" alt="KServe enabled in advanced deployment form">
43+
<figcaption>Enable KServe in the advanced deployment form</figcaption>
44+
</figure>
45+
</p>
46+
47+
Then, you can select the API protocol to be enabled in your model deployment.
48+
49+
!!! note "Only one API protocol can be enabled simultaneously"
50+
Currently, KServe model deployments are limited to one API protocol at a time. Therefore, only one of REST or gRPC API protocols can be enabled at the same time on the same model deployment.
51+
52+
<p align="center">
53+
<figure>
54+
<img src="../../../../assets/images/guides/mlops/serving/deployment_grpc_select.png" alt="Select gRPC API protocol">
55+
<figcaption>Select gRPC API protocol</figcaption>
56+
</figure>
57+
</p>
58+
59+
Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model.
60+
61+
## Code
62+
63+
### Step 1: Connect to Hopsworks
64+
65+
```python
66+
import hopsworks
67+
68+
project = hopsworks.login()
69+
70+
# get Hopsworks Model Registry handle
71+
mr = project.get_model_registry()
72+
73+
# get Hopsworks Model Serving handle
74+
ms = project.get_model_serving()
75+
```
76+
77+
### Step 2: Create a deployment with a specific API protocol
78+
79+
```python
80+
81+
my_model = mr.get_model("my_model", version=1)
82+
83+
my_predictor = ms.create_predictor(my_model,
84+
api_protocol="GRPC" # defaults to "REST"
85+
)
86+
my_predictor.deploy()
87+
88+
# or
89+
90+
my_deployment = ms.create_deployment(my_predictor)
91+
my_deployment.save()
92+
```
93+
94+
### API Reference
95+
96+
[API Protocol](https://docs.hopsworks.ai/machine-learning-api/{{{ hopsworks_version }}}/generated/api/api-protocol/)

docs/user_guides/mlops/serving/deployment.md

+5
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ You will be redirected to a full-page deployment creation form where you can see
7272
3. [Inference logger](#inference-logger)
7373
4. [Inference batcher](#inference-batcher)
7474
5. [Resources](#resources)
75+
6. [API protocol](#api-protocol)
7576

7677
Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model.
7778

@@ -192,6 +193,10 @@ Inference batcher are deployment component that apply batching to the incoming i
192193

193194
Resources include the number of replicas for the deployment as well as the resources (i.e., memory, CPU, GPU) to be allocated per replica. To learn about the different combinations available, see the [Resources Guide](resources.md).
194195

196+
## API protocol
197+
198+
Hopsworks supports both REST and gRPC as the API protocols to send inference requests to model deployments. To learn about REST and gRPC API protocols for model deployments, see the [API Protocol Guide](api-protocol.md).
199+
195200
## Conclusion
196201

197202
In this guide you learned how to create a deployment.

docs/user_guides/mlops/serving/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Deployment
44

5-
Assuming you have already created a model in the Model Registry, a deployment can now be created to prepare a model artifact for this model and make it accessible for running predictions behind a REST endpoint. Follow the [Deployment Creation Guide](deployment.md) to create a Deployment for your model.
5+
Assuming you have already created a model in the Model Registry, a deployment can now be created to prepare a model artifact for this model and make it accessible for running predictions behind a REST or gRPC endpoint. Follow the [Deployment Creation Guide](deployment.md) to create a Deployment for your model.
66

77
### Predictor
88

docs/user_guides/mlops/serving/predictor.md

+6
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Predictors are the main component of deployments. They are responsible for runni
1717
5. [Inference Logger](#inference-logger)
1818
6. [Inference Batcher](#inference-batcher)
1919
7. [Resources](#resources)
20+
8. [API protocol](#api-protocol)
2021

2122
## GUI
2223

@@ -87,6 +88,7 @@ Additionally, you can adjust the default values of the rest of components:
8788
2. [Inference logger](#inference-logger)
8889
3. [Inference batcher](#inference-batcher)
8990
4. [Resources](#resources)
91+
5. [API protocol](#api-protocol)
9092

9193
Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model.
9294

@@ -246,6 +248,10 @@ Inference batcher are deployment component that apply batching to the incoming i
246248

247249
Resources include the number of replicas for the deployment as well as the resources (i.e., memory, CPU, GPU) to be allocated per replica. To learn about the different combinations available, see the [Resources Guide](resources.md).
248250

251+
## API protocol
252+
253+
Hopsworks supports both REST and gRPC as the API protocols to send inference requests to model deployments. To learn about REST and gRPC API protocols for model deployments, see the [API Protocol Guide](api-protocol.md).
254+
249255
## Conclusion
250256

251257
In this guide you learned how to configure a predictor.

mkdocs.yml

+2
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@ nav:
192192
- Resource Allocation: user_guides/mlops/serving/resources.md
193193
- Inference Logger: user_guides/mlops/serving/inference-logger.md
194194
- Inference Batcher: user_guides/mlops/serving/inference-batcher.md
195+
- API Protocol: user_guides/mlops/serving/api-protocol.md
195196
- Troubleshooting: user_guides/mlops/serving/troubleshooting.md
196197
- Vector Database: user_guides/mlops/vector_database/index.md
197198
- Migration:
@@ -356,6 +357,7 @@ markdown_extensions:
356357
- pymdownx.critic
357358
- attr_list
358359
- md_in_html
360+
- def_list
359361
- toc:
360362
permalink: "#"
361363
- pymdownx.tasklist:

0 commit comments

Comments
 (0)