Skip to content

Commit ac58776

Browse files
committed
[HWORKS-1885] Improve vLLM-related docs
1 parent 7ba4754 commit ac58776

File tree

6 files changed

+186
-55
lines changed

6 files changed

+186
-55
lines changed
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# How To Export a Torch Model
2+
3+
## Introduction
4+
5+
In this guide you will learn how to export a Torch model and register it in the Model Registry.
6+
7+
8+
## Code
9+
10+
### Step 1: Connect to Hopsworks
11+
12+
=== "Python"
13+
```python
14+
import hopsworks
15+
16+
project = hopsworks.login()
17+
18+
# get Hopsworks Model Registry handle
19+
mr = project.get_model_registry()
20+
```
21+
22+
### Step 2: Train
23+
24+
Define your Torch model and run the training loop.
25+
26+
=== "Python"
27+
```python
28+
# Define the model architecture
29+
class Net(nn.Module):
30+
def __init__(self):
31+
super().__init__()
32+
self.conv1 = nn.Conv2d(3, 6, 5)
33+
...
34+
35+
def forward(self, x):
36+
x = self.pool(F.relu(self.conv1(x)))
37+
...
38+
return x
39+
40+
# Instantiate the model
41+
net = Net()
42+
43+
# Run the training loop
44+
for epoch in range(n):
45+
...
46+
```
47+
48+
### Step 3: Export to local path
49+
50+
Export the Torch model to a directory on the local filesystem.
51+
52+
=== "Python"
53+
```python
54+
model_dir = "./model"
55+
56+
torch.save(net.state_dict(), model_dir)
57+
```
58+
59+
### Step 4: Register model in registry
60+
61+
Use the `ModelRegistry.torch.create_model(..)` function to register a model as a Torch model. Define a name, and attach optional metrics for your model, then invoke the `save()` function with the parameter being the path to the local directory where the model was exported to.
62+
63+
=== "Python"
64+
```python
65+
# Model evaluation metrics
66+
metrics = {'accuracy': 0.92}
67+
68+
tch_model = mr.torch.create_model("tch_model", metrics=metrics)
69+
70+
tch_model.save(model_dir)
71+
```
72+
73+
## Going Further
74+
75+
You can attach an [Input Example](../input_example.md) and a [Model Schema](../model_schema.md) to your model to document the shape and type of the data the model was trained on.

docs/user_guides/mlops/registry/index.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,13 @@ Follow these framework-specific guides to export a Model to the Model Registry.
1111

1212
* [TensorFlow](frameworks/tf.md)
1313

14+
* [Torch](frameworks/tch.md)
15+
1416
* [Scikit-learn](frameworks/skl.md)
1517

1618
* [LLM](frameworks/llm.md)
1719

18-
* [Other frameworks](frameworks/python.md)
20+
* [Other Python frameworks](frameworks/python.md)
1921

2022

2123
## Model Schema

docs/user_guides/mlops/serving/deployment.md

+14-9
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
In this guide, you will learn how to create a new deployment for a trained model.
66

77
!!! warning
8-
This guide assumes that a model has already been trained and saved into the Model Registry. To learn how to create a model in the Model Registry, see [Model Registry Guide](../registry/frameworks/tf.md)
8+
This guide assumes that a model has already been trained and saved into the Model Registry. To learn how to create a model in the Model Registry, see [Model Registry Guide](../registry/index.md#exporting-a-model)
99

1010
Deployments are used to unify the different components involved in making one or more trained models online and accessible to compute predictions on demand. For each deployment, there are four concepts to consider:
1111

@@ -41,8 +41,8 @@ After selecting the model, the rest of fields are filled automatically. We pick
4141
!!! notice "Deployment name validation rules"
4242
A valid deployment name can only contain characters a-z, A-Z and 0-9.
4343

44-
!!! info "Predictor script for Python models and LLMs"
45-
For Python models and LLMs, you must select a custom [predictor script](#predictor) that loads and runs the trained model by clicking on `From project` or `Upload new file`, to choose an existing script in the project file system or upload a new script, respectively.
44+
!!! info "Predictor script for Python models"
45+
For Python models, you must select a custom [predictor script](#predictor) that loads and runs the trained model by clicking on `From project` or `Upload new file`, to choose an existing script in the project file system or upload a new script, respectively.
4646

4747
If you prefer, change the name of the deployment, model version or [artifact version](#model-artifact). Then, click on `Create new deployment` to create the deployment for your model.
4848

@@ -76,10 +76,10 @@ You will be redirected to a full-page deployment creation form where you can see
7676
!!! info "Deployment advanced options"
7777
1. [Predictor](#predictor)
7878
2. [Transformer](#transformer)
79-
3. [Inference logger](#inference-logger)
80-
4. [Inference batcher](#inference-batcher)
81-
5. [Resources](#resources)
82-
6. [API protocol](#api-protocol)
79+
3. [Inference logger](predictor.md#inference-logger)
80+
4. [Inference batcher](predictor.md#inference-batcher)
81+
5. [Resources](predictor.md#resources)
82+
6. [API protocol](predictor.md#api-protocol)
8383

8484
Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model.
8585

@@ -174,7 +174,12 @@ Inside a model deployment, the local path to the model files is stored in the `M
174174

175175
## Artifact Files
176176

177-
Artifact files are files involved in the correct startup and running of the model deployment. The most important files are the **predictor** and **transformer scripts**. The former is used to load and run the model for making predictions. The latter is typically used to transform model inputs at inference time.
177+
Artifact files are files involved in the correct startup and running of the model deployment. The most important files are the **predictor** and **transformer scripts**. The former is used to load and run the model for making predictions. The latter is typically used to apply transformations on the model inputs at inference time before making predictions. Predictor and transformer scripts run on separate components and, therefore, scale independently of each other.
178+
179+
!!! tip
180+
Whenever you provide a predictor script, you can include the transformations of model inputs in the same script as far as they don't need to be scaled independently from the model inference process.
181+
182+
Additionally, artifact files can also contain a **server configuration file** that helps detach configuration used within the model deployment from the model server or the implementation of the predictor and transformer scripts. Inside a model deployment, the local path to the configuration file is stored in the `CONFIG_FILE_PATH` environment variable (see [environment variables](../serving/predictor.md#environment-variables)).
178183

179184
Every model deployment runs a specific version of the artifact files, commonly referred to as artifact version. ==One or more model deployments can use the same artifact version== (i.e., same predictor and transformer scripts). Artifact versions are unique for the same model version.
180185

@@ -189,7 +194,7 @@ Inside a model deployment, the local path to the artifact files is stored in the
189194
All files under `/Models` are managed by Hopsworks. Changes to artifact files cannot be reverted and can have an impact on existing model deployments.
190195

191196
!!! tip "Additional files"
192-
Currently, the artifact files only include predictor and transformer scripts. Support for additional files (e.g., configuration files or other resources) is coming soon.
197+
Currently, the artifact files can only include predictor and transformer scripts, and a configuration file. Support for additional files (e.g., other resources) is coming soon.
193198

194199
## Predictor
195200

docs/user_guides/mlops/serving/predictor.md

+93-45
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@ Predictors are the main component of deployments. They are responsible for runni
1313
1. [Model server](#model-server)
1414
2. [Serving tool](#serving-tool)
1515
3. [User-provided script](#user-provided-script)
16-
4. [Python environments](#python-environments)
17-
5. [Transformer](#transformer)
18-
6. [Inference Logger](#inference-logger)
19-
7. [Inference Batcher](#inference-batcher)
20-
8. [Resources](#resources)
21-
9. [API protocol](#api-protocol)
16+
4. [Server configuration file](#server-configuration-file)
17+
5. [Python environments](#python-environments)
18+
6. [Transformer](#transformer)
19+
7. [Inference Logger](#inference-logger)
20+
8. [Inference Batcher](#inference-batcher)
21+
9. [Resources](#resources)
22+
10. [API protocol](#api-protocol)
2223

2324
## GUI
2425

@@ -85,7 +86,22 @@ To create your own it is recommended to [clone](../../projects/python/python_env
8586
</figure>
8687
</p>
8788

88-
### Step 5 (Optional): Enable KServe
89+
90+
### Step 5 (Optional): Select a configuration file
91+
92+
!!! note
93+
Only available for LLM deployments.
94+
95+
You can select a configuration file to be added to the [artifact files](deployment.md#artifact-files). If a predictor script is provided, this configuration file will be available inside the model deployment at the local path stored in the `CONFIG_FILE_PATH` environment variable. If a predictor script is **not** provided, this configuration file will be directly passed to the vLLM server. You can find all configuration parameters supported by the vLLM server in the [vLLM documentation](https://docs.vllm.ai/en/v0.6.4/serving/openai_compatible_server.html).
96+
97+
<p align="center">
98+
<figure>
99+
<img style="max-width: 78%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_simple_form_vllm_conf_file.png" alt="Server configuration file in the simplified deployment form">
100+
<figcaption>Select a configuration file in the simplified deployment form</figcaption>
101+
</figure>
102+
</p>
103+
104+
### Step 6 (Optional): Enable KServe
89105

90106
Other configuration such as the serving tool, is part of the advanced options of a deployment. To navigate to the advanced creation form, click on `Advanced options`.
91107

@@ -105,7 +121,7 @@ Here, you change the [serving tool](#serving-tool) for your deployment by enabli
105121
</figure>
106122
</p>
107123

108-
### Step 6 (Optional): Other advanced options
124+
### Step 7 (Optional): Other advanced options
109125

110126
Additionally, you can adjust the default values of the rest of components:
111127

@@ -143,50 +159,71 @@ Once you are done with the changes, click on `Create new deployment` at the bott
143159

144160
def __init__(self):
145161
""" Initialization code goes here"""
146-
pass
162+
# Model files can be found at os.environ["MODEL_FILES_PATH"]
163+
# self.model = ... # load your model
147164

148165
def predict(self, inputs):
149166
""" Serve predictions using the trained model"""
150-
pass
167+
# Use the model to make predictions
168+
# return self.model.predict(inputs)
151169
```
152-
=== "Generate (vLLM deployments only)"
170+
=== "Predictor (vLLM deployments only)"
153171
``` python
154-
from typing import Iterable, AsyncIterator, Union
155-
156-
from vllm import LLM
157-
172+
import os
173+
from vllm import __version__, AsyncEngineArgs, AsyncLLMEngine
174+
from typing import Iterable, AsyncIterator, Union, Optional
158175
from kserve.protocol.rest.openai import (
159176
CompletionRequest,
160177
ChatPrompt,
161178
ChatCompletionRequestMessage,
162179
)
163180
from kserve.protocol.rest.openai.types import Completion
181+
from kserve.protocol.rest.openai.types.openapi import ChatCompletionTool
182+
164183

165184
class Predictor():
166185

167186
def __init__(self):
168187
""" Initialization code goes here"""
169-
# initialize vLLM backend
170-
self.llm = LLM(os.environ["MODEL_FILES_PATH])
171-
172-
# initialize tokenizer if needed
173-
# self.tokenizer = ...
174-
175-
def apply_chat_template(
176-
self,
177-
messages: Iterable[ChatCompletionRequestMessage,],
178-
) -> ChatPrompt:
179-
pass
180-
181-
async def create_completion(
182-
self, request: CompletionRequest
183-
) -> Union[Completion, AsyncIterator[Completion]]:
184-
"""Generate responses using the LLM"""
185-
186-
# Completion: used for returning a single answer (batch)
187-
# AsyncIterator[Completion]: used for returning a stream of answers
188-
189-
pass
188+
189+
# (optional) if any, access the configuration file via os.environ["CONFIG_FILE_PATH"]
190+
config = ...
191+
192+
print("Starting vLLM backend...")
193+
engine_args = AsyncEngineArgs(
194+
model=os.environ["MODEL_FILES_PATH"],
195+
**config
196+
)
197+
198+
# "self.vllm_engine" is required as the local variable with the vllm engine handler
199+
self.vllm_engine = AsyncLLMEngine.from_engine_args(engine_args)
200+
201+
#
202+
# NOTE: Default implementations of the apply_chat_template and create_completion methods are already provided.
203+
# If needed, you can override these methods as shown below
204+
#
205+
206+
#def apply_chat_template(
207+
# self,
208+
# messages: Iterable[ChatCompletionRequestMessage],
209+
# chat_template: Optional[str] = None,
210+
# tools: Optional[list[ChatCompletionTool]] = None,
211+
#) -> ChatPrompt:
212+
# """Converts a prompt or list of messages into a single templated prompt string"""
213+
214+
# prompt = ... # apply chat template on the message to build the prompt
215+
# return ChatPrompt(prompt=prompt)
216+
217+
#async def create_completion(
218+
# self, request: CompletionRequest
219+
#) -> Union[Completion, AsyncIterator[Completion]]:
220+
# """Generate responses using the vLLM engine"""
221+
#
222+
# generators = self.vllm_engine.generate(...)
223+
#
224+
# # Completion: used for returning a single answer (batch)
225+
# # AsyncIterator[Completion]: used for returning a stream of answers
226+
# return ...
190227
```
191228

192229
!!! info "Jupyter magic"
@@ -242,7 +279,7 @@ Hopsworks Model Serving supports deploying models with a Flask server for python
242279
| Flask | ✅ | python-based (scikit-learn, xgboost, pytorch...) |
243280
| TensorFlow Serving | ✅ | keras, tensorflow |
244281
| TorchServe | ❌ | pytorch |
245-
| vLLM | ✅ | vLLM-supported models (see [list](https://docs.vllm.ai/en/latest/models/supported_models.html)) |
282+
| vLLM | ✅ | vLLM-supported models (see [list](https://docs.vllm.ai/en/v0.6.4/models/supported_models.html)) |
246283

247284
## Serving tool
248285

@@ -279,7 +316,17 @@ The predictor script needs to implement a given template depending on the model
279316
| | TensorFlow Serving | ❌ |
280317
| KServe | Fast API | ✅ (only required for artifacts with multiple models) |
281318
| | TensorFlow Serving | ❌ |
282-
| | vLLM | ✅ (required) |
319+
| | vLLM | ✅ (optional) |
320+
321+
### Server configuration file
322+
323+
Depending on the model server, a **server configuration file** can be selected to help detach configuration used within the model deployment from the model server or the implementation of the predictor and transformer scripts. In other words, by modifying the configuration file of an existing model deployment you can adjust its settings without making changes to the predictor or transformer scripts. Inside a model deployment, the local path to the configuration file is stored in the `CONFIG_FILE_PATH` environment variable (see [environment variables](#environment-variables)).
324+
325+
!!! warning "Configuration file format"
326+
The configuration file can be of any format, except in vLLM deployments **without a predictor script** for which a YAML file is ==required==.
327+
328+
!!! note "Passing arguments to vLLM via configuration file"
329+
For vLLM deployments **without a predictor script**, the server configuration file is ==required== and it is used to configure the vLLM server. For example, you can use this configuration file to specify the chat template or LoRA modules to be loaded by the vLLM server. See all available parameters in the [official documentation](https://docs.vllm.ai/en/v0.6.4/serving/openai_compatible_server.html#command-line-arguments-for-the-server).
283330

284331
### Environment variables
285332

@@ -291,6 +338,7 @@ A number of different environment variables is available in the predictor to eas
291338
| ------------------- | -------------------------------------------------------------------- |
292339
| MODEL_FILES_PATH | Local path to the model files |
293340
| ARTIFACT_FILES_PATH | Local path to the artifact files |
341+
| CONFIG_FILE_PATH | Local path to the configuration file |
294342
| DEPLOYMENT_NAME | Name of the current deployment |
295343
| MODEL_NAME | Name of the model being served by the current deployment |
296344
| MODEL_VERSION | Version of the model being served by the current deployment |
@@ -302,13 +350,13 @@ Depending on the model server and serving tool used in the model deployment, you
302350

303351
??? info "Show supported Python environments"
304352

305-
| Serving tool | Model server | Editable | Predictor | Transformer |
306-
| ------------ | ------------------ | -------- | ----------------------------------- | ------------------------------ |
307-
| Kubernetes | Flask server | ❌ | `pandas-inference-pipeline` only | ❌ |
308-
| | TensorFlow Serving | ❌ | (official) tensorflow serving image | ❌ |
309-
| KServe | Fast API | ✅ | any `inference-pipeline` image | any `inference-pipeline` image |
310-
| | TensorFlow Serving | ✅ | (official) tensorflow serving image | any `inference-pipeline` image |
311-
| | vLLM | ✅ | `vllm-inference-pipeline` only | any `inference-pipeline` image |
353+
| Serving tool | Model server | Editable | Predictor | Transformer |
354+
| ------------ | ------------------ | -------- | ------------------------------------------ | ------------------------------ |
355+
| Kubernetes | Flask server | ❌ | `pandas-inference-pipeline` only | ❌ |
356+
| | TensorFlow Serving | ❌ | (official) tensorflow serving image | ❌ |
357+
| KServe | Fast API | ✅ | any `inference-pipeline` image | any `inference-pipeline` image |
358+
| | TensorFlow Serving | ✅ | (official) tensorflow serving image | any `inference-pipeline` image |
359+
| | vLLM | ✅ | `vllm-inference-pipeline` or `vllm-openai` | any `inference-pipeline` image |
312360

313361
!!! note
314362
The selected Python environment is used for both predictor and transformer. Support for selecting a different Python environment for the predictor and transformer is coming soon.

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@ nav:
182182
- user_guides/mlops/registry/index.md
183183
- Frameworks:
184184
- TensorFlow: user_guides/mlops/registry/frameworks/tf.md
185+
- Torch: user_guides/mlops/registry/frameworks/tch.md
185186
- Scikit-learn: user_guides/mlops/registry/frameworks/skl.md
186187
- LLM: user_guides/mlops/registry/frameworks/llm.md
187188
- Python: user_guides/mlops/registry/frameworks/python.md

0 commit comments

Comments
 (0)