Skip to content

Latest commit



155 lines (138 loc) · 5.65 KB

File metadata and controls

155 lines (138 loc) · 5.65 KB

Deploy model with customer container for real-time inference

The normal workflow to deploy your model is,

  1. Register the model to AML workspace.
  2. Prepare an entry script (
  3. Specify an AML inference curated environment as the base Docker image.
  4. Deploy the model to the compute.

The normal workflow requires to register the model and prepare an entry script in the cloud. If user has concern on that, we also support a BYOC mode that user can build the model and entry script into the custom docker image. In this way, the model or the entry script will not be saved in the cloud. Then using AML CLI v2 to deploy model for real-time inference.

Prepare the docker image

Create the Docker file as follows and build your own image.

# Specify a base image from AML inference curated environment

USER root
RUN mkdir -p $HOME/.cache
RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
# copy the conda dependencies file to the container target path
COPY <conda_dependencies.yml FILE PATH> azureml-environment-setup/conda_dependencies.yml

# copy the file to the container target path. If you don't need the built into the docker image, comment out the next line.
COPY < FILE PATH> /var/azureml-app/script/

# copy the model folder/file to the container target path.  If you don't need the model built into the docker image, comment out the next line.
COPY <MODEL FOLDER OR FILE PATH> /var/azureml-app/azureml-models/<MODEL FOLDER/FILE>

RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml -f azureml-environment-setup/conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
ENV PATH /azureml-envs/azureml/bin:$PATH
ENV LD_LIBRARY_PATH /azureml-envs/azureml/lib:$LD_LIBRARY_PATH
CMD ["runsvdir","/var/runit"]

Prepare entry script and conda dependency yaml

Sample script:

import json
import numpy as np
import os
import pickle
import joblib

def init():
    global model
    # MODEL_FILE_PATH is an environment variable specified in online-deployment yaml.
    # load the model from the docker container
    model = joblib.load(os.getenv('MODEL_FILE_PATH'))

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    # you can return any data type as long as it is JSON-serializable
    return y_hat.tolist()

Sample conda dependency yaml. Add the necessary dependencies here to run the entry script.

- anaconda
- conda-forge
- python=3.6.2
- pip:
  - azureml-defaults~=
  - scikit-learn==0.22.1
name: azureml

Authenticate to private container registry

The username and password of the container registry can't be passed from the cloud. If you are building the docker image to your custom container registry. Please follow the Kubernetes instruction using a private registry to provide the credentials, so your cluster can pull the docker image.

Create the online-endpoint

Create online endpoint yaml,

name: <endpoint name>
compute: azureml:<compute target>
auth_mode: key

Create the endpoint by running,

az ml online-endpoint create -f endpoint.yml --sub ${subscription} -g ${resource_group} -w ${workspace}

Create the deployment

Create a deployment yaml file, and you need to update the image accordingly, specify the environment variables to use custom container, and keep the SAME inference_config as the example below.

  • AML_APP_ROOT : the entry script folder at the docker container.
  • AZUREML_ENTRY_SCRIPT: the entry script at the docker container.
  • MODEL_FILE_PATH: the model path at the docker container.
name: <deployment name>
type: kubernetes
  AML_APP_ROOT: /var/azureml-app/script
  MODEL_FILE_PATH: /var/azureml-app/azureml-models/<model file/folder>
  name: <custom environment name>
  version: 1
  image: <docker image>
  #Please keep the SAME inference_config as below.
      port: 5001
      path: /score
      port: 5001
      path: /
      port: 5001
      path: /
  request_timeout_ms: 1000
  max_concurrent_requests_per_instance: 1
  max_queue_wait_ms: 1000
    cpu: "0.1"
    memory: "0.1Gi"
    cpu: "0.2"
    memory: "200Mi"
  initial_delay: 5
  period: 5
  timeout: 10
  success_threshold: 1
  failure_threshold: 1
  initial_delay: 5
  period: 5
  timeout: 10
  success_threshold: 1
  failure_threshold: 1
instance_count: 1
  type: default

Create the online deployment

Create online deployment with all traffic,

az ml online-deployment create -f deployment.yaml -n blue -g --sub ${subscription} -g ${resource_group} -w ${workspace} --all-traffic

Invoke the endpoint to test,

az ml online-endpoint invoke -r ${request json} -n ${endpoint_name} --sub ${subscription} -g ${resource_group} -w ${workspace}