View on GitHub |
You are here:
vertex-ai-mlops/MLOps/Pipelines/readme.md
The workflow of ML code does many steps in sequence. Some of the steps involve conditional logic like deploying the new model only when it is more accurate than the currently deployed model. This is a pipeline. Pipelines are essential for turning ML processes into MLOps. MLOps goes the next mile with automation, monitoring, and governing the workflow.
There are frameworks for specifying these steps like Kubeflow Pipelines (KFP) and TensorFlow Extended (TFX). Vertex AI Pipelines is a managed service that can execute both of these.
- The history of Kubeflow is creating a simplified way for running TensorFlow Extended jobs on Kubernetes.
TL;DR
This is a series of notebook based workflows that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:
Link To Section | Notebook Workflow | Description |
---|---|---|
Link To Section | Vertex AI Pipelines - Start Here | What are pipelines? Start here to go from code to pipeline and see it in action. |
Link To Section | Vertex AI Pipelines - Introduction | Introduction to pipelines with the console and Vertex AI SDK |
Link To Section | Vertex AI Pipelines - Components | An introduction to all the ways to create pipeline components from your code |
Link To Section | Vertex AI Pipelines - IO | An overview of all the type of inputs and outputs for pipeline components |
Link To Section | Vertex AI Pipelines - Control | An overview of controlling the flow of exectution for pipelines |
Link To Section | Vertex AI Pipelines - Secret Manager | How to pass sensitive information to pipelines and components |
Link To Section | Vertex AI Pipelines - GCS Read and Write | How to read/write to GCS from components, including container components. |
Link To Section | Vertex AI Pipelines - Scheduling | How to schedule pipeline execution |
Link To Section | Vertex AI Pipelines - Notifications | How to send email notification of pipeline status. |
Link To Section | Vertex AI Pipelines - Management | Managing, Reusing, and Storing pipelines and components |
Link To Section | Vertex AI Pipelines - Testing | Strategies for testing components and pipeliens locally and remotely to aide development. |
Link To Section | Vertex AI Pipelines - Managing Pipeline Jobs | Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs. |
To discover these notebooks as part of an introduction to MLOps read on below!
What are pipelines?
- They help you automate, manage, and scale your ML workflows
- They offer reproducibility, collaboration, and efficiency
Before getting into the details let's go from code to pipeline and see this in action!
Notebook Workflow: In this quick start, we'll take a simple code example and run it both in a notebook and as a pipeline on Vertex AI Pipelines. This will likely spark many questions, and that's great! The rest of this series will dive deeper into each aspect of pipelines, providing comprehensive answers by example.
|
Pipelines are constructed of:
- Create Components From Code
- Construct Pipelines Where steps, or Tasks, are made from components
- Run Pipelines on Vertex AI Pipelines
- Review pipelines runs and tasks results
- Review task Execution: Each task runs as a Vertex AI Training Custom Job
An overview:
Notebook Workflow: Get a quick start with pipelines by reviewing this workflow for an example using both the Vertex AI Console and SDK.
|
The steps of the workflow, an ML task, are run with components. Getting logic and code into components can consists of using prebuilt components or constructing custom components:
- KFP
- Pre-Built:
- Custom:
- Lightweight Python Components - create a component from a Python function
- Containerized Python Components - for complex dependencies
- Container Component - a component from a container
- TFX
- Pre-Built:
- Custom:
- Python function-based components - create a component from a Python function
- Container-based components - a component from a contaienr
- Fully custom components - reuse and extend standard components.
Notebook Workflow: For an overview of components from custom to pre-built, check out this notebook:
|
Compute Resources For Components:
Running pipleines on Vertex AI Pipelines runs each component as a Vertex AI Training CustomJob
. This defaults to a vm based on e2-standard-4
(4 core CPU, 16GB memory). This can be modified at the task level of pipelines to choose different computing resources, including GPUs. For example:
@kfp.dsl.pipeline()
def pipeline():
task = component().set_cpu_limit(C).set_memory_limit(M).add_node_selector_constraint(A).set_accelerator_limit(G).
Where the inputs are defining machine configuration for the step:
- C = a string representing the number of CPUs (up to 96).
- M = a string represent the memory limit. An integer follwed by K, M, or G (up to 624GB).
- A = a string representing the desired GPU or TPU type
- G = an integer representing the multiple of A desired.
Getting information into code and results out is the IO part of components. These inputs and outputs are particularly important in MLOps as they are the artifacts that define an ML system: datasets, models, metrics, and more. Pipelines tools like TFX and KFP go a step further and automatically track the inputs and outpus and even provide lineage information for them. Component inputs and outputs can take two forms: parameters and artifacts.
Parameters are Python objects like str
, int
, float
, bool
, list
, dict
objects that are defined as inputs to pipelines and components. Components can also return parameters for input into subsequent components. Paramters are excellent for changing the behavior of a pipeline/component through inputs rather than rewriting code.
Artifacts are multi-parameter objects that represent machine learning artifacts and have defined schemas and are stored as metadata with lineage. The artifact schemas follow the ML Metadata (MLMD) client library. This helps with understanding and analyzing a pipeline.
Notebook Workflow: See all the types of parameters and artifacts in action with the following notebook based workflow:
|
Secure Parameters: Passing credentials for an API or service can expose them. If these credentials are hardcoded then they can be discovered from the source code and are harder to update. A great solution is using Secret Manager to host credentials and then pass the name of the credential as a parameter. The only modification needed to a component is to use a Python client to retrieve the credentials at run time.
Notebook Workflow: Check out how easy secret manager isis to implement with the following notebook based example workflow:
|
GCS Read/Write: Methods for reading and writing data in GCS within a component. Components run as Vertex AI Training jobs which include GCS as a Fuse mount. That means components can utlizes GCS at the /gcs
mount during runs. This include container components and the notebook workflow below even shows how to pass code directly to a container for execution.
Notebook Workflow: Use the |
As the task of an ML pipeline run they form a graph. The outputs of upstream components become the inputs of downstram components. Both TFX and KFP automatically use these connection to create a DAG of execution. When logic needs to be specified in the pipeline flow of execution the use of control structures is necessary.
Notebook Workflow: The following notebook shows many examples of implement controls in KFP while running on Vertex AI Pipelines:
|
Pipelines can be run on a schedule directly in Vertex AI without the need to setup a scheduler and trigger (like PubSub).
Notebook Workflow: Here is an example of a pipeline run followed by a schedule that repeats the pipeline at a specified interval the number of iterations set as the maximum on the schedule:
|
This can have many helpful applications, including:
- Running Batch predictions, evaluations, monitoring each day or week
- Retraining a model, do evaluations, and comparing the new model to the currently deployed model then conditionally updating the deployed model
- Check for new training records and commence with retraining if conditions are met - like records that increase a class by 10%, atleast 1000 new records, ....
As the number of pipelines grow and the use of schedulinng and triggering is implemented it becomes necessary to know which pipelines need to be reviewed. Getting notificaitons about the completion of pipeliens is a good first step. Then, being able to control notificaitons to only be sent on failure or particular failures becomes important.
Notebook Workflow: This notebook workflow covers pre-built components for email notification and building a custom notification system for send emails (or tasks) conditional on the pipelines status.
|
As seen above, pipelines are made up of steps which are executions of components. These components are made up of code, container, and instructions (inputs and outputs).
Components:
For each type of component, kfp
compiles the component into YAML as part of the pipeline. You can also directly compile individual components. This makes the YAML for a component a source that can be managed. And using this in additional pipelines is made possible with kfp.components.load_component_from_*()
which has version for files, urls, text (strings).
Pipelines:
Pipelines are compiled into YAML files that include component specifications. Managine these pipelines files as artifacts is made easy with the combination of:
- Kubeflow Pipelines SDK and the included
kfp.registry.RegistryClient
- Google Cloud Artifact Registry with native format for Kubeflow pipeline templates
- Integration with Vertex AI for creating, uploading and using pipeline templates
Notebook Workflow: Work directly with these concepts in the following notebook based workflow: |
When creating pipeline components and pipelines the process of testing can be aided by local testing and several strategies for remote (On Vertex AI Pipelines) testing. This section covers local and remote stratedgies to aide development processes.
Notebook Workflow: Work directly with these concepts in the following notebook based workflow: |
Vertex AI Pipeline Jobs are runs of a pipeline. These can be directly run by a user, started by API, or scheduled. Withing Vetex AI a project can have many jobs running at any time and a history of all past jobs. This workflow shows how to review and manage the jobs in an enviornment using the Python SDK. For custom metrics in Cloud Logging check out this helpful page.
Notebook Workflow: Work directly with these concepts in the following notebook based workflow: |
A series of notebook based workflows that show how to put all the concepts from the material above into common workflows:
- Vertex AI Pipelines - Pattern - Modular and Reusable
- Example 1: Store a pipeline in artifact registry and directly run it on Vertex AI Pipelines without a local download.
- Example 2: Store and retrieve components for reusability: as files (at url, file directory, or text string) and as artifact in artifact registry
- Example 3: Store pipelines in artifact registry and retrieve (download, and import) to use as components in new pipelines
- Run R on Vertex AI Pipelines
- Use a prebuilt container to easily run an R script with inputs for the required libraries and command line arguments