Skip to content

Latest commit

 

History

History
370 lines (265 loc) · 22.8 KB

readme.md

File metadata and controls

370 lines (265 loc) · 22.8 KB

tracker

GitHub logo
View on
GitHub





Orchestration With Pipelines

You are here: vertex-ai-mlops/MLOps/Pipelines/readme.md

The workflow of ML code does many steps in sequence. Some of the steps involve conditional logic like deploying the new model only when it is more accurate than the currently deployed model. This is a pipeline. Pipelines are essential for turning ML processes into MLOps. MLOps goes the next mile with automation, monitoring, and governing the workflow.

There are frameworks for specifying these steps like Kubeflow Pipelines (KFP) and TensorFlow Extended (TFX). Vertex AI Pipelines is a managed service that can execute both of these.

  • The history of Kubeflow is creating a simplified way for running TensorFlow Extended jobs on Kubernetes.

TL;DR

This is a series of notebook based workflows that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

Link To Section Notebook Workflow Description
Link To Section Vertex AI Pipelines - Start Here What are pipelines? Start here to go from code to pipeline and see it in action.
Link To Section Vertex AI Pipelines - Introduction Introduction to pipelines with the console and Vertex AI SDK
Link To Section Vertex AI Pipelines - Components An introduction to all the ways to create pipeline components from your code
Link To Section Vertex AI Pipelines - IO An overview of all the type of inputs and outputs for pipeline components
Link To Section Vertex AI Pipelines - Control An overview of controlling the flow of exectution for pipelines
Link To Section Vertex AI Pipelines - Secret Manager How to pass sensitive information to pipelines and components
Link To Section Vertex AI Pipelines - GCS Read and Write How to read/write to GCS from components, including container components.
Link To Section Vertex AI Pipelines - Scheduling How to schedule pipeline execution
Link To Section Vertex AI Pipelines - Notifications How to send email notification of pipeline status.
Link To Section Vertex AI Pipelines - Management Managing, Reusing, and Storing pipelines and components
Link To Section Vertex AI Pipelines - Testing Strategies for testing components and pipeliens locally and remotely to aide development.
Link To Section Vertex AI Pipelines - Managing Pipeline Jobs Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs.

To discover these notebooks as part of an introduction to MLOps read on below!


Table of Contents


Start Here

What are pipelines?

  • They help you automate, manage, and scale your ML workflows
  • They offer reproducibility, collaboration, and efficiency

Before getting into the details let's go from code to pipeline and see this in action!

Notebook Workflow:

In this quick start, we'll take a simple code example and run it both in a notebook and as a pipeline on Vertex AI Pipelines. This will likely spark many questions, and that's great! The rest of this series will dive deeper into each aspect of pipelines, providing comprehensive answers by example.

  • Vertex AI Pipelines - Start Here
    • Code, Python, pulling data and training a model
    • Same code running in a pipeline on Vertex AI Pipelines
    • Same code modified to be for MLOps on Vertex AI Pipelines

Introduction

Pipelines are constructed of:

  1. Create Components From Code
  2. Construct Pipelines Where steps, or Tasks, are made from components
  3. Run Pipelines on Vertex AI Pipelines
  4. Review pipelines runs and tasks results
  5. Review task Execution: Each task runs as a Vertex AI Training Custom Job

An overview:

Notebook Workflow:

Get a quick start with pipelines by reviewing this workflow for an example using both the Vertex AI Console and SDK.

  • Vertex AI Pipelines - Introduction
    • Build a simple pipeline with IO parameters and artifacts as well as conditional execution
    • Review all parts (runs, tasks, parameters, artifacts, metadata) with the Vertex AI Console
    • Retrieve all parts (runs, tasks, parameters, artifacts, metadata) with the Vertex AI SDK

Components

The steps of the workflow, an ML task, are run with components. Getting logic and code into components can consists of using prebuilt components or constructing custom components:

Notebook Workflow:

For an overview of components from custom to pre-built, check out this notebook:

  • Vertex AI Pipelines - Components
    • Pre-Built Components: Easy access to many GCP services
    • Lightweight Python Components: Build a component from a Python function
    • Containerized Python Components: Build an entire Python enviornment as a component
    • Container Components: Any container as a component
    • Importer Components: Quickly import artifacts

Compute Resources For Components:

Running pipleines on Vertex AI Pipelines runs each component as a Vertex AI Training CustomJob. This defaults to a vm based on e2-standard-4 (4 core CPU, 16GB memory). This can be modified at the task level of pipelines to choose different computing resources, including GPUs. For example:

@kfp.dsl.pipeline()
def pipeline():
    task = component().set_cpu_limit(C).set_memory_limit(M).add_node_selector_constraint(A).set_accelerator_limit(G).

Where the inputs are defining machine configuration for the step:

  • C = a string representing the number of CPUs (up to 96).
  • M = a string represent the memory limit. An integer follwed by K, M, or G (up to 624GB).
  • A = a string representing the desired GPU or TPU type
  • G = an integer representing the multiple of A desired.

Component IO

Getting information into code and results out is the IO part of components. These inputs and outputs are particularly important in MLOps as they are the artifacts that define an ML system: datasets, models, metrics, and more. Pipelines tools like TFX and KFP go a step further and automatically track the inputs and outpus and even provide lineage information for them. Component inputs and outputs can take two forms: parameters and artifacts.

Parameters are Python objects like str, int, float, bool, list, dict objects that are defined as inputs to pipelines and components. Components can also return parameters for input into subsequent components. Paramters are excellent for changing the behavior of a pipeline/component through inputs rather than rewriting code.

Artifacts are multi-parameter objects that represent machine learning artifacts and have defined schemas and are stored as metadata with lineage. The artifact schemas follow the ML Metadata (MLMD) client library. This helps with understanding and analyzing a pipeline.

Notebook Workflow:

See all the types of parameters and artifacts in action with the following notebook based workflow:

  • Vertex AI Pipelines - IO
    • parameters: input, multi-input, output, multi-output
    • artifacts: input, output, Vertex AI ML Metadata Lineage

Secure Parameters: Passing credentials for an API or service can expose them. If these credentials are hardcoded then they can be discovered from the source code and are harder to update. A great solution is using Secret Manager to host credentials and then pass the name of the credential as a parameter. The only modification needed to a component is to use a Python client to retrieve the credentials at run time.

Notebook Workflow:

Check out how easy secret manager isis to implement with the following notebook based example workflow:

  • Vertex AI Pipelines - Secret Manager
    • Setup Secret Manager and use the console and Python Client to store secrets
    • Retrieve secrets using the Python Client
    • example pipeline that retrieves credentials from Secret Manager

GCS Read/Write: Methods for reading and writing data in GCS within a component. Components run as Vertex AI Training jobs which include GCS as a Fuse mount. That means components can utlizes GCS at the /gcs mount during runs. This include container components and the notebook workflow below even shows how to pass code directly to a container for execution.

Notebook Workflow:

Use the /gcs mount point to easily read and write data to GCS without need any library imports or setup. The pipeline runs as a service account on a network and any bucket that can be reached with this setup is automatically available.


Control Flow For Pipelines

As the task of an ML pipeline run they form a graph. The outputs of upstream components become the inputs of downstram components. Both TFX and KFP automatically use these connection to create a DAG of execution. When logic needs to be specified in the pipeline flow of execution the use of control structures is necessary.

Notebook Workflow:

The following notebook shows many examples of implement controls in KFP while running on Vertex AI Pipelines:

  • Vertex AI Pipelines - Control
    • Ordering: DAG and Explicit ordering
    • Conditional Execution: if, elif (else if), and else
      • Collecting: Conditional results
    • Looping: And Parallelism
      • Collecting: Looped Results
    • Exit Handling: with and without task failures
    • Error Handling continue execution even after task failures

Scheduling Pipelines

Pipelines can be run on a schedule directly in Vertex AI without the need to setup a scheduler and trigger (like PubSub).

Notebook Workflow:

Here is an example of a pipeline run followed by a schedule that repeats the pipeline at a specified interval the number of iterations set as the maximum on the schedule:

This can have many helpful applications, including:

  • Running Batch predictions, evaluations, monitoring each day or week
  • Retraining a model, do evaluations, and comparing the new model to the currently deployed model then conditionally updating the deployed model
  • Check for new training records and commence with retraining if conditions are met - like records that increase a class by 10%, atleast 1000 new records, ....

Notifications From Pipelines

As the number of pipelines grow and the use of schedulinng and triggering is implemented it becomes necessary to know which pipelines need to be reviewed. Getting notificaitons about the completion of pipeliens is a good first step. Then, being able to control notificaitons to only be sent on failure or particular failures becomes important.

Notebook Workflow:

This notebook workflow covers pre-built components for email notification and building a custom notification system for send emails (or tasks) conditional on the pipelines status.

  • Vertex AI Pipelines - Notifications
    • Pre-Built Component to send emails on pipelines completion
    • Overview of retrieving pipeline runs final status information
    • Building a custom component to send emails conditional on the pipelines final status.

Managing Pipelines: Storing And Reusing Pipelines & Components

As seen above, pipelines are made up of steps which are executions of components. These components are made up of code, container, and instructions (inputs and outputs).

Components:

For each type of component, kfp compiles the component into YAML as part of the pipeline. You can also directly compile individual components. This makes the YAML for a component a source that can be managed. And using this in additional pipelines is made possible with kfp.components.load_component_from_*() which has version for files, urls, text (strings).

Pipelines:

Pipelines are compiled into YAML files that include component specifications. Managine these pipelines files as artifacts is made easy with the combination of:

Notebook Workflow:

Work directly with these concepts in the following notebook based workflow:


Testing Components And Pipelines: Strategies for Local and Remote Development

When creating pipeline components and pipelines the process of testing can be aided by local testing and several strategies for remote (On Vertex AI Pipelines) testing. This section covers local and remote stratedgies to aide development processes.

Notebook Workflow:

Work directly with these concepts in the following notebook based workflow:


Managing Pipeline Jobs

Vertex AI Pipeline Jobs are runs of a pipeline. These can be directly run by a user, started by API, or scheduled. Withing Vetex AI a project can have many jobs running at any time and a history of all past jobs. This workflow shows how to review and manage the jobs in an enviornment using the Python SDK. For custom metrics in Cloud Logging check out this helpful page.

Notebook Workflow:

Work directly with these concepts in the following notebook based workflow:


Pipeline Patterns - Putting Concepts Together Into Common Workflows

A series of notebook based workflows that show how to put all the concepts from the material above into common workflows:

  • Vertex AI Pipelines - Pattern - Modular and Reusable
    • Example 1: Store a pipeline in artifact registry and directly run it on Vertex AI Pipelines without a local download.
    • Example 2: Store and retrieve components for reusability: as files (at url, file directory, or text string) and as artifact in artifact registry
    • Example 3: Store pipelines in artifact registry and retrieve (download, and import) to use as components in new pipelines
  • Run R on Vertex AI Pipelines
    • Use a prebuilt container to easily run an R script with inputs for the required libraries and command line arguments

Putting It All Together