Overview

The Kubeflow Pipelines Operator provides a declarative API for managing and running ML pipelines with Resource Definitions on multiple providers. A provider is a runtime environment for managing and executing ML pipelines and related resources.

Compatibility

The operator currently supports

  • TFX Pipelines with Python 3.7 and 3.9 - pipelines created using the KFP DSL are not supported yet
  • KFP standalone (a full KFP installation is not supported yet) and Vertex AI

TFX Pipelines and Components

Unlike imperative Kubeflow Pipelines deployments, the operator takes care of providing all environment-specific configuration and setup for the pipelines. Pipeline creators therefore don’t have to provide DAG runners, metadata configs, serving directories, etc. Furthermore, pusher is not required and the operator can extend the pipeline with this very environment-specific component.

For running a pipeline using the operator, only the list of TFX components needs to be returned. Everything else is done by the operator. See the penguin pipeline for an example.

Lifecycle phases and Parameter types

TFX Pipelines go through certain lifecycle phases that are unique to this technology. It is helpful to understand where these differ and where they are executed.

Development: Creating the components definition as code.

Compilation: Applying compile-time parameters and defining the execution runtime (aka DAG runner) for the pipeline to be compiled into a deployable artifact.

Deployment: Creating a pipeline representation in the target environment.

Running: Instantiating the pipeline, applying runtime parameters and running all pipeline steps involved to completion.

Note: Local runners usually skip compilation and deployment and run the pipeline straight away.

TFX allows the parameterization of Pipelines in most lifecycle stages:

Parameter typeDescriptionExample
Named ConstantsCode constantsANN layer size
Compile-time parameterParameters that are unlikely to change between pipeline runs supplied as environment variabels to the pipeline functionBigquery dataset
Runtime parameterParameters exposed as TFX RuntimeParameter which can be overridden at runtime allow simplified experimentation without having to recompile the pipelineNumber of training runs

The pipeline operator supports the application of compile time and runtime parameters through its custom resources. We strongly encourage the usage of both of these parameter types to speed up development and experimentation lifecycles. Note that Runtime parameters can initialised to default values from both constants and compile-time parameters

Eventing Support

The Kubeflow Pipelines operator can optionally be installed with Argo-Events eventsources which lets users react to events.

Currently, we support the following eventsources:

Architecture Overview

To do.