Configuration

The Kubeflow Pipelines operator can be configured with the following parameters:

Parameter nameDescriptionExample
defaultExperimentDefault Experiment name to be used for creating pipeline runsDefault
defaultProviderDefault provider name to be used (see Using Multiple Providers)vertex-ai-europe
multiversionIf enabled, it will support previous versions of the CRDs, only the latest otherwisetrue
workflowNamespaceNamespace where operator Argo workflows should be running - defaults to the operator’s namespacekfp-operator-workflows
runCompletionTTLDuration string for how long to keep one-off runs after completion - a zero-length or negative duration will result in runs being deleted immediately after completion; defaults to empty (never delete runs)10m

An example can be found here.

Provider Configurations

The provider configurations are specific to the implementation. The operator supports the following out of the box.

Common

Parameter nameDescriptionExample
image*Container image of the providerkfp-operator-kfp-provider:0.0.2
executionMode*KFP compiler execution modev1 (currently KFP) or v2 (Vertex AI)
serviceAccount*Service Account name to be used for all provider-specific operations (see respective provider)kfp-operator-vertex-ai
defaultBeamArgsDefault Beam arguments to which the pipeline-defined ones will be added- name: project value: my-gcp-project
pipelineRootStorageThe storage location used by TFX (pipeline-root) to store pipeline artifacts and outputs - this should be a top-level directory and not specific to a single pipelinegcs://kubeflow-pipelines-bucket

* field automatically populated by Helm based on provider type

Kubeflow Pipelines

Parameter nameDescriptionExample
kfpNamespaceThe KFP namespacekubeflow
restKfpApiUrlThe KFP REST URL available to the operatorhttp://ml-pipeline.kubeflow:8888
grpcKfpApiAddressThe KFP gRPC address for the eventsource serverml-pipeline.kubeflow-pipelines:8887
grpcMetadataStoreAddressThe MLMD gRPC address for the eventsource servermetadata-grpc-service.kubeflow-pipelines:8080

KFP must be installed in standalone mode. Default endpoints are used below.

Vertex AI Pipelines

Vertex AI Provider

Parameter nameDescriptionExample
pipelineBucketGCS bucket where to store the compiled pipelinekfp-operator-pipelines
vaiProjectVertex AI GCP project namekfp-operator-vertex-ai
vaiLocationVertex AI GCP project locationeurope-west2
vaiJobServiceAccountVertex AI GCP service account to run pipeline jobskfp-operator-vai@kfp-operator-vertex-ai.iam.gserviceaccount.com
eventsourcePipelineEventsSubscriptionSubscription for the eventsource to use which subscribes to the Vertex AI pipeline events log sink topic (see below)kfp-operator-vai-run-events-eventsource
maxConcurrentRunCountMaximum number of runs that can be started concurrently per schedule, defaults to 103

GCP Project Setup

The following GCP APIs need to be enabled in the configured vaiProject:

  • Vertex AI
  • Pub/Sub
  • Cloud Storage
  • Cloud Scheduler

A Vertex AI log sink needs to be created that:

  • captures pipeline state changes as
       jsonPayload.state="PIPELINE_STATE_SUCCEEDED" OR "PIPELINE_STATE_FAILED" OR "PIPELINE_STATE_CANCELLED"```
    
  • writes state changes to Pub/Sub on to a Pipeline Events topic (see below for required subscription)

Pub/Sub topics and subscriptions need to be created for:

  • Pipeline Events
    • Subscription: eventsourcePipelineEventsSubscription

It is important to configure the retry policy for the eventsourcePipelineEventsSubscription subscription according to your needs. This determines the retry frequency of the eventsource server to query the Vertex AI API in case of errors. We suggest an exponential backoff with min and max backoff set to at least 10 seconds each, resulting in a fixed 10 seconds wait between polls.

GCS pipeline storage bucket provider.configuration.pipelineBucket needs to be created

The configured serviceAccount needs to have workload identity enabled and be granted the following permissions:

  • storage.objects.create on the configured pipelineBucket
  • storage.objects.get on the configured pipelineBucket
  • storage.objects.delete on the configured pipelineBucket
  • projects.subscriptions.pull from the configured eventsourcePipelineEventsSubscription* subscription
  • aiplatform.pipelineJobs.create
  • aiplatform.pipelineJobs.get*
  • aiplatform.schedules.get
  • aiplatform.schedules.create
  • aiplatform.schedules.delete
  • aiplatform.schedules.update
  • iam.serviceAccounts.actAs the configured vaiJobServiceAccount Vertex AI Job Runner

* fields only needed if the operator is installed with eventing support