Configuration

The Kubeflow Pipelines operator can be configured with the following parameters:

Parameter nameDescriptionExample
defaultExperimentDefault Experiment name to be used for creating pipeline runsDefault
defaultProviderDefault provider name to be used (see Using Multiple Providers). Note: This is deprecated as of v1alpha6 and will be removed on release of v1alpha7vertex-ai-europe
multiversionIf enabled, it will support previous versions of the CRDs, only the latest otherwisetrue
workflowNamespaceNamespace where operator Argo workflows should be running - defaults to the operator’s namespacekfp-operator-workflows
runCompletionTTLDuration string for how long to keep one-off runs after completion - a zero-length or negative duration will result in runs being deleted immediately after completion; defaults to empty (never delete runs)10m
runCompletionFeedConfiguration of the service for the run completion feed back to KFP OperatorSee here

An example can be found here.

Run Completion Feed Configuration

Parameter nameDescriptionExample
runCompletionFeed.portThe port that the feed endpoint will listen on8082
runCompletionFeed.endpointsArray of run completion event handler endpoints that should be called per feed message- host: run-completion-event-handler<br/>&nbsp;&nbsp;path: /<br/>&nbsp;&nbsp;port: 12000

Provider Configurations

The provider configurations are specific to the implementation, these configuration are applied via Provider Custom Resource.

Kubeflow Pipelines

KFP must be installed in standalone mode. Its configuration can be controlled using the KFP specific parameters within a Provider Resource.

Vertex AI Pipelines

VAI configuration can be controlled using VAI specific parameters within a Provider Resource Vertex AI Provider

GCP Project Setup

The following GCP APIs need to be enabled in the configured vaiProject:

  • Vertex AI
  • Pub/Sub
  • Cloud Storage
  • Cloud Scheduler

A Vertex AI log sink needs to be created that:

  • captures pipeline state changes as
       jsonPayload.state="PIPELINE_STATE_SUCCEEDED" OR "PIPELINE_STATE_FAILED" OR "PIPELINE_STATE_CANCELLED"```
    
  • writes state changes to Pub/Sub on to a Pipeline Events topic (see below for required subscription)

Pub/Sub topics and subscriptions need to be created for:

  • Pipeline Events
    • Subscription: eventsourcePipelineEventsSubscription

It is important to configure the retry policy for the eventsourcePipelineEventsSubscription subscription according to your needs. This determines the retry frequency of the eventsource server to query the Vertex AI API in case of errors. We suggest an exponential backoff with min and max backoff set to at least 10 seconds each, resulting in a fixed 10 seconds wait between polls.

GCS pipeline storage bucket provider.configuration.pipelineBucket needs to be created

The configured serviceAccount needs to have workload identity enabled and be granted the following permissions:

  • storage.objects.create on the configured pipelineBucket
  • storage.objects.get on the configured pipelineBucket
  • storage.objects.delete on the configured pipelineBucket
  • projects.subscriptions.pull from the configured eventsourcePipelineEventsSubscription* subscription
  • aiplatform.pipelineJobs.create
  • aiplatform.pipelineJobs.get*
  • aiplatform.schedules.get
  • aiplatform.schedules.create
  • aiplatform.schedules.delete
  • aiplatform.schedules.update
  • iam.serviceAccounts.actAs the configured vaiJobServiceAccount Vertex AI Job Runner

* fields only needed if the operator is installed with eventing support