Installation
We recommend the installation using Helm as it allows a declarative approach to managing Kubernetes resources.
This guide assumes you are familiar with Helm.
Prerequisites
- Argo 3.1.6-3.3 installed cluster-wide or into the namespace where the operator’s workflows run (see configuration).
- Argo-Events 1.7.4+ installed cluster-wide (see configuration).
KFP-Operator
To get a working installation you will need to install both the KFP-Operator and at least one provider (see below)
Build and Install
Create basic values.yaml
with the following content:
fullnameOverride: kfp-operator
manager:
argo:
serviceAccount: pipeline-runner
configuration:
defaultExperiment: Default
Install the latest version of the operator
helm install oci://ghcr.io/kfp-operator/kfp-operator -f values.yaml
You will need to configure service accounts and roles required by your chosen Provider
, see here for reference.
Configuration Values
Valid configuration options to override the Default values.yaml
are:
Parameter name | Description |
---|---|
containerRegistry | Container Registry base path for all container images |
namespace.create | Create the namespace for the operator |
namespace.name | Operator namespace name |
manager.argo.containerDefaults | Container Spec defaults to be used for Argo workflow pods created by the operator |
manager.argo.metadata | Container Metadata defaults to be used for Argo workflow pods created by the operator |
manager.argo.ttlStrategy | TTL Strategy used for all Argo Workflows |
manager.argo.stepTimeoutSeconds.compile | Timeout in seconds for compiler steps - defaults to 1800 (30m) |
manager.argo.stepTimeoutSeconds.default | Default timeout in seconds for workflow steps - defaults to 300 (5m) |
manager.argo.serviceAccount.name | The k8s service account used to run Argo workflows |
manager.argo.serviceAccount.create | Create the Argo Workflows service account (or assume it has been created externally) |
manager.argo.serviceAccount.metadata | Optional Argo Workflows service account default metadata |
manager.metadata | Object Metadata for the manager’s pods |
manager.rbac.create | Create roles and rolebindings for the operator |
manager.serviceAccount.name | Manager service account’s name |
manager.serviceAccount.create | Create the manager’s service account or expect it to be created externally |
manager.replicas | Number of replicas for the manager deployment |
manager.resources | Manager resources as per k8s documentation |
manager.configuration | Manager configuration as defined in Configuration (note that you can omit compilerImage and kfpSdkImage when specifying containerRegistry as default values will be applied) |
manager.monitoring.create | Create the manager’s monitoring resources |
manager.monitoring.rbacSecured | Enable addtional RBAC-based security |
manager.monitoring.serviceMonitor.create | Create a ServiceMonitor for the Prometheus Operator |
manager.monitoring.serviceMonitor.endpointConfiguration | Additional configuration to be used in the service monitor endpoint (path, port and scheme are provided) |
manager.multiversion.enabled | Enable multiversion API. Should be used in production to allow version migration, disable for simplified installation |
manager.webhookCertificates.provider | K8s conversion webhook TLS certificate provider - choose cert-manager for Helm to deploy certificates if cert-manager is available or custom otherwise (see below) |
manager.webhookCertificates.secretName | Name of a K8s secret deployed into the operator namespace to secure the webhook endpoint with, required if the custom provider is chosen |
manager.webhookCertificates.caBundle | CA bundle of the certificate authority that has signed the webhook’s certificate, required if the custom provider is chosen |
manager.runcompletionWebhook.endpoints | Array of endpoint for the upstreams to be called when a run completion event is passed |
logging.verbosity | Logging verbosity for all components - see the logging documentation for valid values |
statusFeedback.enabled | Whether run completion eventing and status update feedback loop should be installed - defaults to false |
Examples for these values can be found in the test configuration
Providers
Supported providers are:
- Kubeflow Pipelines
- Vertex AI
Install one or more by following these instructions. Please refer to the respective configuration section before proceeding.
Build and Install
Create basic kfp.yaml
value file with the following content:
provider:
name: kfp-provider
type: kfp
executionMode: v1
serviceAccount:
name: kfp-operator-kfp
create: false
configuration:
kfpNamespace: kubeflow
restKfpApiUrl: http://ml-pipeline.kubeflow:8888
grpcMetadataStoreAddress: metadata-grpc-service.kubeflow:8080
grpcKfpApiAddress: ml-pipeline.kubeflow:8887
defaultBeamArgs:
- name: project
value: ${DATAFLOW_PROJECT}
pipelineRootStorage: ${PIPELINE_STORAGE}
Install the latest version of the provider
helm install oci://ghcr.io/kfp-operator/provider -f kfp.yaml
Configuration
The provider
block contains provider configurations, in order to create relevant Provider Resources.
Parameter name | Description |
---|---|
name | Name given to this provider |
type | Provider type (kfp or vai ) |
serviceAccount.name | Name of the service account to run provider-specific operations |
serviceAccount.create | Create the service account (or assume it has been created externally) |
serviceAccount.metadata | Optional service account default metadata |
configuration | See Provider Configuration for all available providers and their respective configuration options |
Example:
provider:
name: kfp-provider
type: kfp
executionMode: v1
serviceAccount:
name: kfp-operator-kfp
create: false
...
Role-based access control (RBAC) for providers
When using a provider, you should create the necessary ServiceAccount
, RoleBinding
and ClusterRoleBinding
resources required for the providers being used.
In order for Event Source Servers and the Controller to read the Providers you must configure their service accounts to have read permissions of Provider resources. e.g:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kfp-operator-kfp-providers-viewer-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kfp-operator-providers-viewer-role
subjects:
- kind: ServiceAccount
name: kfp-operator-kfp #Used by Event Source Server
namespace: kfp-operator-system
- kind: ServiceAccount
name: kfp-operator-controller-manager #Used by KFP Controller
namespace: kfp-operator-system
An example configuration for Providers is also provided below for reference:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kfp-operator-kfp-service-account
namespace: kfp-namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kfp-operator-kfp-runconfiguration-viewer-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kfp-operator-runconfiguration-viewer-role
subjects:
- kind: ServiceAccount
name: kfp-operator-kfp-service-account
namespace: kfp-namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kfp-operator-kfp-run-viewer-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kfp-operator-run-viewer-role
subjects:
- kind: ServiceAccount
name: kfp-operator-kfp-service-account
namespace: kfp-namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kfp-operator-provider-workflow-executor
namespace: kfp-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kfp-operator-workflow-executor
subjects:
- kind: ServiceAccount
name: kfp-operator-kfp-service-account
namespace: kfp-namespace
KubeFlow completion eventing required RBACs
If using the KubeFlowProvider
you will also need a ClusterRole
for permission to interact with argo workflows for the
eventing system.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kfp-operator-kfp-eventsource-server-role
rules:
- apiGroups:
- argoproj.io
resources:
- workflows
verbs:
- get
- list
- patch
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kfp-operator-kfp-eventsource-server-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kfp-operator-kfp-eventsource-server-role
subjects:
- kind: ServiceAccount
name: kfp-operator-kfp-service-account
namespace: kfp-operator-namespace