Helm Installation Guide
Complete guide to install the KFP Operator using Helm with production-ready configuration
Install and configure the Kubeflow Pipelines Operator in your Kubernetes cluster.
The installation includes:
Ensure your cluster meets these requirements:
Version: 3.1.6 - 3.4.x Purpose: Workflow execution engine for pipeline orchestration
# Install Argo Workflows (cluster-wide)
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.4.4/install.yaml
# Verify installation
kubectl get pods -n argo
# Should show argo-server and workflow-controller pods running
Production Configuration:
# argo-workflows-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: workflow-controller-configmap
namespace: argo
data:
config: |
# Increase parallelism for production
parallelism: 100
# Set resource limits
resourceRateLimit:
limit: 10
burst: 1
# Configure artifact repository
artifactRepository:
s3:
bucket: my-argo-artifacts
endpoint: s3.amazonaws.com
region: us-west-2
Version: 1.7.4 or later
Purpose: Event-driven pipeline automation
# Install Argo Events (cluster-wide)
kubectl create namespace argo-events
kubectl apply -f https://raw.githubusercontent.com/argoproj/argo-events/stable/manifests/install.yaml
# Verify installation
kubectl get pods -n argo-events
# Should show eventbus-controller and eventsource-controller pods running
Purpose: Automatic TLS certificate management for webhooks
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Verify installation
kubectl get pods -n cert-manager
Purpose: Metrics collection and monitoring
# Install Prometheus Operator
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# Add the KFP Operator Helm repository
helm repo add kfp-operator https://sky-uk.github.io/kfp-operator/
helm repo update
# Verify the repository
helm search repo kfp-operator
Create a values.yaml file for your environment:
# values.yaml - Production configuration
namespace:
create: true
name: kfp-operator-system
manager:
# High availability setup
replicas: 2
# Resource allocation
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
# Argo Workflows configuration
argo:
serviceAccount:
create: true
name: kfp-operator-argo
stepTimeoutSeconds:
default: 1800 # 30 minutes
compile: 3600 # 1 hour
ttlStrategy:
secondsAfterCompletion: 3600 # Clean up after 1 hour
containerDefaults:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# Security configuration
rbac:
create: true
# Monitoring configuration
monitoring:
create: true
serviceMonitor:
create: true # For Prometheus Operator
# Webhook configuration (production)
multiversion:
enabled: true
webhookCertificates:
provider: cert-manager # Use cert-manager for TLS
# Enable event-driven workflows
statusFeedback:
enabled: true
# Logging configuration
logging:
verbosity: 1 # 0=error, 1=info, 2=debug
# Container registry configuration
containerRegistry: "your-registry.com/kfp-operator"
# Install with custom configuration
helm install kfp-operator kfp-operator/kfp-operator \
--namespace kfp-operator-system \
--create-namespace \
--values values.yaml
# Verify installation
helm list -n kfp-operator-system
# Install directly from OCI registry
helm install kfp-operator \
oci://ghcr.io/sky-uk/kfp-operator/helm/kfp-operator \
--namespace kfp-operator-system \
--create-namespace \
--values values.yaml
# Clone repository and install from source
git clone https://github.com/sky-uk/kfp-operator.git
cd kfp-operator
# Install from local chart
helm install kfp-operator ./helm/kfp-operator \
--namespace kfp-operator-system \
--create-namespace \
--values values.yaml
# Check operator pods
kubectl get pods -n kfp-operator-system
# Expected output:
# NAME READY STATUS RESTARTS AGE
# kfp-operator-controller-manager-xxx-xxx 2/2 Running 0 2m
# Check Custom Resource Definitions
kubectl get crd | grep pipelines.kubeflow.org
# Expected output:
# pipelines.pipelines.kubeflow.org
# providers.pipelines.kubeflow.org
# runconfigurations.pipelines.kubeflow.org
# runs.pipelines.kubeflow.org
# Check operator logs for any errors
kubectl logs -n kfp-operator-system deployment/kfp-operator-controller-manager
# Should show successful startup messages
# Create a test provider (adjust for your environment)
cat <<EOF | kubectl apply -f -
apiVersion: pipelines.kubeflow.org/v1alpha5
kind: Provider
metadata:
name: test-provider
namespace: default
spec:
type: kfp
kfp:
restKfpApiUrl: "http://ml-pipeline.kubeflow.svc.cluster.local:8888"
EOF
# Check provider status
kubectl get providers
# Should show the provider with Ready status
| Parameter | Description | Default | Production Recommendation |
|---|---|---|---|
manager.replicas | Number of controller replicas | 1 | 2+ for HA |
manager.resources | Resource requests/limits | Basic | Set based on cluster size |
logging.verbosity | Log level (0-2) | 1 | 1 for production |
statusFeedback.enabled | Enable event system | false | true for automation |
For detailed configuration options, see:
The operator requires specific RBAC permissions. The Helm chart creates these automatically, but you can customize them:
# Custom RBAC configuration
manager:
rbac:
create: true
# Additional cluster roles for multi-tenancy
additionalClusterRoles:
- name: kfp-operator-tenant-viewer
rules:
- apiGroups: ["pipelines.kubeflow.org"]
resources: ["pipelines", "runs"]
verbs: ["get", "list", "watch"]
# Service account configuration
manager:
serviceAccount:
create: true
name: kfp-operator-controller-manager
annotations:
# For cloud provider integration
iam.gke.io/gcp-service-account: kfp-operator@project.iam.gserviceaccount.com
If you have Prometheus Operator installed:
# Enable ServiceMonitor for Prometheus
manager:
monitoring:
create: true
serviceMonitor:
create: true
interval: 30s
scrapeTimeout: 10s
Import the KFP Operator Grafana dashboard:
# Download dashboard JSON
curl -o kfp-operator-dashboard.json \
https://raw.githubusercontent.com/sky-uk/kfp-operator/master/monitoring/grafana-dashboard.json
# Import into Grafana UI or via ConfigMap
# Check CRD status
kubectl get crd | grep pipelines.kubeflow.org
# If missing, manually install
kubectl apply -f https://raw.githubusercontent.com/sky-uk/kfp-operator/master/config/crd/bases/
# Check webhook configuration
kubectl get validatingwebhookconfiguration | grep kfp-operator
# Check certificate status (if using cert-manager)
kubectl get certificates -n kfp-operator-system
# Check service account permissions
kubectl auth can-i create pipelines --as=system:serviceaccount:kfp-operator-system:kfp-operator-controller-manager
# Review RBAC configuration
kubectl get clusterroles | grep kfp-operator
After successful installation:
Next: Provider Configuration.
Complete guide to install the KFP Operator using Helm with production-ready configuration