Available on: Enterprise Edition>= 0.18.0

Run tasks as Kubernetes pods.

How to use the Kubernetes task runner

Thr Kubernetes task runner executes tasks in a specified Kubernetes cluster. It is useful to declare resource limits and resource requests.

Here is an example of a workflow with a task running Shell commands in a Kubernetes pod:

yaml
id: kubernetes_task_runner
namespace: company.team

description: |
  To get the kubeconfig file, run: `kubectl config view --minify --flatten`.
  Then, copy the values to the configuration below.
  Here is how Kubernetes task runner properties (on the left) map to the kubeconfig file's properties (on the right):
  - clientKeyData: client-key-data
  - clientCertData: client-certificate-data
  - caCertData: certificate-authority-data
  - masterUrl: server e.g. https://docker-for-desktop:6443
  - username: user e.g. docker-desktop

inputs:
  - id: file
    type: FILE

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    inputFiles:
      data.txt: "{{ inputs.file }}"
    outputFiles:
      - "*.txt"
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.ee.kubernetes.runner.Kubernetes
      config:
        clientKeyData: client-key-data
        clientCertData: client-certificate-data
        caCertData: certificate-authority-data
        masterUrl: server e.g. https://docker-for-desktop:6443
        username: user e.g. docker-desktop
    commands:
      - echo "Hello from a Kubernetes task runner!"
      - cp data.txt out.txt

File handling

If your script task has inputFiles or namespaceFiles configured, an init container will be added to upload files into the main container.

Similarly, if your script task has outputFiles configured, a sidecar container will be added to download files from the main container.

All containers will use an in-memory emptyDir volume for file exchange.

Failure scenarios

If a task is resubmitted (e.g. due to a retry or a Worker crash), the new Worker will reattach to the already running (or an already finished) pod instead of starting a new one.

Specifying resource requests for Python scripts

Some Python scripts may require more resources than others. You can specify the resources required by the Python script in the resources property of the task runner.

yaml
id: kubernetes_resources
namespace: company.team

tasks:
  - id: python_script
    type: io.kestra.plugin.scripts.python.Script
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.ee.kubernetes.runner.Kubernetes
      namespace: default
      pullPolicy: Always
      config:
        username: docker-desktop
        masterUrl: https://docker-for-desktop:6443
        caCertData: xxx
        clientCertData: xxx
        clientKeyData: xxx
      resources:
        request: # The resources the container is guaranteed to get
          cpu: "500m" # Request 1/2 of a CPU (500 milliCPU)
          memory: "128Mi" # Request 128 MB of memory
    outputFiles:
      - "*.json"
    script: |
      import platform
      import socket
      import sys
      import json
      from kestra import Kestra

      print("Hello from a Kubernetes runner!")

      host = platform.node()
      py_version = platform.python_version()
      platform = platform.platform()
      os_arch = f"{sys.platform}/{platform.machine()}"


      def print_environment_info():
          print(f"Host's network name: {host}")
          print(f"Python version: {py_version}")
          print(f"Platform info: {platform}")
          print(f"OS/Arch: {os_arch}")

          env_info = {
              "host": host,
              "platform": platform,
              "os_arch": os_arch,
              "python_version": py_version,
          }
          Kestra.outputs(env_info)

          filename = "environment_info.json"
          with open(filename, "w") as json_file:
              json.dump(env_info, json_file, indent=4)


      if __name__ == '__main__':
          print_environment_info()

Using plugin defaults to avoid repetition

You can use pluginDefaults to avoid repeating the same configuration across multiple tasks. For example, you can set the pullPolicy to Always for all tasks in a namespace:

yaml
id: k8s_taskrunner
namespace: company.team

tasks:
  - id: parallel
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: run_command
        type: io.kestra.plugin.scripts.python.Commands
        containerImage: ghcr.io/kestra-io/kestrapy:latest
        commands:
          - pip show kestra

      - id: run_python
        type: io.kestra.plugin.scripts.python.Script
        containerImage: ghcr.io/kestra-io/pydata:latest
        script: |
          import socket

          ip_address = socket.gethostbyname(hostname)
          print("Hello from AWS EKS and kestra!")
          print(f"Host IP Address: {ip_address}")

pluginDefaults:
  - type: io.kestra.plugin.scripts.python
    forced: true
    values:
      taskRunner:
        type: io.kestra.plugin.ee.kubernetes.runner.Kubernetes
        namespace: default
        pullPolicy: Always
        config:
          username: docker-desktop
          masterUrl: https://docker-for-desktop:6443
          caCertData: |-
            placeholder
          clientCertData: |-
            placeholder
          clientKeyData: |-
            placeholder

Guides

Below are a number of guides to help you set up the Kubernetes Task Runner on different platforms.

Google Kubernetes Engine (GKE)

Here's an example flow for using the Kubernetes Task Runner with GKE. To authenticate, you need to use OAuth with a service account.

yaml
id: gke_task_runner
namespace: company.team

tasks:
  - id: auth
    type: io.kestra.plugin.gcp.auth.OauthAccessToken
    projectId: "projectid" # update
    serviceAccount: "{{ secret('GOOGLE_SA') }}" # update

  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.ee.kubernetes.runner.Kubernetes
      config:
        caCertData: "{{ secret('certificate-authority-data') }}" # update
        masterUrl: https://cluster-external-endpoint # update
        username: gke_projectid_region_clustername # update
        oauthToken: "{{ outputs.auth.accessToken['tokenValue'] }}"
    commands:
      - echo "Hello from a Kubernetes task runner!"

To authenticate with Google Cloud, you'll need to make a Service Account and add a JSON Key to Kestra. Read more about how to do that here For GKE, we'll need to make sure we have the Kubernetes Engine default node service account role assigned to our Service Account.

You'll need to use gcloud to get the credentials such as username, masterUrl and caCertData. You can do this by running the following command:

bash
gcloud container clusters get-credentials clustername --region myregion --project projectid

You'll need to update the following arguments above with your own values:

  • clustername is the name of your cluster
  • myregion is the region your cluster is in, e.g. europe-west2
  • projectid is the id of your Google Cloud project.

Once you've run this command, you can now access your config with kubectl config view --minify --flatten so you can replace caCertData, masterUrl and username.

Amazon Elastic Kubernetes Service (EKS)

Here's an example flow for using Kubernetes Task Runner with AWS EKS. To authenticate, you need an OAuth token.

yaml
id: eks_task_runner
namespace: company.team

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.ee.kubernetes.runner.Kubernetes
      config:
        caCertData: "{{ secret('certificate-authority-data') }}"
        masterUrl: https://xxx.xxx.region.eks.amazonaws.com
        username: arn:aws:eks:region:xxx:cluster/cluster_name
        oauthToken: xxx
    commands:
      - echo "Hello from a Kubernetes task runner!"

Was this page helpful?