Retries
Retries handle transient failures in your workflows.
They are defined at the task level and can be configured to retry a task a certain number of times, or with a certain delay between each retry.
What are retries
Kestra provides a task retry functionality. This makes it possible to add retry behavior for any failed task run based on configurations in the flow description.
A retry on a task run will create a new task run attempt.
Example
The following example defines a retry for the retry-sample
task with a maximum of 5 attempts every 15 minutes:
- id: retry_sample
type: io.kestra.plugin.core.log.Log
message: my output for task {{task.id}}
timeout: PT10M
retry:
type: constant
maxAttempt: 5
interval: PT15M
In next example, the flow will be retried four times every 0.25 ms, with each attempt executing for a maximum of 1 minute before being failed.
Due to the shell command logic used, it will succeed at the 5th attempt as we can track how many times it retries with {{ taskrun.attemptsCount }}
id: retry
namespace: company.team
description:
This flow will be retry 4 times and will success at the 5th attempts
tasks:
- id: failed
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.core.runner.Process
commands:
- 'if [ "{{taskrun.attemptsCount}}" -eq 4 ]; then exit 0; else exit 1; fi'
retry:
type: constant
interval: PT0.25S
maxAttempt: 5
maxDuration: PT1M
warningOnRetry: true
errors:
- id: never-happen
type: io.kestra.plugin.core.debug.Return
format: Never happened {{task.id}}
Retry options for all retry types
name | type | description |
---|---|---|
type | string | Retry behavior to apply. Can be one of constant , exponential , random . |
maxAttempt | integer | Number of retries performed before the system stops retrying. |
maxDuration | Duration | Maximum delay the execution is retried. Once passed, the task is no more processed. |
warningOnRetry | Boolean | Flag the execution as warning if any retry was done on this task. Default false . |
Duration
Some options above have to be filled with a duration notation. Durations are expressed in the ISO 8601 Durations format, here are some examples:
name | description |
---|---|
PT0.250S | 250 milliseconds delay |
PT2S | 2 seconds delay |
PT1M | 1 minute delay |
PT3.5H | 3 hours and a half delay |
Retry types
constant
This establishes constant retry times: if the interval
property is set to 10 minutes, it retries every 10 minutes.
name | type | description |
---|---|---|
interval | Duration | Duration between each retry. |
exponential
This establishes retry behavior that waits longer between each retry e.g. 1s, 5s, 15s, ...
name | type | description |
---|---|---|
interval | Duration | Duration between each retry. |
maxInterval | Duration | Max Duration between each retry. |
delayFactor | Double | Multiplier for the interval on between retry, default is 2. For example, with an interval of 30s and a delay factor of 2, retry will append at 30s, 1m30, 3m30, ... |
random
This establishes retries with a random delay between minimum and maximum limits.
name | type | description |
---|---|---|
minInterval | Duration | Minimal duration between each retry. |
maxInterval | Duration | Maximum duration between each retry. |
Configuring retries globally
You can also configure retries globally for all tasks in a flow by using the plugins
configuration:
kestra:
plugins:
configurations:
- type: io.kestra
values:
retry:
type: constant
maxAttempt: 3
interval: PT30S
This configuration will apply a constant retry with a maximum of 3 attempts every 30 seconds to all tasks across all flows.
Flow-level retries
You can also set a flow-level retry policy to restart the execution if any task fails. The retry behavior
is customizable — you can choose to:
- Create a new execution:
CREATE_NEW_EXECUTION
- Retry the failed task only:
RETRY_FAILED_TASK
Flow-level retries are particularly useful when you want to retry the entire flow if any task fails. This way, you don't need to configure retries for each task individually.
Here's an example of how you can set a flow-level retry policy:
id: flow_level_retry
namespace: company.team
retry:
maxAttempt: 3
behavior: CREATE_NEW_EXECUTION # RETRY_FAILED_TASK
type: constant
interval: PT1S
tasks:
- id: fail_1
type: io.kestra.plugin.core.execution.Fail
allowFailure: true
- id: fail_2
type: io.kestra.plugin.core.execution.Fail
allowFailure: false
The bahavior
property can be set to CREATE_NEW_EXECUTION
or RETRY_FAILED_TASK
. Only with the CREATE_NEW_EXECUTION
behavior, the attempt
of the execution is incremented. Otherwise, only the failed task run is restarted (incrementing the attempt of the task run rather than the execution).
Apart from the behavior
property, the retry
policy is identical to the one you already know from task retries.
Retry vs. Restart vs. Replay
Automatic vs. manual
Retries ensure that failed task runs are automatically rerun within the same Execution. Apart from retries, defined within your flow code, you can also manually rerun the flow from the Flow Execution page in the UI using the Restart or Replay buttons.
Restart vs. Replay
While Restart will rerun failed tasks within the current Execution (i.e., without creating a new execution), a Replay would result in a completely new run with a different Execution ID than the initial run.
When you replay an Execution, a new execution gets created for the same flow. However, you can still track which Execution triggered this new run thanks to the Original Execution
field:
Replay can be executed from any task, even if that task executed successfully. But note that when you trigger a replay from a specific failed task, it will still result in a new Execution running all tasks downstream of your chosen start task:
When you want to rerun only failed tasks, use Restart
.
Summary: Retries vs. Restart vs. Replay
The table below summarizes the differences between a retry, restart and replay.
Concept | Flow or task level | Automatic or manual | Does it create a new execution? |
---|---|---|---|
Retry | Task level | Automatic | No, it only reruns a given task within the same Execution. Each retry results in a new attempt number, allowing you to see how many times a given task run was retried. |
Restart | Flow level | Manual | No, it only reruns all failed tasks within the same Execution. It's meant to handle unanticipated, transient failures. The UI shows a new attempt number for all task runs that were restarted. |
Replay | Either flow or task level | Manual | Yes. You can pick an arbitrary step from which a new execution should be triggered. If you select a task in the middle that needs outputs from a previous task, its output is retrieved from cache. |
Was this page helpful?