Backfill
Backfills are replays of missed schedule intervals between a defined start and end date.
Let's take the following flow as an example:
id: scheduled_flow
namespace: company.team
tasks:
- id: external_system_export
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.core.runner.Process
commands:
- echo "processing data for {{ execution.startDate ?? trigger.date }}"
- sleep $((RANDOM % 5 + 1))
triggers:
- id: schedule
type: io.kestra.plugin.core.trigger.Schedule
cron: "*/30 * * * *"
This flow will run every 30 minutes. However, imagine that your source system had an outage for 5 hours. The flow will miss 10 executions. To replay these missed executions, you can use the backfill feature.
All missed schedules are automatically recovered by default.
You can use Backfill if it's configured differently, e.g. to not recover missed schedules or only the most recent. Read more in the dedicated documentation.
To backfill the missed executions, go to the Triggers
tab on the Flow's detail page and click on the Backfill executions
button.
You can then select the start and end date for the backfill. Additionally, you can set custom labels for the backfill executions to help you identify them in the future.
You can pause and resume the backfill process at any time, and by clicking on the Details
button, you can see more details about that backfill process:
Trigger Backfill via an API call
Using cURL
You can invoke the backfill exections using the cURL call as follows:
curl -XPUT http://localhost:8080/api/v1/triggers -H 'Content-Type: application/json' -d '{
"backfill": {
"start": "2024-03-28T06:30:00.000Z",
"end": null,
"inputs": null,
"labels": [
{
"key": "reason",
"value": "outage"
}
]
},
"flowId": "scheduled_flow",
"namespace": "dev",
"triggerId": "schedule"
}'
In the backfill
attribute, you need to provide the start time for the backfill. The end time can be optinally provided. You can provide inputs to the flow, if any. You can attach labels to the backfill executions by providing key-value pairs in the labels
section. Other attributes to this PUT call are flowId, namespace and triggerId corresponding to the flow that is to backfilled.
Using Python requests
You can invoke the backfill exections using the Python requests as follows:
import requests
import json
url = 'http://localhost:8080/api/v1/triggers'
headers = {
'Content-Type': 'application/json'
}
data = {
"backfill": {
"start": "2024-03-28T06:30:00.000Z",
"end": None,
"inputs": None,
"labels": [
{
"key": "reason",
"value": "outage"
}
]
},
"flowId": "scheduled_flow",
"namespace": "dev",
"triggerId": "schedule"
}
response = requests.put(url, headers=headers, data=json.dumps(data))
print(response.status_code)
print(response.text)
With this code, you will be invoking the backfill for scheduled_flow
flow under company.team
namespace based on schedule
trigger ID within the flow. The number of backfills that will be executed will depend on the schedule present in the schedule
trigger, and the start
and end
times mentioned in the backfill. When the end
time is null, as in this case, the end
time would be considered as the present time.
Was this page helpful?