Working Directory
This task allows you to run multiple tasks sequentially in the same working directory. It is useful when you want to share files from Namespace Files or from a Git repository across multiple tasks.
When to use the WorkingDirectory
task
By default, all Kestra tasks are stateless. If one task generates files, those files won’t be available in downstream tasks unless they are persisted in internal storage. Upon each task completion, the temporary directory for the task is purged. This behavior is generally useful as it keeps your environment clean and dependency free, and it avoids potential privacy or security issues when exposing some data generated by a task to other processes.
Despite the benefits of the stateless execution, in certain scenarios, statefulness is desirable. Imagine that you want to execute several Python scripts, and each of them generates some output data. Another script combines that data as part of an ETL/ML process. Executing those related tasks in the same working directory and sharing state between them is helpful for the following reasons:
- You can attach namespace files to the
WorkingDirectory
task and use them in all downstream tasks. This allows you to work the same way you would work on your local machine, where you can import modules from the same directory. - Within a
WorkingDirectory
, you can clone your entire GitHub branch with multiple modules and configuration files needed to run several scripts and reuse them across multiple downstream tasks. - You can execute multiple scripts sequentially on the same worker or in the same container, minimizing latency.
- Output artifacts of each task (such as CSV, JSON or Parquet files you generate in your script) are directly available to other tasks without having to persist them within the internal storage. This is because all child tasks of the
WorkingDirectory
task share the same file system.
The WorkingDirectory
task allows you to:
- Share files from Namespace Files or from a Git repository across multiple tasks
- Run multiple tasks sequentially in the same working directory
- Share data across multiple tasks without having to persist it in internal storage.
Was this page helpful?