Blueprints
Schedule a Python data ingestion job to extract data from an API and load it to DuckDB using dltHub (data load tool)
About this blueprint
Ingest Data Schedule
This flow loads data from the Chess.com API into DuckDB destination. The flow is scheduled to run daily at 9 AM.
yaml
id: dlt
namespace: company.team
tasks:
- id: chess_api_to_duckdb
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: python:slim
beforeCommands:
- pip install dlt[duckdb]
warningOnStdErr: false
script: |
import dlt
import requests
pipeline = dlt.pipeline(
pipeline_name='chess_pipeline',
destination='duckdb',
dataset_name='player_data'
)
data = []
for player in ['magnuscarlsen', 'rpragchess']:
response = requests.get(f'https://api.chess.com/pub/player/{player}')
response.raise_for_status()
data.append(response.json())
# Extract, normalize, and load the data
pipeline.run(data, table_name='player')
triggers:
- id: daily
type: io.kestra.plugin.core.trigger.Schedule
disabled: true
cron: 0 9 * * *