All pipelines can be run using the run_pipeline.py
script from the command line.
python run_pipeline.py <pipeline_name> [options]
Replace <pipeline_name>
with either era5
, seas5
, imerg
or floodscan
.
These options are available for both pipelines:
--mode {local,dev,prod}
: Specify the mode to run the pipeline in (default: local)--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
: Set the logging level (default: INFO)--use-cache
: Use cached raw data if available--backfill
: Check for missing dates and backfill if necessary (only for 2024 dates onwards)
--start-year YEAR
: Start year for data processing. Min 1981.--end-year YEAR
: End year for data processing. Max 2024.--update
: Get data from last month if available
--start-year YEAR
: Start year for data processing. Min 1981.--end-year YEAR
: End year for data processing. Max 2024.--update
: Get data from this month if available
--start-date DATE
,-s DATE
: Start date to retrieve and process archival IMERG data (format: YYYY-MM-DD, default: yesterday)--end-date DATE
,-e DATE
: End date to retrieve and process archival IMERG data (format: YYYY-MM-DD, default: today)--run {early,late}
,-r {early,late}
: Specify 'early' for early run or 'late' for late run (default: late)--version {6,7}
,-v {6,7}
: IMERG version to use (7 is technically 07B, default: 7)--create-auth-files
,-caf
: Create authorization files for accessing IMERG datasets
--start-date DATE
,-s DATE
: Start date to retrieve and process FloodScan data (format: YYYY-MM-DD, default: yesterday)--end-date DATE
,-e DATE
: End date to retrieve and process FloodScan data (format: YYYY-MM-DD, default: yesterday)--version {5}
,-v {5}
: FloodScan version to use (5 is the only one supported at the moment)--update
: Get data from yesterday if available
-
Run ERA5 pipeline in local mode for years 2020-2022:
python run_pipeline.py era5 --mode local --start-year 2020 --end-year 2022
-
Run SEAS5 pipeline in dev mode with cached data, for 2020-2022:
python run_pipeline.py seas5 --mode dev --start-year 2020 --end-year 2022 --use-cache
-
Update ERA5 data in production mode and backfill for missing dates:
python run_pipeline.py era5 --mode prod --update --backfill
-
Run IMERG pipeline to get yesterday's data and save in production storage:
python run_pipeline.py imerg --mode prod
-
Run FloodScan pipeline to get yesterday's data and save in production storage:
python run_pipeline.py floodscan --mode prod --update
Note: Ensure you have set up the necessary environment variables and dependencies before running the pipelines.