Updated floodscan backfilling #67

hannahker · 2025-01-23T02:33:30Z

This PR replaces #66 as a simpler approach to backfilling in any missing dates. We assume that all missing dates can be backfilled by the latest 90-days file (which is currently true for prod).

This PR primarily changes the query_api function to more fully handle the date input argument, by passing it to the modified get_geotiff_from_90_days_file() function. The backfilling will rerun query_api() for each missing date, which redownloads and unzips the latest 90 days file each time. Note this this will be highly inefficient if we have many missing dates -- it seems unlikely that this will ever be the case, but this is something to be aware of.

isatotun

Hannah and I went over the changes here a couple of times and I feel like the logic around the backfill works properly and has been tested locally and on databricks. If any odd issues could arise in the future I would guess they might be related to running the historical updates for an extensive amount of data, since we are not planning to do that and we're mostly just keeping with the daily updates and backfilling as necessary this PR should be good to go.
Thanks Hannah for the refactoring done here and doing all the effort in understanding this pipeline!

isatotun · 2025-01-23T16:25:27Z

src/pipelines/floodscan_pipeline.py

-
-            shutil.rmtree(sfed_dir)
-            shutil.rmtree(mfed_dir)
+            for file in os.listdir(self.local_raw_dir):


I remember adding this cleanup because I was maxing out on space in Databricks when running for all the dates avialble from 1998. I don't think we need to worry right now about re-running for all the dates again so any changes here should be safe and just optimisation.

src/pipelines/floodscan_pipeline.py

hannahker added 7 commits January 22, 2025 16:31

fix local cleanup

636d875

backfill with dates from latest 90 days

3e4a463

fail more gracefully when tiff isn't found

1410e5b

add todo back

23891e1

fix logging in missing dates

bee01bf

handle latest date

57fe2a0

latest update isn't missing

6a73ab8

hannahker marked this pull request as ready for review January 23, 2025 16:17

temp update for dev testing

c62dcf9

isatotun approved these changes Jan 23, 2025

View reviewed changes

hannahker added 2 commits January 23, 2025 12:00

remove comments

70e3bb5

revert dev testing update

f00297b

hannahker merged commit a52685a into main Jan 23, 2025
1 check passed

This was referenced Jan 23, 2025

Floodscan backfilling #66

Closed

Functionality to backfill missing dates #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated floodscan backfilling #67

Updated floodscan backfilling #67

hannahker commented Jan 23, 2025 •

edited

Loading

isatotun left a comment

isatotun Jan 23, 2025

Updated floodscan backfilling #67

Updated floodscan backfilling #67

Conversation

hannahker commented Jan 23, 2025 • edited Loading

isatotun left a comment

Choose a reason for hiding this comment

isatotun Jan 23, 2025

Choose a reason for hiding this comment

hannahker commented Jan 23, 2025 •

edited

Loading