Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRISM ADRIO and Devlogs #159

Merged
merged 8 commits into from
Sep 23, 2024
Merged

PRISM ADRIO and Devlogs #159

merged 8 commits into from
Sep 23, 2024

Conversation

meaghan66
Copy link
Contributor

PRISM ADRIO adaptation, allowing a user to retrieve climate attributes at a daily granularity, specifically at a centroid of a given geoid. Two devlogs are included, one being a demo showing how to use the PRISM ADRIO template and one displaying the experimentation of downloading and retrieving values from the PRISM raster data files.

Copy link
Contributor

@JavadocMD JavadocMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See preliminary comments; but I'd also like to get this branch rebased on main.

Copy link
Contributor

@JavadocMD JavadocMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change noted inline; but also I discovered a significant issue with the way this is handling zip files. I'll add another comment...

@JavadocMD
Copy link
Contributor

Okay, zip files! In running the examples this time I noticed a bunch of files and folders being created in my epymorph project folder (screenshot):
image

The algorithm you've written goes like this:

  1. download the zip files from PRISM
  2. unzip the files, with a target path based in _PRISM_CACHE_PATH, and generate a new path which points to where the bil file is in the unzipped folder
  3. use rasterio to open each bil file per day and sample points out of it

The problem starts here because _PRISM_CACHE_PATH is actually relative ("adrio/prism") because it's intended to be used only with functions from the epymorph.cache module. (I admit this is misleading.) So this actually unzips in "<CURRENT_WORKING_DIRECTORY>/adrio/prism" and clutters up everything with needless files.

So we could fix the pathing, but I actually have a better solution for you -- we can read the zip files directly without unzipping them to disk! This is a much more efficient use of disk space, and unzipping in memory shouldn't add too much overhead.

Granted rasterio is very poorly documented for this use-case and it took me a lot of trial an error, but it can be done. Here's a working prototype for PRISM (using paths from my file system of course):

from pathlib import Path
import rasterio.io as rio
from numpy.typing import NDArray
import numpy as np

from epymorph.data_type import CentroidDType

# https://rasterio.readthedocs.io/en/stable/topics/memory-files.html
# https://github.com/rasterio/rasterio/issues/977


# Read from a BytesIO of a bil.zip,
# grab the named .bil file in memory,
# and sample the dataset at each point in coords.
def bil_nye_the_raster_guy(
    bil_data: BytesIO,
    bil_file_name: str,
    coords: NDArray[CentroidDType],
) -> NDArray[np.float64]:
    with rio.ZipMemoryFile(bil_data) as zip_contents:
        with zip_contents.open(bil_file_name) as dataset:
            values = [round(x[0], 3) for x in dataset.sample(coords)]
            return np.array(values, dtype=np.float64)


# This of course is determined by the variable and day we're loading...
file_stem = "PRISM_tmean_stable_4kmD2_20210430_bil"
bil_file_name = f"{file_stem}.bil"


# This bit is just a stand-in for what `load_or_fetch_url()` returns --
# a BytesIO that was either loaded from cache or just downloaded from network.
file_path = Path(f"/home/tcoles/.cache/epymorph/adrio/prism/{file_stem}.zip")
with file_path.open("rb") as file:
    bil_data = BytesIO(file.read())

# coords = ...
# you can actually leave this in its raw structured numpy array form
# the access pattern for a CentroidDType array is similar-enough to
# the access pattern for a list[tuple[float, float]] that it "just works", or seems to

data = bil_nye_the_raster_guy(bil_data, bil_file_name, coords)
print(data)

If you use coords from a CountyScope.in_states_by_code(["AZ"]) and us_tiger.InternalPoint(), this prints:

[11.6079998  13.17399979 11.27700043 18.60499954 17.25099945 14.92000008
 22.78499985 24.16300011 18.6420002  12.70300007 17.59600067 21.4829998
 14.83100033 14.5880003  24.72200012]

@JavadocMD
Copy link
Contributor

JavadocMD commented Sep 13, 2024

Oh and is there a reason we are rounding values to three decimal places?

Copy link
Contributor

@JavadocMD JavadocMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! See inline.

@JavadocMD
Copy link
Contributor

Much better -- the memory usage stays nice and flat while fetching a lot of data.

image

@JavadocMD
Copy link
Contributor

Good to squash merge.

@meaghan66 meaghan66 merged commit c0a0a2b into main Sep 23, 2024
1 check passed
@meaghan66 meaghan66 deleted the PRISM branch September 23, 2024 17:57
@JavadocMD JavadocMD linked an issue Sep 23, 2024 that may be closed by this pull request
ewilli1 pushed a commit that referenced this pull request Nov 15, 2024
* Branch fixes and dependency fix.

* Handling date errors and messaging.

* Zip file corrections.

* Block indent fix.

* Generator refactor.

* Fix small issue with dates and file names.

* Validating scope and erroring excluded locations.

* Slight bug fix and update demo values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PRISM ADRIO (climate data)
2 participants