-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More sophisticated bounds handling for temporal averaging #735
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #735 +/- ##
===========================================
- Coverage 100.00% 96.72% -3.28%
===========================================
Files 15 15
Lines 1621 1681 +60
===========================================
+ Hits 1621 1626 +5
- Misses 0 55 +55 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very small start on this issue, but worth going over carefully because if we get the workflow right, we can use this PR (and modifications) as a template for more temporal operations. This mainly handles group averaging operations (we'd need to think more about how to apply this to climatologies, for example).
@@ -2025,6 +2025,210 @@ def _calculate_departures( | |||
return ds_departs | |||
|
|||
|
|||
def compute_monthly_average(self, data_var): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function wraps several steps/functions (that are defined below) in order to compute monthly averages (e.g., from hourly/daily/pentad data to monthly means):
ensure_bounds_order
: function ensures that dataset bounds are in order [earlier time, later time] (since PR logic depends on this)generate_monthly_bounds
: function creates monthly boundsget_temporal_weights
: function computes weights for averaging source dataset into targeted time periods_experimental_averager
: function uses temporal weights to average data into targeted time periods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could generalize this function (kind of like .temporal.group_average()
) by having it call different functions to generate target bounds (e.g., generate_daily_bounds
, generate_seasonal_bounds
, generate_yearly_bounds
). The other steps would work as-is.
return ds.temporal._experimental_averager(data_var, weights, target_bnds) | ||
|
||
|
||
def _experimental_averager(self, data_var, weights, target_bnds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to be a generic average averaging data variable information into targeted time periods (using the supplied weights).
return dsmean | ||
|
||
|
||
def get_temporal_weights(self, target_bnds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function basically gets the intersection between the dataset's own time bounds and the targeted time bounds (i.e., averaging periods). For a given time step, it assigns weight proportional to the duration in which a given timestep is within the a given averaging period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this PR is ~10x slower than existing functionality. The slowdown is almost entirely in this function. If we could speed this step up, that would be great (but we likely can tolerate this slowdown, since the approach in this PR should be more robust/accurate).
return weights | ||
|
||
|
||
def generate_monthly_bounds(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prototype function for generating target bounds (i.e., what bins do you want to average your source data into). We could make other functions for other frequencies (e.g., daily, seasonal, yearly).
return monthly_time, monthly_bnds | ||
|
||
|
||
def ensure_bounds_order(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function just makes sure the bounds are ordered as expected.
Description
This PR has more sophisticated handling of bounds for temporal operations. In particular, it will compare the dataset time bounds with that period that is targeted. For example, if you have a time point with arbitrary bounds of
["2019-12-28 00:00", "2020-01-03 00:00"]
and want to create an average value for January 2020 (i.e.,["2020-01-01", "2020-01-01"]
) this PR will correctly include 2 days of weight in that January average (since only 2 days of data are in January).Checklist
If applicable: