Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More sophisticated bounds handling for temporal averaging #735

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pochedls
Copy link
Collaborator

Description

This PR has more sophisticated handling of bounds for temporal operations. In particular, it will compare the dataset time bounds with that period that is targeted. For example, if you have a time point with arbitrary bounds of ["2019-12-28 00:00", "2020-01-03 00:00"] and want to create an average value for January 2020 (i.e., ["2020-01-01", "2020-01-01"]) this PR will correctly include 2 days of weight in that January average (since only 2 days of data are in January).

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: enhancement New enhancement request label Feb 10, 2025
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 11.29032% with 55 lines in your changes missing coverage. Please review.

Project coverage is 96.72%. Comparing base (8824b32) to head (2dba031).

Files with missing lines Patch % Lines
xcdat/temporal.py 11.29% 55 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #735      +/-   ##
===========================================
- Coverage   100.00%   96.72%   -3.28%     
===========================================
  Files           15       15              
  Lines         1621     1681      +60     
===========================================
+ Hits          1621     1626       +5     
- Misses           0       55      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@pochedls pochedls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very small start on this issue, but worth going over carefully because if we get the workflow right, we can use this PR (and modifications) as a template for more temporal operations. This mainly handles group averaging operations (we'd need to think more about how to apply this to climatologies, for example).

@@ -2025,6 +2025,210 @@ def _calculate_departures(
return ds_departs


def compute_monthly_average(self, data_var):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function wraps several steps/functions (that are defined below) in order to compute monthly averages (e.g., from hourly/daily/pentad data to monthly means):

  • ensure_bounds_order: function ensures that dataset bounds are in order [earlier time, later time] (since PR logic depends on this)
  • generate_monthly_bounds: function creates monthly bounds
  • get_temporal_weights: function computes weights for averaging source dataset into targeted time periods
  • _experimental_averager: function uses temporal weights to average data into targeted time periods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could generalize this function (kind of like .temporal.group_average()) by having it call different functions to generate target bounds (e.g., generate_daily_bounds, generate_seasonal_bounds, generate_yearly_bounds). The other steps would work as-is.

return ds.temporal._experimental_averager(data_var, weights, target_bnds)


def _experimental_averager(self, data_var, weights, target_bnds):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be a generic average averaging data variable information into targeted time periods (using the supplied weights).

return dsmean


def get_temporal_weights(self, target_bnds):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function basically gets the intersection between the dataset's own time bounds and the targeted time bounds (i.e., averaging periods). For a given time step, it assigns weight proportional to the duration in which a given timestep is within the a given averaging period.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR is ~10x slower than existing functionality. The slowdown is almost entirely in this function. If we could speed this step up, that would be great (but we likely can tolerate this slowdown, since the approach in this PR should be more robust/accurate).

return weights


def generate_monthly_bounds(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prototype function for generating target bounds (i.e., what bins do you want to average your source data into). We could make other functions for other frequencies (e.g., daily, seasonal, yearly).

return monthly_time, monthly_bnds


def ensure_bounds_order(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function just makes sure the bounds are ordered as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[Feature]: More Sophisticated Bounds Handling in Temporal Averaging Operations
1 participant