Skip to content

Group Backend Keyword Arguments #10422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

This is a first attempt and base for discussion.

This PR does the following:

  1. split open_dataset kwargs into four groups:
    Here I followed @shoyer's suggestion to use dataclasses Group together decoding options into a single argument #4490.
  • coder_opts: options for CF coders (eg. mask_and_scale, decode_times)
  • open_opts: options for the backend file opener (eg. driver, clobber, diskless, format)
  • backend_opts: options for xarray (eg. chunk, cache, inline_array)
  • store_opts: options for the backend store (eg. group, lock, autoclose)
  1. define these classes in BackendEntrypoint and override them in the subclasses.
    for now only for netcdf4/h5netcdf backends
  2. implement logic into open_dataset
  3. implement logic into to_netcdf
  4. for backwards compatibility reinitialize the above options with the given kwargs as needed

Example usage:

# simple call, use backend default options
ds = xr.open_dataset("test.nc", engine="netcdf4") # simple call
# define once, use many , these should be imported from the backend 
open_opts = NetCDF4OpenOptions(auto_complex=True)
coder_opts = NetCDF4CoderOptions(decode_times=False, mask_and_scale=False)
backend_opts = XarrayBackendOptions(chunk={"time": 10})
store_opts = NetCDF4StoreOptions(group="test")
# engine could also be the `BackenEntryPoint`
ds = xr.open_dataset("test.nc", engine="netcdf4", open_opts=open_opts, coder_opts=coder_opts, backend_opts=backend_opts, store_opts=store_opts) 

CONS:

  • Most users might not need to use these added options at all, but could fallback to current behaviour
  • Users might complain about the additional complexity for setting up the dataclasses
  • tbc.

PROS:

  • strict separation of kwargs/options
  • easy forwarding
  • per backend kwargs/options
  • easy adding kwargs/options
  • tbc.

What this PR still needs to do:

  • implement everything above for the other built-in backends (zarr, scipy, pydap, etc.)

I have follow-up ideas:

  • implement save_dataset in BackendEntrypoint to write to the engine's native format, like to_netcdf would be for scipy/netcdf4/h5netcdf and to_zarr would be for zarr. With that we could do the writing with a unified API, something like:

    ds = xr.open_dataset("test.nc", engine="netcdf4")
    # Dataset API
    ds.save_dataset("test.zarr", engine="zarr)
    ds.save_dataset("test2.nc", engine="netcdf4")
    # general API
    xr.save_dataset(ds, "test2.nc", engine="netcdf4")
    ds.save_dataset("test.grib", engine="grib") # my imagination
    ds.save_dataset("test.hdf5", engine="hdf5") # my imagination
  • further disentangle the current built-in backends from xarray so that they could be their own module

I'm sure I have not taken into account all the possible pitfalls/problems which might arise here. I'd appreciate any comments and suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unconstrained forwarding of backend keyword arguments
1 participant