Skip to content

Group decoding options into single argument #10429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- nodefaults
dependencies:
- aiobotocore
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/all-but-numba.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ dependencies:
# Pin a "very new numpy" (updated Sept 24, 2024)
- numpy>=2.1.1
- aiobotocore
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-3.14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- nodefaults
dependencies:
- aiobotocore
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows-3.14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: xarray-tests
channels:
- conda-forge
dependencies:
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: xarray-tests
channels:
- conda-forge
dependencies:
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- nodefaults
dependencies:
- aiobotocore
- array-api-strict
- array-api-strict<2.4
- boto3
- bottleneck
- cartopy
Expand Down
2 changes: 2 additions & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,8 @@
backends.BackendArray
backends.BackendEntrypoint.guess_can_open
backends.BackendEntrypoint.open_dataset
backends.CoderOptions
backends.CoderOptions.to_kwargs

core.indexing.IndexingSupport
core.indexing.explicit_indexing_adapter
Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1685,6 +1685,7 @@ Advanced API
Dataset.set_close
backends.BackendArray
backends.BackendEntrypoint
backends.CoderOptions
backends.list_engines
backends.refresh_engines

Expand Down
46 changes: 21 additions & 25 deletions doc/internals/how-to-add-new-backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,21 +38,23 @@ This is what a ``BackendEntrypoint`` subclass should look like:

.. code-block:: python

from xarray.backends import BackendEntrypoint
from xarray.backends import BackendEntrypoint, CoderOptions


class MyBackendEntrypoint(BackendEntrypoint):
coder_class = CoderOptions

def open_dataset(
self,
filename_or_obj,
*,
drop_variables=None,
coder_options=None,
# other backend specific keyword arguments
# `chunks` and `cache` DO NOT go here, they are handled by xarray
):
return my_open_dataset(filename_or_obj, drop_variables=drop_variables)
return my_open_dataset(filename_or_obj, coder_options=coder_options)

open_dataset_parameters = ["filename_or_obj", "drop_variables"]
open_dataset_parameters = ["filename_or_obj", "coder_options"]

def guess_can_open(self, filename_or_obj):
try:
Expand Down Expand Up @@ -83,19 +85,15 @@ The following is an example of the high level processing steps:
self,
filename_or_obj,
*,
drop_variables=None,
decode_times=True,
decode_timedelta=True,
decode_coords=True,
coder_options=None,
my_backend_option=None,
):
vars, attrs, coords = my_reader(
filename_or_obj,
drop_variables=drop_variables,
my_backend_option=my_backend_option,
)
vars, attrs, coords = my_decode_variables(
vars, attrs, decode_times, decode_timedelta, decode_coords
vars, attrs, **coder_options.to_kwargs()
) # see also conventions.decode_cf_variables

ds = xr.Dataset(vars, attrs=attrs, coords=coords)
Expand All @@ -110,29 +108,26 @@ method shall be set by using :py:meth:`~xarray.Dataset.set_close`.


The input of ``open_dataset`` method are one argument
(``filename_or_obj``) and one keyword argument (``drop_variables``):
(``filename_or_obj``) and one keyword argument (``coder_options``):

- ``filename_or_obj``: can be any object but usually it is a string containing a path or an instance of
:py:class:`pathlib.Path`.
- ``drop_variables``: can be ``None`` or an iterable containing the variable
names to be dropped when reading the data.
- ``coder_options``: can be None or :py:class:`~xarray.backends.CoderOptions`

If it makes sense for your backend, your ``open_dataset`` method
should implement in its interface the following boolean keyword arguments, called
**decoders**, which default to ``None``:
If it makes sense for your backend, you can override the ``CoderOptions`` fields, which default to ``None``:

- ``mask_and_scale``
- ``decode_times``
- ``decode_timedelta``
- ``use_cftime``
- ``concat_characters``
- ``decode_coords``
- ``drop_variables``

Note: all the supported decoders shall be declared explicitly
in backend ``open_dataset`` signature and adding a ``**kwargs`` is not allowed.
Note: If ``coder_options`` is None the given kwargs are validated against the default.

These keyword arguments are explicitly defined in Xarray
:py:func:`~xarray.open_dataset` signature. Xarray will pass them to the
:py:func:`~xarray.CoderOptions` or subclass. Xarray will pass them to the
backend only if the User explicitly sets a value different from ``None``.
For more details on decoders see :ref:`RST decoders`.

Expand All @@ -141,7 +136,6 @@ arguments. All these keyword arguments can be passed to
:py:func:`~xarray.open_dataset` grouped either via the ``backend_kwargs``
parameter or explicitly using the syntax ``**kwargs``.


If you don't want to support the lazy loading, then the
:py:class:`~xarray.Dataset` shall contain values as a :py:class:`numpy.ndarray`
and your work is almost done.
Expand Down Expand Up @@ -260,14 +254,16 @@ time is stored in two attributes dataDate and dataTime as strings. Therefore,
it is not possible to reuse the Xarray time decoder, and implementing a new
one is mandatory.

Decoders can be activated or deactivated using the boolean keywords of
Xarray :py:meth:`~xarray.open_dataset` signature: ``mask_and_scale``,
Decoders can be activated or deactivated using ``coder_options`` kwarg
(:py:class:`~xarray.backends.CoderOptions`) or it's boolean keywords equivalent of
Xarray :py:meth:`~xarray.open_dataset` (``mask_and_scale``,
``decode_times``, ``decode_timedelta``, ``use_cftime``,
``concat_characters``, ``decode_coords``.
``concat_characters``, ``decode_coords``. ``drop_variables``)
Such keywords are passed to the backend only if the User sets a value
different from ``None``. Note that the backend does not necessarily have to
implement all the decoders, but it shall declare in its ``open_dataset``
interface only the boolean keywords related to the supported decoders.
implement all the decoders, but it shall declare a ``coder_class`` in its
``BackendEntrypoint`` interface with only the boolean keywords related to
the supported decoders.

.. _RST backend_registration:

Expand Down
1 change: 1 addition & 0 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,7 @@ If the file were instead stored remotely (e.g. ``s3://saved_on_disk.h5``) you ca
that are used to `configure fsspec <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.implementations.reference.ReferenceFileSystem.__init__>`_:

.. jupyter-execute::
:stderr:

ds_kerchunked = xr.open_dataset(
"./combined.json",
Expand Down
8 changes: 7 additions & 1 deletion xarray/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@
formats. They should not be used directly, but rather through Dataset objects.
"""

from xarray.backends.common import AbstractDataStore, BackendArray, BackendEntrypoint
from xarray.backends.common import (
AbstractDataStore,
BackendArray,
BackendEntrypoint,
CoderOptions,
)
from xarray.backends.file_manager import (
CachingFileManager,
DummyFileManager,
Expand All @@ -24,6 +29,7 @@
"BackendArray",
"BackendEntrypoint",
"CachingFileManager",
"CoderOptions",
"DummyFileManager",
"FileManager",
"H5NetCDFStore",
Expand Down
Loading
Loading