-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Zarr and Xarray examples to docs (#655)
- Loading branch information
Showing
4 changed files
with
147 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,5 +8,7 @@ maxdepth: 2 | |
--- | ||
how-to-run | ||
basic-array-ops | ||
zarr | ||
xarray | ||
pangeo | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
--- | ||
file_format: mystnb | ||
kernelspec: | ||
name: python3 | ||
--- | ||
# Xarray | ||
|
||
Cubed can work with Xarray datasets via the [`cubed-xarray`](https://github.com/cubed-dev/cubed-xarray) package. | ||
|
||
Install by running the following: | ||
|
||
```shell | ||
pip install cubed cubed-xarray xarray pooch netCDF4 | ||
``` | ||
|
||
Note that `pooch` and `netCDF4` are needed to access the Xarray tutorial datasets that we use in the example below. | ||
|
||
## Open dataset | ||
|
||
Start by importing Xarray - note that we don't need to import Cubed or `cubed-xarray`, since they will be picked up automatically. | ||
|
||
```{code-cell} ipython3 | ||
import xarray as xr | ||
xr.set_options(display_expand_attrs=False, display_expand_data=True); | ||
``` | ||
|
||
We open an Xarray dataset (in netCDF format) using the usual `open_dataset` function. By specifying `chunks={}` we ensure that the dataset is chunked using the on-disk chunking (here it is the netCDF file chunking). The `chunked_array_type` argument specifies which chunked array type to use - Cubed in this case. | ||
|
||
```{code-cell} ipython3 | ||
ds = xr.tutorial.open_dataset( | ||
"air_temperature", chunked_array_type="cubed", chunks={} | ||
) | ||
ds | ||
``` | ||
|
||
Notice that the `air` data variable is a `cubed.Array`. Since Cubed has a lazy computation model, this array is not loaded from disk until a computation is run. | ||
|
||
## Convert to Zarr | ||
|
||
We can use Cubed to convert the dataset to Zarr format by calling `to_zarr` on the dataset: | ||
|
||
```{code-cell} ipython3 | ||
ds.to_zarr("air_temperature_cubed.zarr", mode="w", consolidated=True); | ||
``` | ||
|
||
This will run a computation that loads the input data and writes it out to a Zarr store on the local filesystem. | ||
|
||
## Compute the mean | ||
|
||
We can also use Xarray's API to run computations on the dataset using Cubed. Here we find the mean air temperature over time, for each location: | ||
|
||
```{code-cell} ipython3 | ||
mean = ds.air.mean("time", skipna=False) | ||
mean | ||
``` | ||
|
||
To run the computation we need to call `compute`: | ||
|
||
```{code-cell} ipython3 | ||
mean.compute() | ||
``` | ||
|
||
This is fine for outputs that fit in memory like the example here, but sometimes we want to write the output of the computation to Zarr, which we do by calling `to_zarr` on the dataset instead of `compute`: | ||
|
||
```{code-cell} ipython3 | ||
mean.to_zarr("mean_air_temperature.zarr", mode="w", consolidated=True); | ||
``` | ||
|
||
We can check that the Zarr file was created by loading it from disk using `xarray.open_dataset`: | ||
|
||
```{code-cell} ipython3 | ||
xr.open_dataset( | ||
"mean_air_temperature.zarr", chunked_array_type="cubed", chunks={} | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
file_format: mystnb | ||
kernelspec: | ||
name: python3 | ||
--- | ||
# Zarr | ||
|
||
Cubed was designed to work seamlessly with Zarr data. The examples below demonstrate using {py:func}`cubed.from_zarr`, {py:func}`cubed.to_zarr` and {py:func}`cubed.store` to read and write Zarr data. | ||
|
||
## Write to Zarr | ||
|
||
We'll start by creating a small chunked array containing random data in Cubed and writing it to Zarr using {py:func}`cubed.to_zarr`. Note that the call to `to_zarr` executes eagerly. | ||
|
||
```{code-cell} ipython3 | ||
import cubed | ||
import cubed.random | ||
# 2MB chunks | ||
a = cubed.random.random((5000, 5000), chunks=(500, 500)) | ||
# write to Zarr | ||
cubed.to_zarr(a, "a.zarr") | ||
``` | ||
|
||
## Read from Zarr | ||
|
||
We can check that the Zarr file was created by loading it from disk using {py:func}`cubed.from_zarr`: | ||
|
||
```{code-cell} ipython3 | ||
cubed.from_zarr("a.zarr") | ||
``` | ||
|
||
## Multiple arrays | ||
|
||
To write multiple arrays in a single computation use {py:func}`cubed.store`: | ||
|
||
```{code-cell} ipython3 | ||
import cubed | ||
import cubed.random | ||
# 2MB chunks | ||
a = cubed.random.random((5000, 5000), chunks=(500, 500)) | ||
b = cubed.random.random((5000, 5000), chunks=(500, 500)) | ||
# write to Zarr | ||
arrays = [a, b] | ||
paths = ["a.zarr", "b.zarr"] | ||
cubed.store(arrays, paths) | ||
``` | ||
|
||
Then to read the Zarr files back, we use {py:func}`cubed.from_zarr` for each array and perform whatever array operations we like on them. Only when we call `to_zarr` is the whole computation executed. | ||
|
||
```{code-cell} ipython3 | ||
import cubed.array_api as xp | ||
# read from Zarr | ||
a = cubed.from_zarr("a.zarr") | ||
b = cubed.from_zarr("b.zarr") | ||
# perform operation | ||
c = xp.add(a, b) | ||
# write to Zarr | ||
cubed.to_zarr(c, store="c.zarr") | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,10 @@ tenacity | |
toolz | ||
tqdm | ||
zarr | ||
cubed-xarray | ||
xarray | ||
pooch | ||
netCDF4 | ||
|
||
# docs | ||
sphinx-book-theme | ||
|