Skip to content

Commit

Permalink
Version 0.1 (#12)
Browse files Browse the repository at this point in the history
* Fix xdas.filter.

* Typo in docs.

* Do not generate output dir.

* Speedup .to_netcdf for VirtualStack.

* update docs.

* Add DataArray.size len(DataArray)

* Add dtype checking when combine_by_coords.

* Make Collections pickable.

* Ensure correct fillvalue for virtual datasets.

* Fix virtual writing with dense non-dimensional coordinates.

* replace netcdf4 by h5netcdf.

* Fix to_xarray for scalar coordinate.

* Allow for Ellipsis in DataArray.transpose.

* Add DataArray.T

* One more concat test.

* Fix concatenate for unsorted coords.

* Add randn_wavefronts and rename wavelet_wavefronts.

* Add MLPicker.

* Allow to pass chunk_dim to atoms in process.

* Move Sequential

* Make Sequential inherit from Atom.

* Fix nasty bug.

* auto chunk_dim for process.

* rename kwargs --> flags.

* Rename chunk -> chunk_dim

* Put zeros for missing channels.

* normalize inplace.

* WIP.

* Add coords handling.

* refactor MLPicker.

* Some more refactoring.

* Make circular buffers as states.

* Fix state initialization.

* small refactoring.

* Add find_picks.

* format.

* more formating.

* Fix _find_picks_numeric axis handling.

* Add chunk processing for find_picks_numeric.

* Fix find_picks_numeric for 1d arrays.

* test trigger on several chunk ago.

* Add tests for find_picks.

* Add TODO.

* Implement offset argument to get absolute index location for chunks.

* remove unused imports

* Add trigger on chunks.

* Small refactor of atoms.Partial.

* Atomize find_picks.

* Fix equals for nan.

* Add DataFrameWriter.

* tupos

* Fix parse_dates in DataFrameWriter.

* feat: Add find_picks trigger to signal module

* Fix virtual writing of collections.

* Add Atomic declaration of trigger.

* Add some doc and do some refactoring.

* chunk_dim flag must be provided or the state is reset.

* Allow to pass atoms as input of atomized functions.

* Add numba to requirements.

* WIP: docs

* Improve getting started.

* Update Partial docstring.

* update atomized docstring.

* Remove damned pylance auto import.

* Linting corrections.

* restore netcdf4 dependency.

* Fix ufunc by providing better broadcasting.

* Use NDArrayOperatorsMixin for better arithmetics support.

* Improve get_discontinuities.

* update span -> delta

* Add copy to datacollection.py

* improves copy for datacollections.

* Fix some bugs.

* Add --force-reinstall for latest.

* CI: initial commit.

* CI: fix python verisons.

* Pytest: import mode = importlib.

* rename action: tests.

* Add tests badge.

* add coord.get_availabilities()

* add plot_availability for DataArray.

* Add availability plot for collections.

* Update version.
  • Loading branch information
atrabattoni authored May 21, 2024
1 parent e4dcd07 commit a884007
Show file tree
Hide file tree
Showing 34 changed files with 2,344 additions and 607 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: tests

on: [push]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install '.[tests]'
- name: Test with pytest
run: |
pytest
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
-----------------

[![Documentation Status](https://readthedocs.org/projects/xdas/badge/?version=latest)](https://xdas.readthedocs.io/en/latest/?badge=latest)
[![Tests Status](https://github.com/xdas-dev/xdas/actions/workflows/tests.yaml/badge.svg)](https://github.com/xdas-dev/xdas/actions/workflows/tests.yaml)
[![PyPI](https://img.shields.io/pypi/v/xdas)](https://pypi.org/project/xdas/)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![DOI](https://zenodo.org/badge/560867006.svg)](https://zenodo.org/badge/latestdoi/560867006)
Expand Down
17 changes: 8 additions & 9 deletions docs/api/atoms.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ Attributes

```{eval-rst}
.. autosummary::
:toctree: ../_autosummary
Atom.state
Atom.initialized
Expand All @@ -30,7 +29,6 @@ Methods

```{eval-rst}
.. autosummary::
:toctree: ../_autosummary
Atom.initialize
Atom.initialize_from_state
Expand All @@ -55,11 +53,12 @@ Methods
.. autosummary::
:toctree: ../_autosummary
signal.ResamplePoly
signal.IIRFilter
signal.FIRFilter
signal.LFilter
signal.SOSFilter
signal.DownSample
signal.UpSample
DownSample
FIRFilter
IIRFilter
LFilter
ResamplePoly
SOSFilter
Trigger
UpSample
```
3 changes: 2 additions & 1 deletion docs/api/synthetics.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@
.. autosummary::
:toctree: ../_autosummary
generate
wavelet_wavefronts
randn_wavefronts
```
15 changes: 3 additions & 12 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,14 @@
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys

# -- Project information -----------------------------------------------------

project = "xdas"
copyright = "2024, Alister Trabattoni"
author = "Alister Trabattoni"

# The full version, including alpha/beta/rc tags
release = "0.1rc0"
release = "0.1"


# -- General configuration ---------------------------------------------------
Expand Down Expand Up @@ -101,13 +92,13 @@
import numpy as np

import xdas as xd
from xdas.synthetics import generate
from xdas.synthetics import wavelet_wavefronts

dirpath = os.path.join(os.path.split(__file__)[0], "_data")
if not os.path.exists(dirpath):
os.makedirs(dirpath)

da = generate()
da = wavelet_wavefronts()
chunks = xd.split(da, 3)
da.to_netcdf(os.path.join(dirpath, "sample.h5"))
da.to_netcdf(os.path.join(dirpath, "sample.nc"))
Expand Down
105 changes: 98 additions & 7 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ pip install xdas
````
````{tab-item} Latest
```bash
pip install "git+https://github.com/xdas-dev/xdas.git@dev"
pip install "git+https://github.com/xdas-dev/xdas.git@dev" --force-reinstall
```
````
Expand Down Expand Up @@ -68,7 +68,7 @@ Xdas only loads the metadata from each file and returns a {py:class}`~xdas.DataA
Note that if you want to create a single data collection object for multiple acquisitions (i.e. different instruments or several acquisition with different parameters), you can use the [DataCollection](user-guide/data-structure/datacollection) structure.

```{note}
For Febus users, the current implementation is very slow when directly working with native files. This is due to the particular 3D layout of the Febus format that is for now virtually reshaped in a inefficient way. The current recommended workflow is to first convert each Febus file in the Xdas NetCDF format: `xdas.open_dataarray("path_to_febus_file.h5", engine="febus").to_netcdf("path_to_xdas_file.nc", virtual=False)`. Those converted file can then be linked as described above.
For Febus users, converting native files into Xdas NetCDF format generally improves I/O operations and reduce the amount of data by a factor two. This can be done by looping over Febus files and running: `xdas.open_dataarray("path_to_febus_file.h5", engine="febus").to_netcdf("path_to_xdas_file.nc", virtual=False)`. The converted files can then be linked as described above.
```

### Fixing small gaps and overlaps
Expand Down Expand Up @@ -141,7 +141,7 @@ da.plot(yincrease=False, vmin=-0.5, vmax=0.5)
```


## Processing
## Signal processing

DataArray can be processed without having to extract the underlying N-dimensional array. Most numpy functions can be applied while preserving metadata. Xdas also wraps a large subset of [numpy](https://numpy.org/) and [scipy](https://scipy.org/) function by adding coordinates handling. You mainly need to replace `axis` arguments by `dim` ones and to provides dimensions by name and not by position.

Expand Down Expand Up @@ -174,10 +174,10 @@ Bellow an example of spatial and temporal decimation:
```{code-cell}
import xdas.signal as xs
da = xs.decimate(da, 2, ftype="fir", dim="distance", parallel=None) # all cores by default
da = xs.decimate(da, 2, ftype="iir", dim="time", parallel=8) # height cores
decimated = xs.decimate(da, 2, ftype="fir", dim="distance", parallel=None) # all cores by default
decimated = xs.decimate(decimated, 2, ftype="iir", dim="time", parallel=8) # height cores
da.plot(yincrease=False, vmin=-0.25, vmax=0.25)
decimated.plot(yincrease=False, vmin=-0.25, vmax=0.25)
```

Here how to compute a FK diagram. Note that the DataArray object can be used to represent any number and kind of dimensions:
Expand All @@ -190,7 +190,7 @@ fk = xs.taper(fk, dim="time")
fk = xfft.rfft(fk, dim={"time": "frequency"}) # rename "time" -> "frequency"
fk = xfft.fft(fk, dim={"distance": "wavenumber"}) # rename "distance" -> "wavenumber"
fk = 20 * np.log10(np.abs(fk))
fk.plot(xlim=(-0.004, 0.004), vmin=-40, vmax=20, interpolation="antialiased")
fk.plot(xlim=(-0.004, 0.004), vmin=-30, vmax=30, interpolation="antialiased")
```

### Saving results
Expand All @@ -200,3 +200,94 @@ Processed data can be saved to NetCDF. This time, because the data was changed,
```{code-cell}
fk.to_netcdf("fk.nc")
```


## Massive processing using Atoms

The usual [numpy](https://numpy.org/)/[scipy](https://scipy.org/) way of processing data works great when the data of interest fit in memory. To deal with huge datasets, xdas introduce {py:class}`~xdas.atoms.Atom` objects.

An {py:class}`~xdas.atoms.Atom` is a generic processing unit that takes one input and return one output. Atoms can store state information to ensure continuity from subsequent calls on contiguous chunks.
There are three ways to make atoms with xdas:

- Function can be *atomized* using the {py:class}`~xdas.atoms.Partial` class. All parameters except the input are fixed.
- The {py:mod}`xdas.atoms` module contains a set of predefined atoms. In particular most stateful atoms are implemented in that module.
- The user can subclass the {py:class}`~xdas.atoms.Atom` class and define its own atoms.

### Transforming a classic workflow into an atomic pipeline

Imagine you tested the following workflow on a small subset of your data:

```{code-cell}
from scipy.signal import iirfilter
b, a = iirfilter(4, 0.1, btype="high")
def process(da):
da = xs.decimate(da, 2, ftype="fir", dim="distance") # not impacted by chunking
da = xs.lfilter(b, a, da, dim="time") # require state passing along time
da = np.square(da) # already a unary operator
return da
monolithic = process(da)
```

To convert your workflow into an atomic pipeline you need:

1. to convert each processing step into an atom
2. to bundle all steps into a {py:class}`~xdas.atoms.Sequential` atom.

Converting each processing step into an atom depend on the nature of the step. In particular it depends wether the operation is **stateful** (it does rely on the history along the chunked dimension) or **stateless** (the operation can by applied separately on each chunk along the given dimension without any particular consideration). An example of a stateful operation is a recursive filter, passing on the state from t to t+1.Not that this stateful/less caracteristic depends on the chunking dimension.

- unary operators that are not stateful (that do not rely on the history along the chunked axis) can be used as is.
- functions that are not stateful must be wrapped with the {py:class}`~xdas.atoms.Partial` class.
- functions that **are stateful** must be replaced by an equivalent stateful object.

In practice, the atomized workflow can be implemented like below. The resulting atom is a callable that can be applied to any data array.

```{code-cell}
from xdas.atoms import Sequential, Partial, LFilter
atom = Sequential(
[
Partial(xs.decimate, 2, ftype="fir", dim="distance"), # use Partial when stateless
LFilter(b, a, dim="time"), # use equivalent atom object if stateful
np.square, # do nothing if unary and stateless
]
)
atomic = atom(da)
assert atomic.equals(monolithic) # works as `process` but can by applied chunk by chunk
```

### Applying an atom chunk by chunk

While atoms can be used as an equivalent of functions to organize pipelines, their major selling points is their abilities to enable chunk processing. While chunk by chunk processing can be done manually, xdas provides the {py:mod}`xdas.processing` module to facilitate this operation. The user must define one data loader and one data writer. Then the {py:func}`~xdas.processing.process` function is used to run the computation.

```{code-cell}
:tags: [remove-cell]
!mkdir output
```

In the example below the data array is loaded by chunks of 100 samples along the `"time"` dimension. Each chunk is processed by the atom that was defined above and each resulting processed chunk is saved in the `output` folder. Once the computation is completed, the data loader return a unified view on the output chunks.

```{code-cell}
:tags: [remove-output]
from xdas.processing import process, DataArrayLoader, DataArrayWriter
dl = DataArrayLoader(da, chunks={"time": 100})
dw = DataArrayWriter("output")
chunked = process(atom, dl, dw)
assert chunked.equals(monolithic) # again equal but could be applied to much bigger datasets
```

```{code-cell}
:tags: [remove-cell]
!rm -r output
```

This part was a short summary about atoms and chunk processing. To go deeper on the atom part you can head to the [](user-guide/atoms) section. To further study chunk processing you can head to the [](user-guide/processing) section.
4 changes: 2 additions & 2 deletions docs/user-guide/atoms.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ The last operation, `IIRFilter`, instantiates a specific class dedicated to chun
Once the processing sequence has been defined, it can operate on data in memory by simply calling the sequence with the data array as the argument:

```{code-cell}
from xdas.synthetics import generate
from xdas.synthetics import wavelet_wavefronts
da = generate()
da = wavelet_wavefronts()
result = sequence(da)
result.plot(yincrease=False)
```
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/data-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The formats that are currently implemented are: ASN, FEBUS, OPTASENSE and SINTEL
| OPTASENSE | `"optasense"` |
| SINTELA | `"sintela"` |

## Exdending *xdas* with your file format
## Extending *xdas* with your file format

*xdas* insists on its extensibility, the power is in the hands of the users. Extending *xdas* usually consists of writing few-line-of-code-long functions. The process consists in dealing with the two main aspects of a {py:class}`xarray.DataArray`: unpacking the data and coordinates objects, eventually processing them and packing them back into a Database object.

Expand Down
11 changes: 7 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,21 @@ build-backend = "setuptools.build_meta"

[project]
name = "xdas"
version = "0.1rc0"
requires-python = ">= 3.7"
version = "0.1"
requires-python = ">= 3.10"
authors = [
{ name = "Alister Trabattoni", email = "alister.trabattoni@gmail.com" },
]
dependencies = [
"dask",
"h5netcdf",
"h5py",
"netcdf4",
"numba",
"numpy",
"obspy",
"pandas",
"plotly",
"scipy",
"tqdm",
"xarray",
Expand All @@ -34,11 +37,11 @@ docs = [
"sphinx-copybutton",
"sphinx",
]
tests = ["pytest", "pytest-cov"]
tests = ["pytest", "pytest-cov", "seisbench", "torch"]

[tool.isort]
profile = "black"

[tool.pytest.ini_options]
addopts = "--doctest-modules"
addopts = ["--doctest-modules", "--import-mode=importlib"]
doctest_optionflags = "NORMALIZE_WHITESPACE"
Loading

0 comments on commit a884007

Please sign in to comment.