Skip to content

Commit

Permalink
feat: supporting das data with multiple docs update (#307)
Browse files Browse the repository at this point in the history
* update

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

* update io version

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

* update tests for inc_hours = 0

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

* update notebook

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

* update CONTRIBUTING

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

* updateREADME

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>

---------

Signed-off-by: Yiyu Ni <niyiyu@uw.edu>
  • Loading branch information
niyiyu authored Apr 11, 2024
1 parent 21b174d commit 23e0dbf
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 64 deletions.
31 changes: 12 additions & 19 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,31 @@

There are multiple ways in which you can contribute:

**Report a bug**: To report a suspected bug, please raise an issue on Github. Please try to give a short but complete description of the issue. Use ```bug``` as a label on the Github issue.
**Report a bug**: To report a suspected bug, please raise an issue with the ```bug``` label on Github. Please try to give a short but complete description of the issue.

**Suggest a feature**: To suggest a new feature, please raise an issue on Github. Please describe the feature and the intended use case. Use ```enhancement``` as a label on the Github issue.
**Suggest a feature**: To suggest a new feature, please raise an issue with the ```enhancement``` label on Github. Please describe the feature and the intended use case.

## Developing NoisePy

NoisePy is going under major re-development. Part of the core development involves adding data objects and stores, modularizing it to facilitate community development, and giving alternative workflows for HPC, Cloud, and DAS.

Fork the repository, and create your local version, then follow the installation steps:
```bash
conda create -n noisepy python=3.10 pip
conda create -n noisepy -y python=3.10 pip mpi4py
conda activate noisepy
conda install -c conda-forge openmpi
python -m ipykernel install --user --name noisepy
pip install -e ".[dev,aws,mpi]"
```
it will install all of the dependencies of the package, including IPython to start a jupyter notebook (excluded otherwise to minimmize dependencies for command line deployment).

Install the Jupyter notebook and ipython kernel
```bash
pip install ipykernel notebook
python -m ipykernel install --user --name noisepy
```

Install the `pre-commit` hook:
```sh
```bash
pre-commit install
```

This will run the linting and formatting checks configured in the project before every commit.

## Testing
Expand All @@ -47,27 +49,18 @@ is unavailable and the test will fail. This is usually resolved in a few minutes

## Pull Requests

Please follow the [Conventional Commits](https://github.com/uw-ssec/rse-guidelines/blob/main/conventional-commits.md) naming for pull request titles.
Please follow the [Conventional Commits](https://github.com/uw-ssec/rse-guidelines/blob/main/fundamentals/conventional-commits.md) naming for pull request titles.

## Overview

<img src="./docs_old/figures/data_flow.png">
The data processing in NoisePy consists of three steps:

1. **(Optional) Step 0 - Download**: The `download()` function or the `noisepy download` CLI command can be
used to download data from an FDSN web service. Alternatively, data from an [S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/scedc-pds) can be copied
locally using the `aws` CLI, or streamed directly from S3.
used to download data from an FDSN web service. Alternatively, data from an [S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/scedc-pds) can be copied locally using the AWS CLI, or streamed directly from S3.
2. **Step 1 - Cross Correlation**: Computes cross correlaton for pairs of stations/channels. This can done with either the `cross_correlate()` function or the `noisepy cross_correlate` CLI command.
3. **Step 2 - Stacking**: This steps takes the cross correlation computations across multiple timespans and stacks them for a given station/channel pair. This can done with either the `stack_cross_correlations()` function or the `noisepy stack` CLI command.

### Data Formats

NoisePy accesses data through 3 "DataStore" abstract classes: `RawDataStore`, `CrossCorrelationDataStore` and `StackDataStore`. Concrete implementations are provided for ASDF and miniSEED formats (Zarr is in progress). Support for other formats or file organizations can be extended through these classes.

## Using VS Code

The following extensions are recommended:

- [isort](https://marketplace.visualstudio.com/items?itemName=ms-python.isort)
- [black](https://marketplace.visualstudio.com/items?itemName=ms-python.black-formatter)
- [flake8](https://marketplace.visualstudio.com/items?itemName=ms-python.flake8)
40 changes: 13 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
# About NoisePy
NoisePy is a Python package designed for fast and easy computation of ambient noise cross-correlation functions. It provides additional functionality for noise monitoring and surface wave dispersion analysis.

Disclaimer: this code should not be used "as-is" and not run like a blackbox. The user is expected to change local paths and parameters. Submit an issue to github with information such as the scripts+error messages to debug.

Detailed documentation can be found at https://noisepy.github.io/NoisePy/

[![Documentation Status](https://github.com/noisepy/NoisePy/actions/workflows/notebooks.yml/badge.svg)](https://noisepy.github.io/NoisePy/)
[![Build Status](https://github.com/noisepy/NoisePy/actions/workflows/test.yaml/badge.svg)](https://github.com/noisepy/NoisePy/actions/workflows/test.yaml)
[![Codecov](https://codecov.io/gh/noisepy/NoisePy/branch/main/graph/badge.svg)](https://codecov.io/gh/noisepy/NoisePy)
Expand All @@ -18,30 +14,30 @@ NoisePy is going through a major refactoring to make this package easier to deve
# Installation
The nature of NoisePy being composed of python scripts allows flexible package installation, which is essentially to build dependent libraries the scripts and related functions live upon. We recommend using [conda](https://docs.conda.io/en/latest/) or [pip](https://pypi.org/project/pip/) to install.

### Note the order of the command lines below matters ###
**Note the order of the command lines below matters**

## With Conda and pip:
## With Conda and pip
```bash
conda create -n noisepy python=3.10 pip
conda create -n noisepy -y python=3.10 pip
conda activate noisepy
pip install noisepy-seis
```

## With Conda and pip and MPI support:
## With Conda and pip and MPI support
```bash
conda create -n noisepy python=3.10 pip
conda create -n noisepy -y python=3.10 pip mpi4py
conda activate noisepy
conda install -c conda-forge openmpi
pip install noisepy-seis[mpi]
```

## With virtual environment:
## With virtual environment
```bash
python -m venv noisepy
source noisepy/bin/activate
pip install noisepy-seis
```
## With virtual environment and MPI support:

## With virtual environment and MPI support
An MPI installation is required. E.g. for macOS using [brew](https://brew.sh/) :
```bash
brew install open-mpi
Expand All @@ -53,22 +49,18 @@ source noisepy/bin/activate
pip install noisepy-seis[mpi]
```


# Functionality
Here is a list of features of the package:
* download continous noise data based:
+ on webservices using obspy's core functions of [get_station](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_stations.html) and [get_waveforms](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_waveforms.html)
+ on AWS S3 bucket calls, with a test on the SCEDC AWS Open Dataset.
* save seismic data in [ASDF](https://asdf-definition.readthedocs.io/en/latest/) format, which convinently assembles meta, wavefrom and auxililary data into one single file ([Tutorials](https://github.com/SeismicData/pyasdf/blob/master/doc/tutorial.rst) on reading/writing ASDF files)
* offers scripts to precondition data sets before cross correlations. This involves working with gappy data from various formats (SAC/miniSEED) and storing it on local in ASDF.

* performs fast and easy cross-correlation with functionality to run in parallel through [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface)
* **Applications module**:
+ *Ambient noise monitoring*: measure dv/v using a wide variety of techniques in time, fourier, and wavelet domain (Yuan et al., 2021)
+ *Surface wave dispersion*: construct dispersion images using conventional techniques.



# Usage

To run the code on a single core, open the terminal and activate the noisepy environment before run following commands. To run on institutional clusters, see installation notes for individual packages on the module list of the cluster.
Expand All @@ -81,34 +73,29 @@ docker run -v ~/tmp:/tmp ghcr.io/noisepy/noisepy:latest cross_correlate --path /
```

# Tutorials
A short tutorial on how to use NoisePy-seis can be is available as a [web page](https://noisepy.github.io/NoisePy/noisepy_scedc_tutorial.html) or [Jupyter notebook](https://github.com/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb) and can be
A short tutorial on how to use NoisePy can be is available as a [web page](https://noisepy.github.io/NoisePy/noisepy_scedc_tutorial.html) or [Jupyter notebook](https://github.com/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb) and can be
[run directly in Colab](https://colab.research.google.com/github/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb).


This tutorial presents one simple example of how NoisePy might work! We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!
This tutorial presents one simple example of how NoisePy might work. We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!

Chengxin Jiang (chengxinjiang@gmail.com)
Marine Denolle (mdenolle@uw.edu).
Marine Denolle (mdenolle@uw.edu)
Yiyu Ni (niyiyu@uw.edu)

## Taxonomy
Taxonomy of the NoisePy variables.

* ``station`` refers to the site that has the seismic instruments that records ground shaking.
* `` channel`` refers to the direction of ground motion investigated for 3 component seismometers. For DAS project, it may refers to the single channel sensors.
* ``channel`` refers to the direction of ground motion investigated for 3 component seismometers. For DAS project, it may refers to the single channel sensors.
* ``ista`` is the index name for looping over stations

* ``cc_len`` correlation length, basic window length in seconds
* ``step`` is the window that get skipped when sliding windows in seconds
* ``smooth_N`` number of points for smoothing the time or frequency domain discrete arrays.
* ``maxlag`` maximum length in seconds saved in files in each side of the correlation (save on storage)
* ``substack,substack_len`` boolean, window length over which to substack the correlation (to save storage or do monitoring), it has to be a multiple of ``cc_len``.
* ``time_chunk, nchunk`` refers to the time unit that defined a single job. for instace, ``cc_len`` is the correlation length (e.g., 1 hour, 30 min), the overall duration of the experiment is the total length (1 month, 1 year, ...). The time chunk could be 1 day: the code would loop through each cc_len window in a for loop. But each day will be sent as a thread.


# Acknowledgements

## Contributing

Thanks to our contributors so far!

[![Contributors](https://contrib.rocks/image?repo=noisepy/NoisePy)](https://github.com/noisepy/NoisePy/graphs/contributors)
Expand All @@ -127,5 +114,4 @@ Algorithms used:

* (optimal stacking) Yang X, Bryan J, Okubo K, Jiang C, Clements T, Denolle MA. [Optimal stacking of noise cross-correlation functions/](https://doi.org/10.1093/gji/ggac410) _Geophysical Journal International_. 2023 Mar;232(3):1600-18. https://doi.org/10.1093/gji/ggac410


This research received software engineering support from the University of Washington’s Scientific Software Engineering Center ([SSEC](https://escience.washington.edu/software-engineering/ssec/)) supported by Schmidt Futures, as part of the Virtual Institute for Scientific Software (VISS). We would like to acknowledge [Carlos Garcia Jurado Suarez](https://github.com/carlosgjs) and [Nicholas Rich](https://github.com/nrich20) for their collaboration and contributions to the software.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ dependencies = [
"PyYAML==6.0",
"pydantic-yaml==1.0",
"psutil>=5.9.5,<6.0.0",
"noisepy-seis-io>=0.1.13",
"noisepy-seis-io>=0.1.14",
"scipy==1.12.0"
]

Expand Down
File renamed without changes.
9 changes: 7 additions & 2 deletions src/noisepy/seis/noise_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,13 +247,18 @@ def cut_trace_make_stat(fc_para: ConfigParameters, ch_data: ChannelData):
dataS_t = []
dataS = []

# useful parameters for trace sliding
nseg = int(np.floor((fc_para.inc_hours / 24 * 86400 - fc_para.cc_len) / fc_para.step))
sps = int(ch_data.sampling_rate)
starttime = ch_data.start_timestamp
# copy data into array
data = ch_data.data

if fc_para.inc_hours == 0:
# specifically for DAS data, set the inc_hours to 0
nseg = 1
else:
# useful parameters for trace sliding
nseg = int(np.floor((fc_para.inc_hours / 24 * 86400 - fc_para.cc_len) / fc_para.step))

# if the data is shorter than the tim chunck, return zero values
if data.size < sps * fc_para.inc_hours * 3600:
logger.warning(
Expand Down
4 changes: 3 additions & 1 deletion tests/test_cross_correlation.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,13 @@ def get_inventory(self, timespan: DateTimeRange, station: Station) -> obspy.Inve
@pytest.mark.parametrize("cc_method", [CCMethod.XCORR, CCMethod.COHERENCY, CCMethod.DECONV])
@pytest.mark.parametrize("substack", [True, False])
@pytest.mark.parametrize("substack_len", [1, 2])
def test_correlation(rm_resp: RmResp, cc_method: CCMethod, substack: bool, substack_len: int):
@pytest.mark.parametrize("inc_hours", [0, 24])
def test_correlation(rm_resp: RmResp, cc_method: CCMethod, substack: bool, substack_len: int, inc_hours: int):
config = ConfigParameters()
config.samp_freq = 1.0
config.rm_resp = rm_resp
config.cc_method = cc_method
config.inc_hours = inc_hours
if substack:
config.substack = substack
config.substack_len = substack_len * config.cc_len
Expand Down
54 changes: 40 additions & 14 deletions tutorials/noisepy_compositestore_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,6 @@
"__Warning__: NoisePy uses ```obspy``` as a core Python module to manipulate seismic data. If you use Google Colab, restart the runtime now for proper installation of ```obspy``` on Colab."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {
Expand All @@ -67,12 +62,14 @@
"%autoreload 2\n",
"from noisepy.seis import cross_correlate, stack_cross_correlations, __version__ # noisepy core functions\n",
"from noisepy.seis.io.asdfstore import ASDFCCStore, ASDFStackStore # Object to store ASDF data within noisepy\n",
"from noisepy.seis.io.numpystore import NumpyCCStore, NumpyStackStore\n",
"from noisepy.seis.io.compositerawstore import CompositeRawStore\n",
"from noisepy.seis.io.s3store import SCEDCS3DataStore, NCEDCS3DataStore\n",
"from noisepy.seis.io.channel_filter_store import channel_filter\n",
"from noisepy.seis.io.datatypes import CCMethod, ConfigParameters, FreqNorm, RmResp, StackMethod, TimeNorm # Main configuration object\n",
"from noisepy.seis.io.datatypes import Channel, CCMethod, ConfigParameters, FreqNorm, RmResp, StackMethod, TimeNorm # Main configuration object\n",
"from noisepy.seis.io.channelcatalog import XMLStationChannelCatalog # Required stationXML handling object\n",
"import os\n",
"import obspy\n",
"import shutil\n",
"from datetime import datetime, timezone\n",
"from datetimerange import DateTimeRange\n",
Expand Down Expand Up @@ -141,7 +138,7 @@
"outputs": [],
"source": [
"# Initialize ambient noise workflow configuration\n",
"config = ConfigParameters() # default config parameters which can be customized\n"
"config = ConfigParameters() # default config parameters which can be customized"
]
},
{
Expand Down Expand Up @@ -272,14 +269,17 @@
" storage_options=S3_STORAGE_OPTIONS)\n",
"\n",
"\n",
"scedc_store = SCEDCS3DataStore(SCEDC_DATA, scedc_catalog, channel_filter([\"CI\"], scedc_stations, [\"BHE\", \"BHN\", \"BHZ\"]), \n",
"scedc_store = SCEDCS3DataStore(SCEDC_DATA, scedc_catalog, channel_filter([\"CI\"], scedc_stations, \n",
" [\"BHE\", \"BHN\", \"BHZ\", \"HHE\", \"HHN\", \"HHZ\"]), \n",
" timerange, storage_options=S3_STORAGE_OPTIONS)\n",
"ncedc_store = NCEDCS3DataStore(NCEDC_DATA, ncedc_catalog, channel_filter([\"NC\"], ncedc_stations, [\"HHE\", \"HHN\", \"HHZ\"]), \n",
"\n",
"ncedc_store = NCEDCS3DataStore(NCEDC_DATA, ncedc_catalog, channel_filter([\"NC\"], ncedc_stations, \n",
" [\"BHE\", \"BHN\", \"BHZ\", \"HHE\", \"HHN\", \"HHZ\"]), \n",
" timerange, storage_options=S3_STORAGE_OPTIONS)\n",
"\n",
"raw_store = CompositeRawStore({\"CI\": scedc_store, \n",
" \"NC\": ncedc_store}) # Composite store for reading data from both SCEDC and NCEDC\n",
"cc_store = ASDFCCStore(cc_data_path) # Store for writing CC data"
"cc_store = ASDFCCStore(cc_data_path) # Store for writing CC data"
]
},
{
Expand Down Expand Up @@ -323,8 +323,27 @@
},
"source": [
"## Perform the cross correlation\n",
"The data will be pulled from SCEDC, cross correlated, and stored locally if this notebook is ran locally.\n",
"If you are re-calculating, we recommend to clear the old ``cc_store``."
"The data will be pulled from SCEDC & NCEDC, cross correlated, and stored locally if this notebook is ran locally. If you are re-calculating, we recommend to clear the old ``cc_store``."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# we also define a channel pair filter that limits cross-correlation to stations pairs closer than 600 km\n",
"def pair_filter(src: Channel, rec: Channel) -> bool:\n",
" latS = src.station.lat\n",
" lonS = src.station.lon\n",
" latR = rec.station.lat\n",
" lonR = rec.station.lon\n",
" dist, _, _ = obspy.geodetics.base.gps2dist_azimuth(latS, lonS, latR, lonR)\n",
" dist /= 1e3 # to km\n",
" if dist <= 600:\n",
" return True\n",
" else:\n",
" return False"
]
},
{
Expand All @@ -335,7 +354,7 @@
},
"outputs": [],
"source": [
"cross_correlate(raw_store, config, cc_store)"
"cross_correlate(raw_store, config, cc_store, pair_filter=pair_filter)"
]
},
{
Expand Down Expand Up @@ -461,6 +480,13 @@
"plot_all_moveout(sta_stacks, 'Allstack_linear', 0.1, 0.2, 'ZZ', 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -493,7 +519,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.14"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 23e0dbf

Please sign in to comment.