Hydromosaic-db

The hydromosaic database stores metadata about modeled stream flow, stream temperature, and oxygen content. The data is in netCDF files, each of which contains multiple timeseries simulated for different stream outlets. This database supports a data server to locate and stream available data for each stream outlet.

It has no geographical data on the locations or upstream/downstream relationships of stream outlets; geography and mapping are handled by a separate database.

Initializing a new database

Database design is handled with sqlalchemy and alembic. In order to initialize or upgrade a database:

Install the ORM with poetry install
Edit the sqlalchemy.url line in alembic.ini to a connection string for the database you wish to update
Type poetry run alembic upgrade head to initialize the database

Indexing netCDF files into the database

The indexing script is installed by poetry. It accepts a directory and a database connection string as arguments, and attempts to index every netCDF file in the directory into the database. To run the indexing script:

poetry run index_directory -d postgresql://user:password@server:port/hydromosaic /path/to/data/directory/

Indexing pitfalls

The indexing script skips over any file already in the database. If you add data to or remove data from a file, it will not be updated unless you delete the file from the database first.

The indexing script identifies a file by its full path. If you rename or move a file, and index it in its new location, its data will be duplicated in the database unless you delete the old file entry first.

Expected File format

Timeseries netCDFs are expected to have the following format:

time and nbasins dimensions. nbasins should have cardinality equal to the number of sites or outlets in this file
a basin_name variable with the single dimension nbasins. It should provide a string code to uniquely identify each outlets
one or more variables with dimensions nbasins and time. These should conform to CF Conventions for variable metadata, including units and long_name.
global metadata in accordance with the PCIC Metadata Standards. Load-bearing attributes in this case concern the model and the emissions scenario, required for the database:
- downscaling_GCM_institute_id
- downscaling_GCM_model_id
- downscaling_GCM_experiment_id
- downscaling_GCM_experiment

An alternate prefix (instead of downscaling_GCM_, which is used for data generated by hydrological models forced by downscaled GCM data) may be supplied via the -p argument, if needed for data with an alternate history, such as hydrological models forced by gridded observation data.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
alembic		alembic
hydromosaic		hydromosaic
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydromosaic-db

Initializing a new database

Indexing netCDF files into the database

Indexing pitfalls

Expected File format

About

Releases

Packages

Languages

License

pacificclimate/hydromosaic-db

Folders and files

Latest commit

History

Repository files navigation

Hydromosaic-db

Initializing a new database

Indexing netCDF files into the database

Indexing pitfalls

Expected File format

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages