Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to HDF5 based storage of intermediate data types. #34

Open
cerebis opened this issue Dec 15, 2020 · 0 comments
Open

Switch to HDF5 based storage of intermediate data types. #34

cerebis opened this issue Dec 15, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@cerebis
Copy link
Owner

cerebis commented Dec 15, 2020

Currently data is stored simply compressing pickled python classes.

This approacj was chosen over other serialisation methods as a good-enough and quick approach. However, as time passes and the codebase evoles, class version dependency for existing serialised instances becomes increasingly problematic. This can prevent users wishing to go back to old data and reanalyse with newer version of the software, since the class cannot be deserialised.

Either we must provide conversions between class changes or better avoid this entirely.

Therefore, bin3C should switch to using a class-agnostic and efficient means of storing intermediate analysis results (contact map, clusterings). Though we could pickle plain datatypes, an obvious candidate is HDF5, which would introduce a chunk of dependencies itself. Another alternative is to consider adopting an existing Hi-C HDF5 format, so long as these do not themselves include external class implementation details or extraneous fields not relevant to metagenomics.

@cerebis cerebis added the enhancement New feature or request label Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant