Open
Description
When working on a reasonably large dataset (7TiB Zarr store), I noticed that diversity calculations, in particular windowed ones, generate lots of unmanaged memory. The call_genotype
data portion is 280GiB stored, 7.2TiB actual size. There are some 3 billion sites and 1000 samples. At the end of a run, there is almost 300GiB unmanaged memory, which rules out the use of 256 and possibly 512GB memory nodes (I've been testing dask LocalCluster). Maybe this is more an issue with dask, but I thought I'd post it in case there is something that can be done to free up memory in the underlying implementation.
Metadata
Metadata
Assignees
Labels
No labels