computeMatrix runs substantially slower than expected

Hi there, thank you for your work on deeptools!

I have multiple colleagues who use the `computeMatrix` (reference-point) and `plotHeatmap` tools extensively in their work.
Yet, I'm noticing that `computeMatrix` takes substantially longer than I would expect given what the program is doing. For a typical bigWig file (~1 GB), a regions bedfile that contains ~100,000 ref-seq transcripts, computeMatrix takes well over three hours with a binSize of 1. Even with the default bin size, it can take nearly an hour. Even if it were single-threaded, I would expect a program to query 150,000 regions from a bigWig in a few minutes, not hours. 

As a test, I created a toy script that implements basic computeMatrix reference-point functionality. In my benchmarks below, it runs 100-fold faster than the equivalent `computeMatrix` command, while matching the output. I've tested this on a variety real and simulated bigWigs. For instance, here's how the runtime of my script compares to `computeMatrix` with a randomly generated bigWig:
```
Testing deeptools computeMatrix reference-point against Will's ~200 line script
+ date
Mon Mar  3 08:05:56 PM EST 2025
+ : 'Testing Will'\''s script'
+ ./compute_matrix_faster.py input/random.bigWig input/random_genes.bed
computing matrix

real	0m1.841s
user	0m2.906s
sys	0m0.138s
+ : 'Testing deeptools computeMatrix'
+ deeptools --version
deeptools 3.5.6
++ basename output/deeptools.mat.gz .mat.gz
+ OUT_MAT_BS1=output/deeptools.bs1.mat.gz
+ computeMatrix reference-point -bs 1 -S input/random.bigWig -o output/deeptools.bs1.mat.gz -R input/random_genes.bed
using local install of computeMatrix

real	2m4.223s
user	2m3.890s
sys	0m0.240s
+ set +x
Confirming that matrices are the same...
SUCCESS! :)
```

The test files and [my script](https://github.com/wsowens/computeMatrixBench/blob/master/compute_matrix_faster.py) are found in this repo: [https://github.com/wsowens/computeMatrixBench](https://github.com/wsowens/computeMatrixBench)

I realize this doesn't implement all of the functionality of `computeMatrix`, but the discrepancy in performance is pretty surprising to me. Given the popularity of deeptools, it feels worthwhile to try and optimize this particular command a bit more, and I am happy to help in any way possible.

Do you have any idea why `computeMatrix` runs so much more slowly than the script I've shared, and do you have any thoughts on how to improve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

computeMatrix runs substantially slower than expected #1384

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

computeMatrix runs substantially slower than expected #1384

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions