-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigwigmerge will create regions beyond chrom map #78
Comments
Hmm, it's weird. Merging should be done at the value level, so its weird that it looks like the value here looks so cleanly split. I'd try to get to this to take a look, but I have some personal issues going on, so it might be delayed some. |
I'm really not sure without digging into it. |
@donaldcampbelljr can you share the exact cli command used to reproduce? I just tested (with v0.5.4 and master) and couldn't reproduce:
|
@jackh726 Yes! It looks as though inputting only one of the bigwig files during merge (and if that bigwig file is the "bad" one), do I see the error! Command:
I confirmed that inputting both the good and the bad bigwig files together causes the merge to succeed. Inputting only the good bigwig file is also successful. The reason I was testing one bigwig file for merging is because we currently process in parallel via chromosome and then merge our bigwigs at the end. There could be a case where there is one chromosome processed and thus, downstream, we would try to merge a single bw file.
|
Actually, I just realized that this was occurring for multiple chromosomes in the original issue, so my suggested temporary fix would not work. |
Ah...so there is for sure a bug in the code (causing data past the end of the chromosome), but the input data is difficult because anything less than Essentially, the issue is arising because to merge bigwigs, bigtools reads in the data for each in 50kb chunks, then goes through and converts this back into the values needed for the output. So, if you have values like [ Besides that, the actual issue here is that when two values compare equal (difference less than epsilon), bigtools just adds +1 to the length of the value to be added. "Normally" (but not always!), you won't get a difference of zero at the end of chromosome because usually bigwigs don't store zero values and the data off the end of the chrom is zero. Of course, with either the current setup of checking diff. < epsilon, we'll extend the final value past the end of the chrom if that happens or if the final value is zero. So, I'm doing three things:
|
Fixed in 529d9ed. I'll make a release this weekend. |
Excellent! Thank you for the detailed explanation of your findings and the quick fix. |
Published v0.5.5 which fixes this. |
Hey @jackh726 ,
I've run into an interesting issue while troubleshooting: databio/gtars#74
We are currently processing a bam file into a bigwig per chromosome and then merging these outputs into a final chromosome.
There are a couple of levers that we can pull that affects the regions and values passed to the bigwig creation, one is a smoothing parameter which will change the length of the regions and one is a scaling factor which will change how the intermediate bedGraph values are scaled. I'm finding that, depending on a combination of the two, this can lead to
MergingValues
returned fromget_merged_values
to contain regions that extend beyond the chromosome size and thus cause a failure when writing the final bigWig.I believe that it is happening here:
bigtools/bigtools/src/utils/cli/bigwigmerge.rs
Lines 327 to 337 in fee242d
Here is a link to a "good" and "bad" bw:
https://myuva-my.sharepoint.com/:u:/g/personal/zzz3fh_virginia_edu/EW51mMiOj1pFuaVIr2abeq0Bbpjkg8aMaJGrleAzwaM_uA?e=4bq5Rp
For reference the chromsizes file used for the map:
The good bw will have values that end:
and whose last merged values are within range:
The bad bw
and its merged values are outside of the range:
I'm wondering if this issue is happening within the
get_intervals_move
function? Is it not respecting thesize
parameter based on the chrom_map ?I manually tested the bigwigs using the CLI bigwigmerge tool in addition to what we have implemented in gtars to ensure I could reproduce this failure outside of our own implementation.
Where we implement this in gtars:
https://github.com/databio/gtars/blob/5ce8434b5c03a74262df1c7ccdf70f335bae7844/gtars/src/uniwig/mod.rs#L896-L944
The text was updated successfully, but these errors were encountered: