-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gtars uniwig trying to create coordinates larger than chr #74
Comments
Thank you @ljmills for posting and for sending some example files. I am unable to get the exact same issue as above. I am seeing, what I believe to be, a separate issue in the problem files you sent over. For Emulating gtars command from PEPATAC: I do not get a completion. The program simply hangs during bigwig creation. I'm seeing the intermediate lines being written as such:
The counts are not extending past the chrom size in this particular example (123556469). However, I noticed that the 4th column's numbers are very small and that if I simply change the Again, I'm having difficulties reproducing the exact same error. If the files are hanging during bigwig creation, a temp solution could be to manipulate the If one lowers the scaling factor by a few orders of magnitude, it will allow the bigwig conversion to complete. |
The PEPATAC pipeline is setting the bascale number based on the data so I
don't want to change it manually. You should be able to download the full
bam file and bai using the links below. If you still can't recreate
the error then I will ask the PEPATAC people about the bamscale value.
Also I am not getting a hang on my end most of the chromosomes process fine
only a couple for each sample will fail and they aren't consistent across
the sample. Chr 1 failed in this sample but not all samples.
Thanks,
-Lauren
full bam file:
https://drive.google.com/file/d/1_OiOP3ADLcX7LpTWGI-UJfxUEAEBd7_h/view?usp=sharing
full bai file:
https://drive.google.com/file/d/1BILsI5ltkY1XDGGlpRjGiO4k20PGkHq5/view?usp=sharing
…On Mon, Jan 27, 2025 at 1:12 PM Donald Campbell ***@***.***> wrote:
Thank you @ljmills <https://github.com/ljmills> for posting and for
sending some example files.
I am unable to get the exact same issue as above. I am seeing, what I
believe to be, a separate issue in the problem files you sent over.
For testChr1.bam where chrom.sizes for chr1 are such:
chr1 123556469
Emulating gtars command from PEPATAC:
./target/release/gtars uniwig --file
"/home/drc/Downloads/gtars_74_issue/input/testchr1.bam" -c
/home/drc/Downloads/gtars_74_issue/input/canFam4Y.chrom.sizes -m 25 -s 1 -t
bam -y bw -p 6 -u shift --bamscale 100227120.0 -l
/home/drc/Downloads/gtars_74_issue/output/test
I do not get a completion. *The program simply hangs during bigwig
creation.*
I'm seeing the intermediate lines being written as such:
chr1 0 123554979 0
chr1 123554979 123555028 0.00000000997734
chr1 123555028 123555030 0.00000001995468
chr1 123555030 123555051 0.00000000997734
chr1 123555051 123555079 0.00000001995468
chr1 123555079 123555102 0.00000000997734
chr1 123555102 123555120 0
chr1 123555120 123555122 0.00000000997734
chr1 123555122 123555163 0.00000001995468
chr1 123555163 123555171 0.000000029932018
chr1 123555171 123555173 0.00000001995468
chr1 123555173 123555214 0.00000000997734
chr1 123555214 123555291 0
chr1 123555291 123555342 0.00000000997734
chr1 123555342 123555394 0
chr1 123555394 123555428 0.00000001995468
chr1 123555428 123555432 0.000000029932018
chr1 123555432 123555445 0.00000003990936
chr1 123555445 123555479 0.00000001995468
chr1 123555479 123555483 0.00000000997734
chr1 123555483 123555529 0
chr1 123555529 123555549 0.00000000997734
chr1 123555549 123555550 0.00000001995468
chr1 123555550 123555573 0.000000029932018
chr1 123555573 123555580 0.00000003990936
chr1 123555580 123555600 0.000000029932018
chr1 123555600 123555601 0.00000001995468
chr1 123555601 123555675 0.00000000997734
chr1 123555675 123555700 0
chr1 123555700 123555727 0.00000000997734
chr1 123555727 123555751 0.00000001995468
chr1 123555751 123555778 0.00000000997734
chr1 123555778 123555824 0
chr1 123555824 123555853 0.00000000997734
chr1 123555853 123555875 0.00000001995468
chr1 123555875 123555904 0.00000000997734
chr1 123555904 123555996 0
chr1 123555996 123556047 0.00000000997734
chr1 123556047 123556111 0
chr1 123556111 123556135 0.00000000997734
chr1 123556135 123556162 0.00000001995468
chr1 123556162 123556186 0.00000000997734
chr1 123556186 123556341 0
chr1 123556341 123556392 0.00000000997734
chr1 123556392 123556428 0
chr1 123556428 123556429 0.00000001995468
The counts are not extending past the chrom size in this particular
example (123556469).
However, I noticed that the 4th column's numbers are very small and that
if I simply change the --bamscale number to something smaller, the
process will no longer hang and will complete just fine.
Again, I'm having difficulties reproducing the exact same error.
If the files are hanging during bigwig creation, a temp solution could be
to manipulate the pepatac.py lines (this occurs in a few places! - 4
total) :
https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1551
If one lowers the scaling factor by a few orders of magnitude, it will
allow the bigwig conversion to complete.
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUCYOKFTIKJNTYW2XGRD2M2AIJAVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJWGY3TMNJSGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Excellent. Thank you for providing these. I was able to reproduce the issue. It is related to the smoothing argument This may be a regression as I thought we had fixed this in the past. I will need to investigate and issue a new gtars release, etc. Some suggested solutions for the near future:
|
Thanks Donald. I think I will edit my pepatac.py for now and I look forward
to the update to gtars/pepatac.
…-L
On Tue, Jan 28, 2025 at 11:39 AM Donald Campbell ***@***.***> wrote:
Excellent. Thank you for providing these.
I was able to reproduce the issue. It is related to the smoothing argument -m
25 in the pipeline. It seems as though anything greater than 6 will cause
upper bound of the smoothing to be beyond the chromosome size (for Chr1 in
the above example).
This may be a regression as I thought we had fixed this in the past. I
will need to investigate and issue a new gtars release, etc.
Some suggested solutions for the near future:
1.
Wait for next gtars release. This will probably be within the next
week or so.
2.
Regarding the PEPATAC pipeline (which I also maintain), I believe the
produced bigwig files are not used downstream, *only* the BED file.
Therefore, a temporary solution would be to change the smoothing factor in
your local PEPATAC for the produced bigwig:
https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1541
3.
You can pull and use the previous version of PEPATAC:
https://github.com/databio/pepatac/releases/tag/v0.11.3 It does not
use gtars uniwig and instead relies on a python implementation (it is
slower).
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUC6PMKVCE6ZDSDP2BKD2M66E7AVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJZGY3DENBXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Some troubleshooting notes:I confirmed that the bedGraph lines being reported are not extended beyond the chromosome during uniwig counting, regardless of the smooth size (-m 25 in the below example).
But we still get an error when writing the merged bigwig file. Checking the intermediate bigwig file (before merging), I used bigTools to convert it to a bedGraph and see that the bedGraph also does not contain regions beyond the chromsize:
The final step before it errors is is the merging of any bigwigs: Lines 912 to 944 in 5ce8434
It is not clear why this step would be affected by the smooth_size parameter nor why it could change the merged bigwig in such a way that would cause a region to extend over the chrom_sizes. |
Ok, I believe I've figured out where this is happening. It is during the merging step when all of the individual chromosome bigwigs are merged. Due to a combination of region length and associated values, the merging step can produce a value that extends beyond the chromosome length and then fail. I will submit a bug for discussion in the Bigtools. Some notes for reference:Individually produced bw file and then manually attempt to merge using bigtools bigwigmerge large bamscale, smooth = 25, fails bigwigmerge
small bamscale, smooth = 25, passes bigwigmerge
large bamscale, smooth = 5, passes bigwigmerge
Checking output after merge call ( large bamscale, smooth = 25, fails bigwigmerge
small bamscale, smooth = 25, passes bigwigmerge
So, depending on the region length and the values, the merging step can cause a start and end to extend beyond the chrom sizes in the provided chrom map. Happening here: Line 915 in 5ce8434
|
This should be fixed with the next release of |
Awesome, thanks for the heads up.
…-L
On Mon, Feb 10, 2025 at 2:42 PM Donald Campbell ***@***.***> wrote:
This should be fixed with the next release of bigtools and the subsequent
point release of gtars.
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUC6DFXMX2NLVFHVGFVL2PEFMVAVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBZGE4TAMJRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Newest version of Therefore, I have also released a new version of PEPATAC with the newest |
updated gtars and tried to run PEPATAC again and gtars uniwig is now having a new issue when trying to create bed files. You can use the same example files from earlier to test this issue. % gtars --version gtars uniwig -f /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN09-d1-A_S1/aligned_canFam4Y/UMGC-Mod-084-1s-MN09-d1-A_S1_sort_dedup.bam -c /home/ljmills/shared/bin/pepatac/refgeneFolder/alias/canFam4Y/fasta/default/canFam4Y.chrom.sizes -m 25 -s 1 -t bam -y bed -p 16 -u shift -l /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN09-d1-A_S1/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN09-d1-A_S1 --bamscale 944184.96 Processing successful for chr5 There are chromosome specific bed files in the output directory. The last line of the Chr 1 bed file |
Looking just upstream of that error, I see this: for chrom in chromosomes.iter() {
let file_name = format!(
"{}{}_{}.{}",
bwfileheader, chrom.chrom, location, output_type
);
inputs.push(file_name);
} What would cause one of these inputs to not exist, @donaldcampbelljr? Is it possible for a file to not exist? If so, could we just skip it? |
I have this file Regarding the bed file's last row extending beyond the chrom size, I think that is unrelated to the above issue, but we should clip it in the @nleroy917 Good idea. By the time the program gets this far, it has checked that the chromosomes exist in the bam file. However, it could raise an error during writing to the bed file line by line, but we should see this error in the terminal (which I don't in this case). We could skip that file. As long as the program communicates that it was expecting a chromosome file but didn't find one that would be fine. Currently, the messaging is inadequate either way as I'm not sure which chrom failed from the backtrace. :) Also, I would think the consumer handle would produce an empty file either way by this point. |
gtars uniwig -f
/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam
-c
/home/ljmills/shared/bin/pepatac/refgeneFolder/alias/canFam4Y/fasta/default/canFam4Y.chrom.sizes
-m 25 -s 1 -t bam -y bed -p 16 -u shift -l
/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44
--bamscale 1002271.2`
Results in the same error.
Processing successful for chr9
Processing successful for chr1
thread 'main' panicked at src/uniwig/writing.rs:83:49:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound,
message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a
backtrace
</pre>
and gives me a bed file that extends beyond the chr size
chr1 123556383 123556433 N 0 +
chr1 123556429 123556479 N 0 -
I ran *conda* install rust-gtars to update the version of gtars inside of
my pepatac conda environment. Does it have other dependencies that might
also need updating?
…-L
On Fri, Feb 14, 2025 at 3:17 PM Donald Campbell ***@***.***> wrote:
Reopened #74 <#74>.
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUC5IB23XNOK6PPVWRBL2PZMPFAVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJWGMYTONJQGQZDOMY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ok, I was able to reproduce (was still using a smaller subset of chroms). Added a print function immediately before opening each input file for the merge. /home/drc/Downloads/gtars_74_issue/output/testchrY_NC_051844.1_shift.bed Checking the folder, I see a file:
So, it appears to me that it is dropping the Similar issue for the following chromosome: |
The produced bed file is intended to be used downstream for macs3 peak calling. I checked and noticed that the original method ( Example downstream call to macs3: Will this extension past the chromosome length cause issues downstream for macs3 peak calling and subsequent processing? |
Chromosome names with |
@ljmills the newest version of |
I am using gtars uniwig as part of the PEPATAC pipeline line and I am consistently getting this error across all of my ATAC-seq samples. It seems to happen on chromosomes that have alignment near the end of the chromosomes. I have checked the alignments and I don't have reads aligned off the ends of these chromosomes so I think it is an issue with gtars.
Produce signal tracks (01-26 17:22:21) elapsed: 1118.0 TIME
Target to produce:
/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_exact_shift.bw
,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_smooth_shift.bw
,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_shift.bed
Command completed. Elapsed time: 0:00:49. Running peak memory: 5.237GB.
PID: 3214339; Command: gtars; Return code: 0; Memory used: 0.137GB
Command completed. Elapsed time: 0:00:40. Running peak memory: 5.237GB.
PID: 3245409; Command: gtars; Return code: 0; Memory used: 0.012GB
The text was updated successfully, but these errors were encountered: