Skip to content

run_segalign crashes on human-chimp (and exits 0!) #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
glennhickey opened this issue Aug 24, 2022 · 2 comments
Open

run_segalign crashes on human-chimp (and exits 0!) #57

glennhickey opened this issue Aug 24, 2022 · 2 comments

Comments

@glennhickey
Copy link
Collaborator

This is with d1a73a0 on a ubuntu 18.04 p3.16xlarge AWS instance

It appears to work with d5fd293. So it's definitely a regression related to changes this June for the overflow bugs in the repeatmasker.

Just looking at these commits, it would seem that the changes to the repeat masker here: d1a73a0 would also need to be applied to run_segalign??

It is very quickly reproduced:

wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/hg38_without_alts_preprocessed.fa.pp.gz
wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/panTro6_preprocessed.fa.pp.gz
gzip -d hg38_without_alts_preprocessed.fa.pp.gz 
gzip -d panTro6_preprocessed.fa.pp.gz

The segalign command

run_segalign panTro6_preprocessed.fa.pp hg38_without_alts_preprocessed.fa.pp --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000 --notransition

The crash

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

But run_segalign still returns 0!

echo $?
0

The full log

Converting fasta files to 2bit format

Executing: "segalign /home/ubuntu/work/panTro6_preprocessed.fa.pp /home/ubuntu/work/hg38_without_alts_preprocessed.fa.pp /home/ubuntu/work/output_13442/data_23960/  --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000
 --notransition"
Using 64 threads
Using 8 GPU(s)

Reading query file ...


Reading target file ...

Start alignment ...

Sending reference block 0 ...

Sending query block 0 with buffer 0 ...

Sending query block 1 with buffer 1 ...
Query block 0, interval 1/52 (0:10000000) with buffer 0
Query block 0, interval 3/52 (20000000:30000000) with buffer 0
Query block 0, interval 7/52 (60000000:70000000) with buffer 0
Query block 0, interval 10/52 (90000000:100000000) with buffer 0
Query block 0, interval 14/52 (130000000:140000000) with buffer 0
Query block 0, interval 17/52 (160000000:170000000) with buffer 0
Query block 0, interval 22/52 (210000000:220000000) with buffer 0
Query block 0, interval 26/52 (250000000:260000000) with buffer 0
Query block 0, interval 30/52 (290000000:300000000) with buffer 0
Query block 0, interval 35/52 (340000000:350000000) with buffer 0
Query block 0, interval 39/52 (380000000:390000000) with buffer 0
Query block 0, interval 42/52 (410000000:420000000) with buffer 0
Query block 0, interval 46/52 (450000000:460000000) with buffer 0
Query block 0, interval 48/52 (470000000:480000000) with buffer 0
Query block 0, interval 50/52 (490000000:500000000) with buffer 0
Query block 0, interval 2/52 (10000000:20000000) with buffer 0
Query block 0, interval 18/52 (170000000:180000000) with buffer 0
Query block 1, interval 4/60 (30000000:40000000) with buffer 1
Query block 0, interval 19/52 (180000000:190000000) with buffer 0
Query block 1, interval 10/60 (90000000:100000000) with buffer 1
Query block 0, interval 21/52 (200000000:210000000) with buffer 0
Query block 0, interval 23/52 (220000000:230000000) with buffer 0
Query block 0, interval 4/52 (30000000:40000000) with buffer 0
Query block 0, interval 24/52 (230000000:240000000) with buffer 0
Query block 0, interval 25/52 (240000000:250000000) with buffer 0
Query block 1, interval 12/60 (110000000:120000000) with buffer 1
Query block 0, interval 28/52 (270000000:280000000) with buffer 0
Query block 0, interval 8/52 (70000000:80000000) with buffer 0
Query block 0, interval 29/52 (280000000:290000000) with buffer 0
Query block 0, interval 31/52 (300000000:310000000) with buffer 0
Query block 0, interval 32/52 (310000000:320000000) with buffer 0
Query block 0, interval 9/52 (80000000:90000000) with buffer 0
Query block 0, interval 33/52 (320000000:330000000) with buffer 0
Query block 0, interval 34/52 (330000000:340000000) with buffer 0
Query block 0, interval 36/52 (350000000:360000000) with buffer 0
Query block 0, interval 5/52 (40000000:50000000) with buffer 0
Query block 0, interval 37/52 (360000000:370000000) with buffer 0
Query block 0, interval 38/52 (370000000:380000000) with buffer 0
Query block 0, interval 40/52 (390000000:400000000) with buffer 0
Query block 0, interval 11/52 (100000000:110000000) with buffer 0
Query block 0, interval 41/52 (400000000:410000000) with buffer 0
Query block 0, interval 43/52 (420000000:430000000) with buffer 0
Query block 0, interval 12/52 (110000000:120000000) with buffer 0
Query block 0, interval 44/52 (430000000:440000000) with buffer 0
Query block 0, interval 45/52 (440000000:450000000) with buffer 0
Query block 0, interval 13/52 (120000000:130000000) with buffer 0
Query block 0, interval 47/52 (460000000:470000000) with buffer 0
Query block 0, interval 15/52 (140000000:150000000) with buffer 0
Query block 0, interval 49/52 (480000000:490000000) with buffer 0
Query block 0, interval 16/52 (150000000:160000000) with buffer 0
Query block 0, interval 51/52 (500000000:510000000) with buffer 0
Query block 0, interval 52/52 (510000000:510113926) with buffer 0
Query block 1, interval 1/60 (0:10000000) with buffer 1
Query block 1, interval 2/60 (10000000:20000000) with buffer 1
Query block 1, interval 3/60 (20000000:30000000) with buffer 1
Query block 1, interval 5/60 (40000000:50000000) with buffer 1
Query block 1, interval 6/60 (50000000:60000000) with buffer 1
Query block 1, interval 7/60 (60000000:70000000) with buffer 1
Query block 0, interval 6/52 (50000000:60000000) with buffer 0
Query block 1, interval 8/60 (70000000:80000000) with buffer 1
Query block 1, interval 9/60 (80000000:90000000) with buffer 1
Query block 0, interval 20/52 (190000000:200000000) with buffer 0
Query block 1, interval 11/60 (100000000:110000000) with buffer 1
Query block 0, interval 27/52 (260000000:270000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

real    2m19.050s
user    1m27.737s
sys     1m50.123s

real    2m19.072s
user    2m25.897s
sys     1m53.887s
No alignment generated

@richard-burhans
Copy link

This is with d1a73a0 on a ubuntu 18.04 p3.16xlarge AWS instance

It appears to work with d5fd293. So it's definitely a regression related to changes this June for the overflow bugs in the repeatmasker.

Just looking at these commits, it would seem that the changes to the repeat masker here: d1a73a0 would also need to be applied to run_segalign??

It is very quickly reproduced:

wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/hg38_without_alts_preprocessed.fa.pp.gz
wget -q http://public.gi.ucsc.edu/~hickey/debug/segalign_debug/panTro6_preprocessed.fa.pp.gz
gzip -d hg38_without_alts_preprocessed.fa.pp.gz 
gzip -d panTro6_preprocessed.fa.pp.gz

The segalign command

run_segalign panTro6_preprocessed.fa.pp hg38_without_alts_preprocessed.fa.pp --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000 --notransition

The crash

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

But run_segalign still returns 0!

echo $?
0

The full log

Converting fasta files to 2bit format

Executing: "segalign /home/ubuntu/work/panTro6_preprocessed.fa.pp /home/ubuntu/work/hg38_without_alts_preprocessed.fa.pp /home/ubuntu/work/output_13442/data_23960/  --format=paf:minimap2 --step=2 --ambiguous=iupac,100,100 --ydrop=3000
 --notransition"
Using 64 threads
Using 8 GPU(s)

Reading query file ...


Reading target file ...

Start alignment ...

Sending reference block 0 ...

Sending query block 0 with buffer 0 ...

Sending query block 1 with buffer 1 ...
Query block 0, interval 1/52 (0:10000000) with buffer 0
Query block 0, interval 3/52 (20000000:30000000) with buffer 0
Query block 0, interval 7/52 (60000000:70000000) with buffer 0
Query block 0, interval 10/52 (90000000:100000000) with buffer 0
Query block 0, interval 14/52 (130000000:140000000) with buffer 0
Query block 0, interval 17/52 (160000000:170000000) with buffer 0
Query block 0, interval 22/52 (210000000:220000000) with buffer 0
Query block 0, interval 26/52 (250000000:260000000) with buffer 0
Query block 0, interval 30/52 (290000000:300000000) with buffer 0
Query block 0, interval 35/52 (340000000:350000000) with buffer 0
Query block 0, interval 39/52 (380000000:390000000) with buffer 0
Query block 0, interval 42/52 (410000000:420000000) with buffer 0
Query block 0, interval 46/52 (450000000:460000000) with buffer 0
Query block 0, interval 48/52 (470000000:480000000) with buffer 0
Query block 0, interval 50/52 (490000000:500000000) with buffer 0
Query block 0, interval 2/52 (10000000:20000000) with buffer 0
Query block 0, interval 18/52 (170000000:180000000) with buffer 0
Query block 1, interval 4/60 (30000000:40000000) with buffer 1
Query block 0, interval 19/52 (180000000:190000000) with buffer 0
Query block 1, interval 10/60 (90000000:100000000) with buffer 1
Query block 0, interval 21/52 (200000000:210000000) with buffer 0
Query block 0, interval 23/52 (220000000:230000000) with buffer 0
Query block 0, interval 4/52 (30000000:40000000) with buffer 0
Query block 0, interval 24/52 (230000000:240000000) with buffer 0
Query block 0, interval 25/52 (240000000:250000000) with buffer 0
Query block 1, interval 12/60 (110000000:120000000) with buffer 1
Query block 0, interval 28/52 (270000000:280000000) with buffer 0
Query block 0, interval 8/52 (70000000:80000000) with buffer 0
Query block 0, interval 29/52 (280000000:290000000) with buffer 0
Query block 0, interval 31/52 (300000000:310000000) with buffer 0
Query block 0, interval 32/52 (310000000:320000000) with buffer 0
Query block 0, interval 9/52 (80000000:90000000) with buffer 0
Query block 0, interval 33/52 (320000000:330000000) with buffer 0
Query block 0, interval 34/52 (330000000:340000000) with buffer 0
Query block 0, interval 36/52 (350000000:360000000) with buffer 0
Query block 0, interval 5/52 (40000000:50000000) with buffer 0
Query block 0, interval 37/52 (360000000:370000000) with buffer 0
Query block 0, interval 38/52 (370000000:380000000) with buffer 0
Query block 0, interval 40/52 (390000000:400000000) with buffer 0
Query block 0, interval 11/52 (100000000:110000000) with buffer 0
Query block 0, interval 41/52 (400000000:410000000) with buffer 0
Query block 0, interval 43/52 (420000000:430000000) with buffer 0
Query block 0, interval 12/52 (110000000:120000000) with buffer 0
Query block 0, interval 44/52 (430000000:440000000) with buffer 0
Query block 0, interval 45/52 (440000000:450000000) with buffer 0
Query block 0, interval 13/52 (120000000:130000000) with buffer 0
Query block 0, interval 47/52 (460000000:470000000) with buffer 0
Query block 0, interval 15/52 (140000000:150000000) with buffer 0
Query block 0, interval 49/52 (480000000:490000000) with buffer 0
Query block 0, interval 16/52 (150000000:160000000) with buffer 0
Query block 0, interval 51/52 (500000000:510000000) with buffer 0
Query block 0, interval 52/52 (510000000:510113926) with buffer 0
Query block 1, interval 1/60 (0:10000000) with buffer 1
Query block 1, interval 2/60 (10000000:20000000) with buffer 1
Query block 1, interval 3/60 (20000000:30000000) with buffer 1
Query block 1, interval 5/60 (40000000:50000000) with buffer 1
Query block 1, interval 6/60 (50000000:60000000) with buffer 1
Query block 1, interval 7/60 (60000000:70000000) with buffer 1
Query block 0, interval 6/52 (50000000:60000000) with buffer 0
Query block 1, interval 8/60 (70000000:80000000) with buffer 1
Query block 1, interval 9/60 (80000000:90000000) with buffer 1
Query block 0, interval 20/52 (190000000:200000000) with buffer 0
Query block 1, interval 11/60 (100000000:110000000) with buffer 1
Query block 0, interval 27/52 (260000000:270000000) with buffer 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorInvalidValue: invalid argument
/usr/local/bin/run_segalign: line 197: 35103 Aborted                 (core dumped) stdbuf -oL segalign $refPath $queryPath $DATA_FOLDER $optionalArguments

real    2m19.050s
user    1m27.737s
sys     1m50.123s

real    2m19.072s
user    2m25.897s
sys     1m53.887s
No alignment generated

@glennhickey I've recently run into the same problem. The driver scripts aren't catching all errors. I'm currently rewriting them in python to improve error handling.

@glennhickey
Copy link
Collaborator Author

Hi @richard-burhans,

Your PR #64 looks very interesting. I have a fork of SegAlign here that cactus uses.

My fork is identical to the master branch of this repo except for this commit ComparativeGenomicsToolkit@fe4b16f which I think I made to resolve this issue.

Anyway, I'm interested in incorporating your changes and any future developments into Cactus. So please considering PRing them to https://github.com/ComparativeGenomicsToolkit/SegAlign

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants