Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

krakenuniq-download is stochastic #174

Open
xapple opened this issue Jun 5, 2024 · 0 comments
Open

krakenuniq-download is stochastic #174

xapple opened this issue Jun 5, 2024 · 0 comments

Comments

@xapple
Copy link

xapple commented Jun 5, 2024

Running the following command from the manual:

$ krakenuniq-download -db DBDIR refseq/viral/Any viral-neighbors

Produces mixed results. Sometimes it produces an error, but not always. I needed to relaunch the exact same command three times before it would complete successfully. I believe it has a stochastic behavior because of the amount of HTTP connections it makes. A small fraction of the connections may fail due to proxies or network congestion, and the script doesn't wrap them in a retry. This is the error message:

(krkn) user@cluster test $ krakenuniq-download --db DBDIR refseq/viral/Any viral-neighbors
Environment contains multiple differing definitions for 'cluster'.
Using value from 'CLUSTER' (xxxx) and ignoring 'cluster' (xxxx) at ~/miniconda3/envs/krkn/lib/perl5/site_perl/LWP/UserAgent.pm line 1134.
Environment contains multiple differing definitions for 'site'.
Using value from 'SITE' (xxxx) and ignoring 'site' (xxxx) at ~/miniconda3/envs/krkn/lib/perl5/site_perl/LWP/UserAgent.pm line 1134.
Downloading assembly summary file for viral genomes, and filtering to assembly level Any.
 Downloading viral genomes:  12254/14992 ... Error fetching https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/856/685/GCF_000856685.1_ViralProj15059/GCF_000856685.1_ViralProj15059_genomic.fna.gz. Is curl installed?
 Downloading viral genomes:  14992/14992 ...   Found 14992 files.
Downloading viral neighbors.
Downloading DBDIR/taxonomy/nucl_gb.accession2taxid.gz [curl -g 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz' -o 'DBDIR/taxonomy/nucl_gb.accession2taxid.gz'] ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2301M  100 2301M    0     0  48.5M      0  0:00:47  0:00:47 --:--:-- 49.0M
 done (48s)
DBDIR/taxonomy/nucl_gb.accession2taxid.gz          check [2.25 GB]
 SUCCESS
Sorting maping file (will take some time) [gunzip -c DBDIR/taxonomy/nucl_gb.accession2taxid.gz | cut -f 1,3 > DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp && sort --parallel 5 -T DBDIR/taxonomy DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp > DBDIR/taxonomy/nucl_gb.accession2taxid.sorted && rm DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp] ... done (4m54s)
DBDIR/taxonomy/nucl_gb.accession2taxid.sorted      check [4.81 GB]
Reading names file ...
Downloading DBDIR/taxonomy/taxdump.tar.gz [curl -g 'https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz' -o 'DBDIR/taxonomy/taxdump.tar.gz'] ...
Download taxdump.tar.gz  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 62.2M  100 62.2M    0     0  10.6M      0  0:00:05  0:00:05 --:--:-- 13.4M
 done (6s)
DBDIR/taxonomy/taxdump.tar.gz                      check [62.24 MB]
 SUCCESS
Storing taxonomy timestamp [date > DBDIR/taxonomy/timestamp] ... done (0s)
Extracting nodes file [tar -C DBDIR/taxonomy -zxvf DBDIR/taxonomy/taxdump.tar.gz nodes.dmp > /dev/null] ... done (2s)
DBDIR/taxonomy/nodes.dmp                           check [186.48 MB]
Extracting names file [tar -C DBDIR/taxonomy -zxvf DBDIR/taxonomy/taxdump.tar.gz names.dmp > /dev/null] ... done (3s)
DBDIR/taxonomy/names.dmp                           check [234.57 MB]
DBDIR/library/viral/Neighbors/esearch_res.jsonDownloading 188670 sequences into DBDIR/library/viral/Neighbors.
query_key=1&webenv=MCID_665f1c6a8d232052172de20c
  Downloading sequences 1 to 10000 of 188670 ... done
  Downloading sequences 10001 to 20000 of 188670 ... done
  Downloading sequences 20001 to 30000 of 188670 ...https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=taxonomy&id=AC_000192
Error fetching https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=taxonomy&id=AC_000192. Is curl installed?
(krkn) user@cluster test $
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant