parallelization after using splitFasta #5984
Unanswered
sarahfarhat
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I'm new to Nextflow and I'm trying to create a workflow that runs LTRharvest on a genome.
Since LTRharvest isn't multithreaded, I used splitFasta to split the genome into multiple chunks.
My expectation was that, when I run the workflow, LTRharvest would be executed in parallel—one instance per chunk—utilizing all available CPU cores. However, what I observed is that only one LTRharvest process runs at a time.
Do you have any idea what I might be missing to make it run in parallel?
here si the beginning of the main.nf
`#!/usr/bin/env nextflow
nextflow.enable.dsl=2
include { runLTRharvest } from "${baseDir}/modules/runLTRharvest.nf"
include { concatResLTRharvest } from "${baseDir}/modules/concatResLTRharvest.nf"
include { blastx } from "${baseDir}/modules/blastx.nf"
include { sfAssign } from "${baseDir}/modules/sfAssign.nf"
include { family } from "${baseDir}/modules/family.nf"
include { blastx2 } from "${baseDir}/modules/blastx2.nf"
include { blast2GFF } from "${baseDir}/modules/blast2GFF.nf"
include { gff2fasta } from "${baseDir}/modules/gff2fasta.nf"
include { consensusPerCluster } from "${baseDir}/modules/consensusPerCluster.nf"
/* Default params */
params.chunkSize= 10000
params.nb_cpus=16
log.info """
============
LTR PIPELINE
============
Genome: ${params.genome}
species: ${params.species}
Number of sequences per chunck : ${params.chunkSize}
LTR database: ${params.sfdb}
Number of family rounds: ${params.round}
RTRH database: ${params.rtrhdb}
CPUs number: ${params.nb_cpus}
"""
.stripIndent()
workflow {
/* LTRHarvest will be run on subsets of sequences to make it faster
splitfasta takes the number of sequences per file
by default it is 10000, depending on the number of sequences and their size, you may want to change it*/
Channel
.fromPath(params.genome)
.splitFasta(by: params.chunkSize, file:true)
.set { genome_ch }
process runLTRharvest {
cpus=19
input:
path genome
}`
Beta Was this translation helpful? Give feedback.
All reactions