parallelization after using splitFasta #5984

sarahfarhat · 2025-04-18T14:08:44Z

sarahfarhat
Apr 18, 2025

Hi,
I'm new to Nextflow and I'm trying to create a workflow that runs LTRharvest on a genome.
Since LTRharvest isn't multithreaded, I used splitFasta to split the genome into multiple chunks.

My expectation was that, when I run the workflow, LTRharvest would be executed in parallel—one instance per chunk—utilizing all available CPU cores. However, what I observed is that only one LTRharvest process runs at a time.

Do you have any idea what I might be missing to make it run in parallel?
here si the beginning of the main.nf
`#!/usr/bin/env nextflow
nextflow.enable.dsl=2

include { runLTRharvest } from "${baseDir}/modules/runLTRharvest.nf"
include { concatResLTRharvest } from "${baseDir}/modules/concatResLTRharvest.nf"
include { blastx } from "${baseDir}/modules/blastx.nf"
include { sfAssign } from "${baseDir}/modules/sfAssign.nf"
include { family } from "${baseDir}/modules/family.nf"
include { blastx2 } from "${baseDir}/modules/blastx2.nf"
include { blast2GFF } from "${baseDir}/modules/blast2GFF.nf"
include { gff2fasta } from "${baseDir}/modules/gff2fasta.nf"
include { consensusPerCluster } from "${baseDir}/modules/consensusPerCluster.nf"

/* Default params */
params.chunkSize= 10000
params.nb_cpus=16
log.info """
============
LTR PIPELINE
============
Genome: ${params.genome}
species: ${params.species}
Number of sequences per chunck : ${params.chunkSize}
LTR database: ${params.sfdb}
Number of family rounds: ${params.round}
RTRH database: ${params.rtrhdb}
CPUs number: ${params.nb_cpus}
"""
.stripIndent()

workflow {
/* LTRHarvest will be run on subsets of sequences to make it faster
splitfasta takes the number of sequences per file
by default it is 10000, depending on the number of sequences and their size, you may want to change it*/
Channel
.fromPath(params.genome)
.splitFasta(by: params.chunkSize, file:true)
.set { genome_ch }

/* LTRHarvest suffixator and run */
runLTRharvest(genome_ch)
/* rest of the pipeline*/`

and the ltrharvest process:
`#!/usr/bin/env nextflow

process runLTRharvest {
cpus=19
input:
path genome

output:
path "${genome}.ltr.fa"

script:
"""
gt suffixerator -dna -indexname $genome -db $genome -tis -suf -lcp -des -ssp -sds -memlimit 50GB
gt ltrharvest -index $genome -v -out ${genome}.ltr.fa -minlenltr 100 -maxlenltr 1200
"""

}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallelization after using splitFasta #5984

{{title}}

Replies: 0 comments

Select a reply

parallelization after using splitFasta #5984

sarahfarhat Apr 18, 2025

Replies: 0 comments

sarahfarhat
Apr 18, 2025