-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dsl2: metagenomics uncollapsed paired end #1098
base: dev
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
|
dsl-2 metagenomics-pairedend
Ready for review: Basic Overview: We can now retain PE information from unmerged input fastq files (when PE samples going into metagenomics will be separately input into metaphlan, kraken2, and krakenuniq. Malt does not accept PE reads so it is not allowed with the param combo. Also fixed a small error in the parsing of metagenomics input data from bamfiltering --> metagnomics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on visual inspection, it looks good to me!
I would remove the one map-channel manipulation in metagenomics (see comment) as it is not necessary anymore
Also I have not tested anything, so I trust your tests for now :P
} | ||
|
||
// Routing for metagenomic screening -> first accounting for paired-end mapping, then merged mapping, then no metagenomics | ||
if ( ( params.run_metagenomics && params.metagenomics_input == 'unmapped' ) && params.preprocessing_skippairmerging ) { | ||
ch_fastq_for_metagenomics = CAT_FASTQ_UNMAPPED.out.reads | ||
ch_fastq_for_metagenomics = CAT_FASTQ_UNMAPPED.out.reads.mix(ch_paired_fastq_for_cat) | ||
} else if ( ( params.run_metagenomics && ( params.metagenomics_input == 'mapped' || params.metagenomics_input == 'all' ) ) && params.preprocessing_skippairmerging ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input now is 'all', the unmapped reads are not taken into account.
I think that needs one more if-condition, with something like
if metagenomics_input == 'all':
CAT_FASTQ_MAPPED.out.reads.mix(CAT_FASTQ_UNMAPPED.out.reads).mix(ch_paired_fastq_for_cat)
but I might miss something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the unmapped reads are not filtered out? then it makes sense...
|
||
So: needs to be fixed up higher (eg in bamfiltering.nf, likely with a new adjustment to the SAMTOOLS_FASTQ_UNMAPPED, SAMTOOLS_FASTQ_MAPPED, and SAMTOOLS_VIEW_BAM_FILTERING modules ) | ||
|
||
ISSUE FOUND: while the outputting of PE reads is OK in bamfiltering.nf (fastq_mapped & fastq_unmapped) when overlap merging is not done cat_fastq weirdly merges singletons to one PE file and other to the other PE file, so then everything gets fucked up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a hackathon-thing to do
@@ -137,20 +137,32 @@ workflow METAGENOMICS_PROFILING { | |||
} | |||
|
|||
else if ( params.metagenomics_profiling_tool == 'krakenuniq' ) { | |||
// run krakenuniq once for all samples | |||
// run krakenuniq once for all samples, unless non-merged PE vs SE data | |||
|
|||
ch_krakenuniq_input = ch_reads | |||
.map{ meta, file -> | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the meta already contains the informatino, this whole map can go away. I created this map only to add the 'single_end' back to the meta
@@ -118,10 +118,6 @@ workflow PIPELINE_INITIALISATION { | |||
} | |||
[ meta, r1, r2, bam ] | |||
} | |||
.groupTuple() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure - this was the weird input-validation that did nothing and randomly failed?
PR for metagenomics paired-end merging off, retaining that info for metagenomic tools that have PE modes.
This is a QOL improvement for tools that can use paired end information (kraken2, krakenuniq), and now allows for variable inputs into these tools (by splitting inputs into either PE or SE data and running separate instances).
Main issues resolved:
--preprocessing_skippairmerging
used.PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).