-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats for multiple infection samples and reports workflow #116
Conversation
@TimHHH , I've implemented what we've discussed in #106 and as of now, the failed samples would be published in the same parent folder as the normal workflow, but within the Here's an example of the published stats for samples which didn't pass the QC threshold
|
Could you please run this branch and see if it works as expected? If so, then I'd also like to address #99 within this branch and would like to discuss with you which results need to be published in the consolidated report, in our next meeting. |
.join( GATK_COLLECT_WGS_METRICS.out ) | ||
.join( GATK_FLAG_STAT.out ) | ||
|
||
//TODO: If these stats are needed, only then implement the following processes to accommodate the missing NTM stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//TODO: If these stats are needed, only then implement the following processes to accommodate the missing NTM stats
This could be accommodated better if we convert the shell scripts into proper python scripts as documented here #39
We had/have some serious cluster issues, but I am trying this again atm. |
@abhi18av On the One the Do you need any other data to check this issue? |
I think this much should be enough for me to continue investigating this - thanks Tim! |
@TimHHH , I have tried to reproduce the same issues on my SLURM cluster, however I haven't seen a caching problem. Hypothesis
In your position I'd take this forward with the sys-admin team responsible for the file system setup and operations. The node where head job is running is responsible for maintaining cache database in From the nextflow side, we can use a process.cache = 'lenient'
I've not seen this issue from any other community sources therefore, I consider the NF version being the root cause a highly unlikely. For me the caching is working as expected and causes zero reruns
I'm adding below my experimentation notebook logs to keep a history. Tower workspace (TORCH/XBS-Nextflow)Run-1
|
Therefore, from my side if the overall results are as expected from the new workflow - we can take this PR forward and merge it. |
Discussion from the meeting (13-09-2022)
NOTE: This is somewhat hacky and needs to be done carefully, by saving the results of QUANTTB/COHORT_STATS
|
Run 1: friendly_hopper Results:
Run 2: tender_hoover
Results:
Run 3: mad_austin Fresh run exactly as Run 1 to see if same pattern persists, updated default.config outdir and vcf names. Results:
Run 4: sharp_poisson
Results:
Run 5: sleepy_stonebraker Run 6: prickly_archimedes Run 7: goofy_agnesi Results run 4/5/6: all three runs behave as expected and send the 8 approved sample to BWA mapping. Final remarks @abhi18av @LennertVerboven your opinions / suggestions are welcomed. |
Hmm, then perhaps to test this we can narrow it down to |
I have done some additional runs on the dataset for the publication and encountered another important observation. This was with the same version/branch. I started off with this run: And then kept using All jobs then successfully finished here: Then for the sake of it I did another bunch of @abhi18av as you will see the number of samples in the |
Hi I added a python script here and changed the quanttb cohort stats here to run the new python script. The output are two files The new tsv files have the same layout as the previous quanttb cohort tsv file |
Thanks @LennertVerboven , I have now accommodated the script in the workflow and I think we should give it another go on your cluster 🤞 |
This PR address
REPORTS_WF
)