Skip to content

Commit 81ee089

Browse files
authored
Merge pull request #7 from MPUSP/dev
feat: major simplification of template and update docs
2 parents 432f7c2 + b5c292f commit 81ee089

21 files changed

+129
-426
lines changed

.github/workflows/main.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
steps:
3030
- uses: actions/checkout@v4
3131
- name: Lint workflow
32-
uses: snakemake/snakemake-github-action@v1.25.1
32+
uses: snakemake/snakemake-github-action@v2
3333
with:
3434
directory: .
3535
snakefile: workflow/Snakefile
@@ -45,14 +45,14 @@ jobs:
4545
- uses: actions/checkout@v4
4646

4747
- name: Test workflow
48-
uses: snakemake/snakemake-github-action@v1.25.1
48+
uses: snakemake/snakemake-github-action@v2
4949
with:
5050
directory: .test
5151
snakefile: workflow/Snakefile
52-
args: "--use-conda --show-failed-logs --cores 3 --conda-cleanup-pkgs cache"
52+
args: "--sdm conda --show-failed-logs --cores 2 --conda-cleanup-pkgs cache"
5353

5454
- name: Test report
55-
uses: snakemake/snakemake-github-action@v1.25.1
55+
uses: snakemake/snakemake-github-action@v2
5656
with:
5757
directory: .test
5858
snakefile: workflow/Snakefile

.snakemake-workflow-catalog.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# configuration of display in snakemake workflow catalog: https://snakemake.github.io/snakemake-workflow-catalog
22

33
usage:
4-
mandatory-flags: # optional definition of additional flags
5-
desc: # describe your flags here in a few sentences (they will be inserted below the example commands)
4+
mandatory-flags:
5+
desc: # describe your flags here in a few sentences
66
flags: # put your flags here
7-
software-stack-deployment: # definition of software deployment method (at least one of conda, singularity, or singularity+conda)
8-
conda: true # whether pipeline works with --use-conda
9-
singularity: false # whether pipeline works with --use-singularity
10-
singularity+conda: false # whether pipeline works with --use-singularity --use-conda
11-
report: true # add this to confirm that the workflow allows to use 'snakemake --report report.zip' to generate a report containing all results and explanations
7+
software-stack-deployment:
8+
conda: true # whether pipeline works with '--sdm conda'
9+
apptainer: true # whether pipeline works with '--sdm apptainer/singularity'
10+
apptainer+conda: true # whether pipeline works with '--sdm conda apptainer/singularity'
11+
report: true # whether creation of reports using 'snakemake --report report.zip' is supported

.template/config/config.yaml.tmpl.tmpl

Lines changed: 0 additions & 1 deletion
This file was deleted.

.template/workflow/Snakefile.tmpl.tmpl

Lines changed: 0 additions & 10 deletions
This file was deleted.

.test/config/config.yml

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,9 @@
11
samplesheet: "config/samples.tsv"
22

33
get_genome:
4-
database: "ncbi"
5-
assembly: "GCF_000006785.2"
6-
fasta: Null
7-
gff: Null
8-
gff_source_type:
9-
[
10-
"RefSeq": "gene",
11-
"RefSeq": "pseudogene",
12-
"RefSeq": "CDS",
13-
"Protein Homology": "CDS",
14-
]
4+
ncbi_ftp: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz
155

166
simulate_reads:
177
read_length: 100
188
read_number: 100000
19-
random_freq: 0.01
20-
21-
cutadapt:
22-
threep_adapter: "-a ATCGTAGATCGG"
23-
fivep_adapter: "-A GATGGCGATAGG"
24-
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"]
25-
26-
multiqc:
27-
config: "config/multiqc_config.yml"
9+
random_reads: 0.01

.test/config/multiqc_config.yml

Lines changed: 0 additions & 2 deletions
This file was deleted.

README.md

Lines changed: 27 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
# Snakemake workflow: `<name>`
22

33
[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg)](https://snakemake.github.io)
4-
[![GitHub actions status](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml)
4+
[![GitHub actions status](https://github.com/snakemake-workflows/snakemake-workflow-template/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/snakemake-workflows/snakemake-workflow-template/actions/workflows/main.yml)
55
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
6-
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1D355C.svg?labelColor=000000)](https://sylabs.io/docs/)
7-
[![workflow catalog](https://img.shields.io/badge/Snakemake%20workflow%20catalog-darkgreen)](https://snakemake.github.io/snakemake-workflow-catalog)
6+
[![workflow catalog](https://img.shields.io/badge/Snakemake%20workflow%20catalog-darkgreen)](https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/<owner>/<repo>)
87

98
A Snakemake workflow for `<description>`
109

@@ -21,7 +20,7 @@ A Snakemake workflow for `<description>`
2120

2221
## Usage
2322

24-
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=<owner>%2F<repo>).
23+
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/<owner>/<repo>).
2524

2625
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.
2726

@@ -30,11 +29,10 @@ If you use this workflow in a paper, don't forget to give credits to the authors
3029
This workflow is a best-practice workflow for `<detailed description>`.
3130
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps:
3231

33-
1. Parse sample sheet containing sample meta data (`python`)
32+
1. Download genome reference from NCBI
3433
2. Simulate short read sequencing data on the fly (`dwgsim`)
3534
3. Check quality of input read data (`FastQC`)
36-
4. Trim adapters from input data (`cutadapt`)
37-
5. Collect statistics from tool output (`MultiQC`)
35+
4. Collect statistics from tool output (`MultiQC`)
3836

3937
## Running the workflow
4038

@@ -47,7 +45,6 @@ This template workflow creates artificial sequencing data in `*.fastq.gz` format
4745
| sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz |
4846
| sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz |
4947

50-
5148
### Execution
5249

5350
To run the workflow from command line, change the working directory.
@@ -57,49 +54,39 @@ cd path/to/snakemake-workflow-name
5754
```
5855

5956
Adjust options in the default config file `config/config.yml`.
60-
Before running the entire workflow, you can perform a dry run using:
57+
Before running the complete workflow, you can perform a dry run using:
6158

6259
```bash
6360
snakemake --dry-run
6461
```
6562

66-
To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory.
63+
To run the workflow with test files using **conda**:
6764

6865
```bash
69-
snakemake --cores 3 --sdm conda --directory .test
66+
snakemake --cores 2 --sdm conda --directory .test
7067
```
7168

72-
To run the workflow with **singularity** / **apptainer**, add a link to a container registry in the `Snakefile`, for example:
69+
To run the workflow with **apptainer** / **singularity**, add a link to a container registry in the `Snakefile`, for example:
7370
`container: "oras://ghcr.io/<user>/<repository>:<version>"` for Github's container registry. Run the workflow with:
7471

7572
```bash
76-
snakemake --cores 3 --sdm conda apptainer --directory .test
73+
snakemake --cores 2 --sdm conda apptainer --directory .test
7774
```
7875

7976
### Parameters
8077

8178
This table lists all parameters that can be used to run the workflow.
8279

83-
| parameter | type | details | default |
84-
| ------------------ | ---- | --------------------------------------- | --------------------------------------------- |
85-
| **samplesheet** | | | |
86-
| path | str | path to samplesheet, mandatory | "config/samples.tsv" |
87-
| **get_genome** | | | |
88-
| database | str | one of `manual`, `ncbi` | `ncbi` |
89-
| assembly | str | RefSeq ID | `GCF_000006785.2` |
90-
| fasta | str | optional path to fasta file | Null |
91-
| gff | str | optional path to gff file | Null |
92-
| gff_source_type | str | list of name/value pairs for GFF source | see config file |
93-
| **simulate_reads** | | | |
94-
| read_length | num | length of target reads in bp | 100 |
95-
| read_number | num | number of total reads to be simulated | 100000 |
96-
| random_freq | num | frequency of random read sequences | 0.01 |
97-
| **cutadapt** | | | |
98-
| threep_adapter | str | sequence of the 3' adapter | `-a ATCGTAGATCGG` |
99-
| fivep_adapter | str | sequence of the 5' adapter | `-A GATGGCGATAGG` |
100-
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 25 `, `-M 100`, `--overlap=5`] |
101-
| **multiqc** | | | |
102-
| config | str | path to multiQC config | `config/multiqc_config.yml` |
80+
| parameter | type | details | default |
81+
| ------------------ | ---- | ------------------------------------- | ------------------------------ |
82+
| **samplesheet** | | | |
83+
| path | str | path to samplesheet, mandatory | "config/samples.tsv" |
84+
| **get_genome** | | | |
85+
| ncbi_ftp | str | link to a genome on NCBI's FTP server | link to _S. cerevisiae_ genome |
86+
| **simulate_reads** | | | |
87+
| read_length | num | length of target reads in bp | 100 |
88+
| read_number | num | number of total reads to be simulated | 100000 |
89+
| random_freq | num | frequency of random read sequences | 0.01 |
10390

10491
## Authors
10592

@@ -110,13 +97,13 @@ This table lists all parameters that can be used to run the workflow.
11097

11198
## References
11299

113-
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. *Sustainable data analysis with Snakemake*. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.
100+
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. _Sustainable data analysis with Snakemake_. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.
114101
115102
## TODO
116103

117-
* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization.
118-
* Replace `<name>` with the workflow name (can be the same as `<repo>`).
119-
* Replace `<description>` with a description of what the workflow does.
120-
* Update the workflow description, parameters, running options, authors and references in the `README.md`
121-
* Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's capability
122-
* The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if `<owner>` and `<repo>` were correctly set.
104+
- Replace `<owner>` and `<repo>` everywhere in the template with the correct user name/organization, and the repository name. The workflow will be automaticallky added to the [snakemake workflow catalog](https://snakemake.github.io/snakemake-workflow-catalog/index.html) once it is publicly available on Github.
105+
- Replace `<name>` with the workflow name (can be the same as `<repo>`).
106+
- Replace `<description>` with a description of what the workflow does.
107+
- Update the [workflow overview](#running-the-workflow), and [running instructions](#running-the-workflow) including parameters, deployment, authors and references
108+
- Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's [deployment options](#execution)
109+
- Do not forget to also adjust the configuration-specific `config/README.md` file

config/README.md

Lines changed: 17 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,10 @@
33
This workflow is a best-practice workflow for `<detailed description>`.
44
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps:
55

6-
1. Parse sample sheet containing sample meta data (`python`)
6+
1. Download genome reference from NCBI
77
2. Simulate short read sequencing data on the fly (`dwgsim`)
88
3. Check quality of input read data (`FastQC`)
9-
4. Trim adapters from input data (`cutadapt`)
10-
5. Collect statistics from tool output (`MultiQC`)
9+
4. Collect statistics from tool output (`MultiQC`)
1110

1211
## Running the workflow
1312

@@ -20,7 +19,6 @@ This template workflow creates artificial sequencing data in `*.fastq.gz` format
2019
| sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz |
2120
| sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz |
2221

23-
2422
### Execution
2523

2624
To run the workflow from command line, change the working directory.
@@ -30,53 +28,36 @@ cd path/to/snakemake-workflow-name
3028
```
3129

3230
Adjust options in the default config file `config/config.yml`.
33-
Before running the entire workflow, you can perform a dry run using:
31+
Before running the complete workflow, you can perform a dry run using:
3432

3533
```bash
3634
snakemake --dry-run
3735
```
3836

39-
To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory.
37+
To run the workflow with test files using **conda**:
4038

4139
```bash
42-
snakemake --cores 3 --sdm conda --directory .test
40+
snakemake --cores 2 --sdm conda --directory .test
4341
```
4442

45-
To run the workflow with **singularity** / **apptainer**, add a link to a container registry in the `Snakefile`, for example:
43+
To run the workflow with **apptainer** / **singularity**, add a link to a container registry in the `Snakefile`, for example:
4644
`container: "oras://ghcr.io/<user>/<repository>:<version>"` for Github's container registry. Run the workflow with:
4745

4846
```bash
49-
snakemake --cores 3 --sdm conda apptainer --directory .test
47+
snakemake --cores 2 --sdm conda apptainer --directory .test
5048
```
5149

5250
### Parameters
5351

5452
This table lists all parameters that can be used to run the workflow.
5553

56-
| parameter | type | details | default |
57-
| ------------------ | ---- | --------------------------------------- | --------------------------------------------- |
58-
| **samplesheet** | | | |
59-
| path | str | path to samplesheet, mandatory | "config/samples.tsv" |
60-
| **get_genome** | | | |
61-
| database | str | one of `manual`, `ncbi` | `ncbi` |
62-
| assembly | str | RefSeq ID | `GCF_000006785.2` |
63-
| fasta | str | optional path to fasta file | Null |
64-
| gff | str | optional path to gff file | Null |
65-
| gff_source_type | str | list of name/value pairs for GFF source | see config file |
66-
| **simulate_reads** | | | |
67-
| read_length | num | length of target reads in bp | 100 |
68-
| read_number | num | number of total reads to be simulated | 100000 |
69-
| random_freq | num | frequency of random read sequences | 0.01 |
70-
| **cutadapt** | | | |
71-
| threep_adapter | str | sequence of the 3' adapter | `-a ATCGTAGATCGG` |
72-
| fivep_adapter | str | sequence of the 5' adapter | `-A GATGGCGATAGG` |
73-
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 25 `, `-M 100`, `--overlap=5`] |
74-
| **multiqc** | | | |
75-
| config | str | path to multiQC config | `config/multiqc_config.yml` |
76-
77-
## TODO
78-
79-
* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization.
80-
* Replace `<name>` with the workflow name (can be the same as `<repo>`).
81-
* Replace `<description>` with a description of what the workflow does.
82-
* Update the workflow parameters and running options
54+
| parameter | type | details | default |
55+
| ------------------ | ---- | ------------------------------------- | ------------------------------ |
56+
| **samplesheet** | | | |
57+
| path | str | path to samplesheet, mandatory | "config/samples.tsv" |
58+
| **get_genome** | | | |
59+
| ncbi_ftp | str | link to a genome on NCBI's FTP server | link to _S. cerevisiae_ genome |
60+
| **simulate_reads** | | | |
61+
| read_length | num | length of target reads in bp | 100 |
62+
| read_number | num | number of total reads to be simulated | 100000 |
63+
| random_freq | num | frequency of random read sequences | 0.01 |

config/config.yml

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,9 @@
11
samplesheet: ".test/config/samples.tsv"
22

33
get_genome:
4-
database: "ncbi"
5-
assembly: "GCF_000006785.2"
6-
fasta: Null
7-
gff: Null
8-
gff_source_type:
9-
[
10-
"RefSeq": "gene",
11-
"RefSeq": "pseudogene",
12-
"RefSeq": "CDS",
13-
"Protein Homology": "CDS",
14-
]
4+
ncbi_ftp: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz
155

166
simulate_reads:
177
read_length: 100
188
read_number: 100000
19-
random_freq: 0.01
20-
21-
cutadapt:
22-
threep_adapter: "-a ATCGTAGATCGG"
23-
fivep_adapter: "-A GATGGCGATAGG"
24-
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"]
25-
26-
multiqc:
27-
config: "config/multiqc_config.yml"
9+
random_reads: 0.01

config/multiqc_config.yml

Lines changed: 0 additions & 2 deletions
This file was deleted.

0 commit comments

Comments
 (0)