Merge pull request #54 from harvardinformatics/snakemake-plugin

gwct · web-flow · commit 9f1d5d02c16d · 2025-05-27T15:35:14.000-04:00
Snakemake plugin
diff --git a/docs/faq/index.md b/docs/faq/index.md
@@ -22,6 +22,7 @@
     [Research Computing](https://www.rc.fas.harvard.edu/) manages the Cannon cluster, among other things, and provides advice and support on HPC and related hardware and software questions. The Informatics Group supports specific software and analysis needs, including providing support for core facility software via the Software Operations group, and providing training, consultation, and collaborative project work for bioinformatics needs through the Bioinformatics group. 
 
     You can contact Research Computing via their [contact page](https://www.rc.fas.harvard.edu/about/contact/) for any questions related to HPC hardware or software environments. You can contact FAS Informatics for questions related to bioinformatics support via our [contact page](../contact.md).
+
 ??? question "How can I know about future workshops?"
 
     ##### How can I know about future workshops?
@@ -76,6 +77,12 @@
 
     For all questions, you can use the [contact form](../contact.md). For possibly quicker answers, you can try our public slack channel (FAS Bioinformatics Public). For hands-on help, come to our office hours in Northwest Labs B227 (see [Events](../events.md) for times).
 
+??? question "How can I run a Snakemake workflow on the Cannon cluster?"
+
+    ##### Snakemake on the Cannon cluster
+
+    We have developed a [Snakemake plugin for the Cannon cluster](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html), based on the [generic SLURM plugin](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html). See [the documentation](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) for information on how to install and use it, and feel free to report [issues or questions on the github repo](https://github.com/harvardinformatics/snakemake-executor-plugin-cannon).
+
 
 ## Bauer Core Sequencing
 
diff --git a/docs/resources/Tutorials/add-outgroup-to-whole-genome-alignment-cactus.md b/docs/resources/Tutorials/add-outgroup-to-whole-genome-alignment-cactus.md
@@ -65,6 +65,18 @@ If the help menu displays, you already have Singularity installed. If not, you w
 mamba install conda-forge::singularity
 ```
 
+!!! tip "Cannon cluster Snakemake plugin"
+
+    If you are on the Harvard Cannon cluster, instead of the generic snakemake-executor-plugin-slurm, you can use our specific plugin for the Cannon cluster: [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html). This facilitates *automatic partition selection* based on requested resources. Install this in your environment with:
+
+    ```bash
+    mamba install bioconda::snakemake-executor-plugin-cannon
+    ```
+
+    Then, when running the workflow, specify the cannon executor with `-e cannon` instead of `-e slurm`.
+
+    If you are not on the Harvard Cannon cluster, stick with the generic SLURM plugin. You will just need to directly specify the partitions for each rule in the config file ([see below](#specifying-resources-for-each-rule)).
+
 ### Downloading the cactus-snakemake pipeline
 
 The [pipeline](https://github.com/harvardinformatics/cactus-snakemake/) is currently available on github. You can install it on the Harvard cluster or any computer that has `git` installed by navigating to the directory in which you want to download it and doing one of the following:
@@ -245,6 +257,7 @@ rule_resources:
     * **Allocate the proper partitions based on `use_gpu`.** If you want to use the GPU version of cactus (*i.e.* you have set `use_gpu: True` in the config file), the partition for the rule **blast** must be GPU enabled. If not, the pipeline will fail to run.
     * The `blast: gpus:` option will be ignored if `use_gpu: False` is set.
     * **mem is in MB** and **time is in minutes**.
+    * **If using the [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) specifically for the Harvard Cannon cluster, you can leave the `partition:` fields blank and one will be selected automatically based on the other resources requested!**    
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs and GPUs.
 
@@ -270,7 +283,7 @@ snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_a
     | ------------------------------------------------- | ----------- |
     | `snakemake`                                       | The call to the snakemake workflow program to execute the workflow. |
     | `-j <# of jobs to submit simultaneously>`         | The maximum number of jobs that will be submitted to your SLURM cluster at one time. |
-    | `-e slurm`                                        | Specify to use the SLURM executor plugin. See: [Getting started](#getting-started). |
+    | `-e slurm`                                        | Specify to use the SLURM executor plugin, or use `-e cannon` if using the Cannon specific plugin.  See: [Getting started](#getting-started) |
     | `-s </path/to/cactus_add_outgroup.smk>`           | The path to the workflow file. |
     | `--configfile <path/to/your/snakmake-config.yml>` | The path to your config file. See: [Preparing the Snakemake config file](#preparing-the-snakemake-config-file). |
     | `--dryrun`                                        | Do not execute anything, just display what would be done. |
diff --git a/docs/resources/Tutorials/add-to-whole-genome-alignment-cactus.md b/docs/resources/Tutorials/add-to-whole-genome-alignment-cactus.md
@@ -65,6 +65,18 @@ If the help menu displays, you already have Singularity installed. If not, you w
 mamba install conda-forge::singularity
 ```
 
+!!! tip "Cannon cluster Snakemake plugin"
+
+    If you are on the Harvard Cannon cluster, instead of the generic snakemake-executor-plugin-slurm, you can use our specific plugin for the Cannon cluster: [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html). This facilitates *automatic partition selection* based on requested resources. Install this in your environment with:
+
+    ```bash
+    mamba install bioconda::snakemake-executor-plugin-cannon
+    ```
+
+    Then, when running the workflow, specify the cannon executor with `-e cannon` instead of `-e slurm`.
+
+    If you are not on the Harvard Cannon cluster, stick with the generic SLURM plugin. You will just need to directly specify the partitions for each rule in the config file ([see below](#specifying-resources-for-each-rule)).
+
 ### Downloading the cactus-snakemake pipeline
 
 The [pipeline](https://github.com/harvardinformatics/cactus-snakemake/) is currently available on github. You can install it on the Harvard cluster or any computer that has `git` installed by navigating to the directory in which you want to download it and doing one of the following:
@@ -257,6 +269,7 @@ rule_resources:
     * **Allocate the proper partitions based on `use_gpu`.** If you want to use the GPU version of cactus (*i.e.* you have set `use_gpu: True` in the config file), the partition for the rule **blast** must be GPU enabled. If not, the pipeline will fail to run.
     * The `blast: gpus:` option will be ignored if `use_gpu: False` is set.
     * **mem is in MB** and **time is in minutes**.
+    * **If using the [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) specifically for the Harvard Cannon cluster, you can leave the `partition:` fields blank and one will be selected automatically based on the other resources requested!**
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs and GPUs.
 
@@ -282,7 +295,7 @@ snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_u
     | ------------------------------------------------- | ----------- |
     | `snakemake`                                       | The call to the snakemake workflow program to execute the workflow. |
     | `-j <# of jobs to submit simultaneously>`         | The maximum number of jobs that will be submitted to your SLURM cluster at one time. |
-    | `-e slurm`                                        | Specify to use the SLURM executor plugin. See: [Getting started](#getting-started). |
+    | `-e slurm`                                        | Specify to use the SLURM executor plugin, or use `-e cannon` if using the Cannon specific plugin.  See: [Getting started](#getting-started) |
     | `-s </path/to/cactus_update.smk>`                 | The path to the workflow file. |
     | `--configfile <path/to/your/snakmake-config.yml>` | The path to your config file. See: [Preparing the Snakemake config file](#preparing-the-snakemake-config-file). |
     | `--dryrun`                                        | Do not execute anything, just display what would be done. |
diff --git a/docs/resources/Tutorials/pangenome-cactus-minigraph.md b/docs/resources/Tutorials/pangenome-cactus-minigraph.md
@@ -58,6 +58,18 @@ If the help menu displays, you already have Singularity installed. If not, you w
 mamba install conda-forge::singularity
 ```
 
+!!! tip "Cannon cluster Snakemake plugin"
+
+    If you are on the Harvard Cannon cluster, instead of the generic snakemake-executor-plugin-slurm, you can use our specific plugin for the Cannon cluster: [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html). This facilitates *automatic partition selection* based on requested resources. Install this in your environment with:
+
+    ```bash
+    mamba install bioconda::snakemake-executor-plugin-cannon
+    ```
+
+    Then, when running the workflow, specify the cannon executor with `-e cannon` instead of `-e slurm`.
+
+    If you are not on the Harvard Cannon cluster, stick with the generic SLURM plugin. You will just need to directly specify the partitions for each rule in the config file ([see below](#specifying-resources-for-each-rule)).
+
 ### Downloading the cactus-snakemake pipeline
 
 The [pipeline](https://github.com/harvardinformatics/cactus-snakemake/) is currently available on github. You can install it on the Harvard cluster or any computer that has `git` installed by navigating to the directory in which you want to download it and doing one of the following:
@@ -175,6 +187,7 @@ rule_resources:
     * Be sure to use partition names appropriate your cluster. Several examples in this tutorial have partition names that are specific to the Harvard cluster, so be sure to change them.
     * The steps in the cactus-minigraph pipeline are not GPU compatible, so there are no GPU options in this pipeline.
     * **mem_mb is in MB** and **time is in minutes**.
+    * **If using the [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) specifically for the Harvard Cannon cluster, you can leave the `partition:` fields blank and one will be selected automatically based on the other resources requested!**    
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs.
 
@@ -198,7 +211,7 @@ snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_m
     | ------------------------------------------------- | ----------- |
     | `snakemake`                                       | The call to the snakemake workflow program to execute the workflow. |
     | `-j <# of jobs to submit simultaneously>`         | The maximum number of jobs that will be submitted to your SLURM cluster at one time. |
-    | `-e slurm`                                        | Specify to use the SLURM executor plugin. See: [Getting started](#getting-started). |
+    | `-e slurm`                                        | Specify to use the SLURM executor plugin, or use `-e cannon` if using the Cannon specific plugin.  See: [Getting started](#getting-started) |
     | `-s </path/to/cactus_minigraph.smk>`              | The path to the workflow file. |
     | `--configfile <path/to/your/snakmake-config.yml>` | The path to your config file. See: [Preparing the Snakemake config file](#preparing-the-snakemake-config-file). |
     | `--dryrun`                                        | Do not execute anything, just display what would be done. |
diff --git a/docs/resources/Tutorials/replace-genome-whole-genome-alignment-cactus.md b/docs/resources/Tutorials/replace-genome-whole-genome-alignment-cactus.md
@@ -65,6 +65,18 @@ If the help menu displays, you already have Singularity installed. If not, you w
 mamba install conda-forge::singularity
 ```
 
+!!! tip "Cannon cluster Snakemake plugin"
+
+    If you are on the Harvard Cannon cluster, instead of the generic snakemake-executor-plugin-slurm, you can use our specific plugin for the Cannon cluster: [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html). This facilitates *automatic partition selection* based on requested resources. Install this in your environment with:
+
+    ```bash
+    mamba install bioconda::snakemake-executor-plugin-cannon
+    ```
+
+    Then, when running the workflow, specify the cannon executor with `-e cannon` instead of `-e slurm`.
+
+    If you are not on the Harvard Cannon cluster, stick with the generic SLURM plugin. You will just need to directly specify the partitions for each rule in the config file ([see below](#specifying-resources-for-each-rule)).
+
 ### Downloading the cactus-snakemake pipeline
 
 The [pipeline](https://github.com/harvardinformatics/cactus-snakemake/) is currently available on github. You can install it on the Harvard cluster or any computer that has `git` installed by navigating to the directory in which you want to download it and doing one of the following:
@@ -226,6 +238,7 @@ rule_resources:
     * **Allocate the proper partitions based on `use_gpu`.** If you want to use the GPU version of cactus (*i.e.* you have set `use_gpu: True` in the config file), the partition for the rule **blast** must be GPU enabled. If not, the pipeline will fail to run.
     * The `blast: gpus:` option will be ignored if `use_gpu: False` is set.
     * **mem is in MB** and **time is in minutes**.
+    * **If using the [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) specifically for the Harvard Cannon cluster, you can leave the `partition:` fields blank and one will be selected automatically based on the other resources requested!**    
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs and GPUs.
 
@@ -251,7 +264,7 @@ snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_r
     | ------------------------------------------------- | ----------- |
     | `snakemake`                                       | The call to the snakemake workflow program to execute the workflow. |
     | `-j <# of jobs to submit simultaneously>`         | The maximum number of jobs that will be submitted to your SLURM cluster at one time. |
-    | `-e slurm`                                        | Specify to use the SLURM executor plugin. See: [Getting started](#getting-started). |
+    | `-e slurm`                                        | Specify to use the SLURM executor plugin, or use `-e cannon` if using the Cannon specific plugin.  See: [Getting started](#getting-started) |
     | `-s </path/to/cactus_update.smk>`                 | The path to the workflow file. |
     | `--configfile <path/to/your/snakmake-config.yml>` | The path to your config file. See: [Preparing the Snakemake config file](#preparing-the-snakemake-config-file). |
     | `--dryrun`                                        | Do not execute anything, just display what would be done. |
diff --git a/docs/resources/Tutorials/whole-genome-alignment-cactus.md b/docs/resources/Tutorials/whole-genome-alignment-cactus.md
@@ -57,6 +57,18 @@ If the help menu displays, you already have Singularity installed. If not, you w
 mamba install conda-forge::singularity
 ```
 
+!!! tip "Cannon cluster Snakemake plugin"
+
+    If you are on the Harvard Cannon cluster, instead of the generic snakemake-executor-plugin-slurm, you can use our specific plugin for the Cannon cluster: [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html). This facilitates *automatic partition selection* based on requested resources. Install this in your environment with:
+
+    ```bash
+    mamba install bioconda::snakemake-executor-plugin-cannon
+    ```
+
+    Then, when running the workflow, specify the cannon executor with `-e cannon` instead of `-e slurm`.
+
+    If you are not on the Harvard Cannon cluster, stick with the generic SLURM plugin. You will just need to directly specify the partitions for each rule in the config file ([see below](#specifying-resources-for-each-rule)).
+
 ### Downloading the cactus-snakemake pipeline
 
 The [pipeline](https://github.com/harvardinformatics/cactus-snakemake/) is currently available on github. You can install it on the Harvard cluster or any computer that has `git` installed by navigating to the directory in which you want to download it and doing one of the following:
@@ -194,6 +206,7 @@ rule_resources:
     * **Allocate the proper partitions based on `use_gpu`.** If you want to use the GPU version of cactus (*i.e.* you have set `use_gpu: True` in the config file), the partition for the rule **blast** must be GPU enabled. If not, the pipeline will fail to run.
     * The `blast: gpus:` option will be ignored if `use_gpu: False` is set.
     * **mem_mb is in MB** and **time is in minutes**.
+    * **If using the [snakemake-executor-plugin-cannon](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/cannon.html) specifically for the Harvard Cannon cluster, you can leave the `partition:` fields blank and one will be selected automatically based on the other resources requested!**
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs and GPUs.
 
@@ -257,7 +270,7 @@ snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus.s
     | ------------------------------------------------- | ----------- |
     | `snakemake`                                       | The call to the snakemake workflow program to execute the workflow. |
     | `-j <# of jobs to submit simultaneously>`         | The maximum number of jobs that will be submitted to your SLURM cluster at one time. |
-    | `-e slurm`                                        | Specify to use the SLURM executor plugin. See: [Getting started](#getting-started). |
+    | `-e slurm`                                        | Specify to use the SLURM executor plugin, or use `-e cannon` if using the Cannon specific plugin.  See: [Getting started](#getting-started) |
     | `-s </path/to/cactus.smk>`                        | The path to the workflow file. |
     | `--configfile <path/to/your/snakmake-config.yml>` | The path to your config file. See: [Preparing the Snakemake config file](#preparing-the-snakemake-config-file). |
     | `--dryrun`                                        | Do not execute anything, just display what would be done. |