harvardinformatics
diff --git a/‎docs/resources/Tutorials/add-to-whole-genome-alignment-cactus.md
Lines changed: 26 additions & 22 deletions b/‎docs/resources/Tutorials/add-to-whole-genome-alignment-cactus.md
Lines changed: 26 additions & 22 deletions
diff --git a/‎docs/resources/Tutorials/how-to-annotate-a-genome.md
Lines changed: 1 addition & 1 deletion b/‎docs/resources/Tutorials/how-to-annotate-a-genome.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/resources/Tutorials/installing-command-line-software-conda-mamba.md
Lines changed: 3 additions & 3 deletions b/‎docs/resources/Tutorials/installing-command-line-software-conda-mamba.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/resources/Tutorials/pangenome-cactus-minigraph.md
Lines changed: 8 additions & 6 deletions b/‎docs/resources/Tutorials/pangenome-cactus-minigraph.md
Lines changed: 8 additions & 6 deletions
@@ -159,7 +159,7 @@ In order to run the last step of the workflow that converts the HAL format to a
 
 ??? info "The Cactus update input file"
 
-    The various Cactus commands depend on a single input file with information about the genomes to align. This file is automatically generated by the pipeline at `[output_dir]/cactus-update-input.txt`. This file is a simple tab delimited file and should contains line:
+    The various Cactus commands depend on a single input file with information about the genomes to align. This file is automatically generated by the pipeline at `[output_dir]/cactus-update-input.txt`. This file is a simple tab delimited file and should contain one line:
 
     ```
     [tip label to add to tree]   [path/to/genome/fasta.file]    [new branch length to add to the tree]
@@ -176,18 +176,20 @@ In order to run the last step of the workflow that converts the HAL format to a
 Once you have all the information listed above, you can enter it into the Snakemake configuration file along with some other information to know where to look for files and write output. The config file contains 2 sections, one for specifying the input and output options, and one for specifying resources for the various rules (see [below](#specifying-resources-for-each-rule)). The first part should look something like this:
 
 ```
-cactus_path: <path/to/cactus-singularity-image OR download>
+cactus_path: <path/to/cactus-singularity-image OR download OR a version string (e.g. 2.9.5)>
 
-cactus_gpu_path: <path/to/cactus-GPU-singularity-image OR download>
+cactus_gpu_path: <path/to/cactus-GPU-singularity-image OR download OR a version string (e.g. 2.9.5)>
 
 input_hal: <path/to/hal-file>
 
 new_genome_name: <tip label of new genome>
 
-new_branch_length: <new branch length to connect the new genome to the tree with>
+new_genome_fasta: <path/to/new/genome.fasta>
 
 new_anc_node: <label for new ancestral node connected to new genome>
 
+new_branch_length: <new branch length to connect the new genome to the tree with>
+
 parent_node: <parent node of existing branch>
 
 child_node: <child node of existing branch>
@@ -215,8 +217,9 @@ Simply replace the string surrounded by <> with the path or option desired. Belo
 | `cactus_gpu_path`        | Path to the Cactus GPU Singularity image. If blank or 'download', the image of the latest Cactus version will be downloaded and used. If a version string is provided (e.g. 2.9.5), then that version will be downloaded and used. This will only be used if `use_gpu` is True. |
 | `input_hal`              | Path to the previously generated HAL file to which you want to add a new genome. |
 | `new_genome_name`        | The label to give your new genome in the tree and HAL file. |
-| `new_branch_length`      | The length of the branch to create that will connect your new genome to the tree. |
+| `new_genome_fasta`       | The path to the FASTA file containing the new genome to add to the alignment. |
 | `new_anc_node`           | The name to give the new ancestral node that connects the new genome to an existing branch in the tree. |
+| `new_branch_length`      | The length of the branch to create that will connect your new genome to the tree. |
 | `parent_node`            | The name of the ancestral node of the existing branch to which the new branch is connected. |
 | `child_node`             | The name of the descendant node of the existing branch to which the new branch is connected. |
 | `top_branch_length`      | The existing branch defined by `parent_node` and `child_node` will be split by `new_anc_node`. This is the length of the new, top-most branch created by the split (defined by `parent_node` and `new_anc_node`). |
@@ -231,18 +234,19 @@ Simply replace the string surrounded by <> with the path or option desired. Belo
 Below these options in the config file are further options for specifying resource usage for each rule that the pipeline will run. For example:
 
 ```
-preprocess_partition: "shared" 
-preprocess_cpu: 8
-preprocess_mem: 25000             # in MB
-preprocess_time: 30               # in minutes
-
-##########################
-
-blast_partition: "gpu_test" # If use_gpu is True, this must be a partition with GPUs
-blast_gpu: 1                # If use_gpu is False, this will be ignored
-blast_cpu: 48
-blast_mem: 50000            # in MB
-blast_time: 120             # in minutes
+rule_resources:
+  preprocess:
+    partition: shared
+    mem_mb: 25000
+    cpus: 8
+    time: 30
+
+  blast:
+    partition: gpu    # If use_gpu is True, this must be a partition with GPUs
+    mem_mb: 50000
+    cpus: 48
+    gpus: 1           # If use_gpu is False, this will be ignored
+    time: 30
 ```
 
 **The rule _blast_ is the only one that uses GPUs if `use_gpu` is True.**
@@ -251,7 +255,7 @@ blast_time: 120             # in minutes
 
     * Be sure to use partition names appropriate your cluster. Several examples in this tutorial have partition names that are specific to the Harvard cluster, so be sure to change them.
     * **Allocate the proper partitions based on `use_gpu`.** If you want to use the GPU version of cactus (*i.e.* you have set `use_gpu: True` in the config file), the partition for the rule **blast** must be GPU enabled. If not, the pipeline will fail to run.
-    * The `blast_gpu:` option will be ignored if `use_gpu: False` is set.
+    * The `blast: gpus:` option will be ignored if `use_gpu: False` is set.
     * **mem is in MB** and **time is in minutes**.
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs and GPUs.
@@ -269,7 +273,7 @@ First, we want to make sure everything is setup properly by using the `--dryrun`
 This is done with the following command, changing the snakefile `-s` and `--configfile` paths to the one you have created for your project:
 
 ```bash
-snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_uodate.smk> --configfile <path/to/your/snakmake-config.yml> --dryrun
+snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_update.smk> --configfile <path/to/your/snakmake-config.yml> --dryrun
 ```
 
 ??? info "Command breakdown"
@@ -319,7 +323,7 @@ If you see any red text, that likely means an error has occurred that must be ad
 If you're satisfied that the `--dryrun` has completed successfully and you are ready to start submitting Cactus jobs to the cluster, you can do so by simply removing the `--dryrun` option from the command above:
 
 ```bash
-snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus.smk> --configfile <path/to/your/snakmake-config.yml>
+snakemake -j <# of jobs to submit simultaneously> -e slurm -s </path/to/cactus_update.smk> --configfile <path/to/your/snakmake-config.yml>
 ```
 
 This will start submitting jobs to SLURM. On your screen, you will see continuous updates regarding job status in blue text. In another terminal, you can also check on the status of your jobs by running `squeue -u <your user id>`. 
@@ -351,7 +355,7 @@ Here is a breakdown of the files so you can investigate them and prepare similar
 
 You will first need to [run the test to generate the HAL file](whole-genome-alignment-cactus.md#test-dataset). Then, you can add the Gorilla sequence to it using this pipeline. We recommend running this test dataset before setting up your own project.
 
-First, open the config file, `tests/evolverMammals/evolverMammals-update-cfg.yaml` and make sure the partitions are set appropriately for your cluster. For this small test dataset, it is appropriate to use any "test" partitions you may have. Then, update the path to `tmp_dir` to point to a location where you have a lot of temporary space. Even this small dataset will fail if this directory does not have enough space.
+After you've generated the HAL file, to add a genome, open the config file, `tests/evolverMammals/evolverMammals-update-cfg.yaml` and make sure the partitions are set appropriately for your cluster. For this small test dataset, it is appropriate to use any "test" partitions you may have. Then, update the path to `tmp_dir` to point to a location where you have a lot of temporary space. Even this small dataset will fail if this directory does not have enough space.
 
 After that, run a dryrun of the test dataset by changing into the `tests/evolverMammals` directory and running:
 
@@ -372,7 +376,7 @@ The pipeline will output a [.paf](https://github.com/lh3/miniasm/blob/master/PAF
 
 The final alignment will also be presented in MAF format as `<final_prefix>.<maf_reference>.maf`, again where `<maf_reference>` is whatever you set in the Snakemake config. This file will include all sequences. Another MAF file, `<final_prefix>.<maf_reference>.nodupes.maf` will also be generated, which is the alignment in MAF format with no duplicate sequences. The de-duplicated MAF file is generated with `--dupeMode single`. See the [Cactus documentation regarding MAF export](https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/progressive.md#maf-export) for more info.
 
-A suit of tools called [HAL tools](https://github.com/ComparativeGenomicsToolkit/Hal) is included with the Cactus singularity image if you need to manipulate or analyze .hal files. There are many tools for manipulating MAF files, though they are not always easy to use. The makers of Cactus also develop [taffy](https://github.com/ComparativeGenomicsToolkit/taffy), which can manipulate MAF files by converting them to TAF files.
+A suite of tools called [HAL tools](https://github.com/ComparativeGenomicsToolkit/Hal) is included with the Cactus singularity image if you need to manipulate or analyze .hal files. There are many tools for manipulating MAF files, though they are not always easy to use. The makers of Cactus also develop [taffy](https://github.com/ComparativeGenomicsToolkit/taffy), which can manipulate MAF files by converting them to TAF files.
 
 ## Questions/troubleshooting
 
 
@@ -74,7 +74,7 @@ While the first of these points is dependent upon your research program, the sec
 Below is a decision tree for picking an annotation method, based upon our evaluation of the performance of 12 different methods in our forthcoming paper in *Genome Research.
 
 <center>
-![Genome annotation method decision tree](../img/genome_annotation_decision_chart.png)
+    <img src="../../img/genome_annotation_decision_chart.png" alt="Genome annotation method decision tree" />
 </center>
 
 The dashed lines that indicate "optional integration" refer to the combining of more than one genome annotation method, which we elaborate upon below.
 
@@ -31,7 +31,7 @@ To install mamba, first navigate to the [Miniforge3 repository page](https://git
 ### Mac/Linux
 
 <center>
-![A screenshot of the miniforge repository's installation instructions](../img/mamba-install1.png)
+  <img src="../../img/mamba-install1.png" alt="A screenshot of the miniforge repository's installation instructions" />
 </center>
 
 On Mac and Linux machines (the [Harvard cluster runs a version of Linux](https://www.rc.fas.harvard.edu/about/cluster-architecture/)), you'll want to open your Terminal or login to the server to type the download and install commands.
@@ -59,7 +59,7 @@ If necessary, Miniforge does provide an explicit Windows installer for conda/mam
 Once you have followed the above instructions and **restarted your terminal or reconnected to the server**, you should now see that mamba is activated because the `(base)` environment prefix appears before your prompt:
 
 <center>
-![A screenshot of a command prompt with (base) prepended to it](../img/prompt1.png)
+    <img src="../../img/prompt1.png" alt="A screenshot of a command prompt with (base) prepended to it" />
 </center>
 
 mamba can be used to manage environments. **Environments** modify aspects of a user's file system that make it easier to install and run software, essentially giving the user full control over their own software and negating the need to access critical parts of the file system.
@@ -101,7 +101,7 @@ mamba env list
 Once you are in an environment, your prompt should be updated to be pre-fixed with that environment's name:
 
 <center>
-![A screenshot of a command prompt with (project-env) prepended to it](../img/prompt2.png)
+    <img src="../../img/prompt2.png" alt="A screenshot of a command prompt with (project-env) prepended to it" />
 </center>
 
 !!! tip "Environments must be activated every time you log on"
 
@@ -133,7 +133,7 @@ Cactus-minigraph requires that you select one sample as a reference sample [for
 Besides the sequence input, the pipeline needs some extra configuration to know where to look for files and write output. That is done in the Snakemake configuration file for a given run. It contains 2 sections, one for specifying the input and output options, and one for specifying resources for the various rules (see [below](#specifying-resources-for-each-rule)). The first part should look something like this:
 
 ```
-cactus_path: <path/to/cactus-singularity-image OR download>
+cactus_path: <path/to/cactus-singularity-image OR download OR version string>
 
 input_file: <path/to/cactus-input-file>
 
@@ -162,17 +162,19 @@ Simply replace the string surrounded by <> with the path or option desired. Belo
 Below these options in the config file are further options for specifying resource usage for each rule that the pipeline will run. For example:
 
 ```
-minigraph_partition: "shared"
-minigraph_cpu: 8
-minigraph_mem: 25000
-minigraph_time: 30
+rule_resources:
+  minigraph:
+    partition: shared
+    mem_mb: 25000
+    cpus: 8
+    time: 30
 ```
 
 !!! warning "Notes on resource allocation"
 
     * Be sure to use partition names appropriate your cluster. Several examples in this tutorial have partition names that are specific to the Harvard cluster, so be sure to change them.
     * The steps in the cactus-minigraph pipeline are not GPU compatible, so there are no GPU options in this pipeline.
-    * **mem is in MB** and **time is in minutes**.
+    * **mem_mb is in MB** and **time is in minutes**.
 
 You will have to determine the proper resource usage for your dataset. Generally, the larger the genomes, the more time and memory each job will need, and the more you will benefit from providing more CPUs.