Add merge HOW TO

SPAAM-community · Mar 28, 2024 · 77c2919 · 77c2919
1 parent d535f0f
commit 77c2919
Show file tree

Hide file tree

Showing 2 changed files with 55 additions and 22 deletions.
diff --git a/docs/source/how_to/convert.md b/docs/source/how_to/convert.md
@@ -37,7 +37,7 @@ AMDirT convert --libraries ancientmetagenome-hostassociated_libraries_warinnerli
 
 See [Output](#output) for descriptions of all output files.
 
-> ⚠️ _When using a **pipeline input samplesheet**, you should always double check the sheet is correctly configured. We cannot guarantee accuracy between metadata and sequencing files._ 
+> ⚠️ _When using a **pipeline input samplesheet**, you should always double check the sheet is correctly configured. We cannot guarantee accuracy between metadata and sequencing files._
 
 Once you have validated it, you can directly supply it to the appropriate pipeline as follows (using nf-core/eager as an example):
 
@@ -51,33 +51,33 @@ The **citations BibTex** file contains all the citation information of your sele
 
 ## Output
 
-> ⚠️ _We highly recommend generating and reviewing `AncientMetagenomeDir_filtered_libraries.tsv` **before** downloading or running any pipelines to ensure you have in the download scripts and/or pipeline input sheets only the actual library types you wish to use (e.g. you may only want paired-end data, or non-UDG treated data)._ 
+> ⚠️ _We highly recommend generating and reviewing `AncientMetagenomeDir_filtered_libraries.tsv` **before** downloading or running any pipelines to ensure you have in the download scripts and/or pipeline input sheets only the actual library types you wish to use (e.g. you may only want paired-end data, or non-UDG treated data)._
 
-> ⚠️ _To use a **pipeline input samplesheet**, you should always double check the sheet is correctly configured. We cannot guarantee accuracy between metadata and sequencing files._ 
+> ⚠️ _To use a **pipeline input samplesheet**, you should always double check the sheet is correctly configured. We cannot guarantee accuracy between metadata and sequencing files._
 
 All possible output is as follows:
 
 - `<outdir>`: where all the pipeline samplesheets are placed (by default `.`)
-- `AncientMetagenomeDir_bibliography.bib`: 
-    - A BibTex format citation information file with all references (where available) present in the filtered sample table.
-- `AncientMetagenomeDir_filtered_libraries.tsv`: 
-    - The associated AncientMetagenomeDir curated metadata for all _libraries_ of the samples in the input table.
+- `AncientMetagenomeDir_bibliography.bib`:
+  - A BibTex format citation information file with all references (where available) present in the filtered sample table.
+- `AncientMetagenomeDir_filtered_libraries.tsv`:
+  - The associated AncientMetagenomeDir curated metadata for all _libraries_ of the samples in the input table.
 - `AncientMetagenomeDir_curl_download_script.sh`:
-    - A bash script containing curl commands for all libraries in the input samples list.
+  - A bash script containing curl commands for all libraries in the input samples list.
 - `AncientMetagenomeDir_aspera_download_script.sh`:
-    - A bash script containing Aspera commands for all libraries in the input samples list. See [How Tos](/how_to/miscellaneous) for Aspera configuration information.
-- `AncientMetagenomeDir_nf_core_fetchngs_input_table.tsv`: 
-    - An input sheet containing ERS/SRS accession numbers in a format compatible with the [nf-core/fetchngs](https://nf-co.re/fetchngs) input samplesheet.
+  - A bash script containing Aspera commands for all libraries in the input samples list. See [How Tos](/how_to/miscellaneous) for Aspera configuration information.
+- `AncientMetagenomeDir_nf_core_fetchngs_input_table.tsv`:
+  - An input sheet containing ERS/SRS accession numbers in a format compatible with the [nf-core/fetchngs](https://nf-co.re/fetchngs) input samplesheet.
 - `AncientMetagenomeDir_nf_core_eager_input_table.tsv`:
-    - An input sheet with metadata in a format compatible with the [nf-core/eager](https://nf-co.re/eager) input samplesheet.
-    - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
-- `AncientMetagenomeDir_nf_core_taxprofiler_input_table.csv`: 
-    - An input sheet with metadata in a format compatible with the [nf-core/taxprofiler](https://nf-co.re/eager) input samplesheet.
-    - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
+  - An input sheet with metadata in a format compatible with the [nf-core/eager](https://nf-co.re/eager) input samplesheet.
+  - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
+- `AncientMetagenomeDir_nf_core_taxprofiler_input_table.csv`:
+  - An input sheet with metadata in a format compatible with the [nf-core/taxprofiler](https://nf-co.re/eager) input samplesheet.
+  - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
 - `AncientMetagenomeDir_aMeta_input_table.tsv`:
-    - An input sheet with metadata in a format compatible with the [aMeta](https://github.com/NBISweden/aMeta) input samplesheet.
-    - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
-- `AncientMetagenomeDir_nf_core_mag_input_{single,paired}_table.csv`: 
-    - An input sheet with metadata in a format compatible with the [nf-core/mag](https://nf-co.re/eager) input samplesheet.
-    - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
-    - nf-core/mag does not support paired- and single-end data in the same run, therefore two sheets will be generated if your selected samples contain both types of libraries.
+  - An input sheet with metadata in a format compatible with the [aMeta](https://github.com/NBISweden/aMeta) input samplesheet.
+  - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
+- `AncientMetagenomeDir_nf_core_mag_input_{single,paired}_table.csv`:
+  - An input sheet with metadata in a format compatible with the [nf-core/mag](https://nf-co.re/eager) input samplesheet.
+  - Contained paths are relative to the directory output when using the `curl` and `aspera` download scripts (i.e., input sheet assumes files are in the same directory as the input sheet itself).
+  - nf-core/mag does not support paired- and single-end data in the same run, therefore two sheets will be generated if your selected samples contain both types of libraries.
diff --git a/docs/source/how_to/merge.md b/docs/source/how_to/merge.md
@@ -0,0 +1,33 @@
+# merge
+
+## What
+
+Merges a user-supplied metadata table with the latest AncientMetagenomeDir master metadata tables, with on-the-fly [validation](validation.md).
+
+## When
+
+This command would be used when you have a local version of an AncientMetagenomeDir table (samples or libraries) of just the new samples or libraries to add, and want to append to the current master table before submitting a pull request.
+
+You typically only do this if preparing a pull request to the AncientMetagenomeDir repository entirely locally.
+
+## How
+
+The following description assumes you have already prepared a AncientMetagenomeDir **samples** or **libraries** table whose rows only consist of the header and new samples to be added.
+
+> ⚠️ _The header, and present columns etc. should match exactly that on the corresponding AncientMetagenomeDir table_
+
+Given a new samples table `samples_for_new_pr.tsv` to be added to the single genome samples table `ancientsinglegenome-hostassociated`, you can run the following command:
+
+```bash
+AMDirT merge -n ancientsinglegenome-hostassociated -t samples samples_for_new_pr.tsv
+```
+
+Note that during merge `merge` will also perform schema validation to ensure the contents of the new rows are valid against the AncientMetagenomeDir schema.
+
+## Output
+
+The output of the `merge` command is a new table with the merged rows named after the table you merged the new rows onto, placed by default in the directory you ran the command from (customisable with `-o`).
+
+In the example above, the file result would be: `ancientsinglegenome-hostassociated_samples.tsv`.
+
+The contents of this file can then theoretically be used to submit a pull request to the AncientMetagenomeDir repository.