WamR-Pneumo predicts MIC values based on a list of antimicrobial resistance determinants in pneumococci. This script has three parts from WADE that execute sequentially: \
- MasterBlastR for the list of AMR determinants in loci.csv \
- 23S_rRNA that checks the VCF files for the number of mutated alleles \
- InterpretR_pneumo that combines MasterBlastR and 23S rRNA outputs to calculate MIC values
This tool can be run using RStudio (available at https://www.rstudio.com/)
This tool requires the use of R packages: plyr, dplyr, tidyverse, tidyselect, stringr, Biostrings which can be loaded using:
library(plyr)
library(dplyr)
library(tidyverse)
library(tidyselect)
library(stringr)
library(Biostrings)
and the use of the BLAST+ executable from NCBI: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
-
Install R from https://www.r-project.org/
-
Install RStudio from https://www.rstudio.com
-
Install required packages
install.packages("plyr") install.packages("dplyr") install.packages("tidyverse") install.packages("tidyselect") install.packages("stringr")
-
Install Biostrings (https://bioconductor.org/install/)
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("ggtree", "Biostrings"))
-
Install the BLAST+ executable from NCBI: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
-
Ensure WamR-Pneumo is located in a folder with the following subfolders:
a. allele_lkup_dna - contains reference FASTA files b. output - results can be found here after running WamR-Pneumo c. reference - Contains loci lists, lookup tables and MIC interpretation tables d. temp - storage of temporary files generated by this program e. wildgenes - contains reference FASTA files for SNP-based serotyping
-
Set the working directory where WamR-Pneumo is located.
line 21: curr_work_dir <- "C:\\WamR_Pneumo\\"
-
This molecular analysis tool queries pre-assembled fasta files. The location of the contig files needs to be assigned to ContigsDir with the file extension ".fasta" (eg. MySampleNo_contig.fasta).
line 22: ContigsDir <- "C:\\WamR_Pneumo\\contigs\\"
-
This molecular analysis tool queries the number of 23S rRNA alleles with mutations using vcf files. The location of the vcf files needs to be assigned to vcf_folder.
line 23: vcf_folder <- "C:\\WamR_Pneumo\\vcf\\"
To generate vcf files, run a phylogenetic analysis using the "wildgenes/23S_R6.fasta" file as the mapping reference. Store the vcf outputs as the sample number file names used for the assembly contig fastas. Do not change the fasta header ">23S_rRNA_R6_sprr02" since the code looks for this text in the vcf output file to count the number of variant calls. If the code cannot locate the vcf file, it reports the count as zero alleles.
-
To use the multiple sample list option, a multiple sample list file must be located in the directory. (eg. C:/WamR_Pneumo/list.csv)
list.csv must have the following structure:SampleNo Variable 12345 12346
Where the "Variable" column can be anything you wish to include in the outputs (optional). Then enter the word "list" instead of a sample number at line 18.
When running this program on some Windows machines, the makeblastdb program can give an error. If this happens, the environmental variables setting will need to be changed as follows:
-
Go to Windows Settings and search for "Environmental Variables"
-
in the System Properties dialogue box, click on the "Environmental Variables" button
-
in the "User Variables for..." box, click "New..." button
-
input the following:
Variable Name: BLASTDB_LMDB_MAP_SIZE Variable Value: 1000000
Copyright Government of Canada 2022.
Written by: National Microbiology Laboratory, Public Health Agency of Canada.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Walter Demczuk: Walter.Demczuk@phac-aspc.gc.ca{.email}
Shelley Peterson: Shelley.Peterson@phac-aspc.gc.ca{.email}