Skip to content

Commit

Permalink
Merge pull request #3 from neurogenomics/bioc_review
Browse files Browse the repository at this point in the history
Bioc review 2
  • Loading branch information
HDash authored Nov 5, 2024
2 parents 5bad600 + 2b1f51d commit 66d666f
Show file tree
Hide file tree
Showing 58 changed files with 431 additions and 331 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Icon?
^doc$
^Meta$
^codecov\.yml$
^\.DS_Store$
^_pkgdown\.yml$
^docs$
^pkgdown$
Expand Down
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: MotifPeeker
Title: Benchmarking Epigenomic Profiling Methods Using Motif Enrichment
Version: 0.99.6
Version: 0.99.7
Authors@R: c(
person(given = "Hiranyamaya",
family = "Dash",
Expand Down Expand Up @@ -64,7 +64,6 @@ Imports:
stats,
utils
Suggests:
BiocStyle,
BSgenome.Hsapiens.UCSC.hg19,
BSgenome.Hsapiens.UCSC.hg38,
downloadthis,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ importFrom(BSgenome,getSeq)
importFrom(BiocFileCache,BiocFileCache)
importFrom(BiocFileCache,bfcinfo)
importFrom(BiocFileCache,bfcrpath)
importFrom(BiocParallel,bpnworkers)
importFrom(Biostrings,DNAString)
importFrom(Biostrings,letterFrequency)
importFrom(DT,datatable)
Expand Down
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# MotifPeeker 0.99.7

## New Features
* Replace `workers` argument with `BPPARAM`. Give users more control over the
BiocParallel implementation.

## Miscellaneous

* Remove `cat()` calls in functions.
* Implement helper `check_input()` to validate input before passing them to
other functions.
* Run examples and tests only if MEME Suite is detected (only for functions
which require MEME Suite).


# MotifPeeker 0.99.5 / 0.99.6

## Miscellaneous
Expand Down
65 changes: 39 additions & 26 deletions R/MotifPeeker.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,28 @@
#' hours to complete. To make computation faster, we highly recommend tuning the
#' following arguments:
#' \describe{
#' \item{\code{workers}}{Running motif discovery in parallel can
#' significantly reduce runtime, but it is very memory-intensive, consuming
#' upwards of 10GB of RAM per thread. Memory starvation can greatly slow the
#' process, so set \code{workers} with caution.}
#' \item{\code{denovo_motifs}}{The number of motifs to discover per sequence
#' group exponentially increases runtime. We recommend no more than 5
#' motifs to make a meaningful inference.}
#' \item{\code{trim_seq_width}}{Trimming sequences before running de-novo
#' motif discovery can significantly reduce the search space. Sequence
#' length can exponentially increase runtime. We recommend running the
#' script with \code{denovo_motif_discovery = FALSE} and studying the
#' motif-summit distance distribution under general metrics to find the
#' sequence length that captures most motifs. A good starting point is 150
#' but it can be reduced further if appropriate.}
#' \item{\code{BPPARAM=MulticoreParam(x)}}{Running motif discovery in
#' parallel can significantly reduce runtime, but it is very
#' memory-intensive, consuming 10+GB of RAM per thread. Memory starvation can
#' greatly slow the process, so set the number of cores with caution.}
#' \item{\code{denovo_motifs}}{The number of motifs to discover per sequence
#' group exponentially increases runtime. We recommend no more than 5
#' motifs to make a meaningful inference.}
#' \item{\code{trim_seq_width}}{Trimming sequences before running de-novo
#' motif discovery can significantly reduce the search space. Sequence
#' length can exponentially increase runtime. We recommend running the
#' script with \code{denovo_motif_discovery = FALSE} and studying the
#' motif-summit distance distribution under general metrics to find the
#' sequence length that captures most motifs. A good starting point is 150
#' but it can be reduced further if appropriate.}
#' }
#'
#' @param peak_files A character vector of path to peak files, or a vector of
#' GRanges objects generated using \code{\link{read_peak_file}}. Currently,
#' peak files from the following peak-calling tools are supported:
#' \itemize{
#' \item MACS2: \code{.narrowPeak} files
#' \item SEACR: \code{.bed} files
#' \item MACS2: \code{.narrowPeak} files
#' \item SEACR: \code{.bed} files
#' }
#' ENCODE file IDs can also be provided to automatically fetch peak file(s) from
#' the ENCODE database.
Expand Down Expand Up @@ -81,13 +81,22 @@
#' @param display A character vector specifying the display mode for the HTML
#' report once it is generated. (default = NULL) Options are:
#' \itemize{
#' \item \code{"browser"}: Open the report in the default web browser.
#' \item \code{"rstudio"}: Open the report in the RStudio Viewer.
#' \item \code{NULL}: Do not open the report.
#' \item \code{"browser"}: Open the report in the default web browser.
#' \item \code{"rstudio"}: Open the report in the RStudio Viewer.
#' \item \code{NULL}: Do not open the report.
#' }
#' @param workers An integer specifying the number of threads to use for
#' parallel processing. (default = 1)\cr
#' \strong{IMPORTANT:} For each worker, please ensure a minimum of 6GB of
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' enabling parallel execution. (default = SerialParam(), single-CPU run)\cr\cr
#' Following are two examples of how to set up parallel processing:
#' \itemize{
#' \item \code{BPPARAM = BiocParallel::MulticoreParam(4)}: Uses 4
#' CPU cores for parallel processing.
#' \item \code{library("BiocParallel")} followed by
#' \code{register(MulticoreParam(4))} sets all subsequent BiocParallel
#' functions to use 4 CPU cores. \code{Motifpeeker()} must be run
#' with \code{BPPARAM = BiocParallel::MulticoreParam()}.
#' }
#' \strong{IMPORTANT:} For each worker, please ensure a minimum of 8GB of
#' memory (RAM) is available as \code{denovo_motif_discovery} is
#' memory-intensive.
#' @param quiet A logical indicating whether to print markdown knit messages.
Expand All @@ -99,7 +108,7 @@
#' @inheritParams check_genome_build
#' @inheritParams read_motif_file
#' @inheritParams check_genome_build
#' @inheritParams get_bpparam
#' @inheritParams bpapply
#' @inheritParams memes::runFimo
#' @inheritParams denovo_motifs
#' @inheritParams find_motifs
Expand All @@ -111,6 +120,7 @@
#' @importFrom viridis scale_fill_viridis scale_color_viridis
#' @importFrom tools file_path_sans_ext
#' @importFrom rmarkdown render
#' @importFrom BiocParallel bpnworkers
#'
#' @return Path to the output directory.
#'
Expand Down Expand Up @@ -142,6 +152,7 @@
#' )
#'
#' \donttest{
#' if (memes::meme_is_installed()) {
#' # MotifPeeker takes time to run
#' MotifPeeker(
#' peak_files = peaks,
Expand All @@ -158,11 +169,11 @@
#' motif_db = NULL,
#' download_buttons = TRUE,
#' out_dir = tempdir(),
#' workers = 1,
#' debug = FALSE,
#' quiet = TRUE,
#' verbose = FALSE
#' )
#' }
#' }
#'
#' @export
Expand All @@ -186,7 +197,7 @@ MotifPeeker <- function(
out_dir = tempdir(),
save_runfiles = FALSE,
display = if (interactive()) "browser",
workers = 2,
BPPARAM = BiocParallel::SerialParam(), # Default to single-core
quiet = TRUE,
debug = FALSE,
verbose = FALSE
Expand Down Expand Up @@ -267,14 +278,16 @@ MotifPeeker <- function(
meme_path = meme_path,
out_dir = out_dir,
save_runfiles = save_runfiles,
workers = workers,
BPPARAM = BPPARAM,
debug = debug,
verbose = verbose
)

### Knit Rmd ###
rmd_file <- system.file("markdown",
"MotifPeeker.Rmd", package = "MotifPeeker")
messager("Starting run with", BiocParallel::bpnworkers(BPPARAM), "cores.",
v = verbose)
rmarkdown::render(
input = rmd_file,
output_dir = out_dir,
Expand Down
15 changes: 5 additions & 10 deletions R/bpapply.R
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
#' Use BiocParallel functions with appropriate parameters
#'
#' Light wrapper around \code{\link[BiocParallel]{BiocParallel}} functions that
#' automatically sets the appropriate parameters based on the number of workers
#' specified.
#' automatically applies appropriate parallel function.
#'
#' @param apply_fun A \code{\link[BiocParallel]{BiocParallel}} function to use
#' for parallel processing. (default = \code{BiocParallel::bplapply})
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' specifying run parameters. (default = bpparam())
#' @inheritParams BiocParallel::bplapply
#' @inheritDotParams BiocParallel::bplapply
#' @inheritDotParams BiocParallel::bpmapply
#' @inheritParams get_bpparam
#'
#' @import BiocParallel
#'
Expand All @@ -19,15 +19,15 @@
#' half_it <- function(arg1) return(arg1 / 2)
#' x <- seq_len(10)
#'
#' res <- MotifPeeker:::bpapply(x, half_it, workers = 2)
#' res <- MotifPeeker:::bpapply(x, half_it)
#' print(res)
#'
#' @keywords internal
bpapply <- function(
X,
FUN,
apply_fun = BiocParallel::bplapply,
workers = 1,
BPPARAM = BiocParallel::bpparam(),
progressbar = FALSE,
force_snowparam = FALSE,
verbose = FALSE,
Expand All @@ -38,11 +38,6 @@ bpapply <- function(
if (length(apply_fun_package) == 0 ||
apply_fun_package != "BiocParallel") stop(stp_msg)

BPPARAM <- get_bpparam(workers = workers,
progressbar = progressbar,
force_snowparam = force_snowparam,
verbose = verbose)

res <- apply_fun(X, FUN = FUN, BPPARAM = BPPARAM, ...)
return(res)
}
9 changes: 5 additions & 4 deletions R/check_ENCODE.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
#' thrown.
#' @inheritParams MotifPeeker
#'
#' @returns A character string specifying the path to the downloaded file. If
#' the input is not in ENCODE ID format, the input is returned as-is.
#' @returns A character string specifying the path to the downloaded file.
#'
#' @examples
#' if (requireNamespace("curl", quietly = TRUE) &&
Expand All @@ -20,10 +19,12 @@
#'
#' @export
check_ENCODE <- function(encode_id, expect_format, verbose = FALSE) {
if (!all(is.character(encode_id))) return(encode_id)
### Validate ENCODE ID ###
stp_msg <- "Input is not a ENCODE ID string."
id_pattern <- "^ENC(SR|BS|DO|GM|AB|LB|FF|PL)\\d{3}[A-Z]{3}$"
if (!all(grepl(id_pattern, encode_id))) return(encode_id)
if (!(all(is.character(encode_id)) && all(grepl(id_pattern, encode_id)))) {
stop(stp_msg)
}

### Verify existence of file on ENCODE ###
check_dep("curl")
Expand Down
8 changes: 4 additions & 4 deletions R/check_JASPAR.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@
#' @inheritParams link_JASPAR
#' @inheritParams MotifPeeker
#'
#' @returns A character string specifying the path to the downloaded file. If
#' the input is not in JASPAR ID format, the input is returned as-is.
#' @returns A character string specifying the path to the downloaded file.
#'
#' @examples
#' check_JASPAR("MA1930.2")
#'
#' @export
check_JASPAR <- function(motif_id, verbose = FALSE) {
### Validate JASPAR ID ###
if (!is.character(motif_id)) return(motif_id)
if (!startsWith(motif_id, "MA")) return(motif_id)
stp_msg <- "Input is not a JASPAR ID string."
if (!(is.character(motif_id) && startsWith(motif_id, "MA"))) stop(stp_msg)

### Fetch file ###
return(use_cache(link_JASPAR(motif_id, download = TRUE), verbose = verbose))
}

34 changes: 34 additions & 0 deletions R/check_input.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#' Check for input validity and pass to appropriate function
#'
#' @param x The input to check.
#' @param type The type of input to check for. Supported types are:
#' \itemize{
#' \item \code{jaspar_id}: JASPAR identifier.
#' \item \code{motif}: `universalmotif` motif object.
#' \item \code{encode_id}: ENCODE identifier.
#' }
#' @param FUN The function to pass the input to.
#' @param inverse Logical indicating whether to return the input if it is
#' invalid for the specified `type`.
#' @param ... Additional arguments to pass to the `FUN` function.
#'
#' @returns `x` if the input is invalid for the specified `type`, or else the
#' output of the `FUN` function. If `inverse = TRUE`, the function returns the
#' output of the `FUN` function if the input is valid, or else `x`.
#'
#' @keywords internal
check_input <- function(x, type, FUN, inverse = FALSE, ...) {
valid <- switch(
tolower(type),
jaspar_id = is.character(x) && startsWith(x, "MA"),
encode_id = {
id_pattern <- "^ENC(SR|BS|DO|GM|AB|LB|FF|PL)\\d{3}[A-Z]{3}$"
all(is.character(x)) && all(grepl(id_pattern, x))
},
motif = inherits(x, "universalmotif"),
stop("Invalid type specified.")
)

if (inverse) ifelse(valid, return(x), return(FUN(x, ...)))
ifelse(valid, return(FUN(x, ...)), return(x))
}
11 changes: 8 additions & 3 deletions R/denovo_motifs.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,21 @@
#' (default = 6)
#' @param out_dir A \code{character} vector of output directory to save STREME
#' results to. (default = \code{tempdir()})
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' specifying run parameters. (default = SerialParam(), single core run)
#' @param debug A logical indicating whether to print debug messages while
#' running the function. (default = FALSE)
#' @param ... Additional arguments to pass to \code{STREME}. For more
#' information, refer to the official MEME Suite documentation on
#' \href{https://meme-suite.org/meme/doc/streme.html}{STREME}.
#' @inheritParams bpapply
#' @inheritParams motif_enrichment
#' @inheritParams MotifPeeker
#'
#' @returns A list of \code{\link[universalmotif]{universalmotif}} objects and
#' associated metadata.
#'
#' @examples
#' if (memes::meme_is_installed()) {
#' data("CTCF_TIP_peaks", package = "MotifPeeker")
#' if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
#' genome_build <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
Expand All @@ -48,6 +52,7 @@
#' out_dir = tempdir())
#' print(res[[1]]$consensus)
#' }
#' }
#' @export
denovo_motifs <- function(seqs,
trim_seq_width,
Expand All @@ -58,7 +63,7 @@ denovo_motifs <- function(seqs,
filter_n = 6,
out_dir = tempdir(),
meme_path = NULL,
workers = 1,
BPPARAM = BiocParallel::SerialParam(),
verbose = FALSE,
debug = FALSE,
...) {
Expand Down Expand Up @@ -94,7 +99,7 @@ denovo_motifs <- function(seqs,
### Filter motifs ###
out <- filter_repeats(streme_out, filter_n)
return(out)
}, workers = workers, verbose = verbose
}, BPPARAM = BPPARAM, verbose = verbose
)
messager("STREME run complete.", v = verbose)
return(res)
Expand Down
Loading

0 comments on commit 66d666f

Please sign in to comment.