Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioc review 2 #3

Merged
merged 7 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Icon?
^doc$
^Meta$
^codecov\.yml$
^\.DS_Store$
^_pkgdown\.yml$
^docs$
^pkgdown$
Expand Down
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: MotifPeeker
Title: Benchmarking Epigenomic Profiling Methods Using Motif Enrichment
Version: 0.99.6
Version: 0.99.7
Authors@R: c(
person(given = "Hiranyamaya",
family = "Dash",
Expand Down Expand Up @@ -64,7 +64,6 @@ Imports:
stats,
utils
Suggests:
BiocStyle,
BSgenome.Hsapiens.UCSC.hg19,
BSgenome.Hsapiens.UCSC.hg38,
downloadthis,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ importFrom(BSgenome,getSeq)
importFrom(BiocFileCache,BiocFileCache)
importFrom(BiocFileCache,bfcinfo)
importFrom(BiocFileCache,bfcrpath)
importFrom(BiocParallel,bpnworkers)
importFrom(Biostrings,DNAString)
importFrom(Biostrings,letterFrequency)
importFrom(DT,datatable)
Expand Down
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# MotifPeeker 0.99.7

## New Features
* Replace `workers` argument with `BPPARAM`. Give users more control over the
BiocParallel implementation.

## Miscellaneous

* Remove `cat()` calls in functions.
* Implement helper `check_input()` to validate input before passing them to
other functions.
* Run examples and tests only if MEME Suite is detected (only for functions
which require MEME Suite).


# MotifPeeker 0.99.5 / 0.99.6

## Miscellaneous
Expand Down
65 changes: 39 additions & 26 deletions R/MotifPeeker.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,28 @@
#' hours to complete. To make computation faster, we highly recommend tuning the
#' following arguments:
#' \describe{
#' \item{\code{workers}}{Running motif discovery in parallel can
#' significantly reduce runtime, but it is very memory-intensive, consuming
#' upwards of 10GB of RAM per thread. Memory starvation can greatly slow the
#' process, so set \code{workers} with caution.}
#' \item{\code{denovo_motifs}}{The number of motifs to discover per sequence
#' group exponentially increases runtime. We recommend no more than 5
#' motifs to make a meaningful inference.}
#' \item{\code{trim_seq_width}}{Trimming sequences before running de-novo
#' motif discovery can significantly reduce the search space. Sequence
#' length can exponentially increase runtime. We recommend running the
#' script with \code{denovo_motif_discovery = FALSE} and studying the
#' motif-summit distance distribution under general metrics to find the
#' sequence length that captures most motifs. A good starting point is 150
#' but it can be reduced further if appropriate.}
#' \item{\code{BPPARAM=MulticoreParam(x)}}{Running motif discovery in
#' parallel can significantly reduce runtime, but it is very
#' memory-intensive, consuming 10+GB of RAM per thread. Memory starvation can
#' greatly slow the process, so set the number of cores with caution.}
#' \item{\code{denovo_motifs}}{The number of motifs to discover per sequence
#' group exponentially increases runtime. We recommend no more than 5
#' motifs to make a meaningful inference.}
#' \item{\code{trim_seq_width}}{Trimming sequences before running de-novo
#' motif discovery can significantly reduce the search space. Sequence
#' length can exponentially increase runtime. We recommend running the
#' script with \code{denovo_motif_discovery = FALSE} and studying the
#' motif-summit distance distribution under general metrics to find the
#' sequence length that captures most motifs. A good starting point is 150
#' but it can be reduced further if appropriate.}
#' }
#'
#' @param peak_files A character vector of path to peak files, or a vector of
#' GRanges objects generated using \code{\link{read_peak_file}}. Currently,
#' peak files from the following peak-calling tools are supported:
#' \itemize{
#' \item MACS2: \code{.narrowPeak} files
#' \item SEACR: \code{.bed} files
#' \item MACS2: \code{.narrowPeak} files
#' \item SEACR: \code{.bed} files
#' }
#' ENCODE file IDs can also be provided to automatically fetch peak file(s) from
#' the ENCODE database.
Expand Down Expand Up @@ -81,13 +81,22 @@
#' @param display A character vector specifying the display mode for the HTML
#' report once it is generated. (default = NULL) Options are:
#' \itemize{
#' \item \code{"browser"}: Open the report in the default web browser.
#' \item \code{"rstudio"}: Open the report in the RStudio Viewer.
#' \item \code{NULL}: Do not open the report.
#' \item \code{"browser"}: Open the report in the default web browser.
#' \item \code{"rstudio"}: Open the report in the RStudio Viewer.
#' \item \code{NULL}: Do not open the report.
#' }
#' @param workers An integer specifying the number of threads to use for
#' parallel processing. (default = 1)\cr
#' \strong{IMPORTANT:} For each worker, please ensure a minimum of 6GB of
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' enabling parallel execution. (default = SerialParam(), single-CPU run)\cr\cr
#' Following are two examples of how to set up parallel processing:
#' \itemize{
#' \item \code{BPPARAM = BiocParallel::MulticoreParam(4)}: Uses 4
#' CPU cores for parallel processing.
#' \item \code{library("BiocParallel")} followed by
#' \code{register(MulticoreParam(4))} sets all subsequent BiocParallel
#' functions to use 4 CPU cores. \code{Motifpeeker()} must be run
#' with \code{BPPARAM = BiocParallel::MulticoreParam()}.
#' }
#' \strong{IMPORTANT:} For each worker, please ensure a minimum of 8GB of
#' memory (RAM) is available as \code{denovo_motif_discovery} is
#' memory-intensive.
#' @param quiet A logical indicating whether to print markdown knit messages.
Expand All @@ -99,7 +108,7 @@
#' @inheritParams check_genome_build
#' @inheritParams read_motif_file
#' @inheritParams check_genome_build
#' @inheritParams get_bpparam
#' @inheritParams bpapply
#' @inheritParams memes::runFimo
#' @inheritParams denovo_motifs
#' @inheritParams find_motifs
Expand All @@ -111,6 +120,7 @@
#' @importFrom viridis scale_fill_viridis scale_color_viridis
#' @importFrom tools file_path_sans_ext
#' @importFrom rmarkdown render
#' @importFrom BiocParallel bpnworkers
#'
#' @return Path to the output directory.
#'
Expand Down Expand Up @@ -142,6 +152,7 @@
#' )
#'
#' \donttest{
#' if (memes::meme_is_installed()) {
#' # MotifPeeker takes time to run
#' MotifPeeker(
#' peak_files = peaks,
Expand All @@ -158,11 +169,11 @@
#' motif_db = NULL,
#' download_buttons = TRUE,
#' out_dir = tempdir(),
#' workers = 1,
#' debug = FALSE,
#' quiet = TRUE,
#' verbose = FALSE
#' )
#' }
#' }
#'
#' @export
Expand All @@ -186,7 +197,7 @@ MotifPeeker <- function(
out_dir = tempdir(),
save_runfiles = FALSE,
display = if (interactive()) "browser",
workers = 2,
BPPARAM = BiocParallel::SerialParam(), # Default to single-core
quiet = TRUE,
debug = FALSE,
verbose = FALSE
Expand Down Expand Up @@ -267,14 +278,16 @@ MotifPeeker <- function(
meme_path = meme_path,
out_dir = out_dir,
save_runfiles = save_runfiles,
workers = workers,
BPPARAM = BPPARAM,
debug = debug,
verbose = verbose
)

### Knit Rmd ###
rmd_file <- system.file("markdown",
"MotifPeeker.Rmd", package = "MotifPeeker")
messager("Starting run with", BiocParallel::bpnworkers(BPPARAM), "cores.",
v = verbose)
rmarkdown::render(
input = rmd_file,
output_dir = out_dir,
Expand Down
15 changes: 5 additions & 10 deletions R/bpapply.R
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
#' Use BiocParallel functions with appropriate parameters
#'
#' Light wrapper around \code{\link[BiocParallel]{BiocParallel}} functions that
#' automatically sets the appropriate parameters based on the number of workers
#' specified.
#' automatically applies appropriate parallel function.
#'
#' @param apply_fun A \code{\link[BiocParallel]{BiocParallel}} function to use
#' for parallel processing. (default = \code{BiocParallel::bplapply})
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' specifying run parameters. (default = bpparam())
#' @inheritParams BiocParallel::bplapply
#' @inheritDotParams BiocParallel::bplapply
#' @inheritDotParams BiocParallel::bpmapply
#' @inheritParams get_bpparam
#'
#' @import BiocParallel
#'
Expand All @@ -19,15 +19,15 @@
#' half_it <- function(arg1) return(arg1 / 2)
#' x <- seq_len(10)
#'
#' res <- MotifPeeker:::bpapply(x, half_it, workers = 2)
#' res <- MotifPeeker:::bpapply(x, half_it)
#' print(res)
#'
#' @keywords internal
bpapply <- function(
X,
FUN,
apply_fun = BiocParallel::bplapply,
workers = 1,
BPPARAM = BiocParallel::bpparam(),
progressbar = FALSE,
force_snowparam = FALSE,
verbose = FALSE,
Expand All @@ -38,11 +38,6 @@ bpapply <- function(
if (length(apply_fun_package) == 0 ||
apply_fun_package != "BiocParallel") stop(stp_msg)

BPPARAM <- get_bpparam(workers = workers,
progressbar = progressbar,
force_snowparam = force_snowparam,
verbose = verbose)

res <- apply_fun(X, FUN = FUN, BPPARAM = BPPARAM, ...)
return(res)
}
9 changes: 5 additions & 4 deletions R/check_ENCODE.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
#' thrown.
#' @inheritParams MotifPeeker
#'
#' @returns A character string specifying the path to the downloaded file. If
#' the input is not in ENCODE ID format, the input is returned as-is.
#' @returns A character string specifying the path to the downloaded file.
#'
#' @examples
#' if (requireNamespace("curl", quietly = TRUE) &&
Expand All @@ -20,10 +19,12 @@
#'
#' @export
check_ENCODE <- function(encode_id, expect_format, verbose = FALSE) {
if (!all(is.character(encode_id))) return(encode_id)
### Validate ENCODE ID ###
stp_msg <- "Input is not a ENCODE ID string."
id_pattern <- "^ENC(SR|BS|DO|GM|AB|LB|FF|PL)\\d{3}[A-Z]{3}$"
if (!all(grepl(id_pattern, encode_id))) return(encode_id)
if (!(all(is.character(encode_id)) && all(grepl(id_pattern, encode_id)))) {
stop(stp_msg)
}

### Verify existence of file on ENCODE ###
check_dep("curl")
Expand Down
8 changes: 4 additions & 4 deletions R/check_JASPAR.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@
#' @inheritParams link_JASPAR
#' @inheritParams MotifPeeker
#'
#' @returns A character string specifying the path to the downloaded file. If
#' the input is not in JASPAR ID format, the input is returned as-is.
#' @returns A character string specifying the path to the downloaded file.
#'
#' @examples
#' check_JASPAR("MA1930.2")
#'
#' @export
check_JASPAR <- function(motif_id, verbose = FALSE) {
### Validate JASPAR ID ###
if (!is.character(motif_id)) return(motif_id)
if (!startsWith(motif_id, "MA")) return(motif_id)
stp_msg <- "Input is not a JASPAR ID string."
if (!(is.character(motif_id) && startsWith(motif_id, "MA"))) stop(stp_msg)

### Fetch file ###
return(use_cache(link_JASPAR(motif_id, download = TRUE), verbose = verbose))
}

34 changes: 34 additions & 0 deletions R/check_input.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#' Check for input validity and pass to appropriate function
#'
#' @param x The input to check.
#' @param type The type of input to check for. Supported types are:
#' \itemize{
#' \item \code{jaspar_id}: JASPAR identifier.
#' \item \code{motif}: `universalmotif` motif object.
#' \item \code{encode_id}: ENCODE identifier.
#' }
#' @param FUN The function to pass the input to.
#' @param inverse Logical indicating whether to return the input if it is
#' invalid for the specified `type`.
#' @param ... Additional arguments to pass to the `FUN` function.
#'
#' @returns `x` if the input is invalid for the specified `type`, or else the
#' output of the `FUN` function. If `inverse = TRUE`, the function returns the
#' output of the `FUN` function if the input is valid, or else `x`.
#'
#' @keywords internal
check_input <- function(x, type, FUN, inverse = FALSE, ...) {
valid <- switch(
tolower(type),
jaspar_id = is.character(x) && startsWith(x, "MA"),
encode_id = {
id_pattern <- "^ENC(SR|BS|DO|GM|AB|LB|FF|PL)\\d{3}[A-Z]{3}$"
all(is.character(x)) && all(grepl(id_pattern, x))
},
motif = inherits(x, "universalmotif"),
stop("Invalid type specified.")
)

if (inverse) ifelse(valid, return(x), return(FUN(x, ...)))
ifelse(valid, return(FUN(x, ...)), return(x))
}
11 changes: 8 additions & 3 deletions R/denovo_motifs.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,21 @@
#' (default = 6)
#' @param out_dir A \code{character} vector of output directory to save STREME
#' results to. (default = \code{tempdir()})
#' @param BPPARAM A \code{\link[BiocParallel]{BiocParallelParam-class}} object
#' specifying run parameters. (default = SerialParam(), single core run)
#' @param debug A logical indicating whether to print debug messages while
#' running the function. (default = FALSE)
#' @param ... Additional arguments to pass to \code{STREME}. For more
#' information, refer to the official MEME Suite documentation on
#' \href{https://meme-suite.org/meme/doc/streme.html}{STREME}.
#' @inheritParams bpapply
#' @inheritParams motif_enrichment
#' @inheritParams MotifPeeker
#'
#' @returns A list of \code{\link[universalmotif]{universalmotif}} objects and
#' associated metadata.
#'
#' @examples
#' if (memes::meme_is_installed()) {
#' data("CTCF_TIP_peaks", package = "MotifPeeker")
#' if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
#' genome_build <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
Expand All @@ -48,6 +52,7 @@
#' out_dir = tempdir())
#' print(res[[1]]$consensus)
#' }
#' }
#' @export
denovo_motifs <- function(seqs,
trim_seq_width,
Expand All @@ -58,7 +63,7 @@ denovo_motifs <- function(seqs,
filter_n = 6,
out_dir = tempdir(),
meme_path = NULL,
workers = 1,
BPPARAM = BiocParallel::SerialParam(),
verbose = FALSE,
debug = FALSE,
...) {
Expand Down Expand Up @@ -94,7 +99,7 @@ denovo_motifs <- function(seqs,
### Filter motifs ###
out <- filter_repeats(streme_out, filter_n)
return(out)
}, workers = workers, verbose = verbose
}, BPPARAM = BPPARAM, verbose = verbose
)
messager("STREME run complete.", v = verbose)
return(res)
Expand Down
Loading
Loading