Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-Disk manipulation of imzML datasets #28

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
^\.git$
^\.gitignore$
^\.travis.yml$
^.*\.Rproj$
^\.Rproj\.user$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.swp
.Rproj.user
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
Package: MALDIquantForeign
Version: 0.12
Version: 0.12.75
Date: 2019-01-30
Title: Import/Export Routines for 'MALDIquant'
Authors@R: c(person("Sebastian", "Gibb", role=c("aut", "cre"),
email="mail@sebastiangibb.de",
comment=c(ORCID="0000-0001-7406-4443")), person("Pietro",
"Franceschi", role=c("ctb"),
email="pietro.franceschi@fmach.it"))
Depends: R (>= 3.2.2), methods, MALDIquant (>= 1.16.4)
biocViews:
Depends: R (>= 3.2.2), methods, MALDIquant (>= 1.19.15)
Imports: base64enc, digest, readBrukerFlexData (>= 1.7), readMzXmlData
(>= 2.7), XML
(>= 2.7), XML, parallel
Suggests: knitr, testthat (>= 0.8), RNetCDF (>= 1.6.1)
Description: Functions for reading (tab, csv, Bruker fid, Ciphergen
XML, mzXML, mzML, imzML, Analyze 7.5, CDF, mMass MSD) and
Expand Down
17 changes: 17 additions & 0 deletions MALDIquantForeign.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 7
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
23 changes: 18 additions & 5 deletions R/import-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -354,14 +354,22 @@ importMzMl <- function(path, ...) {
#' Import imzML files
#'
#' This function imports files in imzML file format
#' into \code{\link[MALDIquant]{MassSpectrum-class}} or
#' into \code{\link[MALDIquant]{MassSpectrum-class}},
#' \code{\link[MALDIquant]{MassSpectrumOnDisk-class}} or
#' \code{\link[MALDIquant]{MassPeaks-class}} objects.
#'
#' @param path \code{character}, path to directory or file which should be read
#' in.
#' @param coordinates \code{matrix}, 2 column matrix that contains the x- and
#' y-coordinates for spectra that should be imported. Other spectra would be
#' ignored.
#' @param attachOnly logical (defaults to \code{FALSE}), whether to attach the dataset via the
#' \code{OnDiskVector} class without loading it into memory. See \code{\link[MALDIquant]{MassSpectrumOnDisk-class}}.
#' @param duplicateFile logical, when \code{TRUE} (default), creates a temporary copy of the binary \code{ibd}
#' file in the \code{tempdir} and attaches the \code{\link[MALDIquant]{MassSpectrumOnDisk}} objects to it so
#' as not to affect the original \code{ibd} file.
#' @param mc.cores integer, specifying number of cores for parallel evaluation through \code{parallel::mclapply}.
#' Falls back to \code{mc.cores = 1} is Windows.
#' @param \ldots arguments to be passed to
#' \code{\link[MALDIquantForeign]{import}}.
#'
Expand All @@ -372,9 +380,11 @@ importMzMl <- function(path, ...) {
#' \code{\link[MALDIquant]{MassSpectrum-class}},
#' \code{\link[MALDIquant]{MassPeaks-class}}
#' @author Sebastian Gibb
#' @references \url{http://strimmerlab.org/software/maldiquant/}, \cr
#' @references \url{http://strimmerlab.org/software/maldiquant/}, \cr\cr
#' Definition of \code{imzML} format:
#' \url{http://www.imzml.org/}
#' \url{http://www.imzml.org/}\cr\cr
#' \code{"matter"}: Kylie A. Bemis (2018). matter: A framework for rapid prototyping with binary data on disk. R
#' package version 1.8.0. \url{https://github.com/kuwisdelu/matter}.
#' @examples
#'
#' library("MALDIquant")
Expand All @@ -391,9 +401,12 @@ importMzMl <- function(path, ...) {
#' coordinates = cbind(1:2, c(1, 1)))
#'
#' @rdname importImzMl-functions
#'
#' @export
importImzMl <- function(path, coordinates=NULL, ...) {
import(path=path, type="imzml", coordinates=coordinates, ...)
importImzMl <- function(path, coordinates=NULL, attachOnly=FALSE, duplicateFile=TRUE,
mc.cores = 1L, ...) {
import(path=path, type="imzml", coordinates=coordinates, attachOnly=attachOnly,
duplicateFile=duplicateFile, mc.cores = 1L, ...)
}

#' Import Ciphergen XML files
Expand Down
92 changes: 73 additions & 19 deletions R/importImzMl-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
## along with MALDIquantForeign. If not, see <http://www.gnu.org/licenses/>

.importImzMl <- function(file, centroided=FALSE, massRange=c(0, Inf),
minIntensity=0, coordinates=NULL,
verbose=FALSE) {
minIntensity=0, coordinates=NULL, attachOnly=FALSE,
duplicateFile=TRUE, mc.cores = 1L, verbose=FALSE) {

.msg(verbose, "Reading spectrum from ", sQuote(file), " ...")

Expand All @@ -31,6 +31,15 @@
if (!file.exists(ibdFilename)) {
stop("File ", sQuote(ibdFilename), " doesn't exists!")
}

if (attachOnly) { # attach rather than load
if (duplicateFile) { # duplicate the ibd file to the temp dir in order to keep the original ibd intact
tf <- paste0(tempfile(), "_", basename(ibdFilename))
file.copy(from=ibdFilename, to=tf)
ibdFilename <- tf

}
}

s <- .parseMzMl(file=file, verbose=verbose)

Expand Down Expand Up @@ -89,35 +98,80 @@
}
n <- x[column, "length"]
e <- x[column, "encodedLength"]
readBin(file, double(), n=n, size=e/n, signed=TRUE, endian="little")

if(attachOnly){

OnDiskVector(path=unname(summary(ibd)[[1]]), n=n, offset=x[column, "offset"], size=8L)

}else{

readBin(file, double(), n=n, size=e/n, signed=TRUE, endian="little")
}

}

n <- length(sel)
spectra <- vector(mode="list", length=n)

isProcessed <- s$ims$type == "processed"
isSeekNeeded <- length(s$ims$ibd) > length(sel)

if(isProcessed && attachOnly){
message("The imzML file is of type 'processed'. The 'attachOnly' option is only available ",
"for 'continuous' type and therefore will be overridden. In-memory MassPeaks objects will be created.")
attachOnly <- FALSE
}


if (!isProcessed) {
mass <- .readValues(ibd, s$ims$ibd[[sel[1L]]], "mass", isSeekNeeded)
}

## read mass and intensity values
for (i in seq(along=sel)) {
.msg(verbose, "Reading binary data for spectrum ", i, "/", n, " ...")

m <- modifyList(s$metaData, s$spectra[[sel[i]]]$metaData)
m$file <- file

if (isProcessed) {
mass <- .readValues(ibd, s$ims$ibd[[sel[i]]], "mass", isSeekNeeded)
}
intensity <- .readValues(ibd, s$ims$ibd[[sel[i]]], "intensity", isSeekNeeded)
spectra[[i]] <- .createMassObject(mass=mass, intensity=intensity,
metaData=m, centroided=centroided,
massRange=massRange,
minIntensity=minIntensity,
verbose=verbose)
## read mass and intensity values - possibly in parallel
mc.cores <- ifelse(.Platform$OS.type == "windows", 1, mc.cores)

spectra <- parallel::mclapply(X = seq_along(sel),
mc.cores = mc.cores,
FUN = function(i) {

.msg(verbose, "Reading binary data for spectrum ", i, "/", n, " ...")

m <- modifyList(s$metaData, s$spectra[[sel[i]]]$metaData)
m$file <- file

if (isProcessed) {
mass <- .readValues(ibd, s$ims$ibd[[sel[i]]], "mass", isSeekNeeded)
}
intensity <- .readValues(ibd, s$ims$ibd[[sel[i]]], "intensity", isSeekNeeded)

if(attachOnly){
tmpSpectrum <- new("MassSpectrumOnDisk", mass=mass, intensity=intensity,
metaData=m)
}else{
tmpSpectrum <- .createMassObject(mass=mass, intensity=intensity,
metaData=m, centroided=centroided,
massRange=massRange,
minIntensity=minIntensity,
verbose=verbose)
}

tmpSpectrum
})




.msg(verbose, "Done. ")

if(attachOnly)
{
if(duplicateFile)
message("\nNOTE: imzML dataset was loaded via attacheOnly option and a duplicate file was generate. ",
"Any changes made to the spectra are directly written to the duplicate file.\n ")
else
message("\nNOTE: imzML dataset was loaded via attacheOnly option to the ORIGINAL FILE. ",
"Any changes made to the spectra are directly written to the imzML file.\n ")
}

spectra
}
19 changes: 15 additions & 4 deletions man/importImzMl-functions.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.