Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mobiusklein committed Mar 25, 2024
1 parent 7e98fdf commit 56d67d4
Show file tree
Hide file tree
Showing 27 changed files with 798 additions and 309 deletions.
25 changes: 19 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,24 @@ and this project adheres to [Semantic Versioning].
### Added
- `MGFReaderType` and `MGFWriterType` implement `MSDataFileMetadata`
- `ThermoRawFileReader` has been added to read Thermo RAW files when the .NET 8 runtime is available, using [`thermorawfilereader`](https://crates.io/crates/thermorawfilereader/0.2.1)
- `Source` and `Sink` algebraic types to represent things that spectra can be read from or written to.
- `mz_read` and `mz_write` are macros to open files for reading and writing in unboxed context, but which
only live within a scoped closure.
- `MassSpectrometryReadWriteProcess` trait for orchestrating reading from a `Source`, writing to a `Sink`, and transforming the
data through an arbitrary function specified as part of the trait implementation. Like `mz_read`/`mz_write`, the scope enclosed
by the trait method.

### Changed
- `MGFWriterType` now generates a spectrum title when one is absent, rather than defaulting to
the spectrum's native ID.
- `CURIE` can now be compared to `Param`
- Renamed `ScanWriter` to `SpectrumWriter` and `ScanSource` to `SpectrumSource` for consistency with other trait naming conventions.
- `MZFileReader::open_file` now returns an `io::Result` in keeping with the idea that reading a `File` might fail as well,
even if it is already open, because it is the wrong type of file. This also allows file formats that cannot be read from
arbitrary `io::Read` objects to signal failure without crashing the whole system.
- `Collator`, `std::sync::mpsc::{Sender, SyncSender}` now implement `SpectrumWriter` when properly parameterized.
- `PeakDataLevel` has been refactored into two types, `PeakDataLevel` is an owning type and `RefPeakDataLevel`
is a borrowing type.

### Deprecated

Expand All @@ -28,7 +41,7 @@ and this project adheres to [Semantic Versioning].
## [0.12.0] - 2024-01-29

### Changed
- Require a newer version of `mzsignal`, fixing the rather embarassing error of swapping FWHM
- Require a newer version of `mzsignal`, fixing the rather embarrassing error of swapping FWHM
and SNR during peak picking.
- Thicken the use of internal abstraction around `PrecursorSelection` for the future of allowing
more than one `SelectedIon` per `Precursor`.
Expand Down Expand Up @@ -58,7 +71,7 @@ and this project adheres to [Semantic Versioning].
## [0.8.0] - 2024-01-10

### Added
- Added `close` to the `ScanWriter` trait which "closes" the formatted structure of the file. As Rust lacks a notion of a "closed"
- Added `close` to the `SpectrumWriter` trait which "closes" the formatted structure of the file. As Rust lacks a notion of a "closed"
`io::Write`, the underlying writer isn't actually "closed" until the whole struct is dropped.
- Added `Drop` implementation for `MzMLWriterType` and `MzMLbWriterType` which ensures that the `close` method is called to make the
resulting file well-formed.
Expand All @@ -72,7 +85,7 @@ and this project adheres to [Semantic Versioning].
- `SpectrumGroupingIterator` and other such iterator support `RandomAccessSpectrumGroupingIterator`.

### Changed
- `ScanWriter` no longer applies a lifespan requirement on individual writing operations.
- `SpectrumWriter` no longer applies a lifespan requirement on individual writing operations.
- `filename` is no longer a required dependency, it is only needed to use `MzMLbReaderType::from_file` which otherwise
panics. It introduces unpredictable and difficult to diagnose compilation errors.
- `MGFWriterType` skips MS1 spectra automatically.
Expand Down Expand Up @@ -103,7 +116,7 @@ and this project adheres to [Semantic Versioning].
- Make some window size attributes smaller as they do not require double precision.
- Clean up the internal implementation of the various internal `SpectrumBuilder` types.
- Factor up `mzdata::spectrum::signal` to be less monolithic and a complete redesign of the traits used to convert `mzpeaks` to and from binary arrays.
- Massive refactoring of `mzdata::io::traits` to make more traits depend upon `ScanSource` instead of `SpectrumIterator` and to make things slightly less verbose.
- Massive refactoring of `mzdata::io::traits` to make more traits depend upon `SpectrumSource` instead of `SpectrumIterator` and to make things slightly less verbose.
- Switched the default `mzsignal` backend to `nalgebra` instead of `intel-mkl` for simplicity.

## [0.5.0] - 2021-09-22
Expand All @@ -112,13 +125,13 @@ and this project adheres to [Semantic Versioning].

- MzML writing via `mzdata::io::mzml::MzMLWriter`
- Added feature flags to allow the user to choose amongst more `flate2` backends (zlib _default_, zlib-ng-compat, miniz_oxide)
- Grouped iteration mode for connecting precursor and product spectra over an iterator stream using the `groups` method of `ScanSource`.
- Grouped iteration mode for connecting precursor and product spectra over an iterator stream using the `groups` method of `SpectrumSource`.

### Changed

- Re-structuring and renaming of the various iterator mechanisms for more
consistency. `ScanIterator` -> `SpectrumIterator`, et cetera. Minor refactoring
of this sort expected to come for `ScanSource` as responsibilities are worked out.
of this sort expected to come for `SpectrumSource` as responsibilities are worked out.

### Deprecated

Expand Down
2 changes: 1 addition & 1 deletion examples/from_stdin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use std::io::{self, Seek};
use std::time::Instant;

use mzdata::io::{
infer_from_stream, MassSpectrometryFormat, PreBufferedStream, RestartableGzDecoder, ScanSource,
infer_from_stream, MassSpectrometryFormat, PreBufferedStream, RestartableGzDecoder, SpectrumSource,
};
use mzdata::{MGFReader, MzMLReader};

Expand Down
6 changes: 3 additions & 3 deletions examples/msn_target_mapping.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ impl SelectedTarget {
}


pub struct MSnTargetTrackingIterator<R: ScanSource> {
pub struct MSnTargetTrackingIterator<R: SpectrumSource> {
source: SpectrumGroupingIterator<R, CentroidPeak, DeconvolutedPeak, MultiLayerSpectrum>,
time_width: f64,
error_tolerance: Tolerance,
Expand All @@ -88,7 +88,7 @@ pub struct MSnTargetTrackingIterator<R: ScanSource> {
targets: VecDeque<SelectionTargetSpecification>,
}

impl<R: ScanSource> MSnTargetTrackingIterator<R> {
impl<R: SpectrumSource> MSnTargetTrackingIterator<R> {
pub fn new(
source: SpectrumGroupingIterator<R, CentroidPeak, DeconvolutedPeak, MultiLayerSpectrum>,
time_width: f64,
Expand Down Expand Up @@ -232,7 +232,7 @@ impl<R: ScanSource> MSnTargetTrackingIterator<R> {
}
}

impl<R: ScanSource> Iterator for MSnTargetTrackingIterator<R> {
impl<R: SpectrumSource> Iterator for MSnTargetTrackingIterator<R> {
type Item = (SpectrumGroup, Vec<SelectedTarget>);

fn next(&mut self) -> Option<Self::Item> {
Expand Down
141 changes: 46 additions & 95 deletions examples/mzconvert.rs
Original file line number Diff line number Diff line change
@@ -1,28 +1,21 @@
use std::env;
use std::fs;
use std::io;
use std::path::PathBuf;
use std::process::exit;
use std::thread;
use std::time;

use std::sync::mpsc::sync_channel;

#[cfg(feature = "mzmlb")]
use mzdata::io::mzmlb;

#[cfg(feature = "thermorawfilereader")]
use mzdata::io::ThermoRawReader;

use mzdata::io::MassSpectrometryReadWriteProcess;
use mzdata::io::{
infer_format, infer_from_path, infer_from_stream, MassSpectrometryFormat, PreBufferedStream,
Sink, Source, MassSpectrometryReadWriteProcess,
checksum_file
};
use mzdata::meta::SourceFile;
use mzdata::params::ControlledVocabulary;
use mzdata::prelude::*;
use mzdata::{MGFReader, MGFWriter, MzMLReader, MzMLWriter};

use env_logger;
use mzpeaks::CentroidPeak;
use mzpeaks::DeconvolutedPeak;
use mzpeaks::{CentroidPeak, DeconvolutedPeak};

#[derive(Debug, Clone)]
pub struct MZConvert {
Expand All @@ -36,87 +29,14 @@ impl MZConvert {
}

pub fn main(&self) -> io::Result<()> {
self.reader_then()
}

fn reader_then(&self) -> io::Result<()> {
if self.inpath == "-" {
let mut stream = PreBufferedStream::new(io::stdin())?;
let (ms_format, _compressed) = infer_from_stream(&mut stream)?;
match ms_format {
MassSpectrometryFormat::MGF => self.writer_then(MGFReader::new(stream))?,
MassSpectrometryFormat::MzML => {
self.writer_then(MzMLReader::new(io::BufReader::new(stream)))?
}
_ => {
eprintln!("Could not infer input format from STDIN");
exit(1)
}
}
} else {
let (ms_format, _compressed) = infer_format(&self.inpath)?;
match ms_format {
MassSpectrometryFormat::MGF => {
let reader = MGFReader::open_path(&self.inpath)?;
self.writer_then(reader)?;
}
MassSpectrometryFormat::MzML => {
let reader = MzMLReader::open_path(&self.inpath)?;
self.writer_then(reader)?;
}
#[cfg(feature = "mzmlb")]
MassSpectrometryFormat::MzMLb => {
let reader = mzmlb::MzMLbReader::open_path(&self.inpath)?;
self.writer_then(reader)?;
}
#[cfg(feature = "thermorawfilereader")]
MassSpectrometryFormat::ThermoRaw => {
let reader = ThermoRawReader::open_path(&self.inpath)?;
self.writer_then(reader)?;
}
_ => {
eprintln!("Could not infer input format from {}", self.inpath);
exit(1)
}
}
};
Ok(())
let source = if self.inpath == "-" {
Source::Stdin
} else {Source::<_, _>::from(self.inpath.as_ref())};
let sink = Sink::<CentroidPeak, DeconvolutedPeak>::from(self.outpath.as_ref());
self.open_reader(source, sink)
}

fn writer_then<R: ScanSource + MSDataFileMetadata + Send + 'static>(
&self,
reader: R,
) -> io::Result<()> {
match infer_from_path(&self.outpath).0 {
MassSpectrometryFormat::MGF => {
let mut writer =
MGFWriter::new(io::BufWriter::new(fs::File::create(&self.outpath)?));
writer.copy_metadata_from(&reader);
self.task(reader, writer)?;
}
MassSpectrometryFormat::MzML => {
let mut writer =
MzMLWriter::new(io::BufWriter::new(fs::File::create(&self.outpath)?));
writer.copy_metadata_from(&reader);
self.task(reader, writer)?;
}
#[cfg(feature = "mzmlb")]
MassSpectrometryFormat::MzMLb => {
let mut writer = mzmlb::MzMLbWriterBuilder::new(&self.outpath)
.with_zlib_compression(9)
.create()?;
writer.copy_metadata_from(&reader);
self.task(reader, writer)?;
}
_ => {
eprintln!("Could not infer output format from {}", self.outpath);
exit(1)
}
}
Ok(())
}

fn task<R: ScanSource + Send + 'static, W: ScanWriter + Send + 'static>(
fn task<R: SpectrumSource + Send + 'static, W: SpectrumWriter + Send + 'static>(
&self,
reader: R,
mut writer: W,
Expand Down Expand Up @@ -150,17 +70,49 @@ impl MassSpectrometryReadWriteProcess<CentroidPeak, DeconvolutedPeak> for MZConv

fn task<
R: RandomAccessSpectrumIterator<CentroidPeak, DeconvolutedPeak>
+ ScanSource<CentroidPeak, DeconvolutedPeak>
+ SpectrumSource<CentroidPeak, DeconvolutedPeak>
+ Send
+ 'static,
W: ScanWriter<CentroidPeak, DeconvolutedPeak> + Send + 'static,
W: SpectrumWriter<CentroidPeak, DeconvolutedPeak> + Send + 'static,
>(
&self,
reader: R,
writer: W,
) -> Result<(), Self::ErrorType> {
self.task(reader, writer)
}

#[allow(unused)]
fn transform_writer<
R: RandomAccessSpectrumIterator<CentroidPeak, DeconvolutedPeak> + MSDataFileMetadata + SpectrumSource<CentroidPeak, DeconvolutedPeak> + Send + 'static,
W: SpectrumWriter<CentroidPeak, DeconvolutedPeak> + MSDataFileMetadata + Send + 'static,
>(
&self,
reader: R,
reader_format: mzdata::io::MassSpectrometryFormat,
mut writer: W,
writer_format: mzdata::io::MassSpectrometryFormat,
) -> Result<(R, W), Self::ErrorType> {
if self.inpath != "-" {
let pb: PathBuf = self.inpath.clone().into();
let checksum = checksum_file(&pb)?;
let has_already = reader.file_description().source_files.iter().flat_map(|f| f.get_param_by_name("SHA-1").map(|c| c.value == checksum)).all(|a| a);
if !has_already {
let mut sf = SourceFile::default();
sf.location = pb.parent().map(|p| format!("file://{}", p.to_string_lossy())).unwrap_or("file://".to_string());
sf.name = pb.file_name().map(|p| p.to_string_lossy().to_string()).unwrap_or("".to_string());
let par = ControlledVocabulary::MS.param_val(1000569u32, "SHA-1", checksum);
sf.add_param(par);
sf.file_format = reader_format.as_param();

if let Some(ref_sf) = reader.file_description().source_files.last() {
sf.id_format = ref_sf.id_format.clone()
}
writer.file_description_mut().source_files.push(sf);
}
};
Ok((reader, writer))
}
}

fn main() -> io::Result<()> {
Expand All @@ -174,7 +126,6 @@ fn main() -> io::Result<()> {
eprintln!("Please provide a path to write an MS file to, or '-'");
exit(1)
});

let start = time::Instant::now();
let job = MZConvert::new(inpath, outpath);
job.main()?;
Expand Down
2 changes: 1 addition & 1 deletion examples/readme.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use std::fs;
use mzdata::prelude::*;
use mzpeaks::{Tolerance, prelude::*};
use mzpeaks::Tolerance;
use mzdata::io::MzMLReader;
use mzdata::spectrum::SignalContinuity;

Expand Down
6 changes: 3 additions & 3 deletions src/io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ pub(crate) mod compression;

pub use crate::io::infer_format::{
infer_format, infer_from_path, infer_from_stream, open_file, MassSpectrometryFormat,
MassSpectrometryReadWriteProcess
MassSpectrometryReadWriteProcess, Sink, Source
};
pub use crate::io::mgf::{MGFError, MGFReader, MGFWriter};
#[cfg(feature = "async")]
Expand All @@ -26,10 +26,10 @@ pub use crate::io::mzml::{MzMLParserError, MzMLReader, MzMLWriter};
pub use crate::io::mzmlb::{MzMLbError, MzMLbReader};
pub use crate::io::offset_index::OffsetIndex;
pub use crate::io::traits::{
MZFileReader, RandomAccessSpectrumIterator, ScanSource, ScanWriter, SpectrumAccessError,
MZFileReader, RandomAccessSpectrumIterator, SpectrumSource, SpectrumWriter, SpectrumAccessError,
SpectrumGrouping, SpectrumIterator, StreamingSpectrumIterator,
};
pub use crate::io::utils::{DetailLevel, PreBufferedStream};
pub use crate::io::utils::{DetailLevel, PreBufferedStream, checksum_file};
pub use compression::RestartableGzDecoder;

#[cfg(feature = "thermorawfilereader")]
Expand Down
Loading

0 comments on commit 56d67d4

Please sign in to comment.