Skip to content

Commit

Permalink
Preparing for release
Browse files Browse the repository at this point in the history
  • Loading branch information
mobiusklein committed Dec 25, 2023
1 parent c0aa08d commit 6e5848d
Show file tree
Hide file tree
Showing 19 changed files with 641 additions and 259 deletions.
34 changes: 20 additions & 14 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,36 @@ and this project adheres to [Semantic Versioning].
## [Unreleased]

### Added
- Limited async MzML reader support

- Limited async mzML reader support
- mzMLb read and write support
- Reading mzML and MGF from `STDIN`. HDF5, and ergo mzMLb is not supported on non-seek-able I/O devices. See the `from_stdin` example.
- Parsing of mzML `run` and `spectrumList` metadata, although they are still not a part of the common data model
- Spectrum averaging now has eager `averaging` and `averaging_deferred` adapter implementations as iterator adapters on `SpectrumGroupIterator`.
The deferred adapter is preferred for distributing the process with `rayon`. See the `averaging_writer` example.
- Added ordered parallel iteration collation with `mzdata::spectrum::Collator` to make consuming `rayon` iterators easier while preserving the
original order. See `averaging_writer` example.
- The mzML and mzMLb writers now write the total ion chromatogram and base peak chromatogram

### Changed

- `RandomAccessScanIterator` methods now return mutable references to self, making them actually useful in a chain.
- Make some window size attributes smaller as they do not require double precision.
- Clean up the internal implementation of the various internal `SpectrumBuilder` types.
- Factor up `mzdata::spectrum::signal` to be less monolithic.

### Deprecated

### Removed

### Fixed

### Security

- Factor up `mzdata::spectrum::signal` to be less monolithic and a complete redesign of the traits used to convert `mzpeaks` to and from binary arrays.
- Massive refactoring of `mzdata::io::traits` to make more traits depend upon `ScanSource` instead of `SpectrumIterator` and to make things slightly less verbose.
- Switched the default `mzsignal` backend to `nalgebra` instead of `intel-mkl` for simplicity.

## [0.5.0] - 2021-09-22

### Added

- MzML writing via `mzdata::io::mzml::MzMLWriter`
- Added feature flags to allow the user to choose amongst more `flate2` backends (zlib *default*, zlib-ng-compat, miniz_oxide)
- Added feature flags to allow the user to choose amongst more `flate2` backends (zlib _default_, zlib-ng-compat, miniz_oxide)
- Grouped iteration mode for connecting precursor and product spectra over an iterator stream using the `groups` method of `ScanSource`.

### Changed

- Re-structuring and renaming of the various iterator mechanisms for more
consistency. `ScanIterator` -> `SpectrumIterator`, et cetera. Minor refactoring
of this sort expected to come for `ScanSource` as responsibilities are worked out.
Expand All @@ -43,16 +47,18 @@ and this project adheres to [Semantic Versioning].
### Removed

### Fixed

- Fixed documentation in several places, particularly where it was substantially out of date.

### Security


<!-- Links -->

[keep a changelog]: https://keepachangelog.com/en/1.0.0/
[semantic versioning]: https://semver.org/spec/v2.0.0.html

<!-- Versions -->

[unreleased]: https://github.com/mobiusklein/mzdata/compare/v0.5.0...HEAD
[0.5.0]: https://github.com/mobiusklein/mzdata/compare/v0.1.0...v0.5.0
[0.1.0]: https://github.com/mobiusklein/mzdata/releases/tag/v0.1.0
[0.1.0]: https://github.com/mobiusklein/mzdata/releases/tag/v0.1.0
30 changes: 15 additions & 15 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ async = ["tokio", "quick-xml/async-tokio"]
[dependencies]
regex = "1"
lazy_static = "1.4.0"
serde = { version = "1.0.126", features = ["derive"] }
serde_json = "1.0.64"
serde = { version = "1.0.193", features = ["derive"] }
serde_json = "1.0.108"
quick-xml = { version = "0.30", features = [ "serialize" ] }
base64 = "0.21.3"
flate2 = {version = "1.0.20"}
Expand Down
34 changes: 23 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,31 @@ use std::fs;
use mzdata::prelude::*;
use mzpeaks::{Tolerance, prelude::*};
use mzdata::io::MzMLReader;
use mzdata::spectrum::{SignalContinuity};

let reader = MzMLReader::new(fs::File::open("./test/data/small.mzML").unwrap());
for spectrum in reader {
println!("Scan {} => BP {}", spectrum.id(), spectrum.peaks().base_peak().mz);
if spectrum.signal_continuity() < SignalContinuity::Profile {
let peak_picked = spectrum.into_centroid().unwrap();
println!("Matches for 579.155: {:?}", peak_picked.peaks.all_peaks_for(579.155, Tolerance::Da(0.02)));
use mzdata::spectrum::SignalContinuity;


fn main() {
let mut ms1_count = 0;
let mut msn_count = 0;
let reader = MzMLReader::new(fs::File::open("./test/data/small.mzML").unwrap());
for spectrum in reader {
if spectrum.ms_level() == 1 {
ms1_count += 1;
} else {
msn_count += 1;
}
println!("Scan {} => BP {}", spectrum.id(), spectrum.peaks().base_peak().mz);
if spectrum.signal_continuity() < SignalContinuity::Profile {
let peak_picked = spectrum.into_centroid().unwrap();
println!("Matches for 579.155: {:?}", peak_picked.peaks.all_peaks_for(579.155, Tolerance::Da(0.02)));
}
}
println!("MS1 Count: {}\nMSn Count: {}", ms1_count, msn_count);
assert_eq!(ms1_count, 14);
assert_eq!(msn_count, 34);
}
println!("MS1 Count: {}\nMSn Count: {}", ms1_count, msn_count);
assert_eq!(ms1_count, 14);
assert_eq!(msn_count, 34);


```

### Supported Formats
Expand Down
28 changes: 28 additions & 0 deletions examples/readme.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use std::fs;
use mzdata::prelude::*;
use mzpeaks::{Tolerance, prelude::*};
use mzdata::io::MzMLReader;
use mzdata::spectrum::SignalContinuity;


fn main() {
let mut ms1_count = 0;
let mut msn_count = 0;
let reader = MzMLReader::new(fs::File::open("./test/data/small.mzML").unwrap());
for spectrum in reader {
if spectrum.ms_level() == 1 {
ms1_count += 1;
} else {
msn_count += 1;
}
println!("Scan {} => BP {}", spectrum.id(), spectrum.peaks().base_peak().mz);
if spectrum.signal_continuity() < SignalContinuity::Profile {
let peak_picked = spectrum.into_centroid().unwrap();
println!("Matches for 579.155: {:?}", peak_picked.peaks.all_peaks_for(579.155, Tolerance::Da(0.02)));
}
}
println!("MS1 Count: {}\nMSn Count: {}", ms1_count, msn_count);
assert_eq!(ms1_count, 14);
assert_eq!(msn_count, 34);
}

18 changes: 11 additions & 7 deletions src/io.rs
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
mod infer_format;
pub mod mgf;
pub mod mzml;
#[cfg(feature = "mzmlb")]
pub mod mzmlb;
mod offset_index;
pub mod traits;
mod utils;
mod infer_format;

pub(crate) mod compression;

pub use crate::io::utils::{DetailLevel, PreBufferedStream};
pub use crate::io::mgf::{MGFReader, MGFError, MGFWriter};
pub use crate::io::mzml::{MzMLReader, MzMLParserError, MzMLWriter};
pub use crate::io::infer_format::{
infer_format, infer_from_path, infer_from_stream, open_file, MassSpectrometryFormat,
};
pub use crate::io::mgf::{MGFError, MGFReader, MGFWriter};
#[cfg(feature = "async")]
pub use crate::io::mzml::AsyncMzMLReaderType;
pub use crate::io::mzml::{MzMLParserError, MzMLReader, MzMLWriter};
#[cfg(feature = "mzmlb")]
pub use crate::io::mzmlb::{MzMLbReader, MzMLbError};
pub use crate::io::mzmlb::{MzMLbError, MzMLbReader};
pub use crate::io::offset_index::OffsetIndex;
pub use crate::io::traits::{RandomAccessSpectrumIterator, ScanAccessError, SpectrumIterator, ScanSource};
pub use crate::io::infer_format::{open_file, infer_format, infer_from_path, MassSpectrometryFormat};
pub use crate::io::traits::{
RandomAccessSpectrumIterator, ScanAccessError, ScanSource, SpectrumIterator,
};
pub use crate::io::utils::{DetailLevel, PreBufferedStream};
1 change: 1 addition & 0 deletions src/io/infer_format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ pub fn infer_from_path<P: Into<path::PathBuf>,>(path: P) -> (MassSpectrometryFor
/// stream is GZIP compressed. This assumes the stream is seekable.
pub fn infer_from_stream<R: Read + Seek>(stream: &mut R) -> io::Result<(MassSpectrometryFormat, bool)> {
let mut buf = Vec::with_capacity(100);
buf.resize(100, b'\0');
let current_pos = stream.stream_position()?;
stream.read_exact(buf.as_mut_slice())?;
let is_stream_gzipped = is_gzipped(buf.as_slice());
Expand Down
24 changes: 16 additions & 8 deletions src/io/mgf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ use thiserror::Error;

use lazy_static::lazy_static;
use mzpeaks::{
CentroidPeak, DeconvolutedPeak, IntensityMeasurement, MZLocated, PeakCollection, peak::KnownCharge
peak::KnownCharge, CentroidPeak, DeconvolutedPeak, IntensityMeasurement, MZLocated,
PeakCollection,
};
use regex::Regex;

Expand All @@ -42,7 +43,7 @@ use crate::spectrum::spectrum::{
use crate::spectrum::PeakDataLevel;
use crate::spectrum::SignalContinuity;
use crate::spectrum::{
Precursor, PrecursorSelection, SelectedIon, SpectrumLike, SpectrumDescription,
Precursor, PrecursorSelection, SelectedIon, SpectrumDescription, SpectrumLike,
};
use crate::utils::neutral_mass;

Expand Down Expand Up @@ -273,7 +274,7 @@ impl<
let (key, value) = line.split_once('=').unwrap();

match key {
"TITLE" => builder.description.id = String::from(value),
"TITLE" => builder.description.id = value.to_string(),
"RTINSECONDS" => {
let scan_ev = builder
.description
Expand All @@ -285,7 +286,9 @@ impl<
"PEPMASS" => {
let mut parts = value.split_ascii_whitespace();
let mz: f64 = parts.next().unwrap().parse().unwrap();
let intensity: f32 = parts.next().unwrap().parse().unwrap();
let intensity: f32 = parts.next().and_then(|v| {
Some(v.parse().unwrap())
}).unwrap_or_default();
let charge: Option<i32> = parts.next().map(|c| c.parse().unwrap());
builder.description.precursor = Some(Precursor {
ion: SelectedIon {
Expand Down Expand Up @@ -672,6 +675,14 @@ impl<
> MSDataFileMetadata for MGFReaderType<R, C, D>
{
crate::impl_metadata_trait!();

fn spectrum_count_hint(&self) -> Option<u64> {
if self.index.init {
Some(self.index.len() as u64)
} else {
None
}
}
}

pub type MGFReader<R> = MGFReaderType<R, CentroidPeak, DeconvolutedPeak>;
Expand Down Expand Up @@ -834,10 +845,7 @@ TITLE="#,
Ok(())
}

pub fn write<S: SpectrumLike<C, D> + 'static>(
&mut self,
spectrum: &S,
) -> io::Result<usize> {
pub fn write<S: SpectrumLike<C, D> + 'static>(&mut self, spectrum: &S) -> io::Result<usize> {
let description = spectrum.description();
if description.ms_level == 1 {
log::warn!(
Expand Down
Loading

0 comments on commit 6e5848d

Please sign in to comment.