Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature spec fix #93

Merged
merged 3 commits into from
Jul 15, 2024
Merged

Feature spec fix #93

merged 3 commits into from
Jul 15, 2024

Conversation

devonjkohler
Copy link
Contributor

Motivation and Context

Skyline and Spectronaut converters included zeroes instead of NAs which was messing with summarization. Zeroes were being treated as real values. These zeroes have been replaced with NAs for the time being.

Going forward a deeper look into MNAR vs MCAR missingness is needed.

Changes

  • Skyline converter Os are now NAs
  • Spectronaut converter Os are now NAs
  • Spectronaut converter (Qvalue filtering now off by default per Meena)

Testing

Nah

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

@@ -66,7 +66,7 @@ SkylinetoMSstatsFormat = function(
score_threshold = qvalue_cutoff,
direction = "smaller",
behavior = "fill",
fill_value = 0,
fill_value = NA_real_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the fill_value NA_real_ here but NA for Spectronaut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk! Just following the behavior from further up in the Skyline code.

Apparently:

"In R, NA_real_ is a typed missing value that represents a double, or value with a decimal point, that is missing. NA is the general symbol used to represent missing values in R, and R automatically converts NA to the appropriate typed missing value for the target vector. "

I'm guessing this really doesn't have an impact? Maybe we just go with NA?

@mstaniak do you have a reason to use NA_real_?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some packages (for example dplyr-related) check data types very strictly, including NA types. It should be NA_real_ everywhere

@@ -65,6 +65,8 @@ SpectronauttoMSstatsFormat = function(
score_filtering = list(pgq = pq_filter,
psm_q = qval_filter),
columns_to_fill = list("IsotopeLabelType" = "L"))
input[, Intensity := ifelse(Intensity == 0, NA, Intensity)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we only want to do this with Spectronaut and Skyline but not other converters? i.e. wondering if this could be part of the MSstatsPreprocess function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes we could do this for others, but I am guessing it's not an issue for them? This would require a closer look at the actual data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, fine with staying within Spectronaut and Skyline first and doing a deeper investigation later (maybe as part of the benchmarking work with Anshuman)

@devonjkohler devonjkohler merged commit c358bf7 into devel Jul 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants