Retrieve spectra by USI #14

douweschulte · 2024-10-28T16:54:31Z

I wanted to handle USIs in the annotator so I needed support to get the spectrum by USI. Let me know if I do something totally off.

I want to add some tiny things (mostly error handling) but the main stuff of the PR is fixed.

douweschulte · 2024-10-28T16:55:13Z

Sorry apparently the history has been messed up a bit in my branch, I will fix that stuff (along with the other edits) tomorrow.

mobiusklein · 2024-10-28T17:45:54Z

Thank you for writing this. I'll take a closer look later today.

Could you please compare your spectrum retrieval code with https://github.com/mobiusklein/mzdata/blob/main/src/io/proxi.rs to see if they're compatible, or we can remove my existing PROXI codec since it's only for parsing, not making network requests. I also tangled with serde_json and number parsing ambiguity.

All the same, please gate this functionality behind the proxi feature since that's the proper name for the API you're using.

douweschulte · 2024-10-28T18:16:59Z

Will do! Thanks for the pointers.

douweschulte · 2024-10-29T12:31:14Z

I merged the two files, mostly leaning upon your present serializations and deserializations as they seemed more complete than mine. I fixed the parsing errors that arose from most PROXI servers but stuck with one from PRIDE, where the CURIE string has text in the accession number (see below). I can either work on the CURIE handling to allow text in the accession numbers, or try to ignore this in parsing.

{
    "accession": "NCIT:C19067",
    "name": "project title",
    "value": "Sequencing the anti-MUC1 hybridoma antibody 139H2"
}

Additionally some of the PROXI server are more strict in handling the interpretation part (need the charge for the peptide) so I choose to strip the interpretation from the USI before sending it to any server. Do you think there is a better way of doing this? I could add the current USI again when the result is formed so that this behaviour is hidden for end users.

I want to add an async version as well to allow for different access patterns so that will be added still. Oh and I will fix the git history at some point.

mobiusklein · 2024-10-29T14:18:17Z

Additionally some of the PROXI server are more strict in handling the interpretation part (need the charge for the peptide) so I choose to strip the interpretation from the USI before sending it to any server. Do you think there is a better way of doing this? I could add the current USI again when the result is formed so that this behaviour is hidden for end users.

The PROXI docs are incomplete/outdated and leaves much as implementation details. For example , if I remember correctly, the MassIVE endpoint would include the USI in the response while the PRIDE wouldn't. Additionally, different implementations might indicate failure differently. If you've a will to do it, you could send the USI as-is, and if it is rejected try to re-send it again without the interpretation, but what might be best right now is to understand the failure modes rather than trying to hide them from the end-user, and then when those modes are well understood, try to add error handling.

I fixed the parsing errors that arose from most PROXI servers but stuck with one from PRIDE, where the CURIE string has text in the accession number (see below). I can either work on the CURIE handling to allow text in the accession numbers, or try to ignore this in parsing.

I was not planning to handle additional controlled vocabularies beyond the two used for mzML in mzdata, but I see two paths forward here:

Change the accession field in CURIE and Param with an opaque token data structure that we may need to bend over backwards to make const-compatible, causing massive breaking changes everywhere.
Write a run-time or compile-time look-up table/perfect hashing function to map NCIT accession codes that we can tie to a ControlledVocabulary method to encode/decode the accession. A PFH is guaranteed to exist as the keyspace is much, much smaller than the capacity of a u32, but a look up table is easier to invert to render it back to the text version. I know that look up tables can be generated at compile time and are definitely const-compatible.

I'll expand on the drawbacks later today.

douweschulte · 2024-10-29T16:21:46Z

I removed the USI interpretation stripping and tortured some apis a bit to get more known error messages. I do agree with the idea of having as much control as library user as possible, it is very possible still to remove the interpretation field before running the get_spectrum command.

On the NCIT, I think I understand what you mean with the perfect hashing scheme, but I would not trust my own implementation at this point. Do you need this behaviour in more places, or do you think you will need it in the future? Otherwise I suggest to create a CURIEOrString type to contain any nonstandard strings, which would only make this file slightly more complex. Or possibly ignore any CURIES that cannot be parsed, which would remove some metadata, but this one is only used as a general title so might not be very much used.

douweschulte added 6 commits October 16, 2024 11:25

Fixed rare crash while mgf parsing

b08af06

Fixed silent error that returns an unfinished spectrum

eae7777

Some refactoring

76c920e

Handle charge when the sign is at the tail

fb2233a

Merge branch 'main' of https://github.com/mobiusklein/mzdata

e5294c7

First start on retrieving USIs

96539c8

douweschulte added 2 commits October 29, 2024 11:58

Snapshot for rust crash

f0f3c24

Finished error handling PROXI responses

0a12b06

douweschulte added 2 commits October 29, 2024 16:56

Added async USI get_spectrum

6c7433d

Removed interpretation stripping and added more known errors

7be52cf

Bit more documentation on the USIs wich require a charge

f2e3ba6

douweschulte mentioned this pull request Oct 29, 2024

Added method to retrieve the raw spectrum from any USI #15

Merged

douweschulte closed this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve spectra by USI #14

Retrieve spectra by USI #14

douweschulte commented Oct 28, 2024

douweschulte commented Oct 28, 2024

mobiusklein commented Oct 28, 2024

douweschulte commented Oct 28, 2024 •

edited

Loading

douweschulte commented Oct 29, 2024 •

edited

Loading

mobiusklein commented Oct 29, 2024

douweschulte commented Oct 29, 2024

Retrieve spectra by USI #14

Retrieve spectra by USI #14

Conversation

douweschulte commented Oct 28, 2024

douweschulte commented Oct 28, 2024

mobiusklein commented Oct 28, 2024

douweschulte commented Oct 28, 2024 • edited Loading

douweschulte commented Oct 29, 2024 • edited Loading

mobiusklein commented Oct 29, 2024

douweschulte commented Oct 29, 2024

douweschulte commented Oct 28, 2024 •

edited

Loading

douweschulte commented Oct 29, 2024 •

edited

Loading