Skip to content

MAINT: rounding residue partial charges #5035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

tylerjereddy
Copy link
Member

@tylerjereddy tylerjereddy commented Apr 18, 2025

  • Fixes Query: amino acid residue charges #5032.

  • Given that 1e-4 is already optimistic for electron charge unit precision in MD forcefields, this patch proposes to round residue net charges to 4 decimal places to avoid the confusion described in the matching ticket.

  • Some obvious questions might include: why are we only doing this for AMBER topologies? why only round at the residue level and not i.e., upstream at atom level and also at "segment" levels? This is mostly for scoping, because this initial patch is targeted at the specific issue raised, and because it probably doesn't make sense to proceed much farther until things are discussed.

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5035.org.readthedocs.build/en/5035/

* Fixes MDAnalysisgh-5032.

* Given that 1e-4 is already optimistic for electron charge unit
precision in MD forcefields, this patch proposes to round residue
net charges to 4 decimal places to avoid the confusion described
in the matching ticket.

* Some obvious questions might include: why are we only doing this
for AMBER topologies? why only round at the residue level and
not i.e., upstream at atom level and also at "segment" levels? This
is mostly for scoping, because this initial patch is targeted
at the specific issue raised, and because it probably doesn't make
sense to proceed much farther until things are discussed.
attr = Charges(charges)
# in practice, the precision of partial charges
# in MD simulation forcefields is typically limited
# to a few decimal places (i.e., 1e-4 is already optimistic)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a citation perhaps, but only if there's an appetite for moving forward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our recent work shows that changes on the order of 1e-3 can have large impacts on the free energy. I've seen (unreported) cases where 5e-4 does subtle changes, probably 1e-5 is the best bet?

Probably @lilyminium is the best person to weigh in on this though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some quick thoughts below, apologies I've just skimmed this issue/PR and I'm in the middle of travel.

For the same reasons Oliver and Irfan have mentioned I'm not super keen on auto-rounding, especially to something as low as 4 dp -- just to throw out an example, OpenFF's FFs will assign charges with more dp than that and can be written out to AMBER formats (and we ensure integer charge at the molecule level but not necessarily residue level). Could rounding be a kwarg that's passed to the TOPParser instead? Then I'd be 100% in favour of having this capability in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quick comment, sorry to be brief, Lily: I don't think that a TOPparser argument is the answer. If we round atom charges on reading the topology, there's still no guarantee that they sum to integer values for each residue.

However, building in a rounding argument into any of our functions that aggregate (such as total_charge(decimals=N) (modelled after np.round() makes sense to me.

I don't see how to do this transparently for managed attributes such as residues.charges without maintaining some per-Universe state — which I don't like because something like Universe(..., decimals=4) is ambiguous (where do we round?) and also makes downstream analysis code dependent on how a universe was created. At the end of the day, I find np.round(residues.charges, decimals=4) a very clear way to express what the specific use case is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification Oliver -- I was just looking at the changed lines in the PR where the kwarg is passed into the Charge attr on top-parsing. So as I understand it atom charges aren't rounded in this PR, but the residue-rounding is set on parsing. I agree that's not super intuitive and that making the user just round charges themselves is probably easier and clearer.

Copy link

codecov bot commented Apr 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.63%. Comparing base (e64755c) to head (36662df).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5035   +/-   ##
========================================
  Coverage    93.62%   93.63%           
========================================
  Files          177      177           
  Lines        21978    21983    +5     
  Branches      3110     3111    +1     
========================================
+ Hits         20578    20583    +5     
  Misses         946      946           
  Partials       454      454           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst
Copy link
Member

orbeckst commented May 9, 2025

To start a discussion here: I am generally not too keen on altering data, e.g., rounding. I'd rather have the user decide what they want to do with it. I feel that the described use case – from what I gathered — was a somewhat direct use of numerical data without further consideration of the underlying meaning of the numbers.

I'd be strictly against rounding the atomic charges, which are taken directly from the input data, and should be presented as close to their original representation as possible — regardless how ridiculous we might find a charge spelled out to 8 decimals. (I understand that this is not the proposal here, I just wanted to make my stance clear.)

The per-residue total charge is something that MDAnalysis computes so I acknowledge that we do have the freedom to decide how we want to present it. I'd be more comfortable with giving the user the option to round — mainly that allows us to write a doc entry to say that "non-integer charges can happen and if you don't like it, set round=4" or something like that. However, personally I'd keep the default at no rounding.

It would be good to hear others. This also impinges on the "consistent dtype across the universe" plan that @hmacdope in particular has been pursuing.

@tylerjereddy
Copy link
Member Author

Got it, thanks. I don't have a strong view, was just passing this concern along from the scientists. Maybe I'll wait for others to chime in before adjusting to offer an option to round (or just documenting that folks may want to round on their end, whatever the team prefers).

@tylerjereddy
Copy link
Member Author

Is this converging on a docs-only suggestion that rounding may be needed to get sensible charges for some user applications?

I'm skeptical this will do much to prevent the original footgun I described. What may be more convenient for our API doesn't really do much for improving end user correctness.

That said, if that's the consensus I'll try to add a few sentences in the docs, but seems pretty low value/low visibility.

@tylerjereddy
Copy link
Member Author

Also, I think Irfan and I are in favor of rounding to some degree (or offering the API option to), while Oli and Lily are not, so this is currently somewhat tied, no?

@tylerjereddy
Copy link
Member Author

I think the fact that the devs are disagreeing a bit here also somewhat reinforces the potential for a less experienced user to experience a footgun.

@orbeckst
Copy link
Member

I think it should be a user choice to do any rounding. The whole work on the guessers has been driven by reducing well-meaning but not always correct choices. The idea of dtypes throughout wants to make the precision of operations transparent — admittedly, I’m not sure when this will get done, but we have it on the road map. Building in rounding by default does not fit in well with the direction in which we wanted to go.

How would you want to give the user the option? Could this be done with a guesser (eg something that has the assumption built in that residues have integer charges) so that these assumptions are made clear and require a conscientious choice? That would also help with reproducibility, I think.

@IAlibay
Copy link
Member

IAlibay commented May 18, 2025

In this particular case, where the issue is down to the idea that floats should sum to an int, how about we implement a residue / segment / etc level "formal_charge" attribute that either sums the per atom formal charges (if available) or approximates them from the partial charges if the sum is within a tolerance of an int? (edit: if it doesn't turn return a NaN)

@orbeckst
Copy link
Member

orbeckst commented May 18, 2025 via email

@tylerjereddy
Copy link
Member Author

User choice and/or separate attribute that auto-rounds seems ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query: amino acid residue charges
4 participants