-
Notifications
You must be signed in to change notification settings - Fork 712
MAINT: rounding residue partial charges #5035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
* Fixes MDAnalysisgh-5032. * Given that 1e-4 is already optimistic for electron charge unit precision in MD forcefields, this patch proposes to round residue net charges to 4 decimal places to avoid the confusion described in the matching ticket. * Some obvious questions might include: why are we only doing this for AMBER topologies? why only round at the residue level and not i.e., upstream at atom level and also at "segment" levels? This is mostly for scoping, because this initial patch is targeted at the specific issue raised, and because it probably doesn't make sense to proceed much farther until things are discussed.
attr = Charges(charges) | ||
# in practice, the precision of partial charges | ||
# in MD simulation forcefields is typically limited | ||
# to a few decimal places (i.e., 1e-4 is already optimistic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a citation perhaps, but only if there's an appetite for moving forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our recent work shows that changes on the order of 1e-3 can have large impacts on the free energy. I've seen (unreported) cases where 5e-4 does subtle changes, probably 1e-5 is the best bet?
Probably @lilyminium is the best person to weigh in on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some quick thoughts below, apologies I've just skimmed this issue/PR and I'm in the middle of travel.
For the same reasons Oliver and Irfan have mentioned I'm not super keen on auto-rounding, especially to something as low as 4 dp -- just to throw out an example, OpenFF's FFs will assign charges with more dp than that and can be written out to AMBER formats (and we ensure integer charge at the molecule level but not necessarily residue level). Could rounding be a kwarg that's passed to the TOPParser instead? Then I'd be 100% in favour of having this capability in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quick comment, sorry to be brief, Lily: I don't think that a TOPparser argument is the answer. If we round atom charges on reading the topology, there's still no guarantee that they sum to integer values for each residue.
However, building in a rounding argument into any of our functions that aggregate (such as total_charge(decimals=N)
(modelled after np.round() makes sense to me.
I don't see how to do this transparently for managed attributes such as residues.charges
without maintaining some per-Universe state — which I don't like because something like Universe(..., decimals=4)
is ambiguous (where do we round?) and also makes downstream analysis code dependent on how a universe was created. At the end of the day, I find np.round(residues.charges, decimals=4)
a very clear way to express what the specific use case is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification Oliver -- I was just looking at the changed lines in the PR where the kwarg is passed into the Charge attr on top-parsing. So as I understand it atom charges aren't rounded in this PR, but the residue-rounding is set on parsing. I agree that's not super intuitive and that making the user just round charges themselves is probably easier and clearer.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #5035 +/- ##
========================================
Coverage 93.62% 93.63%
========================================
Files 177 177
Lines 21978 21983 +5
Branches 3110 3111 +1
========================================
+ Hits 20578 20583 +5
Misses 946 946
Partials 454 454 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
To start a discussion here: I am generally not too keen on altering data, e.g., rounding. I'd rather have the user decide what they want to do with it. I feel that the described use case – from what I gathered — was a somewhat direct use of numerical data without further consideration of the underlying meaning of the numbers. I'd be strictly against rounding the atomic charges, which are taken directly from the input data, and should be presented as close to their original representation as possible — regardless how ridiculous we might find a charge spelled out to 8 decimals. (I understand that this is not the proposal here, I just wanted to make my stance clear.) The per-residue total charge is something that MDAnalysis computes so I acknowledge that we do have the freedom to decide how we want to present it. I'd be more comfortable with giving the user the option to round — mainly that allows us to write a doc entry to say that "non-integer charges can happen and if you don't like it, set It would be good to hear others. This also impinges on the "consistent dtype across the universe" plan that @hmacdope in particular has been pursuing. |
Got it, thanks. I don't have a strong view, was just passing this concern along from the scientists. Maybe I'll wait for others to chime in before adjusting to offer an option to round (or just documenting that folks may want to round on their end, whatever the team prefers). |
Is this converging on a docs-only suggestion that rounding may be needed to get sensible charges for some user applications? I'm skeptical this will do much to prevent the original footgun I described. What may be more convenient for our API doesn't really do much for improving end user correctness. That said, if that's the consensus I'll try to add a few sentences in the docs, but seems pretty low value/low visibility. |
Also, I think Irfan and I are in favor of rounding to some degree (or offering the API option to), while Oli and Lily are not, so this is currently somewhat tied, no? |
I think the fact that the devs are disagreeing a bit here also somewhat reinforces the potential for a less experienced user to experience a footgun. |
I think it should be a user choice to do any rounding. The whole work on the guessers has been driven by reducing well-meaning but not always correct choices. The idea of dtypes throughout wants to make the precision of operations transparent — admittedly, I’m not sure when this will get done, but we have it on the road map. Building in rounding by default does not fit in well with the direction in which we wanted to go. How would you want to give the user the option? Could this be done with a guesser (eg something that has the assumption built in that residues have integer charges) so that these assumptions are made clear and require a conscientious choice? That would also help with reproducibility, I think. |
In this particular case, where the issue is down to the idea that floats should sum to an int, how about we implement a residue / segment / etc level "formal_charge" attribute that either sums the per atom formal charges (if available) or approximates them from the partial charges if the sum is within a tolerance of an int? (edit: if it doesn't turn return a NaN) |
That may be a viable way forward!Am 5/18/25 um 18:43 schrieb Irfan Alibay ***@***.***>:
IAlibay left a comment (MDAnalysis/mdanalysis#5035)
In this particular case, where the issue is down to the idea that floats should sum to an int, how about we implement a residue / segment / etc level "formal_charge" attribute that either sums the per atom formal charges (if available) or approximates them from the partial charges if the sum is within a tolerance of an int?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
User choice and/or separate attribute that auto-rounds seems ok. |
Fixes Query: amino acid residue charges #5032.
Given that 1e-4 is already optimistic for electron charge unit precision in MD forcefields, this patch proposes to round residue net charges to 4 decimal places to avoid the confusion described in the matching ticket.
Some obvious questions might include: why are we only doing this for AMBER topologies? why only round at the residue level and not i.e., upstream at atom level and also at "segment" levels? This is mostly for scoping, because this initial patch is targeted at the specific issue raised, and because it probably doesn't make sense to proceed much farther until things are discussed.
PR Checklist
package/CHANGELOG
file updated?package/AUTHORS
? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5035.org.readthedocs.build/en/5035/