Skip to content

Segid columns change to 73-76 for PDBParser #5001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Mar 31, 2025

Conversation

yuyuan871111
Copy link
Contributor

@yuyuan871111 yuyuan871111 commented Mar 30, 2025

Fixes #4948 (comment) issue 1.

Changes made in this Pull Request:

  • extract the segid from columns 73-76 in PDB files (instead of columns 67-76)
  • add warnings related to reading the segids

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5001.org.readthedocs.build/en/5001/

Copy link

codecov bot commented Mar 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.41%. Comparing base (66e7e5c) to head (830f7cd).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5001      +/-   ##
===========================================
- Coverage    93.42%   93.41%   -0.02%     
===========================================
  Files          177      189      +12     
  Lines        21859    22931    +1072     
  Branches      3078     3079       +1     
===========================================
+ Hits         20422    21421     +999     
- Misses         986     1059      +73     
  Partials       451      451              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuyuan871111 yuyuan871111 changed the title Segid columns change to 73-76 for PDBParser (GSoC purpose) Segid columns change to 73-76 for PDBParser Mar 30, 2025
@orbeckst orbeckst requested a review from Copilot March 30, 2025 15:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request updates the PDB segid handling to extract the segment ID from columns 73-76, along with updating the related documentation and adding warnings for when the segid does not match related values.

  • Update segid extraction from PDB files in the parser
  • Add warning messages for mismatched or missing segids in the topology parser
  • Update documentation in the PDB coordinate reader to reflect the new segid column locations

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
package/MDAnalysis/topology/PDBParser.py Extract segid from columns 73-76 and add warnings accordingly.
package/MDAnalysis/coordinates/PDB.py Update documentation to reflect the new segid column positions.
Files not reviewed (1)
  • package/AUTHORS: Language not supported

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting the PR.

  • CHANGELOG entry
  • test case (something that fails with the old code and passes with the new one ... eg a file that has something else in 67-72)
  • I am not a fan of too many warnings: When running in workflows, they are virtually invisible, when running interactively they can be very distracting when they show up for most "normal" cases. I suggest to use logging instead. We then also want some testing for the cases that trigger the warnings/logging events. We have code that shows how to test that messages are logged (or warnings emitted).

@orbeckst orbeckst self-assigned this Mar 30, 2025
@orbeckst orbeckst requested review from cbouy and talagayev March 30, 2025 16:19
@orbeckst
Copy link
Member

This PR is relevant for a GSOC application so it would actually be really good if @cbouy (and @talagayev ) could get involved.

I'll take over the managing of the PR.

(@marinegor and @lilyminium quick reviews would be greatly appreciated, too!)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good, just some minor issues to fix.

(Eventually we should use logging more consistently in the PDBParser but that can be for another PR.)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@orbeckst orbeckst merged commit 7fb3534 into MDAnalysis:develop Mar 31, 2025
23 of 24 checks passed
@orbeckst
Copy link
Member

Congratulations @yuyuan871111 — first contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add force read chain to segment when reading a PDB file / set groups by ids
2 participants