Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBParser chainid segid / set groups #4965

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

yuyuan871111
Copy link

@yuyuan871111 yuyuan871111 commented Mar 13, 2025

Fixes #4948 #2874

Changes made in this Pull Request:

  • extract the segid from columns 73-76 in PDB files (instead of columns 67-76)
  • deal with conflicts between chainids and segids
    • generate warnings when chainids and segids are different
    • generate warnings if chainids is prior to segids when loading into the AtomGroup/ResidueGroup/SegmentGroup in the universe
    • add an argument force_chainids_to_segids in PDBReader to forcibly reading the chain ID into the segment ID (PDB specific method)
    • add a function in the universe set_groups to update the topology and groups in the universe when atomwise resids/segids is given (more generalized method across different file type)

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--4965.org.readthedocs.build/en/4965/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

codecov bot commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.42%. Comparing base (35d9d2e) to head (a594aaf).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4965      +/-   ##
===========================================
- Coverage    93.42%   93.42%   -0.01%     
===========================================
  Files          177      189      +12     
  Lines        21859    22969    +1110     
  Branches      3078     3084       +6     
===========================================
+ Hits         20422    21459    +1037     
- Misses         986     1059      +73     
  Partials       451      451              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuyuan871111
Copy link
Author

might need to update the user guide as well if this PR is accepted
https://userguide.mdanalysis.org/stable/formats/reference/pdb.html#reading-in

@yuyuan871111
Copy link
Author

Hi @lilyminium, I update the codes based on your comment in the issue #4948 (comment).

let me know if it works for you.

u = mda.Universe("test.pdb")
# try
u.guess_or_set_segments() # default guess chainIDs
# or
u.guess_or_set_segments(custom_segids=u.atoms.chainIDs) # or use any atomwise list you would like to set

# might possibly add it to the guess_TopologyAttrs, as you suggested below 
# u.guess_TopologyAttrs(to_guess=["segids"], force_guess=["segids"])
# (or just let it stay outside, I am not sure which one is better)

@lilyminium
Copy link
Member

Thanks for trying that out, @yuyuan871111!

As @tylerjereddy mentioned I think there are some design decisions still to be resolved. I'll reply on the parent issue just to centralize discussion

@yuyuan871111 yuyuan871111 changed the title Pdbparser chainid segid PDBParser chainid segid / set groups Mar 21, 2025
@yuyuan871111
Copy link
Author

yuyuan871111 commented Mar 21, 2025

Hi @lilyminium @tylerjereddy

I believe the design decisions have been finalized #4948 (comment). Can we proceed with the pull request?

@yuyuan871111
Copy link
Author

Hi @orbeckst @yuxuanzhuang,

As we have discussed this improvement in #4948, would you mind giving some feedback from the perspective of the codes?

I know it is a busy time for MDAnalysis core members (on GSoC application, MDAnalysis UserGuide/Document updates), but I hope this helpful improvement for MDAnlaysis users will not become a stale PR. Please let me know any suggestions and I am happy to clarify any unclear part (or improve codes).

@orbeckst
Copy link
Member

No concerns with tightening the SEGID columns that are read in PDB to 73-76, see #4948 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add force read chain to segment when reading a PDB file / set groups by ids
4 participants