Skip to content

Allow select_atoms to select chain #2875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xiki-tempula opened this issue Jul 28, 2020 · 1 comment · Fixed by #2927
Closed

Allow select_atoms to select chain #2875

xiki-tempula opened this issue Jul 28, 2020 · 1 comment · Fixed by #2927

Comments

@xiki-tempula
Copy link
Contributor

xiki-tempula commented Jul 28, 2020

Is your feature request related to a problem?

The PDB standard defined location 22 as chain ID.
The charmm standard defined the segment id being a 4 letter ID starting at 73.

Currently, mda assumes that the segment id is chain id when segment id is in the absence and will ignore the chain id when the segment id is given.

Ideally one could select chain based on chainid.

u.select_atoms('chainid A') or u.select_atoms('chain A') if we do it in the pymol way

Related to #2874

@orbeckst
Copy link
Member

orbeckst commented Jul 30, 2020

Chain vs Segment

chain

A chain is a polymer term, specifically, from PDB files (ATOM chainId and see TER and SEQRES for clarification) and originally means one polymer, as expressed for SEQURES (my emphasis)

SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.

Each SEQRES entry has a corresponding chainId in ATOM records and should be terminated with a TER (although in the wild this is often omitted).

Segment

A segment originates (as far as I know) from PSF files and is generally used to mark up a collection of molecules. This is often used to label single proteins or all lipids or all waters or the whole solvent. The charmmtutorial.org: CHARMM:The Basics: Molecule Metadata treats "chain" and "segment" as equivalent

Residues are further grouped into chains, or segments, which represent major functional units of the protein.

but then shows an example where all water molecules are in a segment with SEGID W.

In practice, segments are used as a convenient container for collections of "residues", where residues can either be building blocks of a polymer or individual molecules such as lipids or waters or bare ions.

selection keyword

A quick survey indicates that chain is probably a good keyword to use.

VMD

VMD's selections have the keywords

  • chain (str): the one-character chain identifier
  • fragment (num): a set of connected residues
  • segname (str): segment name

CHARMM

MDAnalysis selections were modelled after CHARMM so unsurprisingly (see charmmtutorial.org: Atom Selection and c42b1 select

  • segid (num): segment with numerical segment ID

PyMOL

See pymolwiki.org: Selection_Algebra

  • chain (char): Chain identifier
  • segi (char): Segment identifier (label_asym_id from mmCIF)
  • model (str): Atoms from object "1ubq" (e.g., "model 1ubq")

Related operators

  • bysegi (expr): Expands expr to complete segments
  • bychain (expr): Expands expr to complete chains
  • bymolecule (expr): Expands expr to complete molecules (connected with bonds)
  • byfragment (expr): ?

mdtraj

mdtraj does not seem to store chains/segids, at least based on mdtraj: Atom Selection Reference it only lets users select the internal chainid :

  • chainid (num): Chain index (0-based)

Feel free to correct me on any of the above.

lilyminium added a commit that referenced this issue Dec 10, 2020
Fixes #2925 
Fixes #2875
Fixes #3054 

Changes made in this Pull Request:
 - added a class factory to subclass `core.selection.Selection` for each TopologyAttr
 - added tokens to `core.selection.SameSelection`
 - added `FloatRangeSelection` and `BoolSelection`
 - added negatives, scientific notation and "to" delimiter for ranges
cbouy pushed a commit to cbouy/mdanalysis that referenced this issue Jan 12, 2021
* Add arbitrary TopologyAttr selection (MDAnalysis#2927)

Fixes MDAnalysis#2925 
Fixes MDAnalysis#2875
Fixes MDAnalysis#3054 

Changes made in this Pull Request:
 - added a class factory to subclass `core.selection.Selection` for each TopologyAttr
 - added tokens to `core.selection.SameSelection`
 - added `FloatRangeSelection` and `BoolSelection`
 - added negatives, scientific notation and "to" delimiter for ranges

* Add ReadTheDocs configuration for PR builds (MDAnalysis#3060)

 - Adds RTD configuration
 - Add `environment.yml` for package installation

* Remove appveyor

* Install MDAnalysis on ReadTheDocs via pip (MDAnalysis#3071)

Install via `pip install package/` to build current docs on ReadTheDocs

* try stringio

* rm metals file

* pin pytest

* pin pytest on gh actions

* Fixes RMSF docstring (Issue MDAnalysis#2806) (MDAnalysis#3033)

Fixes the RMSF docstring's align command and adds transformation to make the results accurate

* MAINT: simplify guessers regex (MDAnalysis#3085)

* the `SYMBOLS` regex in `guessers.py` does not require
any escape sequences because the metacharacters are inactive
in the character class (this includes the range metacharacter
when placed at the start or end of the character class)

* MAINT: char class regex improve

* avoid the overhead of a regex character class
when that character class has only a single character
(i.e., serves no purpose)

* there is only one instance of this in MDA codebase
discovered by my [scraping
code](https://github.com/tylerjereddy/regex-improve)

* for a longer explanation see my similar changes in
NumPy codebase:
numpy/numpy#18083

* Fix syntax warning over comparison of literals using is.

* Quick fix for atommethods to return empty residue group (MDAnalysis#3089)

Returns empty residue group for _get_prev_residues_by_resid and _get_next_residues_by_resid

* Add to authors list.

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
Co-authored-by: IAlibay <irfan.alibay@gmail.com>
Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
Co-authored-by: Lily Wang <lily@minium.com.au>
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>
Co-authored-by: Aditya Kamath <48089312+aditya-kamath@users.noreply.github.com>
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
Fixes MDAnalysis#2925 
Fixes MDAnalysis#2875
Fixes MDAnalysis#3054 

Changes made in this Pull Request:
 - added a class factory to subclass `core.selection.Selection` for each TopologyAttr
 - added tokens to `core.selection.SameSelection`
 - added `FloatRangeSelection` and `BoolSelection`
 - added negatives, scientific notation and "to" delimiter for ranges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants