-
Notifications
You must be signed in to change notification settings - Fork 713
Implementing gemmi
-based mmcif reader (with easy extension to PDB/PDBx and mmJSON)
#4712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
aa2a88f
218cf43
7f78e02
6682d6e
817f3a0
3cc8c80
f1bf325
d21c220
2a1be15
77645e6
91e6942
9a0c086
9c731df
8b40ec7
bdcbd73
401a4d3
9c336bd
def88e4
cabfd37
4c9d930
184491a
3de8565
ca6ebbb
45077ad
9a1a59a
27c10d6
950cfcf
8d1a8b5
ef29338
caca17e
f0e49cc
b7ada7c
e80632c
10f3124
98353fe
35fa187
e68fcce
263e9f1
ba47d53
9ffb6f2
fcfc6c0
0de720e
236b286
b562115
816b23f
92ae164
88c64a3
59b7e29
f2c23c8
776676e
71e60f4
36b7125
b058941
ef30fa7
95572c1
a8a9436
6706bbe
8cf9da4
f13156b
dda981c
ebdf849
47043f6
9770d7b
fd7f70d
1493056
3d7fbb9
9b9286e
b8f3c04
0f38a2d
b915aab
0d61248
34d76ca
b242aa5
4fc3a78
14fa756
e3a9a1f
d492b4e
1880e4a
927d7a0
ad0f0be
e03c3e5
4d79205
32d7cf9
0df8c3a
05c6ea1
88dab79
a82fe52
e3f1714
db46016
32cd103
805089e
55c3dbb
cd201d0
81f0b5b
22d1cca
d1ba434
a03b56f
bd4c255
3d61dc5
aed9b54
53c51f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -21,8 +21,8 @@ inputs: | |||||
default: 'codecov' | ||||||
cython: | ||||||
default: 'cython' | ||||||
filelock: | ||||||
default: 'filelock' | ||||||
fasteners: | ||||||
default: 'fasteners' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add optional deps down in the optional deps section below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's on me not merging develop-2 |
||||||
griddataformats: | ||||||
default: 'griddataformats' | ||||||
gsd: | ||||||
|
@@ -60,6 +60,8 @@ inputs: | |||||
default: 'dask' | ||||||
distopia: | ||||||
default: 'distopia>=0.4.0' | ||||||
gemmi: | ||||||
default: 'gemmi' | ||||||
h5py: | ||||||
default: 'h5py>=2.10' | ||||||
hole2: | ||||||
|
@@ -130,6 +132,7 @@ runs: | |||||
${{ inputs.dask }} | ||||||
${{ inputs.distopia }} | ||||||
${{ inputs.gsd }} | ||||||
${{ inputs.gemmi }} | ||||||
${{ inputs.h5py }} | ||||||
${{ inputs.hole2 }} | ||||||
${{ inputs.joblib }} | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
# | ||
""" | ||
MMCIF structure files in MDAnalysis --- :mod:`MDAnalysis.coordinates.MMCIF` | ||
========================================================================== | ||
|
||
MDAnalysis reads coordinates from MMCIF (macromolecular Crystallographic Information File) files, also known as PDBx/mmCIF format, | ||
using the ``gemmi`` library as a backend. MMCIF is a more modern and flexible alternative to the PDB format, | ||
capable of storing detailed structural and experimental data about biological macromolecules. | ||
|
||
MMCIF files use a structured, tabular format with key-value pairs to store both coordinate and atom information. | ||
The format supports multiple models/frames, though this implementation currently only reads the first model | ||
and provides warning messages for multi-model files. | ||
|
||
Basic usage | ||
----------- | ||
|
||
Reading an MMCIF file is straightforward: | ||
|
||
.. code-block:: python | ||
|
||
import MDAnalysis as mda | ||
u = mda.Universe("structure.cif") | ||
|
||
|
||
The reader will automatically detect if the structure contains placeholder unit cell information | ||
(usually it's the case for cryoEM structures, and cell parameters are (1, 1, 1, 90, 90, 90)) | ||
and set dimensions to None in that case. | ||
|
||
Capabilities | ||
------------ | ||
|
||
The MMCIF reader implementation uses the gemmi library to parse files and extract coordinates | ||
and unit cell information. Currently only reading capability is supported, with the following | ||
features: | ||
|
||
- Single frame/model reading | ||
- Unit cell dimensions detection | ||
- Support for compressed .cif.gz files | ||
- Automatic handling of placeholder unit cells for cryoEM structures | ||
|
||
Examples | ||
-------- | ||
|
||
Basic structure loading:: | ||
|
||
.. code-block:: python | ||
yuxuanzhuang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Load structure from MMCIF | ||
u = mda.Universe("structure.cif") | ||
|
||
# or from cif.gz file | ||
u = mda.Universe("structure.cif.gz") | ||
|
||
Classes | ||
------- | ||
|
||
.. autoclass:: MMCIFReader | ||
:members: | ||
:inherited-members: | ||
|
||
See Also | ||
yuxuanzhuang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
-------- | ||
- wwPDB MMCIF Resources: <http://mmcif.wwpdb.org>_ | ||
- Gemmi library documentation: <https://gemmi.readthedocs.io>_ | ||
|
||
.. versionadded:: 2.9.0 | ||
""" | ||
|
||
import logging | ||
import warnings | ||
|
||
import numpy as np | ||
|
||
from . import base | ||
|
||
try: | ||
import gemmi | ||
|
||
HAS_GEMMI = True | ||
except ImportError: | ||
HAS_GEMMI = False | ||
|
||
logger = logging.getLogger("MDAnalysis.coordinates.MMCIF") | ||
|
||
|
||
def get_coordinates(model: "gemmi.Model") -> np.ndarray: | ||
"""Get coordinates of all atoms in the `gemmi.Model` object. | ||
|
||
Parameters | ||
---------- | ||
model | ||
input `gemmi.Model`, e.g. `gemmi.read_structure('file.cif')[0]` | ||
|
||
Returns | ||
------- | ||
np.ndarray, shape [n, 3], where `n` is the number of atoms in the structure. | ||
""" | ||
return np.array( | ||
[[*at.pos.tolist()] for chain in model for res in chain for at in res] | ||
) | ||
|
||
|
||
class MMCIFReader(base.SingleFrameReaderBase): | ||
"""Reads from an MMCIF file using ``gemmi`` library as a backend. | ||
|
||
Notes | ||
----- | ||
|
||
If the structure represents an ensemble, only the first structure in the ensemble | ||
is read here (and a warning is thrown). Also, if the structure has a placeholder "CRYST1" | ||
record (1, 1, 1, 90, 90, 90), it's set to ``None`` instead. | ||
|
||
.. versionadded:: 2.9.0 | ||
""" | ||
|
||
format = ["cif", "cif.gz", "mmcif"] | ||
units = {"time": None, "length": "Angstrom"} | ||
|
||
def _read_first_frame(self): | ||
structure = gemmi.read_structure(self.filename) | ||
cell_dims = np.array( | ||
[ | ||
getattr(structure.cell, name) | ||
for name in ("a", "b", "c", "alpha", "beta", "gamma") | ||
] | ||
) | ||
if len(structure) > 1: | ||
warnings.warn( # FIXME: add tests for this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #FIXME :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, you do have |
||
f"File {self.filename} has {len(structure)=} models, but only the first one will be read" | ||
) | ||
|
||
model = structure[0] | ||
coords = get_coordinates(model) | ||
self.n_atoms = len(coords) | ||
self.ts = self._Timestep.from_coordinates(coords, **self._ts_kwargs) | ||
if np.allclose(cell_dims, np.array([1.0, 1.0, 1.0, 90.0, 90.0, 90.0])): | ||
warnings.warn( | ||
"1 A^3 CRYST1 record," | ||
" this is usually a placeholder." | ||
" Unit cell dimensions will be set to None." | ||
) | ||
self.ts.dimensions = None | ||
else: | ||
self.ts.dimensions = cell_dims | ||
self.ts.frame = 0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's on me not merging develop-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is not fixed yet?