Skip to content

Commit 76b96d4

Browse files
Dihedral Plots: RDKit Mol Object (#243)
* add RDKit Mol object to dihedral analysis plots * add tests, and close #238 * add svgutils and cairosvg methods to plot svg mol object * reimplement DF input option and fix most tests to reflect name changes and altered function definitions * add svgutils and cairosvg to dependencies, install, requirements lists, remove broken test, add reminder to update func list in docs * split plot_violins into new build_svg function * change, better function names for dihedrals workflow module * docs and cleanup, plot width docs, dict comprehension for ab_pairs * intersphinx mapping * tests: new fixtures and tests for bond_indices and ab_pairs * tests: new fixtures and tests for bond_indices and ab_pairs, skip 3.7 * test_build_universe method * confirm build universe test * rewrite docs to cover new functions and kwarg changes * fix tests to accommodate kwarg updates in dihedrals module * explanation of why figdir is a kwarg at top level of dihedrals module but a positional argument elsewhere - workflows base **kwargs, issue #244, see in-line comment in dihedrals.py * temporary fix for figdir issue which should currently be a positional argument, but would require redundant rewrite of workflows base module, pending issue #244 * upcoming CHANGES * remove dafault scope specification for defined functions * reimplement try/except method for rdkit conversion topology element guessing * generate combined plots pdf for automated dihedral analysis * updates for implementation of pypdf in workflows dihedrals module: CHANGES, testing environment, requirements, sphinx source configuration * documentation for dihedral_violins function in workflows dihedrals module * documentation for get_paired_indices function in workflows dihedrals module * documentation and kwarg definition for get_paired_indices function and ab_pairs dictionary object in workflows dihedrals module * kwarg definition for plot_title for dihedral_violins function in workflows dihedrals module * move in-line comments explaining figdir kward for workflows dihedrals module * reorganize kwargs for plot_dihedral_violins in top-level automated_dihedral_analysis function call in workflows dihedrals module * add assert method to make figdir kwarg required in workflows dihedrals module * change MDA guess_atom_element to MDA guess_types for RDKit conversion in workflows dihedrals module * fix registry import error for workflows base, close #245 * remove guess_atom_element import * reimplement assert figdir reuired for workflows dihedrals module * add pypdf to setup.py install_requires for dihedrals workflow * change imports to follow PEP 8 * modify dihedrals workflow docs to explain figdir kwarg requirement * use first solvent specified to build MDAnalysis Universe * modify single solvent plotting method, add solvent count assertion * comment expected fixture scope changes, reference issue #235 * remove solute.unwrap, not needed * reference issue #260 to fix jupyter notebook figure output * finalize single solvent figure modifications and add test --------- Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
1 parent 3b93aad commit 76b96d4

File tree

9 files changed

+528
-248
lines changed

9 files changed

+528
-248
lines changed

CHANGES

+6
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ Changes
2727

2828
Enhancements
2929

30+
* convert figure components to SVG, save as individual PDFs,
31+
and generate PDF of all figures combined in one file,
32+
for workflows dihedrals module (#243)
33+
* add RDKit mol object to dihedrals plot with dihedral atom
34+
indices labeled and dihedral atom group bonds highlighted
35+
for workflows dihedrals module (#243)
3036
* new workflows registry that contains each EnsembleAnalysis for which
3137
a workflows module exists, for use with workflows base module (#229)
3238
* new workflows base module that provides iterative workflow use for

devtools/conda-envs/test_env.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ dependencies:
1818
- pymbar >=4
1919
- rdkit
2020
- seaborn
21+
- svgutils
22+
- cairosvg
23+
- pypdf
2124

2225
# Testing
2326
- pytest

doc/requirements.txt

+3
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,6 @@ mdanalysis
1010
rdkit
1111
seaborn
1212
matplotlib
13+
svgutils
14+
cairosvg
15+
pypdf

doc/sphinx/source/conf.py

+3
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,9 @@
248248
'https://www.rdkit.org/docs/': None,
249249
'https://pandas.pydata.org/docs/': None,
250250
'https://seaborn.pydata.org': None,
251+
'https://cairosvg.org/documentation/': None,
252+
'https://svgutils.readthedocs.io/en/latest/': None,
253+
'https://pypdf.readthedocs.io/en/stable/': None,
251254
}
252255

253256

mdpow/tests/test_automated_dihedral_analysis.py

+115-48
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,87 @@
11
import os
22
import sys
33
import yaml
4-
import pybol
5-
import pytest
4+
import py.path
65
import pathlib
76
import logging
87

98
import scipy
10-
import numpy as np
11-
import pandas as pd
12-
9+
from scipy.stats import circvar, circmean
1310
import seaborn
14-
11+
import numpy as np
1512
from numpy.testing import assert_almost_equal
16-
from scipy.stats import circvar, circmean
13+
import pandas as pd
14+
import pybol
15+
import pytest
1716

1817
from . import RESOURCES
19-
20-
import py.path
21-
22-
from mdpow.workflows import dihedrals
23-
2418
from pkg_resources import resource_filename
19+
from mdpow.workflows import dihedrals
2520

2621
RESOURCES = pathlib.PurePath(resource_filename(__name__, 'testing_resources'))
2722
MANIFEST = RESOURCES / "manifest.yml"
2823

29-
@pytest.fixture(scope="function")
24+
resname = "UNK"
25+
molname = "SM25"
26+
27+
@pytest.fixture
3028
def molname_workflows_directory(tmp_path, molname='SM25'):
3129
m = pybol.Manifest(str(MANIFEST))
3230
m.assemble('workflows', tmp_path)
3331
return tmp_path / molname
3432

3533
class TestAutomatedDihedralAnalysis(object):
3634

37-
@pytest.fixture(scope="function")
35+
@pytest.fixture
3836
def SM25_tmp_dir(self, molname_workflows_directory):
3937
dirname = molname_workflows_directory
4038
return dirname
4139

42-
@pytest.fixture(scope="function")
43-
def atom_indices(self, SM25_tmp_dir):
44-
atom_group_indices = dihedrals.dihedral_indices(dirname=SM25_tmp_dir, resname=self.resname)
40+
@pytest.fixture
41+
def mol_sol_data(self, SM25_tmp_dir):
42+
u = dihedrals.build_universe(dirname=SM25_tmp_dir)
43+
mol, solute = dihedrals.rdkit_conversion(u=u, resname=resname)
44+
return mol, solute
45+
46+
@pytest.fixture
47+
def atom_indices(self, mol_sol_data):
48+
mol, _ = mol_sol_data
49+
atom_group_indices = dihedrals.get_atom_indices(mol=mol)
4550

4651
# testing optional user input of alternate SMARTS string
4752
# for automated dihedral atom group selection
48-
atom_group_indices_alt = dihedrals.dihedral_indices(dirname=SM25_tmp_dir,
49-
resname=self.resname,
50-
SMARTS='[!$(*#*)&!D1]-!@[!$(*#*)&!D1]')
53+
atom_group_indices_alt = dihedrals.get_atom_indices(mol=mol, SMARTS='[!$(*#*)&!D1]-!@[!$(*#*)&!D1]')
5154
return atom_group_indices, atom_group_indices_alt
5255
# fixture output, tuple:
5356
# atom_indices[0]=atom_group_indices
5457
# atom_indices[1]=atom_group_indices_alt
5558

56-
@pytest.fixture(scope="function")
59+
@pytest.fixture
60+
def bond_indices(self, mol_sol_data, atom_indices):
61+
mol, _ = mol_sol_data
62+
atom_index, _ = atom_indices
63+
bond_indices = dihedrals.get_bond_indices(mol=mol, atom_indices=atom_index)
64+
return bond_indices
65+
66+
@pytest.fixture
67+
def dihedral_groups(self, mol_sol_data, atom_indices):
68+
_, solute = mol_sol_data
69+
atom_index, _ = atom_indices
70+
dihedral_groups = dihedrals.get_dihedral_groups(solute=solute, atom_indices=atom_index)
71+
return dihedral_groups
72+
73+
@pytest.fixture
5774
def dihedral_data(self, SM25_tmp_dir, atom_indices):
5875
atom_group_indices, _ = atom_indices
59-
df = dihedrals.dihedral_groups_ensemble(atom_group_indices=atom_group_indices,
76+
df = dihedrals.dihedral_groups_ensemble(atom_indices=atom_group_indices,
6077
dirname=SM25_tmp_dir,
6178
solvents=('water',))
62-
df_aug = dihedrals.periodic_angle(df)
79+
df_aug = dihedrals.periodic_angle_padding(df)
6380
return df, df_aug
6481
# fixture output, tuple:
6582
# dihedral_data[0]=df
6683
# dihedral_data[1]=df_aug
6784

68-
resname = 'UNK'
69-
7085
# tuple-tuples of dihedral atom group indices
7186
# collected using mdpow.workflows.dihedrals.SMARTS_DEFAULT
7287
check_atom_group_indices = ((0, 1, 2, 3),(0, 1, 12, 13),(1, 2, 3, 11),(1, 2, 3, 10),
@@ -79,6 +94,23 @@ def dihedral_data(self, SM25_tmp_dir, atom_indices):
7994
# see: fixture - atom_indices().atom_group_indices_alt
8095
check_atom_group_indices_alt = ((1, 2), (1, 12), (2, 3), (3, 4), (12, 13), (13, 14))
8196

97+
check_atom_name_index_pairs = {'O1-C2-N3-S4': (0, 1, 2, 3),
98+
'O1-C2-C13-C14': (0, 1, 12, 13),
99+
'C2-N3-S4-O12': (1, 2, 3, 11),
100+
'C2-N3-S4-O11': (1, 2, 3, 10),
101+
'C2-N3-S4-C5': (1, 2, 3, 4),
102+
'C2-C13-C14-C15': (1, 12, 13, 14),
103+
'N3-S4-C5-C6': (2, 3, 4, 5),
104+
'N3-S4-C5-C10': (2, 3, 4, 9),
105+
'N3-C2-C13-C14': (2, 1, 12, 13),
106+
'S4-N3-C2-C13': (3, 2, 1, 12),
107+
'C6-C5-S4-O12': (5, 4, 3, 11),
108+
'C6-C5-S4-O11': (5, 4, 3, 10),
109+
'C10-C5-S4-O12': (9, 4, 3, 11),
110+
'C10-C5-S4-O11': (9, 4, 3, 10),
111+
'C13-C14-C15-C16': (12, 13, 14, 15),
112+
'C13-C14-C15-C20': (12, 13, 14, 19)}
113+
82114
check_groups = [np.array(['O1', 'C2', 'N3', 'S4'], dtype=object),
83115
np.array(['O1', 'C2', 'C13', 'C14'], dtype=object),
84116
np.array(['C2', 'N3', 'S4', 'O12'], dtype=object),
@@ -132,29 +164,49 @@ def test_build_universe(self, SM25_tmp_dir):
132164
# between RDKIT versions; issue raised (#239) to identify and
133165
# resolve exact package/version responsible
134166
def test_dihedral_indices(self, atom_indices):
167+
135168
atom_group_indices = atom_indices[0]
136169
assert set(atom_group_indices) == set(self.check_atom_group_indices)
137170

138171
# Possible ordering issue (#239)
139172
def test_SMARTS(self, atom_indices):
140-
atom_group_indices_alt = atom_indices[1]
173+
_, atom_group_indices_alt = atom_indices
141174
assert atom_group_indices_alt == self.check_atom_group_indices_alt
142175

143176
# Use set comparison because ordering of indices appears to change
144177
# between RDKIT versions; issue raised (#239) to identify and
145178
# resolve exact package/version responsible
146-
def test_dihedral_groups(self, SM25_tmp_dir):
147-
groups = dihedrals.dihedral_groups(dirname=SM25_tmp_dir, resname=self.resname)
179+
def test_dihedral_groups(self, dihedral_groups):
180+
groups = dihedral_groups
148181

149182
values = [g.all() for g in groups]
150183
reference = [g.all() for g in self.check_groups]
151184

152185
assert set(values) == set(reference)
153186

187+
# atom indices are determined by RDKit Mol object
188+
# bond indices are determined by atom indices and are subsequently self-consistent
189+
# dihedral group names are determined by the MDAnalysis solute object from RDKit-derived atom indices
190+
# this test checks if indexing schemes for RDKit and MDAnalysis are consistent
191+
def test_RDKit_MDAnalysis_atom_index_consistency(self, atom_indices, bond_indices, dihedral_groups):
192+
atom_index, _ = atom_indices
193+
bond_index = bond_indices
194+
groups = dihedral_groups
195+
196+
name_index_pairs = dihedrals.get_paired_indices(atom_indices=atom_index, bond_indices=bond_index,
197+
dihedral_groups=groups)
198+
199+
atom_name_index_pairs = {}
200+
201+
for key in name_index_pairs.keys():
202+
atom_name_index_pairs[key] = name_index_pairs[key][0]
203+
204+
assert atom_name_index_pairs == self.check_atom_name_index_pairs
205+
154206
# Possible ordering issue (#239)
155207
def test_dihedral_groups_ensemble(self, dihedral_data):
156208

157-
df = dihedral_data[0]
209+
df, _ = dihedral_data
158210

159211
dh1_result = df.loc[df['selection'] == 'O1-C2-N3-S4']['dihedral']
160212
dh1_mean = circmean(dh1_result, high=180, low=-180)
@@ -172,19 +224,21 @@ def test_dihedral_groups_ensemble(self, dihedral_data):
172224
dh2_var == pytest.approx(self.DG_C13141520_var)
173225

174226
def test_save_df(self, dihedral_data, SM25_tmp_dir):
175-
dihedrals.save_df(df=dihedral_data[0], df_save_dir=SM25_tmp_dir, molname='SM25')
227+
df, _ = dihedral_data
228+
dihedrals.save_df(df=df, df_save_dir=SM25_tmp_dir, resname='UNK', molname='SM25')
176229
assert (SM25_tmp_dir / 'SM25' / 'SM25_full_df.csv.bz2').exists(), 'Compressed csv file not saved'
177230

178231
def test_save_df_info(self, dihedral_data, SM25_tmp_dir, caplog):
232+
df, _ = dihedral_data
179233
caplog.clear()
180234
caplog.set_level(logging.INFO, logger='mdpow.workflows.dihedrals')
181-
dihedrals.save_df(df=dihedral_data[0], df_save_dir=SM25_tmp_dir, molname='SM25')
235+
dihedrals.save_df(df=df, df_save_dir=SM25_tmp_dir, resname='UNK', molname='SM25')
182236
assert f'Results DataFrame saved as {SM25_tmp_dir}/SM25/SM25_full_df.csv.bz2' in caplog.text, 'Save location not logged or returned'
183237

184238
# Possible ordering issue (#239)
185239
def test_periodic_angle(self, dihedral_data):
186240

187-
df_aug = dihedral_data[1]
241+
_, df_aug = dihedral_data
188242

189243
aug_dh2_result = df_aug.loc[df_aug['selection'] == 'C13-C14-C15-C20']['dihedral']
190244

@@ -195,37 +249,50 @@ def test_periodic_angle(self, dihedral_data):
195249
aug_dh2_var == pytest.approx(self.ADG_C13141520_var)
196250

197251
# Possible ordering issue (#239)
252+
# Tests using similar instances of the automated analyses
253+
# will use module or class-scoped fixtures, pending #235
198254
def test_save_fig(self, SM25_tmp_dir):
199255
dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
200-
resname=self.resname, molname='SM25',
256+
resname=resname, molname='SM25',
201257
solvents=('water',))
202258
assert (SM25_tmp_dir / 'SM25' / 'SM25_C10-C5-S4-O11_violins.pdf').exists(), 'PDF file not generated'
203259

204260
# Possible ordering issue (#239)
261+
# Tests using similar instances of the automated analyses
262+
# will use module or class-scoped fixtures, pending #235
205263
def test_save_fig_info(self, SM25_tmp_dir, caplog):
206264
caplog.clear()
207265
caplog.set_level(logging.INFO, logger='mdpow.workflows.dihedrals')
208266
dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
209-
resname=self.resname, molname='SM25',
267+
resname=resname, molname='SM25',
210268
solvents=('water',))
211269
assert f'Figure saved as {SM25_tmp_dir}/SM25/SM25_C10-C5-S4-O11_violins.pdf' in caplog.text, 'PDF file not saved'
212270

213-
def test_DataFrame_input(self, SM25_tmp_dir):
214-
test_df = pd.DataFrame([['C1-C2-C3-C4', 'water', 'Coulomb', 0, 0, 60.0],
215-
['C1-C2-C3-C5', 'water', 'Coulomb', 0, 0, 60.0]],
216-
[1,2],['selection', 'solvent', 'interaction', 'lambda', 'time', 'dihedral'])
217-
plot = dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
218-
resname=self.resname,
219-
solvents=('water',), dataframe=test_df)
220-
assert isinstance(plot, seaborn.axisgrid.FacetGrid)
271+
# Tests using similar instances of the automated analyses
272+
# will use module or class-scoped fixtures, pending #235
273+
def test_DataFrame_input(self, SM25_tmp_dir, dihedral_data):
274+
df, _ = dihedral_data
275+
dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
276+
resname=resname, molname=molname,
277+
solvents=('water',), dataframe=df)
278+
assert (SM25_tmp_dir / 'SM25' / 'SM25_C10-C5-S4-O11_violins.pdf').exists(), 'PDF file not generated'
221279

222-
def test_DataFrame_input_info(self, SM25_tmp_dir, caplog):
280+
# Tests using similar instances of the automated analyses
281+
# will use module or class-scoped fixtures, pending #235
282+
def test_DataFrame_input_info(self, SM25_tmp_dir, dihedral_data, caplog):
223283
caplog.clear()
224284
caplog.set_level(logging.INFO, logger='mdpow.workflows.dihedrals')
225-
test_df = pd.DataFrame([['C1-C2-C3-C4', 'water', 'Coulomb', 0, 0, 60.0],
226-
['C1-C2-C3-C5', 'water', 'Coulomb', 0, 0, 60.0]],
227-
[1,2],['selection', 'solvent', 'interaction', 'lambda', 'time', 'dihedral'])
285+
df, _ = dihedral_data
228286
dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
229-
resname=self.resname,
230-
solvents=('water',), dataframe=test_df)
287+
resname=resname, molname=molname,
288+
solvents=('water',), dataframe=df)
231289
assert 'Proceeding with results DataFrame provided.' in caplog.text, 'No dataframe provided or dataframe not recognized'
290+
291+
# testing resources only contain analyses with single solvent input
292+
def test_single_solvent(self, dihedral_data):
293+
df, _ = dihedral_data
294+
# all analysis data in one violin plot
295+
g = dihedrals.dihedral_violins(df=df, width=0.9, solvents=('water',), plot_title='test')
296+
# number of solvents in DataFrame used to generate plot
297+
number_of_solvents = g.data['solvent'].nunique()
298+
assert number_of_solvents == 1

mdpow/tests/test_workflows_base.py

+15-12
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,16 @@
22
import os
33
import sys
44
import yaml
5-
import pybol
6-
import pytest
75
import pathlib
86
import logging
97

8+
import pybol
9+
import pytest
1010
import pandas as pd
1111

12-
from mdpow.workflows import base
13-
14-
from pkg_resources import resource_filename
15-
1612
from . import RESOURCES, MANIFEST, STATES
17-
13+
from pkg_resources import resource_filename
14+
from mdpow.workflows import base
1815

1916
@pytest.fixture(scope='function')
2017
def molname_workflows_directory(tmp_path):
@@ -62,17 +59,23 @@ def test_project_paths_csv_input(self, csv_input_data):
6259

6360
pd.testing.assert_frame_equal(project_paths, csv_df)
6461

65-
def test_automated_project_analysis(self, project_paths_data, caplog):
62+
def test_dihedral_analysis_figdir_requirement(self, project_paths_data, caplog):
63+
caplog.clear()
64+
caplog.set_level(logging.ERROR, logger='mdpow.workflows.base')
65+
6666
project_paths = project_paths_data
6767
# change resname to match topology (every SAMPL7 resname is 'UNK')
6868
# only necessary for this dataset, not necessary for normal use
6969
project_paths['resname'] = 'UNK'
7070

71-
base.automated_project_analysis(project_paths, solvents=('water',),
72-
ensemble_analysis='DihedralAnalysis')
71+
with pytest.raises(AssertionError,
72+
match="figdir MUST be set, even though it is a kwarg. Will be changed with #244"):
73+
74+
base.automated_project_analysis(project_paths, solvents=('water',),
75+
ensemble_analysis='DihedralAnalysis')
7376

74-
assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '
75-
'did not iteratively run to completion for the provided project')
77+
assert 'all analyses completed' in caplog.text, ('automated_dihedral_analysis '
78+
'did not iteratively run to completion for the provided project')
7679

7780
def test_automated_project_analysis_KeyError(self, project_paths_data, caplog):
7881
caplog.clear()

mdpow/workflows/base.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@
2222

2323
import os
2424
import re
25-
import pandas as pd
26-
2725
import logging
2826

27+
import pandas as pd
28+
2929
logger = logging.getLogger('mdpow.workflows.base')
3030

3131
def project_paths(parent_directory=None, csv=None, csv_save_dir=None):

0 commit comments

Comments
 (0)