Scientific software developer in the Washington, D.C. area.
[]({% post_url 2025-01-11-Drawing-Molecules-With-Indigo %})
Here's how to draw molecules with EPAM Indigo Toolkit, a free and open-source cheminformatics package.
[How to Write Cheminformatics Blog Posts]({% post_url 2024-11-21-Cheminformatics-Blogging-How-To %})
As the YouTubers would say, “A lot of you have been asking me about how to write cheminformatics blog posts.” Well, not a lot, but at least a couple! Here's my process.
[]({% post_url 2024-10-15-Color-from-Conjugation %})
It's usually because of a long chain of conjugated bonds. I search 20K data points to find a series of molecules where extending the conjugated chain increases the absorption wavelength.
[Tautomer Generation Algorithms and InChI Representations]({% post_url 2024-05-01-Tautomer-Sources-Comparison %})
[]({% post_url 2024-05-01-Tautomer-Sources-Comparison %})
Which cheminformatics algorithms produce the most tautomers? And how successful is InChI at representing with a single representation all tautomers of a given structure?
Molecular Isotopic Distributions: [Permutations]({% post_url 2023-12-26-Molecular-isotopes-1-permutations %}) and [Combinations]({% post_url 2024-01-20-Molecular-isotopes-2-combinations %})
[]({% post_url 2023-12-26-Molecular-isotopes-1-permutations %})
These posts use two different methods to calculate molecular isotopic mass distributions.
[RDKit Contribution MolsMatrixToGridImage()]({% post_url 2023-12-02-MolsMatrixToGridImage-simplifies-code %})
[]({% post_url 2023-12-02-MolsMatrixToGridImage-simplifies-code %})
I contributed MolsMatrixToGridImage to the RDKit 2023.09.1 release to draw row-and-column grids of molecules.
Uses Python, RDKit, seaborn, and matplotlib
[]({% post_url 2023-10-28-Display-Molecular-Formulas %})
How to display molecular formulas such as C3H4O2 in molecular grids, tables, and graphs. Also works for other HTML-, Markdown-, or LaTeX-formatted text.
Uses Python and RDKit
[]({% post_url 2023-10-20-Molecular-Formula-Generation %})
In cheminformatics, the typical way of representing a molecule is with a SMILES string such as CCO
for ethanol. However, there are still cases where the molecular formula such as C2H6O is useful.
[Refitting Data From Wiener’s Classic Cheminformatics Paper]({% post_url 2023-04-25-Refitting-Data-from-Wiener %})
Uses Python, SciPy, Polars, NumPy, seaborn, matplotlib, and mol_frame
[]({% post_url 2023-04-25-Refitting-Data-from-Wiener %})
How well did cheminformatics pioneers Egloff and Wiener fit their models to boiling points of alkanes in the 1940s? This blog post revisits their fits using digital tools.
[Revisiting a Classic Cheminformatics Paper: The Wiener Index]({% post_url 2023-03-10-Revisiting-a-Classic-Cheminformatics-Paper-The-Wiener-Index %})
Uses Python, RDKit, Polars, matplotlib, seaborn, py2opsin, and mol_frame
[]({% post_url 2023-03-10-Revisiting-a-Classic-Cheminformatics-Paper-The-Wiener-Index %})
This post revisits Harry Wiener's article "Structural Determination of Paraffin Boiling Points", extracts data for molecules from it, recalculates cheminformatics parameters and boiling points, and plots the data.
[RDKit Utility to Check Whether Starting Materials for Synthesizing Your Target Molecules Are Commercially Available]({% post_url 2023-02-07-Are-the-Starting-Materials-for-Synthesizing-Your-Target-Molecules-Commercially-Available %})
Uses Python, RDKit, PubChem's API, asyncio, and Semaphore
[]({% post_url 2023-02-07-Are-the-Starting-Materials-for-Synthesizing-Your-Target-Molecules-Commercially-Available %})
Given target molecules and reactions to synthesize them, determine whether the starting materials are commercially available using PubChem's API, and thus whether the target is synthetically accessible.
[RDKit Utility to Create a Mass Spectrometry Fragmentation Tree]({% post_url 2023-01-02-Mass-Spectrometry-Fragmentation-Tree %})
Uses Python and RDKit
[]({% post_url 2023-01-02-Mass-Spectrometry-Fragmentation-Tree %})
Given a mass spec fragmentation hierarchy, with species as SMILES strings, display the fragmentation tree in a grid, labeling each species with its name and either mass or mass to charge ratio m/z
.
[RDKit Utility to Find the Maximum Common Substructure, and Groups Off It, Between a Set of Molecules]({% post_url 2022-12-25-RDKit-Find-Groups-Off-Common-Core %})
Uses Python and RDKit
[]({% post_url 2022-12-25-RDKit-Find-Groups-Off-Common-Core %})
Given a collection of molecules as SMILES strings, find the maximum common substructure (MCS) match between them, and the groups off that common core for each molecule, displaying the results using a grid.
[Chemistry machine learning for drug discovery with DeepChem]({% post_url 2022-12-13-Chemistry-machine-learning-for-drug-discovery-with-DeepChem %})
Uses Python, DeepChem, seaborn, Matplotlib, and pandas
[]({% post_url 2022-12-13-Chemistry-machine-learning-for-drug-discovery-with-DeepChem %})
Use the DeepChem deep learning package to predict compounds' lipophilicity--how well they are absorbed into the lipids of biological membranes, which is important for oral delivery of drugs.
[RDKit Utility to Visualize Retrosynthetic Analysis Hierarchically]({% post_url 2022-11-11-RDKit-Recap-decomposition-tree %})
Uses Python and RDKit
[]({% post_url 2022-11-11-RDKit-Recap-decomposition-tree %})
Given a target molecule, use the Recap algorithm{:target='_blank'} to decompose it into a set of fragments that could be combined to make the parent molecule using common reactions. Display the fragmentation hierarchically.
[RDKit Utility to Find and Highlight the Maximum Common Substructure Amongst Molecules]({% post_url 2022-10-09-RDKit-find-and-highlight-the-maximum-common-substructure-between-molecules %})
Uses Python and RDKit
[]({% post_url 2022-10-09-RDKit-find-and-highlight-the-maximum-common-substructure-between-molecules %})
Given a collection of molecules as SMILES{:target='_blank'} strings, find the maximum common substructure (MCS) match between them as a SMARTS{:target='_blank'} string, display the match pattern as a molecule, and highlight the match pattern in each molecule using a grid.
Uses Python, NumPy, SymPy, ChemPy, Flask, JavaScript, and Bootstrap
Find a given number of points which satisfy constraints given in a constraints file for an n-dimensional space defined on the unit hypercube, then write them to an output file.
Optionally, identify the components (dimensions) in the constraints file using chemical formulas, and Sampler will use ChemPy to calculate their molar masses, then output the component weight fraction.
Uses Ruby, Sinatra, PostgreSQL, and JavaScript
Understand how the elements are related to each other. Emphasizes electronic configuration of the elements.
-
Conceived, proposed, and coded MolsMatrixToGridImage feature to use a two-dimensional (nested) data structure as input to create molecular grid images. Feature was merged into the main codebase by the project maintainer and released in the 2023.09.1 release. It was the subject of an article on the site Macs In Chemistry, which included:
If you need to display molecules and associated data in a grid then Jeremy Monat’s MolsMatrixToGridImage is exactly what you need. To underline just how useful this is and to highlight how it simplifies code he has written a very nice blog post.
-
Implemented improved combinatorial function, making computations ~75x faster for Gen2DFingerprint
-
Improved documentation by enhancing API documentation, adding to the guide for contributors, adding an example of how to include a bond index, illustrating molecular drawing capability in tutorial, adding SMILES (chemical notation) for R groups, and more
- Technical writer for funded 2022 Season of Docs project: Creating documentation for how to solve equations
- Core developer wrote "I think you are doing excellent work on the SymPy documentation. Thank you!"
- Led selection of new Sphinx theme for SymPy documentation; the new theme was implemented
- Contributed code for documentation to explain usage of a core class for users and developers, and improve accessibility
- Lead developer wrote “You've been doing great work with the Sphinx theme and other documentation work”
- Initiated and provided scientific and coding direction to issue to improve interpretation of chemical formulas
- Spurred a developer to improve code
- Package author wrote “Great work guys!”
- Initiated issue to improve accessibility and internationalization of documentation generated by Sphinx; was addressed within a day by Sphinx’s main developer