From 975b9daf3752105c12351a9281ae93e8be2ce4d7 Mon Sep 17 00:00:00 2001 From: ivanmilevtues Date: Tue, 17 Jun 2025 01:15:14 +0200 Subject: [PATCH] Added high-level diagrams --- .codeboarding/ConfGF Runner.md | 133 +++++++++++ .../Conformation Generation & Evaluation.md | 197 +++++++++++++++++ .codeboarding/Distance Score Model.md | 99 +++++++++ .codeboarding/General Utilities.md | 189 ++++++++++++++++ .codeboarding/Molecular Data Processing.md | 169 ++++++++++++++ .codeboarding/on_boarding.md | 207 ++++++++++++++++++ 6 files changed, 994 insertions(+) create mode 100644 .codeboarding/ConfGF Runner.md create mode 100644 .codeboarding/Conformation Generation & Evaluation.md create mode 100644 .codeboarding/Distance Score Model.md create mode 100644 .codeboarding/General Utilities.md create mode 100644 .codeboarding/Molecular Data Processing.md create mode 100644 .codeboarding/on_boarding.md diff --git a/.codeboarding/ConfGF Runner.md b/.codeboarding/ConfGF Runner.md new file mode 100644 index 0000000..effbfe1 --- /dev/null +++ b/.codeboarding/ConfGF Runner.md @@ -0,0 +1,133 @@ +```mermaid + +graph LR + + DefaultRunner["DefaultRunner"] + + TorchUtilities["TorchUtilities"] + + DistanceGeometryUtilities["DistanceGeometryUtilities"] + + DatasetProcessing["DatasetProcessing"] + + DefaultRunner -- "utilizes" --> TorchUtilities + + DefaultRunner -- "depends on" --> DistanceGeometryUtilities + + DefaultRunner -- "processes data with" --> DatasetProcessing + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The ConfGF Runner subsystem serves as the central orchestration component for the entire ConfGF pipeline. Its primary purpose is to manage the lifecycle of molecular conformation generation, encompassing the training and evaluation of the distance score model, and the generation of molecular conformations using both distance-based and position-based Langevin Dynamics. It coordinates interactions with various utility modules for data handling, model execution, and evaluation, ensuring a cohesive workflow from raw SMILES input to generated 3D molecular structures. + + + +### DefaultRunner + +The central orchestrator for the ConfGF model, handling training, evaluation, saving, and the core generation processes using Langevin Dynamics for both position and distance-based sampling. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.runner.default_runner.DefaultRunner` (18:395) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.train` (110:183) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.evaluate` (79:107) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.distance_Langevin_Dynamics` (195:218) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.position_Langevin_Dynamics` (222:258) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.save` (40:51) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.ConfGF_generator` (260:280) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.ConfGFDist_generator` (283:304) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.generate_samples_from_smiles` (307:351) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.generate_samples_from_testset` (354:395) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.convert_score_d` (187:191) + + + + + +### TorchUtilities + +Provides essential PyTorch-related utility functions, such as norm clipping and data repetition, crucial for numerical stability and data manipulation within the ConfGF framework. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.torch.clip_norm` (8:11) + +- `confgf.utils.torch.repeat_data` (14:16) + + + + + +### DistanceGeometryUtilities + +Offers utility functions for handling distance geometry, including calculating distances from positions and embedding 3D structures, which are fundamental for molecular conformation generation. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.distgeom.get_d_from_pos` (59:60) + +- `confgf.utils.distgeom.Embed3D` (39:57) + + + + + +### DatasetProcessing + +Manages the conversion of SMILES strings into a data format suitable for the ConfGF model, acting as an interface between raw molecular data and the model's input requirements. + + + + + +**Related Classes/Methods**: + + + +- `confgf.dataset.dataset.smiles_to_data` (77:124) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Conformation Generation & Evaluation.md b/.codeboarding/Conformation Generation & Evaluation.md new file mode 100644 index 0000000..0f7dd72 --- /dev/null +++ b/.codeboarding/Conformation Generation & Evaluation.md @@ -0,0 +1,197 @@ +```mermaid + +graph LR + + Core_Generation_Runner["Core Generation Runner"] + + Chemical_Structure_Utilities["Chemical Structure Utilities"] + + Conformation_Evaluation["Conformation Evaluation"] + + Distance_Geometry_Operations["Distance Geometry Operations"] + + PyTorch_Helper_Utilities["PyTorch Helper Utilities"] + + Molecular_Data_Preparation["Molecular Data Preparation"] + + Core_Generation_Runner -- "utilizes" --> Molecular_Data_Preparation + + Core_Generation_Runner -- "leverages" --> PyTorch_Helper_Utilities + + Core_Generation_Runner -- "interacts with" --> Distance_Geometry_Operations + + Core_Generation_Runner -- "interacts with" --> Conformation_Evaluation + + Core_Generation_Runner -- "interacts with" --> Chemical_Structure_Utilities + + Chemical_Structure_Utilities -- "depends on" --> PyTorch_Helper_Utilities + + Conformation_Evaluation -- "depends on" --> Chemical_Structure_Utilities + + Conformation_Evaluation -- "depends on" --> PyTorch_Helper_Utilities + + Distance_Geometry_Operations -- "depends on" --> PyTorch_Helper_Utilities + + Molecular_Data_Preparation -- "utilizes" --> Distance_Geometry_Operations + + Molecular_Data_Preparation -- "utilizes" --> Chemical_Structure_Utilities + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The Conformation Generation & Evaluation subsystem is responsible for creating 3D molecular conformations from distance matrices using distance geometry and assessing their quality. It integrates core generation runners that orchestrate Langevin Dynamics simulations for both position and distance-based approaches, leveraging PyTorch utilities for numerical stability and data handling. The generated conformations are then evaluated using metrics like RMSD and Maximum Mean Discrepancy (MMD), with essential chemical structure utilities supporting molecular data manipulation and analysis throughout the process. Molecular data preparation, including SMILES to data conversion, forms the initial step in this pipeline. + + + +### Core Generation Runner + +This component orchestrates the molecular conformation generation process, including Langevin Dynamics simulations for both position and distance-based approaches. It serves as the primary interface for generating samples from SMILES strings or test sets. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.position_Langevin_Dynamics` (222:258) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.ConfGF_generator` (260:280) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.ConfGFDist_generator` (283:304) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.generate_samples_from_smiles` (307:351) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.generate_samples_from_testset` (354:395) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.convert_score_d` (187:191) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.distance_Langevin_Dynamics` (195:218) + + + + + +### Chemical Structure Utilities + +Provides essential functions for manipulating and analyzing chemical structures, such as setting atom positions in RDKit molecules, calculating RMSD, and retrieving atom symbols. These utilities are fundamental for handling molecular data. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.chem.set_rdmol_positions` (49:57) + +- `ConfGF.confgf.utils.chem.set_rdmol_positions_` (60:68) + +- `ConfGF.confgf.utils.chem.GetBestRMSD` (134:138) + +- `ConfGF.confgf.utils.chem.get_atom_symbol` (71:72) + + + + + +### Conformation Evaluation + +This component is responsible for evaluating the quality and diversity of generated molecular conformations. It includes functions for computing RMSD confusion matrices and evaluating distance-based metrics like Maximum Mean Discrepancy (MMD) using Gaussian kernels. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.evaluation.get_rmsd_confusion_matrix` (11:32) + +- `ConfGF.confgf.utils.evaluation.evaluate_conf` (35:38) + +- `ConfGF.confgf.utils.evaluation.evaluate_distance` (41:133) + +- `ConfGF.confgf.utils.evaluation.compute_mmd` (162:180) + +- `ConfGF.confgf.utils.evaluation.guassian_kernel` (135:160) + + + + + +### Distance Geometry Operations + +Handles operations related to molecular distance geometry, including embedding 3D coordinates from distance matrices and calculating distance matrices from atomic positions. These are crucial for working with molecular conformations in a distance space. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.distgeom.Embed3D` (39:57) + +- `ConfGF.confgf.utils.distgeom.embed_3D` (3:36) + +- `ConfGF.confgf.utils.distgeom.get_d_from_pos` (59:60) + + + + + +### PyTorch Helper Utilities + +Contains general utility functions that leverage PyTorch, such as clipping tensor norms and repeating data, which are commonly used in deep learning contexts to ensure numerical stability and data preparation. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.torch.clip_norm` (8:11) + +- `confgf.utils.torch.repeat_data` (14:16) + + + + + +### Molecular Data Preparation + +Manages the conversion of chemical identifiers, specifically SMILES strings, into structured data formats suitable for processing by the molecular generation models. + + + + + +**Related Classes/Methods**: + + + +- `confgf.dataset.dataset.smiles_to_data` (77:124) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Distance Score Model.md b/.codeboarding/Distance Score Model.md new file mode 100644 index 0000000..dcee892 --- /dev/null +++ b/.codeboarding/Distance Score Model.md @@ -0,0 +1,99 @@ +```mermaid + +graph LR + + DistanceScoreMatch["DistanceScoreMatch"] + + GraphIsomorphismNetwork["GraphIsomorphismNetwork"] + + MultiLayerPerceptron["MultiLayerPerceptron"] + + DistanceScoreMatch -- "uses" --> GraphIsomorphismNetwork + + DistanceScoreMatch -- "uses" --> MultiLayerPerceptron + + GraphIsomorphismNetwork -- "uses" --> MultiLayerPerceptron + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +Implements the core graph neural network model, DistanceScoreMatch, which learns to predict scores related to inter-atomic distances. This model is central to the ConfGF framework, utilizing graph convolutions and transformations to process molecular graphs and output distance-based scores. + + + +### DistanceScoreMatch + +Implements the core graph neural network model, DistanceScoreMatch, which learns to predict scores related to inter-atomic distances. This model is central to the ConfGF framework, utilizing graph convolutions and transformations to process molecular graphs and output distance-based scores. It orchestrates graph extension, distance calculation, and score prediction. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch` (12:200) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.extend_graph` (48:91) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.get_distance` (94:99) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.get_score` (103:122) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.forward` (124:200) + + + + + +### GraphIsomorphismNetwork + +The GraphIsomorphismNetwork (GIN) component implements a type of Graph Neural Network designed to process and learn representations from graph-structured data. It consists of multiple convolutional layers that iteratively update node and graph features. This component is crucial for extracting meaningful features from the molecular graphs, which are then used by the DistanceScoreMatch model for score estimation. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.layers.gin.GraphIsomorphismNetwork` (64:130) + +- `ConfGF.confgf.layers.gin.GINEConv` (14:61) + + + + + +### MultiLayerPerceptron + +The MultiLayerPerceptron (MLP) component is a fundamental feed-forward neural network used as a versatile building block throughout the ConfGF subsystem. It comprises multiple linear layers with optional activation functions and dropout. MLPs are utilized for various transformations, including processing input features, generating output scores, and as sub-components within the GraphIsomorphismNetwork's convolutional layers. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.layers.common.MultiLayerPerceptron` (47:94) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/General Utilities.md b/.codeboarding/General Utilities.md new file mode 100644 index 0000000..2a58491 --- /dev/null +++ b/.codeboarding/General Utilities.md @@ -0,0 +1,189 @@ +```mermaid + +graph LR + + Learning_Rate_Management["Learning Rate Management"] + + Langevin_Dynamics_Runner["Langevin Dynamics Runner"] + + Graph_Neural_Network_Layers["Graph Neural Network Layers"] + + Common_Neural_Network_Modules["Common Neural Network Modules"] + + Torch_Utility_Functions["Torch Utility Functions"] + + Geometric_Utility_Functions["Geometric Utility Functions"] + + Chemical_Data_Utilities["Chemical Data Utilities"] + + Langevin_Dynamics_Runner -- "utilizes" --> Torch_Utility_Functions + + Langevin_Dynamics_Runner -- "utilizes" --> Geometric_Utility_Functions + + Graph_Neural_Network_Layers -- "incorporates" --> Common_Neural_Network_Modules + + Chemical_Data_Utilities -- "utilizes" --> Torch_Utility_Functions + + Geometric_Utility_Functions -- "utilizes" --> Torch_Utility_Functions + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +This graph illustrates the structure and interdependencies within the `General Utilities` subsystem of the ConfGF project. It highlights foundational helper functions for PyTorch operations, learning rate management, geometric calculations, and chemical data processing, along with core neural network components. The relationships show how different utility modules support the Langevin Dynamics simulations and how common neural network modules are integrated into graph neural network layers. + + + +### Learning Rate Management + +Handles the creation and management of learning rate schedulers, including a custom exponential decay with a minimum learning rate, crucial for stable model training. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.torch.get_scheduler` (61:75) + +- `ConfGF.confgf.utils.torch.ExponentialLR_with_minLr` (28:46) + + + + + +### Langevin Dynamics Runner + +Orchestrates the Langevin dynamics simulations for molecular structures, managing both distance and position-based updates, and converting score distances. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.distance_Langevin_Dynamics` (195:218) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.position_Langevin_Dynamics` (222:258) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.convert_score_d` (187:191) + + + + + +### Graph Neural Network Layers + +Provides the core building blocks for Graph Isomorphism Networks, including the GINE convolution and the overall network structure for processing graph-structured data. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.layers.gin.GraphIsomorphismNetwork.__init__` (66:91) + +- `ConfGF.confgf.layers.gin.GraphIsomorphismNetwork.forward` (95:130) + +- `ConfGF.confgf.layers.gin.GINEConv` (14:61) + + + + + +### Common Neural Network Modules + +Contains fundamental neural network components like Multi-Layer Perceptrons and various readout mechanisms (sum and mean) used across different models. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.layers.common.MultiLayerPerceptron` (47:94) + +- `ConfGF.confgf.layers.common.SumReadout` (28:43) + +- `ConfGF.confgf.layers.common.MeanReadout` (10:25) + + + + + +### Torch Utility Functions + +A collection of general-purpose utility functions for PyTorch operations, such as norm clipping and other tensor manipulations. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.torch.clip_norm` (8:11) + + + + + +### Geometric Utility Functions + +Provides functions for geometric calculations, specifically for deriving distances from positional data, essential for molecular structure analysis. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.distgeom.get_d_from_pos` (59:60) + + + + + +### Chemical Data Utilities + +Offers functionalities for processing chemical data, including operations like removing duplicate molecules and converting molecular representations to SMILES strings. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.chem.remove_duplicate_mols` (83:97) + +- `ConfGF.confgf.utils.chem.mol_to_smiles` (75:76) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Molecular Data Processing.md b/.codeboarding/Molecular Data Processing.md new file mode 100644 index 0000000..a79c3ab --- /dev/null +++ b/.codeboarding/Molecular Data Processing.md @@ -0,0 +1,169 @@ +```mermaid + +graph LR + + Chemical_Processing_Utilities["Chemical Processing Utilities"] + + Molecular_Data_Transformers["Molecular Data Transformers"] + + Dataset_Management["Dataset Management"] + + Conformation_Generation_Runner["Conformation Generation Runner"] + + Evaluation_Utilities["Evaluation Utilities"] + + Conformation_Generation_Runner -- "prepares data using" --> Dataset_Management + + Dataset_Management -- "transforms data using" --> Molecular_Data_Transformers + + Dataset_Management -- "utilizes chemical information from" --> Chemical_Processing_Utilities + + Molecular_Data_Transformers -- "uses chemical properties from" --> Chemical_Processing_Utilities + + Evaluation_Utilities -- "retrieves atom symbols from" --> Chemical_Processing_Utilities + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The Molecular Data Processing subsystem in ConfGF is responsible for the comprehensive handling of molecular data, from initial input to its transformation into a graph-based representation suitable for neural network models. This involves converting raw SMILES strings or RDKit molecular objects, managing various datasets, and enriching the molecular graphs with higher-order edges and essential chemical properties. It acts as the foundational layer for preparing all molecular inputs for subsequent model operations, ensuring data consistency and model readiness. + + + +### Chemical Processing Utilities + +Provides fundamental chemical utility functions, such as converting molecular objects to SMILES strings and retrieving atom symbols, which are crucial for various data processing and representation tasks within the ConfGF system. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.chem.remove_duplicate_mols` (83:97) + +- `confgf.utils.chem.mol_to_smiles` (75:76) + +- `confgf.utils.chem.get_atom_symbol` (71:72) + + + + + +### Molecular Data Transformers + +Contains classes and methods for transforming molecular data structures, including adding higher-order edges, calculating edge lengths, and assigning edge names, which are essential steps in preparing data for graph-based models. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.transforms.AddHigherOrderEdges.get_higher_order_adj_matrix` (18:34) + +- `confgf.utils.transforms.AddHigherOrderEdges.__call__` (36:57) + +- `confgf.utils.transforms.AddEdgeName.__call__` (88:108) + +- `confgf.utils.transforms.AddEdgeLength` (59:67) + +- `confgf.utils.transforms.AddPlaceHolder` (71:76) + + + + + +### Dataset Management + +Manages the loading, preprocessing, and conversion of chemical datasets (like ISO17 and GEOM) into a format suitable for the ConfGF model. It includes functionalities to convert SMILES strings or RDKit molecular objects into data structures and to handle dataset initialization. + + + + + +**Related Classes/Methods**: + + + +- `confgf.dataset.dataset.smiles_to_data` (77:124) + +- `confgf.dataset.dataset.preprocess_iso17_dataset` (127:158) + +- `confgf.dataset.dataset.preprocess_GEOM_dataset` (164:296) + +- `confgf.dataset.dataset.get_GEOM_testset` (299:401) + +- `confgf.dataset.dataset.rdmol_to_data` (27:75) + +- `confgf.dataset.dataset.GEOMDataset.__init__` (409:414) + +- `confgf.dataset.dataset.GEOMDataset_PackedConf.__init__` (446:448) + +- `confgf.dataset.dataset.GEOMDataset._atom_types` (427:432) + +- `confgf.dataset.dataset.GEOMDataset._edge_types` (434:439) + +- `confgf.dataset.dataset.GEOMDataset_PackedConf._pack_data_by_mol` (450:481) + + + + + +### Conformation Generation Runner + +Orchestrates the process of generating molecular conformations from SMILES strings using different ConfGF generator models (ConfGF and ConfGFDist). It handles data preparation, model invocation, and result processing. + + + + + +**Related Classes/Methods**: + + + +- `confgf.runner.default_runner.DefaultRunner.generate_samples_from_smiles` (307:351) + +- `confgf.runner.default_runner.DefaultRunner.ConfGF_generator` (260:280) + +- `confgf.runner.default_runner.DefaultRunner.ConfGFDist_generator` (283:304) + + + + + +### Evaluation Utilities + +Provides tools for evaluating the quality of generated molecular conformations, primarily by computing distance metrics like Maximum Mean Discrepancy (MMD) between generated and reference bond lengths. + + + + + +**Related Classes/Methods**: + + + +- `confgf.utils.evaluation.evaluate_distance` (41:133) + +- `confgf.utils.evaluation.compute_mmd` (162:180) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md new file mode 100644 index 0000000..ee4eaf2 --- /dev/null +++ b/.codeboarding/on_boarding.md @@ -0,0 +1,207 @@ +```mermaid + +graph LR + + ConfGF_Runner["ConfGF Runner"] + + Distance_Score_Model["Distance Score Model"] + + Molecular_Data_Processing["Molecular Data Processing"] + + Conformation_Generation_Evaluation["Conformation Generation & Evaluation"] + + General_Utilities["General Utilities"] + + ConfGF_Runner -- "orchestrates" --> Distance_Score_Model + + ConfGF_Runner -- "manages" --> Molecular_Data_Processing + + ConfGF_Runner -- "utilizes" --> Conformation_Generation_Evaluation + + ConfGF_Runner -- "uses" --> General_Utilities + + Distance_Score_Model -- "processes data from" --> Molecular_Data_Processing + + Distance_Score_Model -- "leverages" --> General_Utilities + + Molecular_Data_Processing -- "prepares data for" --> Distance_Score_Model + + Molecular_Data_Processing -- "uses" --> General_Utilities + + Conformation_Generation_Evaluation -- "is used by" --> ConfGF_Runner + + Conformation_Generation_Evaluation -- "relies on" --> General_Utilities + + click ConfGF_Runner href "https://github.com/DeepGraphLearning/ConfGF/blob/main/.codeboarding//ConfGF Runner.md" "Details" + + click Distance_Score_Model href "https://github.com/DeepGraphLearning/ConfGF/blob/main/.codeboarding//Distance Score Model.md" "Details" + + click Molecular_Data_Processing href "https://github.com/DeepGraphLearning/ConfGF/blob/main/.codeboarding//Molecular Data Processing.md" "Details" + + click Conformation_Generation_Evaluation href "https://github.com/DeepGraphLearning/ConfGF/blob/main/.codeboarding//Conformation Generation & Evaluation.md" "Details" + + click General_Utilities href "https://github.com/DeepGraphLearning/ConfGF/blob/main/.codeboarding//General Utilities.md" "Details" + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The ConfGF project implements a framework for generating molecular conformations using a score-based generative model. The main flow involves training a Distance Score Model to learn the distribution of inter-atomic distances, which is then used by the ConfGF Runner to generate 3D molecular structures through Langevin Dynamics. Molecular data is prepared and transformed by the Molecular Data Processing component, and the quality of generated conformations is assessed by the Conformation Generation & Evaluation module. Various General Utilities support these core functionalities. + + + +### ConfGF Runner + +The central orchestration component responsible for managing the entire ConfGF pipeline, including training the distance score model, evaluating its performance, and generating molecular conformations using Langevin Dynamics. It coordinates interactions between data handling, model execution, and evaluation modules. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.runner.default_runner.DefaultRunner` (18:395) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.train` (110:183) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.evaluate` (79:107) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.distance_Langevin_Dynamics` (195:218) + +- `ConfGF.confgf.runner.default_runner.DefaultRunner.position_Langevin_Dynamics` (222:258) + + + + + +### Distance Score Model + +Implements the core graph neural network model, DistanceScoreMatch, which learns to predict scores related to inter-atomic distances. This model is central to the ConfGF framework, utilizing graph convolutions and transformations to process molecular graphs and output distance-based scores. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch` (12:200) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.extend_graph` (48:91) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.get_score` (103:122) + +- `ConfGF.confgf.models.scorenet.DistanceScoreMatch.forward` (124:200) + +- `ConfGF.confgf.layers.gin.GraphIsomorphismNetwork` (64:130) + +- `ConfGF.confgf.layers.gin.GINEConv` (14:61) + +- `ConfGF.confgf.layers.common.MultiLayerPerceptron` (47:94) + + + + + +### Molecular Data Processing + +Handles the entire lifecycle of molecular data, from loading raw SMILES strings or RDKit molecules to transforming them into graph representations suitable for the neural network. This includes managing datasets, adding higher-order edges, and incorporating chemical properties. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.dataset.dataset.smiles_to_data` (77:124) + +- `ConfGF.confgf.dataset.dataset.rdmol_to_data` (27:75) + +- `ConfGF.confgf.dataset.dataset.GEOMDataset` (407:439) + +- `ConfGF.confgf.dataset.dataset.GEOMDataset_PackedConf` (444:492) + +- `ConfGF.confgf.utils.transforms.AddHigherOrderEdges` (8:57) + +- `ConfGF.confgf.utils.transforms.AddEdgeLength` (59:67) + +- `ConfGF.confgf.utils.chem.mol_to_smiles` (75:76) + +- `ConfGF.confgf.utils.chem.get_atom_symbol` (71:72) + + + + + +### Conformation Generation & Evaluation + +Provides functionalities for generating 3D molecular conformations from distance matrices using distance geometry and for evaluating the quality of these generated conformations. It includes metrics like RMSD and MMD to assess structural accuracy. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.distgeom.Embed3D` (39:57) + +- `ConfGF.confgf.utils.distgeom.embed_3D` (3:36) + +- `ConfGF.confgf.utils.distgeom.get_d_from_pos` (59:60) + +- `ConfGF.confgf.utils.evaluation.get_rmsd_confusion_matrix` (11:32) + +- `ConfGF.confgf.utils.evaluation.evaluate_conf` (35:38) + +- `ConfGF.confgf.utils.evaluation.compute_mmd` (162:180) + +- `ConfGF.confgf.utils.chem.set_rdmol_positions` (49:57) + + + + + +### General Utilities + +A collection of foundational helper functions that support various operations across the ConfGF project. This includes general PyTorch utilities for learning rate scheduling and tensor manipulation, as well as common readout functions for graph neural networks. + + + + + +**Related Classes/Methods**: + + + +- `ConfGF.confgf.utils.torch.get_scheduler` (61:75) + +- `ConfGF.confgf.utils.torch.clip_norm` (8:11) + +- `ConfGF.confgf.layers.common.SumReadout` (28:43) + +- `ConfGF.confgf.layers.common.MeanReadout` (10:25) + +- `ConfGF.confgf.utils.chem.remove_duplicate_mols` (83:97) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file