Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make changes from feedback from HUPO #26

Merged
merged 4 commits into from
Jan 9, 2025
Merged

Conversation

tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 9, 2025

User description

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
  • Ran devtools::document()

PR Type

Enhancement, Tests, Documentation


Description

  • Added input validation for getSubnetworkFromIndra to handle large datasets.

  • Enhanced visualizeNetworks with new layout and updated color scheme.

  • Introduced new utility functions for populating protein-related information.

  • Expanded test coverage for input validation in getSubnetworkFromIndra.


Changes walkthrough 📝

Relevant files
Enhancement
4 files
NAMESPACE
Added new imports for layout and validation functions       
+1/-0     
getSubnetworkFromIndra.R
Added input validation for large datasets                               
+1/-0     
utils_getSubnetworkFromIndra.R
Introduced input validation utility for subnetwork function
+10/-0   
visualizeNetworks.R
Enhanced network visualization with new layout and colors
+5/-3     
Documentation
10 files
annotateProteinInfoFromIndra.R
Updated parameter name in function documentation                 
+1/-1     
annotateProteinInfoFromIndra.Rd
Updated documentation for `annotateProteinInfoFromIndra` 
+4/-1     
dot-populateHgncIdsInDataFrame.Rd
Added documentation for HGNC ID population utility             
+17/-0   
dot-populateHgncNamesInDataFrame.Rd
Added documentation for HGNC name population utility         
+17/-0   
dot-populateKinaseInfoInDataFrame.Rd
Added documentation for kinase information population utility
+17/-0   
dot-populatePhophataseInfoInDataFrame.Rd
Added documentation for phosphatase information population utility
+17/-0   
dot-populateTranscriptionFactorInfoInDataFrame.Rd
Added documentation for transcription factor information utility
+17/-0   
dot-populateUniprotIdsInDataFrame.Rd
Added documentation for Uniprot ID population utility       
+20/-0   
dot-validateAnnotateProteinInfoFromIndraInput.Rd
Added documentation for input validation utility                 
+17/-0   
getSubnetworkFromIndra.Rd
Updated documentation for `getSubnetworkFromIndra`             
+4/-3     
Tests
1 files
test-getSubnetworkFromIndra.R
Added test for input validation in `getSubnetworkFromIndra`
+12/-0   

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

Copy link

github-actions bot commented Jan 9, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Input Validation

The new .validateGetSubnetworkFromIndraInput function introduces a hard limit of 400 proteins. Ensure this limit is appropriate and aligns with the intended use cases, as it may restrict larger datasets.

#' Validate input for MSstatsBioNet getSubnetworkFromIndra
#' @param input dataframe from MSstats groupComparison output
#' @keywords internal
#' @noRd
.validateGetSubnetworkFromIndraInput <- function(input) {
    if (nrow(input) >= 400) {
        stop("Invalid Input Error: INDRA query must contain less than 400 proteins.  Consider adding a p-value cutoff")
    }
}
Visualization Color Scheme

The updated color scheme in visualizeNetworks changes the visual representation of nodes. Confirm that the new colors (#ADD8E6, #D3D3D3, #FFA590) are suitable and improve clarity for users.

    VISUAL_STYLE_MAPPINGS <- list(
        mapVisualProperty("Node Label", "id", "p"),
        mapVisualProperty(
            "Node Fill Color", "logFC_color", "c",
            c(-logfcCutoff, 0.0, logfcCutoff),
            c("#ADD8E6", "#ADD8E6", "#D3D3D3", "#FFA590", "#FFA590")
        )
    )
    createVisualStyle(
        VISUAL_STYLE_NAME,
        DEFAULT_VISUAL_STYLE,
        VISUAL_STYLE_MAPPINGS
    )
    setVisualStyle(VISUAL_STYLE_NAME)
    layoutNetwork("cose")
} else {

Copy link

github-actions bot commented Jan 9, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Possible issue
Add validation to ensure required columns exist in the input before processing

Ensure that the input parameter in .validateGetSubnetworkFromIndraInput is validated
for required columns like HgncId before checking the row count, as missing columns
could lead to runtime errors.

R/utils_getSubnetworkFromIndra.R [6-9]

+if (!"HgncId" %in% colnames(input)) {
+    stop("Invalid Input Error: Input must contain a column named 'HgncId'.")
+}
 if (nrow(input) >= 400) {
     stop("Invalid Input Error: INDRA query must contain less than 400 proteins.  Consider adding a p-value cutoff")
 }
Suggestion importance[1-10]: 9

Why: The suggestion addresses a critical issue by ensuring that the required column 'HgncId' exists in the input data frame before further processing. This prevents potential runtime errors and improves the robustness of the function.

9
General
Add a test case to ensure the function validates the presence of required columns in the input data frame

Include a test case to validate that the getSubnetworkFromIndra function throws an
error when the input data frame is missing required columns like HgncId.

tests/testthat/test-getSubnetworkFromIndra.R [27-37]

-test_that("Exception is thrown for 400+ proteins in dataframe", {
-    input_400 <- data.frame(
-        Protein = paste0("Protein", 1:400),
-        issue = NA,
+test_that("Exception is thrown for missing required columns in dataframe", {
+    input_missing_col <- data.frame(
+        Protein = paste0("Protein", 1:10),
         adj.pvalue = 0.05
     )
     expect_error(
-        getSubnetworkFromIndra(input_400),
-        "Invalid Input Error: INDRA query must contain less than 400 proteins.  Consider adding a p-value cutoff"
+        getSubnetworkFromIndra(input_missing_col),
+        "Invalid Input Error: Input must contain a column named 'HgncId'."
     )
 })
Suggestion importance[1-10]: 8

Why: Adding this test case enhances the test coverage and ensures that the function correctly handles cases where required columns are missing, thereby improving reliability and robustness.

8
Adjust the color mapping to ensure it aligns with the expected range of values for visualization

Verify that the color mapping in mapVisualProperty for "Node Fill Color" is
consistent with the expected range of logFC_color, as the added color values may not
align with the intended visualization logic.

R/visualizeNetworks.R [59]

-c("#ADD8E6", "#ADD8E6", "#D3D3D3", "#FFA590", "#FFA590")
+c("#ADD8E6", "#D3D3D3", "#FFA590")
Suggestion importance[1-10]: 7

Why: The suggestion improves the visualization logic by aligning the color mapping with the expected range of values, enhancing the clarity and accuracy of the visual output. However, it is less critical than addressing functional errors.

7

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.93%. Comparing base (90fc510) to head (6d04c6a).

Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #26      +/-   ##
==========================================
+ Coverage   85.66%   85.93%   +0.26%     
==========================================
  Files           6        6              
  Lines         314      320       +6     
==========================================
+ Hits          269      275       +6     
  Misses         45       45              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tonywu1999 tonywu1999 merged commit fa70cfe into devel Jan 9, 2025
3 checks passed
@tonywu1999 tonywu1999 deleted the hupo-changes branch January 9, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants