Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(getSubnetworkFromIndra): Create separate helper function to get correlation matrix #34

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 23, 2025

PR Type

Enhancement


Description

  • Refactored correlation matrix computation into a separate helper function.

  • Added a new internal function .getCorrelationMatrixFromProteinLevelData.

  • Removed redundant imports and streamlined dependencies.

  • Updated the NAMESPACE file to reflect changes in imports.


Changes walkthrough 📝

Relevant files
Configuration changes
NAMESPACE
Removed unused import and updated dependencies                     

NAMESPACE

  • Removed unused import quantification from MSstats.
  • Updated NAMESPACE to reflect streamlined dependencies.
  • +0/-1     
    Enhancement
    utils_getSubnetworkFromIndra.R
    Refactored correlation matrix computation logic                   

    R/utils_getSubnetworkFromIndra.R

  • Extracted correlation matrix computation into
    .getCorrelationMatrixFromProteinLevelData.
  • Removed inline correlation computation logic from
    .constructEdgesDataFrame.
  • Added a new internal function
    .getCorrelationMatrixFromProteinLevelData.
  • Removed redundant imports quantification and pivot_wider.
  • +17/-8   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Issue

    The new function .getCorrelationMatrixFromProteinLevelData assumes that the column originalRUN will always exist in protein_level_data. If this column is missing, the function will throw an error. This should be validated or handled gracefully.

    #' Construct correlation matrix from MSstats
    #' @param protein_level_data output of dataProcess
    #' @importFrom tidyr pivot_wider
    #' @return correlations matrix
    #' @keywords internal
    #' @noRd
    .getCorrelationMatrixFromProteinLevelData <- function(protein_level_data) {
        wide_data <- pivot_wider(protein_level_data[,c("Protein", "LogIntensities", "originalRUN")], names_from = Protein, values_from = LogIntensities)
        wide_data <- wide_data[, -which(names(wide_data) == "originalRUN")]
        if (any(colSums(!is.na(wide_data)) == 0)) {
            warning("protein_level_data contains proteins with all missing values, unable to calculate correlations for those proteins.")
        }
        correlations <- cor(wide_data, use = "pairwise.complete.obs")
        return(correlations)
    }
    Edge Case Handling

    The cor function in .getCorrelationMatrixFromProteinLevelData uses pairwise.complete.obs. This may lead to unexpected results if there are many missing values. Consider documenting or validating this behavior.

    #' Construct correlation matrix from MSstats
    #' @param protein_level_data output of dataProcess
    #' @importFrom tidyr pivot_wider
    #' @return correlations matrix
    #' @keywords internal
    #' @noRd
    .getCorrelationMatrixFromProteinLevelData <- function(protein_level_data) {
        wide_data <- pivot_wider(protein_level_data[,c("Protein", "LogIntensities", "originalRUN")], names_from = Protein, values_from = LogIntensities)
        wide_data <- wide_data[, -which(names(wide_data) == "originalRUN")]
        if (any(colSums(!is.na(wide_data)) == 0)) {
            warning("protein_level_data contains proteins with all missing values, unable to calculate correlations for those proteins.")
        }
        correlations <- cor(wide_data, use = "pairwise.complete.obs")
        return(correlations)
    }

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add validation for required columns

    Ensure that the pivot_wider function in .getCorrelationMatrixFromProteinLevelData
    handles cases where the protein_level_data input lacks the required columns
    ("Protein", "LogIntensities", "originalRUN") to prevent runtime errors.

    R/utils_getSubnetworkFromIndra.R [251]

    +if (!all(c("Protein", "LogIntensities", "originalRUN") %in% colnames(protein_level_data))) {
    +    stop("protein_level_data must contain 'Protein', 'LogIntensities', and 'originalRUN' columns.")
    +}
     wide_data <- pivot_wider(protein_level_data[,c("Protein", "LogIntensities", "originalRUN")], names_from = Protein, values_from = LogIntensities)
    Suggestion importance[1-10]: 9

    Why: The suggestion adds a critical validation step to ensure that the required columns are present in the input data before proceeding, which prevents runtime errors. This is highly relevant and improves the robustness of the function.

    9
    Handle NULL correlations gracefully

    Ensure that the apply function used to assign correlations to edges in
    .constructEdgesDataFrame handles cases where correlations is NULL, to prevent
    runtime errors.

    R/utils_getSubnetworkFromIndra.R [194-199]

    -edges$correlation <- apply(edges, 1, function(edge) {
    -    if (edge["source"] %in% rownames(correlations) && edge["target"] %in% colnames(correlations)) {
    -        return(correlations[edge["source"], edge["target"]])
    -    } else {
    -        return(NA)
    -    }
    -})
    +edges$correlation <- if (!is.null(correlations)) {
    +    apply(edges, 1, function(edge) {
    +        if (edge["source"] %in% rownames(correlations) && edge["target"] %in% colnames(correlations)) {
    +            return(correlations[edge["source"], edge["target"]])
    +        } else {
    +            return(NA)
    +        }
    +    })
    +} else {
    +    rep(NA, nrow(edges))
    +}
    Suggestion importance[1-10]: 9

    Why: The suggestion ensures that the function can handle cases where the correlation matrix is NULL, avoiding runtime errors and maintaining functionality. This is a critical improvement for robustness.

    9
    Handle empty matrix after column removal

    Add a check to handle cases where the wide_data matrix becomes empty after removing
    the "originalRUN" column, to avoid errors in subsequent correlation calculations.

    R/utils_getSubnetworkFromIndra.R [252]

     wide_data <- wide_data[, -which(names(wide_data) == "originalRUN")]
    +if (ncol(wide_data) == 0) {
    +    stop("wide_data is empty after removing 'originalRUN' column. Cannot calculate correlations.")
    +}
    Suggestion importance[1-10]: 8

    Why: The suggestion addresses a potential issue where the matrix could become empty after removing a specific column, which would cause errors in subsequent operations. This is a valuable enhancement to ensure the function's stability.

    8

    @codecov-commenter
    Copy link

    codecov-commenter commented Jan 23, 2025

    Codecov Report

    Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

    Project coverage is 82.79%. Comparing base (d98b6ed) to head (59d2c3c).

    Files with missing lines Patch % Lines
    R/utils_getSubnetworkFromIndra.R 0.00% 8 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##            devel      #34      +/-   ##
    ==========================================
    - Coverage   83.33%   82.79%   -0.54%     
    ==========================================
      Files           7        7              
      Lines         462      465       +3     
    ==========================================
      Hits          385      385              
    - Misses         77       80       +3     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @tonywu1999 tonywu1999 merged commit fc732dd into devel Jan 23, 2025
    3 checks passed
    @tonywu1999 tonywu1999 deleted the refactor branch January 23, 2025 14:35
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants