Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(getSubnetworkFromIndra): Add correlation metrics into INDRA results #32

Merged
merged 5 commits into from
Jan 22, 2025

Conversation

tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 22, 2025

PR Type

Enhancement, Documentation, Bug fix


Description

  • Added correlation metrics to INDRA subnetwork results.

  • Introduced new parameters for correlation filtering and protein-level data.

  • Updated documentation to reflect new functionality and parameters.

  • Fixed issues related to correlation handling in edge filtering.


Changes walkthrough 📝

Relevant files
Dependencies
DESCRIPTION
Added `tidyr` dependency to Imports.                                         

DESCRIPTION

  • Added tidyr to the Imports section.
+2/-1     
NAMESPACE
Added imports for `MSstats` and `tidyr`.                                 

NAMESPACE

  • Imported quantification from MSstats.
  • Imported pivot_wider from tidyr.
  • +2/-0     
    Enhancement
    getSubnetworkFromIndra.R
    Added correlation handling in `getSubnetworkFromIndra`.   

    R/getSubnetworkFromIndra.R

  • Added protein_level_data and correlation_cutoff parameters.
  • Updated function logic to handle correlation filtering.
  • Enhanced edge filtering with correlation metrics.
  • +12/-3   
    utils_getSubnetworkFromIndra.R
    Enhanced edge construction and filtering with correlation.

    R/utils_getSubnetworkFromIndra.R

  • Modified .constructEdgesDataFrame to calculate and include
    correlations.
  • Updated .filterEdgesDataFrame to filter edges by correlation cutoff.
  • Added logic to handle protein-level data for correlation computation.
  • +23/-3   
    Documentation
    getSubnetworkFromIndra.Rd
    Updated documentation for new parameters and features.     

    man/getSubnetworkFromIndra.Rd

  • Updated documentation to include protein_level_data and
    correlation_cutoff.
  • Reflected new functionality in parameter descriptions.
  • +12/-1   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    @tonywu1999 tonywu1999 changed the title Correlation feat(getSubnetworkFromIndra): Add correlation metrics into INDRA results Jan 22, 2025
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Correlation Calculation Logic

    The correlation calculation in .constructEdgesDataFrame assumes that the wide_data matrix is correctly structured and that the cor function will not encounter issues with missing or invalid data. This logic should be validated for edge cases, such as missing values or mismatched protein IDs.

    if (!is.null(protein_level_data)) {
        protein_level_data <- protein_level_data[
            protein_level_data$Protein %in% edges$source | 
                protein_level_data$Protein %in% edges$target, ]
        wide_data = pivot_wider(protein_level_data[,c("Protein", "LogIntensities", "originalRUN")], names_from = Protein, values_from = LogIntensities)
        wide_data <- wide_data[, -which(names(wide_data) == "originalRUN")]
        correlations = cor(wide_data, use = "pairwise.complete.obs")
        edges$correlation = apply(edges, 1, function(edge) correlations[edge["source"], edge["target"]])
    Filtering Logic for Correlation

    The filtering logic in .filterEdgesDataFrame for correlation cutoffs assumes that the correlation column is always present when protein_level_data is provided. This assumption should be validated to avoid runtime errors.

    if ("correlation" %in% colnames(edges)) {
        edges <- edges[which(abs(edges$correlation) >= correlation_cutoff), ]
    }

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Handle missing protein indices in correlation matrix

    Ensure that the correlations matrix is properly indexed and handles cases where
    source or target proteins are missing or mismatched in the matrix to avoid runtime
    errors.

    R/utils_getSubnetworkFromIndra.R [192]

    -edges$correlation = apply(edges, 1, function(edge) correlations[edge["source"], edge["target"]])
    +edges$correlation = apply(edges, 1, function(edge) {
    +    if (edge["source"] %in% rownames(correlations) && edge["target"] %in% colnames(correlations)) {
    +        return(correlations[edge["source"], edge["target"]])
    +    } else {
    +        return(NA)
    +    }
    +})
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential runtime error by ensuring that the correlation matrix is properly indexed and handles cases where source or target proteins are missing. This is a critical improvement for robustness and correctness.

    9
    Validate input structure for pivot_wider

    Ensure that the pivot_wider function call handles cases where the input data does
    not have the expected structure, such as missing required columns, to prevent
    runtime errors.

    R/utils_getSubnetworkFromIndra.R [189]

    +if (!all(c("Protein", "LogIntensities", "originalRUN") %in% colnames(protein_level_data))) {
    +    stop("protein_level_data must contain 'Protein', 'LogIntensities', and 'originalRUN' columns.")
    +}
     wide_data = pivot_wider(protein_level_data[,c("Protein", "LogIntensities", "originalRUN")], names_from = Protein, values_from = LogIntensities)
    Suggestion importance[1-10]: 9

    Why: This suggestion ensures that the input data for pivot_wider has the expected structure, preventing runtime errors and improving the function's robustness.

    9
    Validate wide_data for missing values

    Add a check to ensure that the wide_data variable does not contain columns with all
    missing values before calculating correlations, as this could lead to errors or
    invalid results.

    R/utils_getSubnetworkFromIndra.R [191]

    +if (any(colSums(!is.na(wide_data)) == 0)) {
    +    stop("wide_data contains columns with all missing values, unable to calculate correlations.")
    +}
     correlations = cor(wide_data, use = "pairwise.complete.obs")
    Suggestion importance[1-10]: 8

    Why: Adding a check for columns with all missing values in wide_data prevents errors during correlation computation, enhancing the reliability of the function.

    8
    General
    Validate edges after correlation filtering

    Add a safeguard to ensure that the edges data frame does not contain invalid or
    missing values after filtering by correlation, as this could lead to downstream
    issues.

    R/utils_getSubnetworkFromIndra.R [228]

     edges <- edges[which(abs(edges$correlation) >= correlation_cutoff), ]
    +if (nrow(edges) == 0) {
    +    warning("No edges remain after applying the correlation cutoff.")
    +}
    Suggestion importance[1-10]: 7

    Why: Adding a safeguard to check for empty edges after filtering by correlation cutoff is a good practice to avoid downstream issues, though the impact is less critical compared to other suggestions.

    7

    @codecov-commenter
    Copy link

    codecov-commenter commented Jan 22, 2025

    Codecov Report

    Attention: Patch coverage is 29.62963% with 19 lines in your changes missing coverage. Please review.

    Project coverage is 83.58%. Comparing base (fbac585) to head (69ce251).

    Files with missing lines Patch % Lines
    R/utils_getSubnetworkFromIndra.R 20.83% 19 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##            devel      #32      +/-   ##
    ==========================================
    - Coverage   87.09%   83.58%   -3.51%     
    ==========================================
      Files           7        7              
      Lines         434      457      +23     
    ==========================================
    + Hits          378      382       +4     
    - Misses         56       75      +19     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @tonywu1999 tonywu1999 merged commit ef1dc04 into devel Jan 22, 2025
    3 checks passed
    @tonywu1999 tonywu1999 deleted the correlation branch January 22, 2025 22:10
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants