Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(getSubnetworkFromIndra): Add additional filters #30

Merged
merged 4 commits into from
Jan 22, 2025
Merged

Conversation

tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 22, 2025

PR Type

Enhancement, Tests


Description

  • Added new filtering parameters to getSubnetworkFromIndra function.

  • Implemented filtering logic for statement types, paper count, and evidence count.

  • Updated documentation to reflect new parameters and functionality.

  • Enhanced unit tests to validate new filtering features.


Changes walkthrough 📝

Relevant files
Enhancement
getSubnetworkFromIndra.R
Add filtering parameters and logic to main function           

R/getSubnetworkFromIndra.R

  • Added new parameters for filtering: statement_types,
    paper_count_cutoff, and evidence_count_cutoff.
  • Updated function logic to apply new filters to INDRA response and
    edges.
  • Adjusted node construction to consider filtered edges.
  • +14/-2   
    utils_getSubnetworkFromIndra.R
    Add utility functions for filtering INDRA response and edges

    R/utils_getSubnetworkFromIndra.R

  • Introduced .filterIndraResponse to filter INDRA statements by type and
    evidence count.
  • Added .filterEdgesDataFrame to filter edges by paper count.
  • Updated edge metadata to include paper count.
  • +41/-1   
    Documentation
    getSubnetworkFromIndra.Rd
    Update documentation for new filtering parameters               

    man/getSubnetworkFromIndra.Rd

  • Updated documentation to include descriptions for new parameters.
  • Adjusted usage examples to reflect new filtering options.
  • +16/-1   
    Tests
    test-getSubnetworkFromIndra.R
    Update and enhance tests for filtering features                   

    tests/testthat/test-getSubnetworkFromIndra.R

  • Modified tests to include statement_types parameter in function calls.
  • Adjusted expected results to validate new filtering logic.
  • Added TODO for p-value filtering test adjustments.
  • +5/-4     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Parameter Default Values

    The new parameters statement_types, paper_count_cutoff, and evidence_count_cutoff have default values. Ensure these defaults align with expected use cases and do not unintentionally exclude relevant data.

    #' @param statement_types list of interaction types to filter on.  Equivalent to
    #' statement type in INDRA.  Default is c("IncreaseAmount", "DecreaseAmount").
    #' @param paper_count_cutoff number of papers to filter on. Default is 1.
    #' @param evidence_count_cutoff number of evidence to filter on for each
    #' paper. E.g. A paper may have 5 sentences describing the same interaction vs 1
    #' sentence.  Default is 1.
    Filtering Logic

    The filtering logic in .filterIndraResponse and .filterEdgesDataFrame should be validated to ensure it correctly handles edge cases, such as empty responses or unexpected data formats.

    .filterIndraResponse <- function(res, interaction_types, evidence_count_cutoff) {
        filtered_response = Filter(
            function(statement) statement$data$stmt_type %in% interaction_types, 
            res)
        filtered_response = Filter(
            function(statement) statement$data$evidence_count >= evidence_count_cutoff, 
            filtered_response
        )
        return(filtered_response)
    }
    Test Coverage

    The updated tests include new parameters, but additional tests should be added to cover edge cases for the filtering logic, such as invalid parameter values or empty datasets.

        suppressWarnings(subnetwork <- getSubnetworkFromIndra(input, statement_types = c("Activation", "Phosphorylation")))
        expect_equal(nrow(subnetwork$nodes), 2)
        expect_equal(nrow(subnetwork$edges), 1)
    })
    
    test_that("getSubnetworkFromIndra with pvalue filter works correctly", {
        input <- data.table::fread(
            system.file("extdata/groupComparisonModel.csv", package = "MSstatsBioNet")
        )
        local_mocked_bindings(.callIndraCogexApi = function(x) {
            return(readRDS(system.file("extdata/indraResponse.rds", package = "MSstatsBioNet")))
        })
        suppressWarnings(
            subnetwork <- getSubnetworkFromIndra(input, pvalueCutoff = 0.45, statement_types = c("Activation", "Phosphorylation"))
        )
        # TODO: Adjust test file to reflect p-value filtering
        expect_equal(nrow(subnetwork$nodes), 2)
        expect_equal(nrow(subnetwork$edges), 1)

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Validate field existence and type

    Ensure that the statement$data$evidence_count field exists and is numeric before
    applying the filter in .filterIndraResponse to avoid runtime errors if the field is
    missing or improperly formatted.

    R/utils_getSubnetworkFromIndra.R [50-52]

     filtered_response = Filter(
    -    function(statement) statement$data$evidence_count >= evidence_count_cutoff, 
    +    function(statement) is.numeric(statement$data$evidence_count) && statement$data$evidence_count >= evidence_count_cutoff, 
         filtered_response
     )
    Suggestion importance[1-10]: 9

    Why: This suggestion ensures that the evidence_count field exists and is numeric before applying the filter, which is crucial to prevent runtime errors if the field is missing or improperly formatted. It directly enhances the robustness of the code.

    9
    Validate stmt_type field existence

    Add a check to ensure that statement$data$stmt_type is not NULL or missing before
    applying the filter in .filterIndraResponse to prevent potential errors.

    R/utils_getSubnetworkFromIndra.R [48-49]

     filtered_response = Filter(
    -    function(statement) statement$data$stmt_type %in% interaction_types, 
    +    function(statement) !is.null(statement$data$stmt_type) && statement$data$stmt_type %in% interaction_types, 
         res)
    Suggestion importance[1-10]: 9

    Why: Adding a check for the existence of stmt_type ensures that the filter does not fail due to missing or null values, improving the reliability of the function and preventing potential runtime errors.

    9
    Validate paper_count initialization

    Ensure that the paperCount field is initialized and numeric in
    .constructEdgesDataFrame to avoid errors when filtering or aggregating data.

    R/utils_getSubnetworkFromIndra.R [134]

    -edge$data$paper_count <- 1
    +edge$data$paper_count <- ifelse(is.numeric(edge$data$paper_count), edge$data$paper_count, 1)
    Suggestion importance[1-10]: 8

    Why: This suggestion ensures that the paper_count field is properly initialized and numeric, which is important for avoiding errors during filtering or aggregation. It improves the robustness of the code.

    8
    General
    Add tests for new parameters

    Update the test cases to include assertions for the new paper_count_cutoff and
    evidence_count_cutoff parameters to ensure they are correctly applied and tested.

    tests/testthat/test-getSubnetworkFromIndra.R [8-9]

    -suppressWarnings(subnetwork <- getSubnetworkFromIndra(input, statement_types = c("Activation", "Phosphorylation")))
    +suppressWarnings(subnetwork <- getSubnetworkFromIndra(input, statement_types = c("Activation", "Phosphorylation"), paper_count_cutoff = 2, evidence_count_cutoff = 3))
    Suggestion importance[1-10]: 7

    Why: Updating the test cases to include assertions for the new parameters ensures that the added functionality is properly tested. While this is a good enhancement, it is not critical as the existing tests already cover the function's behavior to some extent.

    7

    @codecov-commenter
    Copy link

    Codecov Report

    All modified and coverable lines are covered by tests ✅

    Project coverage is 87.41%. Comparing base (dd20f47) to head (80b393f).

    Additional details and impacted files
    @@            Coverage Diff             @@
    ##            devel      #30      +/-   ##
    ==========================================
    + Coverage   86.73%   87.41%   +0.68%     
    ==========================================
      Files           7        7              
      Lines         407      429      +22     
    ==========================================
    + Hits          353      375      +22     
      Misses         54       54              

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @tonywu1999 tonywu1999 merged commit c33b317 into devel Jan 22, 2025
    4 checks passed
    @tonywu1999 tonywu1999 deleted the hupo-changes branch January 22, 2025 23:09
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants