fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for an HGNC ID #28

tonywu1999 · 2025-01-13T15:47:42Z

User description

Motivation and Context

Got this error [Error in vapply ... values must be length 1, but FUN(X[[1]]) result is length 0](https://stackoverflow.com/questions/42426914/error-in-vapply-values-must-be-length-1-but-funx11-result-is-length-0)

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
Ran devtools::document()

PR Type

Bug fix, Tests

Description

Fixed handling of multiple matches for HGNC IDs in getSubnetworkFromIndra.
Improved error handling for invalid UniProt mnemonic ID inputs.
Enhanced unit tests to validate new error messages and edge cases.
Updated error messages for clarity and user guidance.

Changes walkthrough 📝

Relevant files

Bug fix

utils_annotateProteinInfoFromIndra.R `Enhanced error handling for UniProt mnemonic ID inputs` R/utils_annotateProteinInfoFromIndra.R Added error handling for converting non-character inputs to character. Improved validation of UniProt mnemonic ID inputs. Introduced `tryCatch` to handle conversion errors gracefully.	+18/-4
utils_getSubnetworkFromIndra.R `Fixed HGNC ID matching and improved error messages` R/utils_getSubnetworkFromIndra.R Fixed logic for matching HGNC IDs to ensure unique matches. Added error message for multiple or zero matches in input data. Updated error message for input size exceeding 400 proteins.	+19/-3

Tests

test-getSubnetworkFromIndra.R `Updated and enhanced unit tests for error handling` tests/testthat/test-getSubnetworkFromIndra.R Updated test to reflect new error message for input size limit. Enhanced tests to validate error handling for missing columns.	+1/-1

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

… an HGNC ID

github-actions · 2025-01-13T15:48:38Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Error Handling The new error handling for converting `uniprotMnemonicIds` to character strings should be validated to ensure it properly handles edge cases and does not introduce unexpected behavior. tryCatch({ # Attempt to convert all elements to character if not already character uniprotMnemonicIds <- lapply(uniprotMnemonicIds, function(x) { if (!is.character(x)) { as.character(x) } else { x } }) # Check if conversion was successful if (any(!sapply(uniprotMnemonicIds, is.character))) { stop("All elements in the list must be character strings representing UniProt mnemonic IDs.") } }, error = function(e) { stop("An error occurred converting uniprot mnemonic IDs to character strings: ", e$message) Input Validation The new validation for matching HGNC IDs in `.addAdditionalMetadataToIndraEdge` should be reviewed to confirm it correctly identifies and handles cases with 0 or multiple matches. if (nrow(matched_rows_source) != 1 \|\| nrow(matched_rows_target) != 1) { stop(paste0( "INDRA Exception: Unexpected number of matches for the following HGNC IDs in the input data: ", edge$source_id, " or ", edge$target_id, ". Each ID must match exactly one entry in the input data, but 0 or multiple matches were found. Please check the input data for duplicates or missing entries." ))

github-actions · 2025-01-13T15:49:04Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Validate that the `Protein` column in the input dataframe contains valid data before processing Add a check to ensure that the `Protein` column in the `input` dataframe is not empty or entirely NA before attempting to match rows, as this could lead to runtime errors or incorrect results. R/utils_getSubnetworkFromIndra.R [69-70] +if (all(is.na(input$Protein) \| input$Protein == "")) { + stop("Input data must contain valid entries in the 'Protein' column.") +} matched_rows_source <- input[input$HgncId == edge$source_id & !is.na(input$Protein), ] matched_rows_target <- input[input$HgncId == edge$target_id & !is.na(input$Protein), ] Suggestion importance[1-10]: 9 Why: This suggestion adds a crucial validation step to ensure the `Protein` column contains valid data before proceeding with matching rows. This prevents runtime errors and ensures the function operates on valid input, significantly improving reliability.	9
Possible issue	Improve error handling for non-convertible elements in the input list to avoid unexpected failures Ensure that the `tryCatch` block properly handles cases where `uniprotMnemonicIds` contains non-convertible elements, as the current implementation may not account for all edge cases, such as complex objects or NULL values. R/utils_annotateProteinInfoFromIndra.R [18-34] tryCatch({ # Attempt to convert all elements to character if not already character uniprotMnemonicIds <- lapply(uniprotMnemonicIds, function(x) { if (!is.character(x)) { - as.character(x) + tryCatch(as.character(x), error = function(e) NULL) } else { x } }) # Check if conversion was successful - if (any(!sapply(uniprotMnemonicIds, is.character))) { + if (any(is.null(uniprotMnemonicIds) \| !sapply(uniprotMnemonicIds, is.character))) { stop("All elements in the list must be character strings representing UniProt mnemonic IDs.") } }, error = function(e) { stop("An error occurred converting uniprot mnemonic IDs to character strings: ", e$message) }) Suggestion importance[1-10]: 8 Why: The suggestion enhances the robustness of the `tryCatch` block by handling cases where elements in `uniprotMnemonicIds` cannot be converted to character strings. This improvement addresses potential edge cases and ensures the function behaves predictably, which is critical for input validation.	8
General	Add a test to ensure the function validates the `Protein` column in the input dataframe Add a test case to verify that the function throws an error when the `Protein` column in the input dataframe is entirely NA or empty, ensuring robust validation of input data. tests/testthat/test-getSubnetworkFromIndra.R [27-37] -test_that("Exception is thrown for 400+ proteins in dataframe", { - input_400 <- data.frame( - Protein = paste0("Protein", 1:400), - HgncId = paste0("HGNCID", 1:400), +test_that("Exception is thrown for empty or NA Protein column", { + input_invalid <- data.frame( + Protein = rep(NA, 10), + HgncId = paste0("HGNCID", 1:10), issue = NA, adj.pvalue = 0.05 ) expect_error( - getSubnetworkFromIndra(input_400), - "Invalid Input Error: INDRA query must contain less than 400 proteins. Consider lowering your p-value cutoff" + getSubnetworkFromIndra(input_invalid), + "Input data must contain valid entries in the 'Protein' column." ) }) Suggestion importance[1-10]: 8 Why: Adding a test case to validate the `Protein` column ensures that the function's input validation logic is robust and correctly handles cases where the column is empty or entirely NA. This strengthens the overall reliability of the code.	8
General	Enhance error messages to provide more specific details about problematic rows in the input data Refactor the error message in `.addAdditionalMetadataToIndraEdge` to include more specific details about the problematic rows, such as their indices or content, to aid debugging. R/utils_getSubnetworkFromIndra.R [73-79] stop(paste0( "INDRA Exception: Unexpected number of matches for the following HGNC IDs in the input data: ", edge$source_id, " or ", edge$target_id, - ". Each ID must match exactly one entry in the input data, but 0 or multiple matches were found. Please check the input data for duplicates or missing entries." + ". Each ID must match exactly one entry in the input data, but 0 or multiple matches were found. Problematic rows: ", + paste(which(input$HgncId %in% c(edge$source_id, edge$target_id)), collapse = ", "), + ". Please check the input data for duplicates or missing entries." )) Suggestion importance[1-10]: 7 Why: The suggestion improves the error message by including specific details about problematic rows, aiding debugging and making it easier for users to identify and resolve issues in their input data.	7

codecov-commenter · 2025-01-13T15:49:49Z

Codecov Report

Attention: Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.

Project coverage is 86.69%. Comparing base (4c5c730) to head (6504dc4).

Files with missing lines	Patch %	Lines
R/utils_getSubnetworkFromIndra.R	46.15%	7 Missing ⚠️
R/utils_annotateProteinInfoFromIndra.R	78.57%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel      #28      +/-   ##
==========================================
- Coverage   88.28%   86.69%   -1.59%     
==========================================
  Files           7        7              
  Lines         384      406      +22     
==========================================
+ Hits          339      352      +13     
- Misses         45       54       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tonywu1999 added 2 commits January 13, 2025 10:28

fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for…

9019da8

… an HGNC ID

fix unit test

6504dc4

github-actions bot added the Review effort [1-5]: 3 label Jan 13, 2025

tonywu1999 merged commit 3a08390 into devel Jan 13, 2025
4 checks passed

tonywu1999 deleted the bugs branch January 13, 2025 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for an HGNC ID #28

fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for an HGNC ID #28

tonywu1999 commented Jan 13, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

codecov-commenter commented Jan 13, 2025

fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for an HGNC ID #28

fix(getSubnetworkFromIndra): Fix bugs related to multiple matches for an HGNC ID #28

Conversation

tonywu1999 commented Jan 13, 2025 • edited by github-actions bot Loading

User description

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Jan 13, 2025

PR Reviewer Guide 🔍

github-actions bot commented Jan 13, 2025

PR Code Suggestions ✨

codecov-commenter commented Jan 13, 2025

Codecov Report

tonywu1999 commented Jan 13, 2025 •

edited by github-actions bot

Loading