-
-
Notifications
You must be signed in to change notification settings - Fork 106
pkgmatch: Find R Packages Matching Either Descriptions or Other R Packages #671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
🚀 Editor check started 👋 |
Checks for pkgmatch (v0.4.2)git hash: f12ad732
Package License: MIT + file LICENSE 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baselapply (46), data.frame (31), which (27), names (24), vapply (23), length (19), nrow (17), grep (16), c (15), list (15), paste0 (14), seq_len (12), character (11), gsub (11), as.integer (10), by (10), ncol (10), tryCatch (10), unname (10), url (10), order (9), grepl (7), colnames (6), for (6), integer (6), readRDS (6), unlist (6), version (6), basename (5), colSums (5), format (5), ifelse (5), raw (5), seq (5), seq_along (5), tempdir (5), all (4), as.Date (4), asNamespace (4), do.call (4), is.null (4), strsplit (4), system (4), attr (3), difftime (3), getOption (3), log (3), matrix (3), proc.time (3), read.dcf (3), sqrt (3), table (3), unique (3), as.character (2), cbind (2), floor (2), is.na (2), ls (2), match (2), min (2), nzchar (2), options (2), regmatches (2), rowSums (2), sum (2), units (2), any (1), apply (1), as.matrix (1), class (1), cut (1), drop (1), file (1), gregexpr (1), list.files (1), logical (1), mean (1), new.env (1), parseNamespaceFile (1), paste (1), rank (1), rbind (1), readline (1), regexpr (1), rep (1), sort (1), switch (1), Sys.Date (1), Sys.getenv (1), Sys.time (1), system.file (1), tolower (1), unclass (1), vector (1) pkgmatchbm25_tokens_list (8), get_embeddings (7), not_null_index (7), bm25_idf (6), get_pkg_text (6), get_pkg_code (5), pkgmatch_bm25 (5), cosine_similarity (4), dl_prev_data (4), pkgmatch_embeddings_from_pkgs (4), rm_fns_from_pkg_txt (4), bm25_tokens (3), get_all_fn_descs (3), get_cache_file_name (3), get_embeddings_from_ollama (3), jina_model (3), pkgmatch_bm25_from_idf (3), pkgmatch_load_data (3), pkgmatch_treesitter_fn_tags (3), append_cols (2), attach_ns (2), bm25_idf_internal (2), bm25_tokens_internal (2), bm25_tokens_list_internal (2), days_in_this_month (2), dl_one_tarball (2), extract_tarball (2), get_calls (2), get_calls_in_functions (2), get_embeddings_intern (2), get_fn_defs_namespace (2), get_fn_descs_from_ns (2), get_local_pkg_dep_fns (2), get_local_pkg_deps (2), get_pkg_readme (2), get_pkg_text_internal (2), get_pkg_text_namespace (2), input_is_pkg (2), is_docker_sudo (2), is_windows (2), list_new_cran_updates (2), load_data_internal (2), m_list_remote_files (2), ollama_dl_jina_model (2), opt_is_quiet (2), pkg_fns_from_r_search (2), pkg_fns_from_r_search_internal (2), pkg_is_installed (2), pkg_name_from_path (2), pkgmatch_bm25_fn_calls (2), pkgmatch_bm25_fn_calls_internal (2), pkgmatch_bm25_from_idf_internal (2), pkgmatch_bm25_internal (2), pkgmatch_cache_path (2), pkgmatch_update_cran (2), append_data_to_bm25 (1), append_data_to_embeddings (1), append_data_to_fn_calls (1), apply_col_names (1), attach_base_rcmd_ns (1), attach_local_dep_namespaces (1), attach_this_pkg_namespace (1), convert_paths_to_pkgs (1), desc_template (1), extract_data_from_local_dir (1), fn_names_base (1), fn_names_rcmd (1), get_fn_defs_local (1), get_pkg_exported_fns (1), get_pkg_text_local (1), has_ollama (1), has_ollama_docker (1), has_ollama_local (1), head.pkgmatch (1), input_is_path (1), input_mentions_functions (1), make_cran_version_column (1), modify_by_lm_prop (1), ollama_check (1), ollama_has_jina_model (1), ollama_is_running (1), ollama_models (1), order_output (1), pkg_install_path (1), pkgmatch_browse (1), pkgmatch_cache_update_interval (1), pkgmatch_dl_data (1), pkgmatch_embeddings_from_text (1), pkgmatch_rerank (1), pkgmatch_similar_fns (1), pkgmatch_similar_pkgs (1), pkgmatch_update_data (1), pkgmatch_update_ropensci (1), rcmd_pkgs (1), rcpp_bm25 (1), registry_daily_chunk (1), rename_files_in_r (1), ros_registry (1), similar_pkgs_from_pkg (1), similar_pkgs_from_pkg_internal (1), similarity_embeddings (1), tok_lists_to_idfs (1), tressitter_calls_in_package (1) fspath (19), dir_ls (9), path_temp (7), dir_create (5), path_ext (3), file_exists (1), file_info (1), path_ext_set (1), path_real (1) utilsinstalled.packages (4), lsf.str (4), data (3), packageDescription (3), prompt (3), tar (2), untar (2), browseURL (1), getFromNamespace (1), tail (1), timestamp (1) checkmateassert_character (7), assert_integerish (3), assert_matrix (2), assert_names (2), assert_numeric (2), check_file_exists (2), assert_list (1), assert_logical (1) dplyrleft_join (8), rename (3), mutate (2), last_col (1), n (1), relocate (1), summarise (1) memoisememoise (13) statsdt (5), start (3), end (2), line (2) treesitterquery_captures (3), node_text (2), parser (1), parser_parse (1), tree_root_node (1) httr2req_headers (2), request (2), resp_body_json (2), req_perform (1) pbapplypblapply (5) toolsparse_Rd (2), Rd_db (2), CRAN_package_db (1) brioread_lines (2) gertgit_clone (2) jsonliteread_json (2) rvesthtml_table (1), read_html (1) tibblenew_tibble (2) tokenizerscount_words (1), tokenize_words (1) hmshms (1) piggybackpb_download (1) treesitter.rlanguage (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
11727438106 | docker | skipped | f12ad7 | 23 | 2024-11-07 |
11727438100 | pkgcheck | NA | f12ad7 | 96 | 2024-11-07 |
11727438103 | R-CMD-check | success | f12ad7 | 292 | 2024-11-07 |
11727438110 | test-coverage | success | f12ad7 | 292 | 2024-11-07 |
11727438101 | Update pkgmatch data | NA | f12ad7 | 66 | 2024-11-07 |
3b. goodpractice
results
R CMD check
with rcmdcheck
rcmdcheck found no errors, warnings, or notes
Test coverage with covr
Package coverage: 79.93
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
get_pkg_readme | 17 |
Static code analyses with lintr
lintr found no issues with this package!
Package Versions
package | version |
---|---|
pkgstats | 0.2.0.47 |
pkgcheck | 0.1.2.63 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot assign @MargaretSiple-NOAA as editor |
Assigned! @MargaretSiple-NOAA is now the editor |
@ropensci-review-bot seeking reviewers |
Please add this badge to the README of your package repository: [](https://github.com/ropensci/software-review/issues/671) Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news |
(just a note: I put 'seeking reviewers' before putting in my editor checks but I have not forgotten them. I started them today and will continue tomorrow.) |
Editor checks:
Editor commentsNice package, @mpadge ! This will be useful for anyone working on package development or review. I wrote a few notes on what I saw during the editor check process-- they're longer than I usually write, but I figure these will probably make the reviewers' lives easier so it doesn't hurt to mention them early. A few notes of mine from the editor check process:
|
Thanks @MargaretSiple-NOAA for really useful feedback! The issue linked above has details of changes made in response. I'd say the most important of those for reviewers of this package is that I've added |
@ropensci-review-bot Add @agricolamz as reviewer |
@agricolamz added to the reviewers list. Review due date is 2025-01-02. Thanks @agricolamz for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@agricolamz: If you haven't done so, please fill this form for us to update our reviewers records. |
@ropensci-review-bot set due date for @agricolamz to 2025-01-31 |
Review due date for @agricolamz is now 31-January-2025 |
Hi @mpadge -- I was just revising my editor checks above after your revisions, and everything is looking good except some issues I had with |
Thanks @MargaretSiple-NOAA, I've fixed the issue linked above, so |
Thank for for addressing that , @mpadge ! The tests now run without issues. Two more quick things:
|
@ropensci-review-bot add @Selbosh as reviewer |
@Selbosh added to the reviewers list. Review due date is 2025-02-17. Thanks @Selbosh for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
Thank you @mpadge, I am glad if my feedback can be helpful and hope it didn't come across as sounding too harsh. |
Thanks @MargaretSiple-NOAA. Just to let you know, I plan on waiting for second review before tackling the issues from @Selbosh's review, to ensure both reviews pertain to the package in a similar state. |
Hi, all, sorry but my review will be late for more two days, but I'm planning to finish it on Sunday. |
No problem George-- I'll adjust the due date just for our records but Sunday is fine. |
@ropensci-review-bot set due date for @agricolamz to 2025-02-23 |
Review due date for @agricolamz is now 23-February-2025 |
Package ReviewDear all, I am really sorry for the delay on my part.
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 9
Review CommentsI was avoiding reading the review by @Selbosh, and now I see that some of my comments are the same. Thank you very much for the package. It was really a pleasure using it. I think that it is really promising. Major comments
First, it seems that the package quietly downloads something, and I don’t know what it is or where it is stored (now I know that it is in After I installed the
I don't think that 273 Mb will cause any damage, but I expect that most users will disagree that silence implies consent.
First I decided to check some popular packages
As you can see from the plots, my expectations were not met. That was striking for me, since I based my prompts on the documentation phrasing. It is possible that my function call meets this 8,096 token limit, so the model took the latest part of the package, omitting the very first entries. But this is just a hypothesis. I must admit: the suggested packages sometimes are really close to the prompt query. If you want to fix this, I would suggest merging information from LLM and BM25 with information about the number of downloads or any other measure of the package's popularity. I also test the As you see, the package correctly identifies the code, but for some reason, it struggles with the text. The
Smaller comments/suggestions
|
Thanks @agricolamz for really putting the package through it's paces, and detailing your insights in response. That's exactly what I was hoping review would achieve. I'll address all of your concerns as soon as possible. In the meantime ... @MargaretSiple-NOAA @Selbosh @agricolamz Please note that I'll only have a chance to start my responses in 2nd week of March, starting from the 10th. Thanks 😃 |
Dear @mpadge, it is my pleasure! I hope my comments will help to improve the package! |
Thank you @agricolamz and @Selbosh for two great and thorough reviews. @mpadge I'll check back in after the 10th. |
@MargaretSiple-NOAA @Selbosh @agricolamz I've started updating the package in response to the really helpful reviews. You can see several issues listed above. I hope to finish them within the next couple of weeks, and will then write a full response here. Thank you all 👍 😄 |
Deal all I'm wrapping my EiC rotation and before I "go" I'd like to thank you very much for your work! |
Thank YOU @maurolepore for keeping us all on track! @mpadge , FYI, I am going into be at sea for field work May 23-July 11 this year so if you are still working on it for a while longer, we may have a period of inactivity during which we could put it on hold or [some other option that I don't know yet]. |
As a field linguist myself, I wish you good luck! |
Thanks both. In that case I'll endeavour to get my responses back before you head off. I'll aim to do that quickly enough to finish this before May 23rd. If not, we'll maybe consider [some other option that I don't know yet] 😄 ... but hopefully not 🚀 |
@MargaretSiple-NOAA Update: I'm not quite going to get my responses done by May 23rd. If I may, please let me know what you think of this as a plan:
Let me know whether that sounds good, and David and George, hopefully you'll both have a small amount of time in June to have a look over my changes. Thanks all! |
Hi all, I don't think that I will be available through the June, but I will try to react in my spare time. Good luck in a field @MargaretSiple-NOAA! |
Uh oh!
There was an error while loading. Please reload this page.
Submitting Author Name: Mark Padgham
Due date for @Selbosh: 2025-02-21Submitting Author Github Handle: @mpadge
Repository: https://github.com/ropensci-review-tools/pkgmatch
Version submitted: 0.4.2
Submission type: Standard
Editor: @MargaretSiple-NOAA
Reviewers: @agricolamz, @Selbosh
Due date for @agricolamz: 2025-02-23
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Data retrieval, because the package includes code to generate language model (LM) embeddings from all R packages retrieved from both CRAN and rOpenSci package repositories. Wrapper because LM embeddings are generated by wrapping interface to ollama software. Plus I've inserted a new, one-off category of "rOpenSci tools" for internal, staff-curated packages.
Beyond internal rOpenSci use, target audiences are (1) entirely general audience of those interested in searching R packages using either text or code input, and (2) package developers, who can use this package to identify similar packages or functions to code they might be working on.
No, not at all. There are to my knowledge two other R packages for interfacing with LMs: tidyllm and elmer. Both of these are general interfaces to LM API endpoints, while this package specifically uses LM outputs to identify best-matching packages.
Not applicable.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: