-
-
Notifications
You must be signed in to change notification settings - Fork 106
dataset: Create Data Frames that are Easier to Exchange and Reuse #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
This comment has been minimized.
This comment has been minimized.
@antaldaniel I am so sorry, you didn't use the right template... because we broke the right template last time we edited it! I put it back in shape, thanks for helping us catch it. You can find it by opening a new issue (it'll be at the top of the list) or use it from here https://github.com/ropensci/software-review/blob/main/.github/ISSUE_TEMPLATE/A-submit-software-for-review.md So sorry, thanks for your patience. |
Checks for dataset (v0.3.4002)git hash: 7bf85ac7
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseifelse (41), as.character (40), is.null (39), list (20), c (14), vapply (8), lapply (7), names (7), data.frame (6), logical (6), paste (6), paste0 (6), character (5), inherits (5), which (5), contributors (4), date (4), seq_along (4), substr (4), Sys.time (4), for (3), format (3), invisible (3), length (3), t (3), with (3), all (2), attr (2), class (2), drop (2), gsub (2), labels (2), nrow (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), Sys.Date (1), tolower (1), vector (1) datasetget_bibentry (26), creator (11), dataset_title (11), subject (10), publisher (7), rights (7), get_creator (6), identifier (6), description (5), language (5), new_Subject (5), provenance (5), dataset_df (4), get_publisher (4), get_type (4), agent (3), convert_column (3), n_triple (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), datacite (2), default_provenance (2), definition_attribute (2), geolocation (2), get_author (2), get_person_iri (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triples (2), namespace_attribute (2), new_my_tibble (2), prov_author (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), dataset_to_triples (1), defined (1), describe (1), dublincore (1), dublincore_to_triples (1), fix_contributor (1), fix_publisher (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is_doi (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), print.dataset_df (1), set_default_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1) assertthatassert_that (29) graphicstitle (11) utilsperson (5), bibentry (2), citation (1) labelledvar_label (4), to_labelled (1) rlangcaller_env (1), env_is_user_facing (1) statsdf (1), family (1) clicat_line (1) havenlabelled (1) tibblenew_tibble (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
@maelle let me know if this works now :) |
Yes, thank you! |
@ropensci-review-bot assign @maelle as editor |
Assigned! @maelle is now the editor |
Thanks again for your submission! Editor checks:
Editor commentsDocumentationMy main comment before I can proceed to looking for reviewers is that the case of the package could be made better. On the one hand, it'd be interesting to read how dataset compares to other approaches to the same "problem", such as (if I follow correctly)
On the other hand, how would an user take advantage of dataset?
In short, could you exemplify "release" and "re-use" in a vignette or more, as use cases, potentially using as roles the type of users you mention in the submission under "target audience". For instance https://wbdataset.dataobservatory.eu/ is a good example, but it is mentioned in a vignette. A tiny comment: I find "reuse" harder to parse than "re-use" but that might be a personal preference. Installation instructionsI'd recommend documenting the two methods of installation (CRAN and GitHub) in distinct chunks so readers could copy-paste the entire code chunk of interest. Instead of devtools you could recommend using pak. install.packages("pak")
pak::pak("dataobservatory-eu/dataset") Default git branchYou might want to rename the master branch to main as some people can be offended by the word "master", see https://www.tidyverse.org/blog/2021/10/renaming-default-branch/ that includes links to context, and practical advice on renaming the default branch. Contributing guideThe contributing guide does not seem customized. Since you are looking for co-developers, and mentioned one of the articles could be relevant to potential contributors, I'd recommend having some text related to design and wishes for feedback in the contributing guide. The contributing guide mentions "AppVeyor" which is not used any more as far as I can tell. Continuous integration
Project managementFrom the open issues, which ones are meant to be tackled soon? Code styleI'd recommend running styler (on R scripts including tests) to make spacing more consistent. For instance in https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/R/agent.R#L56C1-L57C55 the space before CodeThe code could be simplified so that reviewers might more easily follow the logic.
Code like and (and other similar pipelines) reminds me of Jenny Bryan's advice in her talk Code smells and feels
So instead of the complex logic, you'd define methods. In some files like Since dataset imports rlang, you could use the creators <- ifelse(is.null(dataset_bibentry$author), ":tba", dataset_bibentry$author) would become creators <- dataset_bibentry$author %||% ":tba" There are many other occurrences of the In the Example datasetThe iris dataset is very well-known, but it is also infamous because of its eugenics links. TestsShould the line https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/tests/testthat/test-agent.R#L9 be removed as it is not used in the test? I don't understand why the iris object needs to be duplicated in lines like https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/tests/testthat/test-creator.R#L23
should be expect_type(as_datacite(iris_dataset, "list"), "list") What is https://github.com/dataobservatory-eu/dataset/blob/master/tests/testthat/test-dataset_prov.bak? When using Thank you! Happy to discuss any of the items. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.3.4002)git hash: 7bf85ac7
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseifelse (41), as.character (40), is.null (39), list (20), c (14), vapply (8), lapply (7), names (7), data.frame (6), logical (6), paste (6), paste0 (6), character (5), inherits (5), which (5), contributors (4), date (4), seq_along (4), substr (4), Sys.time (4), for (3), format (3), invisible (3), length (3), t (3), with (3), all (2), attr (2), class (2), drop (2), gsub (2), labels (2), nrow (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), Sys.Date (1), tolower (1), vector (1) datasetget_bibentry (26), creator (11), dataset_title (11), subject (10), publisher (7), rights (7), get_creator (6), identifier (6), description (5), language (5), new_Subject (5), provenance (5), dataset_df (4), get_publisher (4), get_type (4), agent (3), convert_column (3), n_triple (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), datacite (2), default_provenance (2), definition_attribute (2), geolocation (2), get_author (2), get_person_iri (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triples (2), namespace_attribute (2), new_my_tibble (2), prov_author (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), dataset_to_triples (1), defined (1), describe (1), dublincore (1), dublincore_to_triples (1), fix_contributor (1), fix_publisher (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is_doi (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), print.dataset_df (1), set_default_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1) assertthatassert_that (29) graphicstitle (11) utilsperson (5), bibentry (2), citation (1) labelledvar_label (4), to_labelled (1) rlangcaller_env (1), env_is_user_facing (1) statsdf (1), family (1) clicat_line (1) havenlabelled (1) tibblenew_tibble (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Coverage is also something noted in my comments. |
@maelle Thank you. It is unfortunate that you have joined this review after two years, because I find many of your comments very useful. However, I must say that some of them would be unusual to add to a vignette, and I would find an article format more useful, i.e., the your question about frictionless, datapack, datapackage.org, and researchobject, as about 2 years of research went into this package that is usually not a vignette material, other reviewers of my other packages hated extended vignettes. I think that the frictionless package family follows a very different approach, and when I started the development of this package and this review, it did not even seem relevant. Eventually I see that in some use cases both can be useful and a choice could be offered, so I will argue that, but I think that this is more a paper considering statistical exchange formats and their best representation in R. A small question: what it the package coverage you are aiming at? I think that the package already has a very high coverage, and it exceeded the requirements when I started the review. |
@antaldaniel thank you for your answer!
|
Thank you @maelle and I will look in to why the coverage is not updating... However, do you have an explicit coverage target? |
Yes it's 75% https://devguide.ropensci.org/pkg_building.html#testing -- But I'd recommend also looking at the coverage report to find the not covered lines and make a judgement call on how important/risky these lines are (vs how hard to test they'd be) just so you're sure there's nothing dangerous in the remaining 25%. 🙂 The idea really is to cover "key functionality" (phrasing from the dev guide). In my comments I recommend updating the test-coverage workflow, the fix might be as simple as that. |
@maelle Thank you agian for your useful comments and your PRs. I reorganised the issues and created a new milestone. The milestone currently breaks up your review comments to 7 issues, but they could of course be broken down to more. I set myself a deadline of 16 February to resolve them, though it may happen earlier. I will tag you in the issues when they are ready for review and will also add a new comment here. |
@antaldaniel thank you! I am having trouble reading your comment, could you please format quotes as quotes i.e. using the ">" sign in front of the paragraph? |
This is now explained in detail in the new vignette. Repositories like Zenodo apply at most, but usually with limitations, the 5-star FAIR data model, which helps with finding data, but does not make the data readily reusable. We apply more the 8-star model, which means that we put metadata about the contents of the data, not only the file. On Zenodo, you can upload three different datasets about GDP per capita in countries, one in dollars, the other in euros, and yet another on PPP adjusted euros. These datasets are not very reusable, because if a user accidentally joins them, it will result in logical-semantic errors; for example, euro values will be compared or added to dollar values. Or, in natural sciences, measures of units may mismatch. Dataset contain the machine readable definition of each variable, it unit of measure, and other metadata. To my knowledge, apart from rdflib, no other R pakcages help in that. We designed a set of packages that work with rdflib.
We aim for full compliance with the GSIM system, which is partly the StatisticaL Data and Metadata eXchange (so our datasets can be exchanged, enriched, etc. with IMF,Eurostat, French national statitical office data) and DDI (the standard for microdata.)
You use the dataset_df constructor that enriches a data.frame or a tibble object. The print method use this information. But I implemented a plot function, too (see the README)
Release means that you can create datasets that you can send to the EU Open Data Portal, which has far more strict requirements than Zenodo. It is meant for people who want to machine-exchange data with professional scientific or statistical data providers, even on API level.
Done
The Guide is rewritten.
It is removed
All issues tacked and wait for your review.
styler is used now with tidyverse settings.
This issues are resolved. I hope that all such cases are handled.
I prefer to remain as close to base R as possible and later remove some dependencies, possibly even rlang. So I prefer to use less of rlang not more. However, for now I created an internal function for the %||% and there are few examples of its use.
I think that the way rOpenSci handles this issue could improve, because the dataset was never used in eugenics. I find boycotting an innocent author, i.e., Edgar Anderson, unethical. There are better practices in handling problematic provenance descriptions (because in this case it is the provenance description that is associated with eugenics, not the dataset or its author) and scientific solutions about such problems from digital humanities and digital heritage management could be introduced. But this could be the the topic of a peer-reviewed article itself. So, upon your request, I removed the iris dataset from all vignetes and examples. It uses now the Orange dataset.
Solved.
These minor issues are solved and there are many new texts. |
@ropensci-review-bot assign @maurolepore as editor |
Assigned! @maurolepore is now the editor |
Dear @antaldaniel Thanks again for your sharing your work with rOpenSci, for you patience, and for incorporating so many of your suggestions so far. I'm now the handling editor, please direct all your questions to me. This week I'm wrapping up my EiC rotation but starting next week I'll be available to work with you to move this towards approval as smoothly and quickly as you and I can make it. I plan to first digest this thread, then do some preliminary checks, then assign myself as the first reviewer -- which we sometimes do and the editorial board agrees to it in this specific submission. With that deeper dive into the package I'll be more able to share my findings with the editorial board and discuss the next step. Would you be available to respond to my suggestions between May 12-23? Not a hard commitment but just to sense your availability and decide if this is indeed the right time for you. |
Dear @maurolepore , thank you very much. I think that I have an important deadline on 15 May, so I think that whatever you write I will be able to review, and at least partly implement by 23 May, most likely starting only on 16 May. Because the package is already in CRAN, and I fixed a lot of minor bugs and improvements, if it is not against your preferences, I would release the current version (the README makes it very clear that this is an experimental version under review in rOpenSci). The changes that may be bothering those few users who are experimenting with the package are resolved (there are new, and better tested subsetting, printing, etc. methods for the dataframe like and the vector classes -- nothing groundbreaking and conceptual, but if somebody uses the package as it is, it is a significant improvement.) So is it OK for you if I just release on CRAN these new improved version and wait for your feedback till 12 May, then reply (and whatever possible, implement) by 23 May? |
@antaldaniel sure go ahead and release what you have. |
This is to mark the end of my EiC rotation and to hand it over to the next EiC with a short summary.
|
@ropensci-review-bot assign @maurolepore as reviewer |
@maurolepore added to the reviewers list. Review due date is 2025-05-31. Thanks @maurolepore for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@maurolepore: If you haven't done so, please fill this form for us to update our reviewers records. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.3.4022)git hash: 8768983c
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseis.null (45), ifelse (39), c (27), list (19), lapply (18), vapply (16), contributors (9), for (9), seq_along (9), names (8), paste0 (8), if (7), inherits (7), logical (7), mapply (7), substr (7), which (7), attr (6), labels (6), unique (6), unlist (6), do.call (5), format (5), date (4), length (4), Sys.time (4), col (3), data.frame (3), invisible (3), nzchar (3), t (3), vector (3), all (2), as.data.frame (2), as.list (2), drop (2), gsub (2), is.na (2), nchar (2), nrow (2), paste (2), tryCatch (2), any (1), args (1), as.POSIXct (1), cbind (1), class (1), double (1), gregexpr (1), identical (1), ncol (1), NextMethod (1), rbind (1), rep (1), setdiff (1), strsplit (1), sub (1), Sys.Date (1), tolower (1), typeof (1), units (1), with (1) datasetas.character (41), get_bibentry (26), creator (11), identifier (10), subject (10), var_unit (10), namespace_attribute (9), dataset_df (8), dataset_title (8), publisher (7), var_definition (7), description (5), language (5), provenance (5), rights (5), get_author (4), get_creator (4), get_publisher (4), get_type (4), agent (3), as_dataset_df (3), convert_column (3), definition_attribute (3), n_triple (3), new_Subject (3), publication_year (3), var_namespace (3), as_dublincore (2), datacite (2), default_provenance (2), defined (2), dublincore (2), geolocation (2), get_contributor (2), get_person_iri (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triples (2), new_my_tibble (2), prov_author (2), unit_attribute (2), all.identical (1), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_factor (1), as_factor.haven_labelled_defined (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), as.list.haven_labelled_defined (1), as.numeric (1), as.numeric.haven_labelled_defined (1), as.vector.haven_labelled_defined (1), bind_defined_rows (1), c.haven_labelled_defined (1), compare_creators (1), create_iri (1), dataset_to_triples (1), describe (1), dublincore_to_triples (1), fix_contributor (1), fix_publisher (1), format.haven_labelled_defined (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), head.haven_labelled_defined (1), id_to_column (1), is_dataset_df (1), is_doi (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), length.haven_labelled_defined (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), Ops.haven_labelled_defined (1), plot.dataset_df (1), print.dataset_df (1), print.haven_labelled_defined (1), set_default_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), strip_defined (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tail.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1), var_unit.default (1), vec_cast.character.haven_labelled_defined (1) assertthatassert_that (25) utilsperson (5), bibentry (2), citation (1), head (1), tail (1) labelledvar_label (8), to_labelled (1) graphicstitle (7) statsdf (5), end (1), start (1) vctrsvec_data (7) rlangcaller_env (1), env_is_user_facing (1) havenlabelled (1) tibblenew_tibble (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
message | number of times |
---|---|
Avoid 1:nrow(...) expressions, use seq_len. | 1 |
Avoid library() and require() calls in packages | 21 |
Avoid using sapply, consider vapply instead, that's type safe | 1 |
Lines should not be more than 80 characters. This line is 100 characters. | 10 |
Lines should not be more than 80 characters. This line is 101 characters. | 6 |
Lines should not be more than 80 characters. This line is 102 characters. | 6 |
Lines should not be more than 80 characters. This line is 103 characters. | 8 |
Lines should not be more than 80 characters. This line is 104 characters. | 7 |
Lines should not be more than 80 characters. This line is 105 characters. | 7 |
Lines should not be more than 80 characters. This line is 106 characters. | 6 |
Lines should not be more than 80 characters. This line is 107 characters. | 9 |
Lines should not be more than 80 characters. This line is 108 characters. | 3 |
Lines should not be more than 80 characters. This line is 109 characters. | 14 |
Lines should not be more than 80 characters. This line is 110 characters. | 5 |
Lines should not be more than 80 characters. This line is 111 characters. | 1 |
Lines should not be more than 80 characters. This line is 112 characters. | 6 |
Lines should not be more than 80 characters. This line is 113 characters. | 1 |
Lines should not be more than 80 characters. This line is 114 characters. | 8 |
Lines should not be more than 80 characters. This line is 115 characters. | 6 |
Lines should not be more than 80 characters. This line is 117 characters. | 3 |
Lines should not be more than 80 characters. This line is 118 characters. | 5 |
Lines should not be more than 80 characters. This line is 119 characters. | 2 |
Lines should not be more than 80 characters. This line is 120 characters. | 4 |
Lines should not be more than 80 characters. This line is 122 characters. | 1 |
Lines should not be more than 80 characters. This line is 123 characters. | 2 |
Lines should not be more than 80 characters. This line is 125 characters. | 1 |
Lines should not be more than 80 characters. This line is 126 characters. | 3 |
Lines should not be more than 80 characters. This line is 131 characters. | 3 |
Lines should not be more than 80 characters. This line is 133 characters. | 2 |
Lines should not be more than 80 characters. This line is 134 characters. | 1 |
Lines should not be more than 80 characters. This line is 136 characters. | 1 |
Lines should not be more than 80 characters. This line is 140 characters. | 1 |
Lines should not be more than 80 characters. This line is 142 characters. | 1 |
Lines should not be more than 80 characters. This line is 143 characters. | 1 |
Lines should not be more than 80 characters. This line is 148 characters. | 1 |
Lines should not be more than 80 characters. This line is 151 characters. | 1 |
Lines should not be more than 80 characters. This line is 153 characters. | 1 |
Lines should not be more than 80 characters. This line is 157 characters. | 1 |
Lines should not be more than 80 characters. This line is 166 characters. | 1 |
Lines should not be more than 80 characters. This line is 171 characters. | 2 |
Lines should not be more than 80 characters. This line is 174 characters. | 1 |
Lines should not be more than 80 characters. This line is 292 characters. | 3 |
Lines should not be more than 80 characters. This line is 312 characters. | 1 |
Lines should not be more than 80 characters. This line is 81 characters. | 17 |
Lines should not be more than 80 characters. This line is 82 characters. | 15 |
Lines should not be more than 80 characters. This line is 83 characters. | 21 |
Lines should not be more than 80 characters. This line is 84 characters. | 15 |
Lines should not be more than 80 characters. This line is 85 characters. | 8 |
Lines should not be more than 80 characters. This line is 86 characters. | 11 |
Lines should not be more than 80 characters. This line is 87 characters. | 15 |
Lines should not be more than 80 characters. This line is 88 characters. | 15 |
Lines should not be more than 80 characters. This line is 89 characters. | 12 |
Lines should not be more than 80 characters. This line is 90 characters. | 11 |
Lines should not be more than 80 characters. This line is 91 characters. | 10 |
Lines should not be more than 80 characters. This line is 92 characters. | 12 |
Lines should not be more than 80 characters. This line is 93 characters. | 12 |
Lines should not be more than 80 characters. This line is 94 characters. | 8 |
Lines should not be more than 80 characters. This line is 95 characters. | 15 |
Lines should not be more than 80 characters. This line is 96 characters. | 9 |
Lines should not be more than 80 characters. This line is 97 characters. | 2 |
Lines should not be more than 80 characters. This line is 98 characters. | 17 |
Lines should not be more than 80 characters. This line is 99 characters. | 8 |
unexpected 'in' | 1 |
4. Other Checks
Details of other checks (click to open)
✖️ The following 10 function names are duplicated in other packages:
-
as_character
from metan, radiant.data, retroharmonize, sjlabelled
-
as_numeric
from descstat, metan, qdapRegex, radiant.data, retroharmonize, sjlabelled, zenplots
-
describe
from AzureVision, Bolstad2, describer, dlookr, explore, Hmisc, iBreakDown, ingredients, lambda.r, MSbox, onewaytests, prettyR, psych, psych, psyntur, questionr, radiant.data, RCPA3, Rlab, scan, scorecard, sylly, tidycomm
-
description
from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
-
get_bibentry
from eurostat
-
identifier
from Ramble
-
is.defined
from nonmemica
-
language
from sylly, wakefield
-
provenance
from provenance
-
subject
from DGM, emayili, gmailr, sendgridr
Package Versions
package | version |
---|---|
pkgstats | 0.2.0.54 |
pkgcheck | 0.1.2.126 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Dear @antaldaniel Thanks again for your submission. Here's my review. Before you proceed please wait to hear back from me. I need to first discuss with the editorial board. I'll aim to share next steps this week. I see there is a lot of love and effort invested in this package. With such a large investment I anticipate some of my feedback will likely feel uncomfortable. I appreciate your willingness to receive feedback. Please read imagining a friendly tone and the best of the intentions. We're all here helping each other kindly to make research software better. Thanks for sharing your work with rOpenSci. Semantic comment-tagsI tag my comments (ml01, ml02, and so on) to help track them. Please respond in sequence and using
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide Briefly describe any working relationship you have (had) with the package authors.
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 6
Review Comments
I understand "further usage of this dataset as unacceptable" (@mpadge ropensci/dev_guide#868 (comment)):
This is surprising to me because this comment suggests it was removed:
Did you remove it from the default
|
@ropensci-review-bot submit review #681 (comment) time 6 |
Logged review for maurolepore (hours: 6) |
@maurolepore Thank you very much for the extensive comments, and I will start from here, because I was a bit concerned that you perhaps see a wrong version as all examples ran smoothly for me, as well as checks, and
... indeed, you should not meet the iris dataset. The agreement was that dataset will be hidden (after all, it is part of base R, and about 200 unit test use it) but from the examples and vignettes it will be removed. And as far as I see that the iris dataset only appears in the documentation once and in unit tests. I'll write out all your comments as issues and take a serious look, because when I signalled that I think all issues are resolved (at least from the previous two reviews), I certainly did not see the The original guidelines for rOpenSci when I first submitted this package for review was to submit preferrably in a conceptual stage, before the first CRAN release, and that was my intention. So I understand that there are a lot of unpolished solutions present, but the idea was to somehow create a tool that can receive semantically rich datasets into the R ecosystem, and can give it back to such system, mainly those APIs that use SDMX, i.e., the statistical data and metadata exchange standard. |
Thanks @antaldaniel for the clarifications. The editorial board also confirmed that the priority is to remove For the record, we typically do two reviews before the author makes changes but we accept exceptions and we agree this is one, so please do go ahead with your changes as planned. For changes to the documentation it can really help to get an external reviewer. As someone who has written some long pieces of work I learned to appreciate the value of fresh eyes. External reviewers (including AI) can typically reword and summarize text much more easily than the first author. Professional writers do it, and this work seems very professional to me. For my own writing I find this short paper very helpful: The science of scientific writing. It really applies to writing not just science but anything. It shows how to structure text to maximize the chances that the reader will interpret it as the writer intended. Interestingly that structure may sometimes be not the most natural and it's yet the most effective. Funny enough I learned about the main author, George Gopen, from reading R for Data Science:
|
@maurolepore Thank you for your patience, regarding these errors I made a very clumsy mistake, I work on several computers and the one on the main branch was not the version that should have been available to you. I think that there are very certainly no more visible iris references, no failing examples, and where you suggested, I also very extensively improved the tests, now I have ~400 unit tests. I will turn to the revisions of the documentation that you suggested, which is a bit bigger task than I though but I will try my best, and if I already invest into this, I will also consider to try to create a publication. I think that the idea that the R system should be able to integrate more into the semantic web and the infrastructure of statistical exchange has its merits, even if this package is still in an early stage. I will let you know as soon as I am finished with the rewrites. |
@maurolepore Just noting that this issue pops up on our dashboard as being labelled with multiple stages. Can you please rectify that? thanks 👍 |
@antaldaniel: please post your response with Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html |
Uh oh!
There was an error while loading. Please reload this page.
Submitting Author Name: Daniel Antal
Submitting Author Github Handle: @antaldaniel
Repository: https://github.com/dataobservatory-eu/dataset
Version submitted: 0.3.4002
Submission type: Standard
Editor: @maurolepore
Reviewers: @maurolepore
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package works with various semantic interoperability standards, therefore it allows the users to retrieve RDF annotated, rich and platform-independent data and reconstruct it as an R data.frame with rich metadata attributes, or to release interoperable, RDF annotated datasets on linked open data platforms from native R objects.
Production-side statisticans. Scientists who want to update their sources from various data repositories and exchanges. Scientists and research data managers who want to release new scientific or professional datasets that follow modern interoperability standards.
The package aimst to complement the rdflib and the datapsice package.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Not applicable.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted.Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
Code of conduct
The text was updated successfully, but these errors were encountered: