Skip to content

Commit 61737d7

Browse files
committed
updates to build_ functions
- switch to `source` and `destination` argument syntax - use `corella::darwin_core_terms` to add links to schema
1 parent b96d00e commit 61737d7

9 files changed

+96
-82
lines changed

DESCRIPTION

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ Authors@R:
1717
role = c("aut")))
1818
Description: galaxias helps users describe, package and share biodiversity
1919
information using the 'Darwin Core' data standard, which is the format used
20-
and accepted by the Global Biodiversity Information Facility (GBIF) and it's
20+
and accepted by the Global Biodiversity Information Facility (GBIF) and its'
2121
partner nodes. It is functionally similar to `devtools`, but with a focus on
22-
building Darwin Core Archives (DwCA's) rather than R packages.
22+
building 'Darwin Core Archives' rather than R packages.
2323
Depends:
2424
R (>= 4.3.0),
2525
corella

R/build_archive.R

+20-23
Original file line numberDiff line numberDiff line change
@@ -4,54 +4,51 @@
44
#' and metadata. This function assumes that all of these file types have been
55
#' pre-constructed, and can be found inside a single folder, with no additional
66
#' or redundant information. This function is similar to `devtools::build()`,
7-
#' in the sense that it takes a repository and wraps it for publication, without
8-
#' assessing the contents in any meaningful way. It differs from
9-
#' `devtools::build()` in that it builds a Darwin Core Archive, rather than an
10-
#' R package.
7+
#' in the sense that it takes a repository and wraps it for publication, It
8+
#' differs from `devtools::build()` in that it builds a Darwin Core Archive,
9+
#' rather than an R package.
1110
#' @details
1211
#' This function looks for three types of objects in the specified `directory`:
1312
#'
1413
#' * One or more `csv` files such as `occurrences.csv` &/or `events.csv`.
1514
#' These will be manipulated versions of the raw dataset, which have been
16-
#' altered to use Darwin Core terms as column headers. See the `corella`
17-
#' package for details.
18-
#' * A metadata statement, stored in xml using the filename `eml.xml`. The
19-
#' function `use_metadata()` from the `paperbark` package is a good starting
20-
#' point here, followed by `build_metadata()` to save it in xml.
15+
#' altered to use Darwin Core terms as column headers. See
16+
#' [corella::corella-package()] for details.
17+
#' * A metadata statement, stored in `EML` using the filename `eml.xml`. The
18+
#' function [use_metadata()] is a good starting point here, followed by
19+
#' [build_metadata()] once you have populated your metadata statement.
2120
#' * A 'schema' document, also stored in xml, called `meta.xml`. This is
22-
#' usually constructed using `build_schema()`.
21+
#' usually constructed using [build_schema()].
2322
#'
2423
#' You will get an error if these files are not present. The resulting file
2524
#' shares the name of the working directory (with a .zip file extension),
2625
#' and is placed in the parent directory
27-
#' @param x (string) A directory containing all the files to be stored in the
28-
#' archive. Defaults to the `data` folder within the current working directory.
29-
#' @param file (string) A file name to save the resulting zip file.
26+
#' @param source (string) A directory containing all the files to be stored in
27+
#' the archive. Defaults to the `data` folder within the current working
28+
#' directory.
29+
#' @param destination (string) A file name to save the resulting zip file.
3030
#' @return Invisibly returns the location of the built zip file; but typically
3131
#' called for the side-effect of building a 'Darwin Core Archive' (i.e. a zip
3232
#' file).
3333
#' @importFrom zip zip
3434
#' @export
35-
build_archive <- function(x = "data", file) {
36-
x <- get_default_directory(x)
37-
38-
progress_update("Retrieving metadata...")
39-
files_in <- find_data(x)
35+
build_archive <- function(source = "data", destination) {
36+
progress_update("Retrieving data...")
37+
files_in <- get_default_directory(source) |>
38+
find_data()
4039

4140
progress_update("Creating zip folder...")
42-
file_out <- get_default_file(file)
41+
file_out <- get_default_file(destination)
4342

4443
progress_update("Building Darwin Core Archive...")
4544
zip::zip(zipfile = file_out,
4645
files = files_in,
4746
mode = "cherry-pick")
4847

49-
cli::cli_alert_success("Darwin Core Archive successfully built. \nSaved as {.file {file_out}}.")
48+
cli::cli_alert_success("Darwin Core Archive successfully built. \nSaved as `{.file {file_out}}`.")
5049
cli::cli_progress_done()
5150

52-
# invisible(return(file_out)) # might need this to save
53-
54-
51+
invisible(file_out)
5552
}
5653

5754
#' Simple function to specify a zip file if no arg given

R/build_metadata.R

+13-12
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,35 @@
11
#' Create a metadata statement for a Darwin Core Archive
22
#'
33
#' A metadata statement lists the owner of the dataset, how it was collected,
4-
#' and how it may used (i.e. its' licence). This function simply converts
5-
#' metadata stored in a markdown file to xml, and stores it in the folder
6-
#' specified using the `directory` argument.
4+
#' and how it can be used (i.e. its' licence). This function simply reads
5+
#' converts metadata stored in a markdown file, converts it to xml, and saves it
6+
#' in the `destination` file.
77
#'
88
#' This function is a fairly shallow wrapper on top of functionality build
99
#' in the `paperbark` package, particularly `read_md()` and `write_eml()`. You can
1010
#' use that package to gain greater control, or to debug problems, should you
1111
#' wish.
12-
#' @param path Path to a metadata statement stored in markdown format (.md).
13-
#' @param file A file where the result should be saved. Defaults to
12+
#' @param source A metadata file stored in markdown format (`.md`). Defaults
13+
#' to `metadata.md`, which is the same as is created by [use_metdata()]
14+
#' @param destination A file where the result should be saved. Defaults to
1415
#' `data/eml.xml`.
1516
#' @returns Does not return an object to the workspace; called for the side
1617
#' effect of building a file named `meta.xml` in the `data` directory.
1718
#' @importFrom paperbark read_md
1819
#' @importFrom paperbark write_eml
1920
#' @export
20-
build_metadata <- function(x = "data",
21-
file = "./data/eml.xml"){
22-
if(!file.exists(x)){
23-
cli::cli_abort("{.file {x}} doesn't exist in specified location.")
21+
build_metadata <- function(source = "metadata.md",
22+
destination = "./data/eml.xml"){
23+
if(!file.exists(source)){
24+
cli::cli_abort("`{source}` doesn't exist in specified location.")
2425
}
2526
# import file, ensure EML metadata is added, convert to XML
2627
progress_update("Reading file...")
27-
metadata_file <- read_md(x)
28+
metadata_tbl <- read_md(source)
2829

2930
progress_update("Writing file...")
30-
write_eml(built_file, file = file)
31+
write_eml(metadata_tbl, file = destination)
3132

32-
cli::cli_alert_success("Metadata successfully built. Saved as {.file /data/eml.xml}.")
33+
cli::cli_alert_success("Metadata successfully built. Saved as `{destination}`.")
3334
cli::cli_progress_done()
3435
}

R/build_schema.R

+25-16
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,26 @@
44
#' It works by detecting column names on csv files in a specified directory;
55
#' these should all be Darwin Core terms for this function to produce reliable
66
#' results.
7-
#' @param x (string) A directory containing all the files to be stored in the
8-
#' archive. Defaults to the `data` folder within the current working directory.
9-
#' @param file (string) A file name for the resulting schema document.
7+
#' @param source A directory (**not** a file) containing files to be documented
8+
#' in the schema document. Defaults to the `data` folder within the current
9+
#' working directory. Note that files that do not match the Darwin Core naming
10+
#' convention and/or do not end in `.csv` are ignored.
11+
#' @param destination A file name for the resulting schema document. Defaults
12+
#' to `./data/meta.xml` for consistency with the Darwin Core standard.
1013
#' @returns Does not return an object to the workspace; called for the side
1114
#' effect of building a file named `meta.xml` in the specified directory.
1215
#' @importFrom paperbark write_eml
1316
#' @importFrom glue glue
1417
#' @importFrom rlang abort
1518
#' @export
16-
build_schema <- function(x = "data",
17-
file = "./data/meta.xml") {
18-
x <- get_default_directory(x)
19-
20-
files <- detect_dwc_files(x)
21-
fields <- detect_dwc_fields(files)
22-
result <- add_front_matter(fields)
23-
24-
progress_update("Writing file...")
25-
write_eml(result, file = file)
26-
27-
cli::cli_alert_success("Schema successfully built. Saved as {.file /data/meta.xml}.")
19+
build_schema <- function(source = "data",
20+
destination = "./data/meta.xml") {
21+
get_default_directory(source) |>
22+
detect_dwc_files() |>
23+
detect_dwc_fields() |>
24+
add_front_matter() |>
25+
write_eml(file = destination)
26+
cli::cli_alert_success("Schema successfully built. Saved as {destination}.")
2827
cli::cli_progress_done()
2928
}
3029

@@ -195,7 +194,17 @@ create_field_rows <- function(x){
195194
index_list <- as.list(seq_along(field_names))
196195
names(index_list) <- rep("index", n_fields)
197196
# get sequence of urls
198-
term_list <- as.list(glue("http://rs.tdwg.org/dwc/terms/{field_names}"))
197+
dwc_df <- corella::darwin_core_terms
198+
term_list <- map(field_names,
199+
.f = \(a){
200+
term_lookup <- dwc_df$term == a
201+
if(any(term_lookup)){
202+
dwc_df$url[which(term_lookup)[1]]
203+
}else{
204+
"no-dwc-term-found"
205+
}
206+
})
207+
# term_list <- as.list(glue("http://rs.tdwg.org/dwc/terms/{field_names}")) # obsolete
199208
names(term_list) <- rep("term", n_fields)
200209
# combine
201210
tibble(level = 3,

man/build_archive.Rd

+14-14
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/build_metadata.Rd

+8-7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/build_schema.Rd

+7-4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/testthat/test-build.R

+6-3
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,14 @@ test_that("build_ functions work correctly in sequence", {
88

99
# add data
1010
# add events.csv
11-
tibble(eventID = 1, eventDate = "2024-01-01") |>
11+
tibble(eventID = 1,
12+
eventDate = "2024-01-01") |>
1213
write.csv(file = "data/events.csv",
1314
row.names = FALSE)
1415
# add occurrences.csv
15-
tibble(basisOfRecord = "humanObservation", individualCount = 1) |>
16+
tibble(basisOfRecord = "humanObservation",
17+
individualCount = 1,
18+
scientificName = "Litoria peronii") |>
1619
write.csv(file = "data/occurrences.csv",
1720
row.names = FALSE)
1821
# expect_error(build_archive()) # no schema or metadata
@@ -22,7 +25,7 @@ test_that("build_ functions work correctly in sequence", {
2225
build_schema()
2326
expect_true(file.exists("data/meta.xml"))
2427
result <- readLines("data/meta.xml")
25-
expect_equal(length(result), 15) # correct number of entries
28+
expect_equal(length(result), 16) # correct number of entries
2629
expect_true(all(grepl("^\\s*<", result))) # all open with `<`
2730
# NOTE: still has problems with attributes containing `amp` instead of `&`
2831
# expect_error(build_archive()) # no metadata yet

vignettes/quick_start_guide.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Darwin Core may be an unfamiliar format, so it can be useful to 'check' your
135135
data for common issues. We suggest first using `check_archive()`:
136136

137137

138-
Alternatively, you can use the GBIF 'validate' API to check your data (not functional!)
138+
Alternatively, you can use the GBIF 'validate' API to check your data:
139139

140140
```{r, eval=FALSE}
141141
validate_archive()

0 commit comments

Comments
 (0)