Skip to content

Commit ba222ba

Browse files
committed
Add occurrences example vignette #25
1 parent c490e53 commit ba222ba

File tree

3 files changed

+129
-0
lines changed

3 files changed

+129
-0
lines changed

_pkgdown.yml

+7
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ template:
88
bootswatch: cerulean
99
development:
1010
mode: release
11+
project:
12+
render: ['*.qmd']
1113
navbar:
1214
structure:
1315
left:
@@ -22,6 +24,11 @@ navbar:
2224
quickstart:
2325
text: Quick start guide
2426
href: articles/quick_start_guide.html
27+
articles:
28+
text: Examples
29+
menu:
30+
- text: Standardise occurrence-based data
31+
href: articles/occurrence-example.html
2532
news:
2633
text: News
2734
href: news/index.html

vignettes/dummy-dataset-sb.xlsx

10.8 KB
Binary file not shown.

vignettes/occurrences-example.Rmd

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: "Standardise an Occurrences dataset"
3+
author: "Dax Kellie & Martin Westgate"
4+
date: '2025-02-04'
5+
output:
6+
rmarkdown::html_vignette
7+
vignette: >
8+
%\VignetteIndexEntry{Standardise an Occurrences dataset}
9+
%\VignetteEngine{knitr::rmarkdown}
10+
%\VignetteEncoding{UTF-8}
11+
---
12+
13+
```{r, include = FALSE}
14+
knitr::opts_chunk$set(
15+
collapse = TRUE,
16+
comment = "#>"
17+
)
18+
```
19+
20+
Data of species observations is referred to as *occurrence* data. In Living Atlases like the Atlas of Living Australia (ALA), this is the default type of data stored.
21+
22+
Using occurrence-based datasets assume that all observations are independent of each other. The benefit of this assumption is that observational data can remain simple in structure - every observation is made at a specific place and time. This simplicity allows all occurrence-based data to be aggregated and used together.
23+
24+
Let's see how to build an occurrence-based dataset using galaxias.
25+
26+
## The dataset
27+
28+
Let's use an small example dataset of bird observations taken from 4 different site locations. This dataset has many different types of data like landscape type and age class. Importantly for standardising to Darwin Core, this dataset contains the scientific name (`species`), coordinate location (`lat` & `lon`) and date of observation (`date`).
29+
30+
```{r}
31+
#| warning: false
32+
#| message: false
33+
library(galaxias)
34+
library(dplyr)
35+
library(readxl)
36+
37+
obs <- read_xlsx("dummy-dataset-sb.xlsx",
38+
sheet = 1) |>
39+
janitor::clean_names()
40+
41+
obs |>
42+
gt::gt() |>
43+
gt::opt_interactive(page_size_default = 5)
44+
```
45+
46+
47+
## Standardise to Darwin Core
48+
49+
To determine what we need to do to standardise our dataset, let's use `suggest_workflow()`. The output tells us we have one matching Darwin Core term in our data already (`sex`), but we are missing all minimum required Darwin Core terms.
50+
51+
```{r}
52+
obs |>
53+
suggest_workflow()
54+
```
55+
56+
The suggested workflow provided tells us which functions we can use to rename, modify or add columns that we're missing. `set_` functions are specialised wrappers around `dplyr::mutate()`, with additional functionality to support Darwin Core.
57+
58+
For simplicity, let's do the easy part first of renaming columns we already have in our dataset to use accepted standard Darwin Core terms. `set_` functions will check to make sure each column is correctly formatted. We'll save our modified dataframe as `obs_dwc`.
59+
60+
```{r}
61+
obs_dwc <- obs |>
62+
set_scientific_name(scientificName = species) |>
63+
set_coordinates(decimalLatitude = lat,
64+
decimalLongitude = lon) |>
65+
set_datetime(eventDate = lubridate::ymd(date)) # specify year-month-day format
66+
```
67+
68+
Running `suggest_workflow()` again will reflect our progress and show us what's left to do. The output tells us that we still need to add several columns to our dataset to meet minimum Darwin Core requirements.
69+
70+
```{r}
71+
obs_dwc |>
72+
suggest_workflow()
73+
```
74+
75+
Here's a rundown of the columns we need to add:
76+
77+
* `occurrenceID`: Unique identifiers of each record. This ensures that we can identify the specific record for any future updates or corrections. We can use `random_id()`, `composite_id()` or `sequential_id()` to add this unique IDs our dataframe.
78+
* `basisOfRecord`: The type of record (e.g. human observation, specimen, machine observation). See a list of acceptable values with `corella::basisOfRecord_values()`.
79+
* `geodeticDatum`: The Coordinate Reference System (CRS) projection of your data (for example, the CRS of Google Maps is "WGS84").
80+
* `coordinateUncertaintyInMeters`: The area of uncertainty around your observation. You may know this value based on your method of data collection, or you can use `with_uncertainty()` to provide a default value based on the method used.
81+
82+
Now let's add these columns using `set_occurrences()` and `set_coordinates()`. We can also add the suggested function `set_individual_traits()` which will automatically identify the matched column name `sex` and check the column's format.
83+
84+
```{r}
85+
obs_dwc <- obs_dwc |>
86+
set_occurrences(
87+
occurrenceID = random_id(), # adds random UUID
88+
basisOfRecord = "humanObservation"
89+
) |>
90+
set_coordinates(
91+
geodeticDatum = "WGS84",
92+
coordinateUncertaintyInMeters = 30
93+
# coordinateUncertaintyInMeters = with_uncertainty(method = "phone")
94+
) |>
95+
set_individual_traits()
96+
```
97+
98+
Running `suggest_workflow()` once more will confirm that our dataset is ready to be used in a Darwin Core Archive!
99+
100+
```{r}
101+
obs_dwc |>
102+
suggest_workflow()
103+
```
104+
105+
To submit our dataset, let's select columns with valid occurrence term names and save this dataframe to the file `occurrences.csv`. Importantly, we will save our csv in a folder called `data-processed`, which galaxias looks for automatically when building a Darwin Core Archive.
106+
107+
```{r}
108+
obs_dwc <- obs_dwc |>
109+
select(any_of(occurrence_terms())) # select any matching terms
110+
111+
obs_dwc |>
112+
gt::gt() |>
113+
gt::opt_interactive(page_size_default = 5)
114+
```
115+
116+
```{r}
117+
#| eval: false
118+
# Save in ./data-processed
119+
write_csv(obs_dwc, file = "./data-processed/occurrences.csv")
120+
```
121+
122+
All done! See the [Quick start guide vignette](quick_start_guide.html) for how to build a Darwin Core Archive.

0 commit comments

Comments
 (0)