You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Move package load, and put data.table comment into ^[]
* Remove mention of compare section (merged into model)
* Reduce number of code lines a little
* Use ref for the table as hoped!
* Add primary and secondary sentence
* Add text about Figure 2.2
* Add clarification on Figure 1.4
* Improvements to Figure 2.1
* Using gt and dplyr here
* Update to use the retrospective data
* Improve writing about histograms, and fix colour typo
* Downplay censoring less
* Rewrite ref:obs-est caption
* Add dplyr to Suggests
* Update vignettes/epidist.Rmd
Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
---------
Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
Copy file name to clipboardexpand all lines: vignettes/epidist.Rmd
+40-38
Original file line number
Diff line number
Diff line change
@@ -30,35 +30,36 @@ knitr::opts_chunk$set(
30
30
)
31
31
```
32
32
33
-
```{r load-requirements}
34
-
library(epidist)
35
-
library(data.table)
36
-
library(purrr)
37
-
library(ggplot2)
38
-
```
39
-
40
-
Many quantities in epidemiology can be thought of as the time between two events, or "delays".
33
+
Many quantities in epidemiology can be thought of as the time between two events or "delays".
41
34
Important examples include:
42
35
43
36
* the incubation period (time between infection and symptom onset),
44
37
* serial interval (time between symptom onset of infectee and symptom onset of infected), and
45
38
* generation interval (time between infection of infectee and infection of infected).
46
39
40
+
We encompass all of these delays as the time between a "primary event" and "secondary event".
47
41
Unfortunately, estimating delays accurately from observational data is usually difficult.
48
42
In our experience, the two main issues are:
49
43
50
44
1. interval censoring, and
51
45
2. right truncation.
52
46
53
47
Don't worry if you've not come across these terms before.
54
-
In Section \@ref(data), we will explain what they mean by simulating data like that we might observe during an ongoing infectious disease outbreak.
55
-
Next, in Section \@ref(fit), we show how `epidist` can be used to estimate delays using a statistical model which properly accounts for these two issues.
56
-
Finally, in Section \@ref(compare), we demonstrate that the fitted delay distribution accurately recovers the underlying truth.
48
+
First, in Section \@ref(data), we explain interval censoring and right truncation by simulating data like that we might observe during an ongoing infectious disease outbreak.
49
+
Then, in Section \@ref(fit), we show how `epidist` can be used to accurately estimate delay distributions by using a statistical model which properly accounts for these two issues.
57
50
58
51
If you would like more technical details, the `epidist` package implements models following best practices as described in @park2024estimating and @charniga2024best.
59
52
60
-
Finally, to run this vignette yourself, you will need the `data.table`, `purrr` and `ggplot2` packages installed.
61
-
Note that to work with outputs from `epidist` you do not need to use `data.table`: any tool of your preference is suitable.
53
+
To run this vignette yourself, as well as the `epidist` package, you will need the `data.table`^[Note that to work with outputs from `epidist` you do not need to use `data.table`: any tool of your preference is suitable!], `purrr`, `ggplot2`, `gt`, and `dplyr` packages installed.
aes(x = ptime, xend = stime, y = case, yend = case), col = "grey"
133
133
) +
134
134
geom_point(aes(x = ptime), col = "#56B4E9") +
135
135
geom_point(aes(x = stime), col = "#009E73") +
@@ -156,7 +156,7 @@ Here we suppose that the interval is daily, meaning that only the date of the pr
156
156
obs_cens <- obs |> observe_process()
157
157
```
158
158
159
-
(ref:cens) Interval censoring of the primary and secondary event times obscures the delay times. While daily censoring is most common, `epidist` supports the primary and secondary events having other delay intervals.
159
+
(ref:cens) Interval censoring of the primary and secondary event times obscures the delay times. A common example of this is when events are reported as daily aggregates. While daily censoring is most common, `epidist` supports the primary and secondary events having other delay intervals.
160
160
161
161
```{r cens, fig.cap="(ref:cens)"}
162
162
ggplot(obs_cens, aes(x = delay, y = delay_daily)) +
@@ -201,29 +201,31 @@ With our censored, truncated, and sampled data, we are now ready to try to recov
201
201
202
202
# Fit the model and compare estimates {#fit}
203
203
204
-
If we had access to the data `obs`, then it would be simple to estimate the delay distribution.
204
+
If we had access to the complete and unaltered `obs`, it would be simple to estimate the delay distribution.
205
205
However, with only access to `obs_cens_trunc_samp`, the delay distribution we observe is biased (Figure \@ref(fig:obs-est)) and we must use a statistical model.
206
206
207
-
(ref:obs-est) The histogram of delays from `obs` matches closely the underlying distribution, whereas those from `obs_cens_trunc_samp`are systematically biased.
207
+
(ref:obs-est) The histogram of delays from the complete, retrospective data `obs_cens` match quite closely with the underlying distribution, whereas those from `obs_cens_trunc_samp`show more significant systematic bias. In this instance the extent of the bias caused by censoring is less than that caused by right truncation. Nonetheless, we always recommend [@charniga2024best; Table 2] adjusting for censoring when it is present.
@@ -254,15 +256,15 @@ The `fit` object is a `brmsfit` object containing MCMC samples from each of the
254
256
Users familiar with Stan and `brms`, can work with `fit` directly.
255
257
Any tool that supports `brms` fitted model objects will be compatible with `fit`.
256
258
259
+
(ref:pars) All of the parameters that are included in the model. Many of these parameters (e.g. `swindow` and `pwindow`) the so called latent variables in the model, and have lengths corresponding to the `sample_size`.
260
+
257
261
```{r pars}
258
262
pars <- fit$fit@par_dims |>
259
263
map(.f = function(x) if (identical(x, integer(0))) return(1) else return(x))
"All of the parameters that are included in the model.
264
-
Many of these parameters are the so called latent variables in the model."
265
-
)
266
+
gt::gt() |>
267
+
gt::tab_caption("(ref:pars)")
266
268
```
267
269
268
270
The `epidist` package also provides functions to make common post-processing tasks easy.
@@ -272,9 +274,10 @@ For example, samples of the fitted lognormal `meanlog` and `sdlog` parameter can
272
274
draws <- extract_lognormal_draws(fit)
273
275
```
274
276
275
-
Figure \@ref(fig:fitted-lognormal) shows...
277
+
Figure \@ref(fig:fitted-lognormal) shows the lognormal delay distribution obtained using the average of the `meanlog` and `sdlog` draws.
278
+
Whereas in Figure \@ref(fig:obs-est) the histogram of censored, truncated, sampled data was substantially different to the underlying delay distribution, using `epidist` we have obtained a much closer match to the truth.
276
279
277
-
(ref:fitted-lognormal) Figure caption.
280
+
(ref:fitted-lognormal) A fitted delay distribution (in pink) as compared with the true delay distribution (in black).
0 commit comments