-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
240 lines (184 loc) · 6.96 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
output: github_document
always_allow_html: true
---
```{r, include = FALSE}
knitr::opts_chunk$set(
warning = FALSE, message = FALSE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# CohortCharacteristics <a href="https://darwin-eu.github.io/CohortCharacteristics/"><img src="man/figures/logo.png" align="right" height="130"/></a>
[](https://CRAN.R-project.org/package=CohortCharacteristics) [](https://app.codecov.io/github/darwin-eu/CohortCharacteristics?branch=main) [](https://github.com/darwin-eu/CohortCharacteristics/actions) [](https://lifecycle.r-lib.org/articles/stages.html#experimental)
## Package overview
CohortCharacteristics contains functions for summarising characteristics of cohorts of patients identified in an OMOP CDM dataset. Once a cohort table has been created, CohortCharacteristics provides a number of functions to help provide a summary of the characteristics of the individuals within the cohort.
```{r, echo=FALSE}
citation("CohortCharacteristics")
```
## Package installation
You can install the latest version of CohortCharacteristics from CRAN:
```{r, eval=FALSE}
install.packages("CohortCharacteristics")
```
Or install the development version from github:
```{r, eval=FALSE}
install.packages("pak")
pak::pkg_install("darwin-eu/CohortCharacteristics")
```
```{r}
library(CohortCharacteristics)
```
## Content
The package contain three types of functions:
- **summarise**\* type functions. These functions produce <summarised_result> standard output. See [omopgenerics](https://darwin-eu.github.io/omopgenerics/articles/summarised_result.html) for more information on this standardised output format. These functions are the ones that do the work in terms of extracting the necessary data from the cdm and summarising it.
- **table**\* type functions. These functions work with the output of the summarise ones. They will produce a table visualisation created using the [visOmopResults](https://cran.r-project.org/package=visOmopResults) package.
- **plot**\* type functions. These functions work with the output of the summarise ones. They will produce a plot visualisation created using the [visOmopResults](https://cran.r-project.org/package=visOmopResults) package.
## Examples
### Mock data
Although the package provides some simple mock data for testing (`mockCohortCharacteristics()`), for these examples we will use the GiBleed dataset that can be downloaded using the CDMConnector package that will give us some more real results.
```{r}
library(CDMConnector)
library(duckdb)
library(dplyr, warn.conflicts = FALSE)
requireEunomia()
con <- dbConnect(duckdb(), eunomiaDir())
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main")
```
Let's create a simple cohort:
```{r, warning=FALSE}
library(DrugUtilisation)
cdm <- generateIngredientCohortSet(cdm = cdm, name = "my_cohort", ingredient = c("warfarin", "acetaminophen"))
```
### Cohort counts
We can get counts using the function `summariseCohortCount()`:
```{r}
result <- summariseCohortCount(cdm$my_cohort)
result |>
glimpse()
```
You can easily create a table using the associated table function, `tableCohortCount()`:
```{r}
tableCohortCount(result, type = "flextable")
```
We could create a simple plot with `plotCohortCount()`:
```{r}
result |>
filter(variable_name == "Number subjects") |>
plotCohortCount(x = "cohort_name", colour = "cohort_name")
```
All the other function work using the same dynamic, first `summarise`, then `plot`/`table`.
### Cohort attrition
```{r}
result <- summariseCohortAttrition(cdm$my_cohort)
```
```{r}
tableCohortAttrition(result, type = "flextable")
```
```{r, eval = FALSE}
result |>
filter(group_level == "161_acetaminophen") |>
plotCohortAttrition()
```
```{r, echo = FALSE, results='asis'}
fil <- 'man/figures/attrition.svg'
file <- file.path(getwd(), fil)
result |>
filter(group_level == "161_acetaminophen") |>
plotCohortAttrition() |>
DiagrammeRsvg::export_svg() |>
writeLines(con = file)
paste0('<img src="', fil, '" width="100%" />') |>
cat()
```
### Characteristics
```{r}
result <- summariseCharacteristics(cdm$my_cohort)
```
```{r}
tableCharacteristics(result, type = "flextable")
```
```{r}
result |>
filter(variable_name == "Age") |>
plotCharacteristics(plotType = "boxplot", colour = "cohort_name")
```
### Timing between cohorts
```{r}
result <- summariseCohortTiming(cdm$my_cohort)
```
```{r}
tableCohortTiming(result, type = "flextable")
```
```{r}
plotCohortTiming(
result,
uniqueCombinations = TRUE,
facet = "cdm_name",
colour = c("cohort_name_reference", "cohort_name_comparator"),
timeScale = "years"
)
```
```{r}
plotCohortTiming(
result,
plotType = "densityplot",
uniqueCombinations = FALSE,
facet = "cdm_name",
colour = c("cohort_name_comparator"),
timeScale = "years"
)
```
### Overlap between cohort
```{r}
result <- summariseCohortOverlap(cdm$my_cohort)
```
```{r}
tableCohortOverlap(result, type = "flextable")
```
```{r}
plotCohortOverlap(result, uniqueCombinations = TRUE)
```
### Large scale characteristics
```{r}
result <- cdm$my_cohort |>
summariseLargeScaleCharacteristics(
window = list(c(-90, -1), c(0, 0), c(1, 90)),
eventInWindow = "condition_occurrence"
)
```
```{r}
tableLargeScaleCharacteristics(result, type = "flextable")
```
```{r}
result |>
filter(group_level == "161_acetaminophen") |>
plotLargeScaleCharacteristics()
```
```{r}
result |>
filter(group_level == "161_acetaminophen", variable_level != "0 to 0") |>
plotComparedLargeScaleCharacteristics(
reference = c(variable_level = "-90 to -1"), colour = "variable_name"
)
```
### Disconnect
Disconnect from your database using `CDMConnector::cdmDisconnect()` to close the connection or with `mockDisconnect()` to close connection and delete the created mock data:
```{r}
mockDisconnect(cdm)
```
### Recommendations
Although it is technically possible, we do not recommend to pipe table or plot functions with the summarise ones. The main reason is that summarise functions take some time to run, a large scale characterisation in a big cdm object can take a few hours. If we pipe the output to a table/plot function we loose the summarise result object. In fact, some times we would send code around to be ran in others database and what we want to export is the summarised_result objects and not the table or plot which we would like to build after compiling results from different cdm objects.
**Not recommended**:
```{r, eval = FALSE}
cdm$my_cohort |>
summariseCharacteristics() |>
tableCharacteristics()
```
**Recommended**:
```{r, eval = FALSE}
x <- summariseCharacteristics(cdm$my_cohort)
tableCharacteristics(x)
```