-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpresentation.Rmd
379 lines (309 loc) · 11.1 KB
/
presentation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
---
output:
ioslides_presentation:
widescreen: true
incremental: true
bibliography: references.bib
link-citations: true
urlcolor: blue
title: Statistical Analysis of the <br> Grit Scale Dataset
subtitle: Submitted to the Department of Statistics <br> of the Athens University of Economics and Business
author: Maximilian Schneider
date: "`r format(Sys.time(), '%d.%m.%Y')`"
---
<style>
.build > * {
-webkit-transition: opacity 0ms ease-in-out;
-webkit-transition-delay: 0ms;
-moz-transition: opacity 0ms ease-in-out 0ms;
-o-transition: opacity 0ms ease-in-out 0ms;
transition: opacity 0ms ease-in-out 0ms;
}
slides > slide.title-slide {
background-image: url('figures/background_title_page_1920x1080.jpg');
background-size: 100% 100%;
}
slides > slide.dark {
background-image: url('figures/background_title_page_1920x1080.jpg') !important;
background-size: 100% 100% !important;
}
slides > slide.title-slide hgroup h1,
slides > slide.title-slide hgroup h2,
slides > slide.title-slide p,
slides > slide,
slides > slide hgroup h2 {
color: black;
}
slides > slide.dark hgroup h2 {
color: black;
font-size: 72px;
letter-spacing: 10;
}
div.footnotes {
position: absolute;
bottom: 0;
margin-bottom: 10px;
width: 80%;
font-size: 0.6em;
}
table {
caption-side: bottom
}
</style>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
$('slide:not(.backdrop):not(.title-slide)').append('<div class=\"footnotes\">');
$('footnote').each(function(index) {
var text = $(this).html();
var fnNum = (index+1).toString();
$(this).html(fnNum.sup());
var footnote = fnNum + '. ' + text + '<br/>';
var oldContent = $(this).parents('slide').children('div.footnotes').html();
var newContent = oldContent + footnote;
$(this).parents('slide').children('div.footnotes').html(newContent);
});
});
</script>
```{r setup, include=FALSE}
# rmarkdown settings
knitr::opts_chunk$set(fig.align = "center", out.width = '80%', echo = FALSE, cache = FALSE,
message = FALSE, warning = FALSE)
# packages
library(data.table)
library(purrr)
library(magrittr)
library(ggplot2); theme_set(theme_bw())
library(mgcv)
# code
source("R/functions.R")
source("R/prepare-data.R")
source("R/application.R")
source("R/plots.R")
```
<!-- TODO
* Figure captions
* Note reference categories
-->
# The Big Five <br> Personality Dimensions
<div class="notes">
> * Scientific discipline: psychology
> * Describing the character of a person
> * Five continuous scales
</div>
## The Big Five personality dimensions<footnote>https://en.wikipedia.org/wiki/Big_Five_personality_traits</footnote>
<div class="notes">
> * Extraversion (outgoing -- solitary / energetic -- reserved)
> * Neuroticism (sensitive -- resilient / nervous -- confident)
> * Agreeableness (friendly -- critical / compassionate -- rational)
> * Conscientiousness (efficient -- extravagant / organized -- careless)
> * Openness to experience (inventive -- consistent / curious -- cautious)
</div>
* Extraversion
* Neuroticism
* Agreeableness
* Conscientiousness
* **Openness to experience**
# The Grid Scale Dataset | <footnote>Duckworth, Angela Lee, and Patrick D Quinn. 2009. “Development and Validation of the Short Grit Scale (GRIT–s).” Journal of Personality Assessment 91 (2): 166–74.</footnote>
<div class="notes">
> * Online questionnaire data
> * Study: "Development and Validation of the Short Grit Scale"
> * Angela Duckworth and Patrick Quinn
> * International Personality Item Pool (IPIP), Demographic data, Technical information, test on vocabulary
</div>
## Openness score
<div class="notes">
> * Likert-type scale, Encoding: 0 (Disagree strongly) to 4 (Agree strongly), NA
> * Questions 2, 4 and 6 other way around -- after transformation all items positively correlated
> * Score: sum of single items divided by four times the number of items answered
> * Possible scores: multiples of $\frac{1}{40}$ (all ten Openness items completed), $\frac{1}{36}$ (nine Openness items completed), $\frac{1}{32}$ ... -- negative skew
</div>
```{r plot-openness}
p2
```
## Eloquence score
<div class="notes">
> * Basis: List of 16 words, 3 unreal
> * "Mark all the words whose definitions you are sure to know."
> * Sum of correct answers
> * Assumption: No NAs
</div>
```{r plot-eloquence}
p1
```
## Other variables
<div class="notes">
> * Scales: treated as nominal
</div>
```{r}
ggtable(dt_bigf, sort(vars_cat)[1:4])
```
## Other variables
<div class="notes">
> * Error in race variable: misc - Indigenous Australian, Native American or White
</div>
```{r}
ggtable(dt_bigf, sort(vars_cat)[5:8])
```
## Other variables
<div class="notes">
> * Black line is the mean
> * Age starting at 13
> * Family size: number of kids of mother -- starting at 1
> * 8 automatically collected technical information: sanity check
</div>
```{r}
cowplot::plot_grid(ggtable(dt_bigf, sort(vars_cat)[9:10]),
cowplot::plot_grid(tmpp3, tmpp4, ncol = 1),
ncol = 2)
```
## Pairwise associations with Openness
<div class="notes">
> * Consider age and eloquence as non-linear
> * Family size as linear
</div>
```{r openness-continuous}
cowplot::plot_grid(p4, p3, p5, nrow = 3)
```
# Linear Regression Model for Openness
<div class="notes">
> * Full enumeration $\rightarrow$ avoid step-wise procedures (i.e., arbitrary choices) and problems with collinearity: only variables which contain unique information will be used
> * AIC: no theory $\rightarrow$ focus on prediction
</div>
## Model selection
<div class="notes">
> * No clear best model, see next slide
</div>
```{r aics}
res[1:5, .(AIC, Probability = scales::percent(probs, 0.01))] %>%
knitr::kable(booktabs = TRUE,
caption = "Top 5 models according to AIC and their probabilities to minimize the (estimated) information loss<footnote>Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference: A practical information-theoretic approach (2nd ed.), Springer-Verlag.</footnote>")
```
## A posteriori inclusion probabilities
<div class="notes">
> * Only urban and married have a probability lower than 50%
> * Model using all covariates with a post incl prob > 50% is the model with the lowest AIC
> * Second best dropped voted, third best dropped family size $\rightarrow$ good models are similar
</div>
```{r post-inclusion-probs}
p6
```
## Model diagnostics and limitations
<div class="notes">
> * Low explanatory power of the model
> * Trend in the residuals due to discrete dependent variable
> * Residuals with negative skew
</div>
```{r}
n <- summary(m.lam.fin)$n
cowplot::plot_grid(
data.table(fitted(m.lam.fin), m.lam.fin$model$openness) %>%
ggplot(aes(V1, V2)) +
geom_abline(intercept = 0, slope = 1, color = "grey", size = 1.3) +
geom_jitter(size = 0.5, width = 0, height = 0.005, alpha = 0.4) +
geom_smooth(se = FALSE, color = "#0075be", size = 1) +
lims(x = c(0, 1.01), y = c(0, 1.01)) +
labs(x = "Fitted values", y = "Openness score"),
data.table(qnorm(seq(1/(2 * n), (2 * n - 1)/(2 * n), 1/n)),
sort(residuals(m.lam.fin, type = "scaled.pearson"))) %>%
ggplot(aes(V1, V2)) +
geom_abline(intercept = 0, slope = 1, color = "grey", size = 1.3) +
lapply(qnorm(c(0.025, 0.05, 0.25, 0.75, 0.95, 0.975)),
function(x) geom_vline(xintercept = x)) +
geom_line() +
geom_point(alpha = 0.1, size = 0.5) +
labs(x = "Theoretical Quantiles", y = "Standardized Residuals"),
data.table(fitted(m.lam.fin),
residuals(m.lam.fin, type = "scaled.pearson") %>% abs() %>% sqrt()) %>%
ggplot(aes(V1, V2)) +
geom_point(alpha = 0.4, size = 0.5) +
geom_smooth(se = FALSE, color = "#0075be", size = 1) +
labs(x = "Fitted values", y = bquote(sqrt("Standardized Residuals"))),
data.table(residuals(m.lam.fin, type = "scaled.pearson")) %>%
ggplot(aes(V1)) +
geom_histogram(aes(y = ..density..), bins = 115, color = "#606161", fill = "#606161") +
geom_function(fun = dnorm, color = "#0075be", size = 1) +
xlim(-4, 4) +
labs(x = "Standardized Residuals", y = "Density"),
nrow = 2)
```
## Model diagnostics and limitations
<div class="notes">
> * Missing data: beyond scope of this course
> * Outlier: n is large
> * Skewness: bootstrap possible
> * Test for normality doesn't make sense: even small departures are significant
</div>
<div style="float: left; width: 50%;">
> * $n_\text{full}$: `r dt_bigf.fix[, .N]` (missing: `r dt_bigf[, .N] - dt_bigf.fix[, .N]` -- `r scales::percent((dt_bigf[, .N] - dt_bigf.fix[, .N]) / dt_bigf[, .N], 0.01)`)
> * $n_\text{final}$: `r n` (missing: `r dt_bigf[, .N] - n` -- `r scales::percent((dt_bigf[, .N] - n) / dt_bigf[, .N], 0.01)`)
> * Effective degrees of freedom: `r sum(m.lam.fin$edf) %>% round(2)`
> * Variance explained: `r scales::percent(summary(m.lam.fin)$dev.expl, 0.01)`
> * Skewed residuals
</div>
<div style="float: right; width: 50%;">
> * Online questionnaire data
> * "Analysis in the void": No context from psychology; why do we want to model Openness like this? \
$\rightarrow$ Difficult to evaluate importance of violation of assumptions
> * Missing covariates possible
</div>
## Statistical significance
<div class="notes">
> * With enough observations everything is statistically significant
> * Not significant: engnat, family size, orientation, voted
</div>
<div style="float: left; width: 50%;">
```{r}
dt_f <- anova(m.lam.fin) %$%
cbind(c(pTerms.table[, "F"], s.table[, "F"]), c(pTerms.pv, s.table[, "p-value"]) * 11) %>%
round(3) %>%
as.data.table(keep.rownames = TRUE)
dt_f[, V2 := ifelse(V2 > 0.5, ">0.5", V2)]
dt_f[, V2 := ifelse(V2 == "0", "<0.001", V2)]
setnames(dt_f, c("Covariate", "F-statistic", "p-value"))
setkey(dt_f, "Covariate")
knitr::kable(dt_f[1:6,], booktabs = TRUE)
```
</div>
<div style="float: right; width: 50%;">
```{r}
knitr::kable(dt_f[7:11], booktabs = TRUE,
caption = "Drop one F-tests with Bonferroni correction (approximations for smooth terms<footnote>Wood, S.N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.). Chapman and Hall/CRC.</footnote>)")
```
</div>
## Coefficients
<div class="notes">
> * Do they make sense? / Are they as expected?
> * Ref Cat: Education: Less than high school, Engnat: no, Gender: Male, Hand: right, Orientation: heterosexual, Voted: no
> * Small effect sizes
> * Big CI: low number of observations
</div>
```{r coefficients1}
p7.1
```
## Coefficients
<div class="notes">
> * Race: misc - Indigenous Australian, Native American or White, Religion: agnostic
</div>
```{r coefficients2}
p7.2
```
## Smooth effects
<div class="notes">
> * Centered around zero - interpret changes
> * Age - concerning: in there search for a thing called "character" something consistent over time is desired (cohort effect?)
> * Eloquence makes sense - is part of the definition of Openness
</div>
```{r smooths}
cowplot::plot_grid(p10.1, p10.2, ncol = 1)
```
# Appendix
## Sensitivity - REML
```{r}
#p8
cowplot::plot_grid(ggcoef(m.lam.fin.sel), ggcoef(m.lam.fin))
```
## Sensitivity - splines
```{r}
cowplot::plot_grid(ggcoef(m.lm.fin, -c(27, 31)), ggcoef(m.lam.fin))
```