generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path10_ceteris-paribus-profiles.Rmd
133 lines (87 loc) · 4.34 KB
/
10_ceteris-paribus-profiles.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Ceteris-paribus Profiles
**Learning objectives:**
- Definition of CP profiles
- CP visualization and methodology
- Examine CP profiles using the Titanic imputed dataset
- Pros and cons
## Introduction
- *Ceteris Paribus* is a Latin phrase meaning "other things held constant" or "all else unchanged".
- *Ceteris paribus* profiles, or CP profiles, show how one variable affects the outcome, holding all other variables fixed, for one observation.
## Intuition
- In essence, a CP profile shows the dependence of the conditional expectation of the dependent variable (response) on the values of the particular explanatory variable.
- Figure 10.1 Panel A presents a 3D visualization, where $x$ is the `age`, $y$ is the response (prediction probability), and $z$ is the `class` factor for the 'titanic_lmr' logistic regression model.
- In the same figure, Panel B illustrates CP profiles for individual variables, age (continuous) and class (categorical).

## Example: Titanic imputed dataset
Load packages
```{r 10-load-pckages}
suppressPackageStartupMessages({
library(DALEX)
library(rms)
library(randomForest)
})
```
Load `archivist` Titanic imputed dataset, logistic regression and randomg forest models, 'Henry' observation.
```{r 10-archivist}
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_lmr <- archivist::aread("pbiecek/models/58b24")
titanic_rf <- archivist::aread("pbiecek/models/4e0fc")
henry <- archivist::aread("pbiecek/models/a6538")
```
Let's build two explainers correpsonding to the logistic regresion and random forest models
```{r 10-explainers, warning=FALSE}
explain_lmr <- DALEX::explain(model = titanic_lmr, data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes", label = "Logistic Regression", verbose = FALSE)
explain_lmr$model_info$type = "classification"
explain_rf <- DALEX::explain(model = titanic_rf, data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes", label = "Random Forest", verbose = FALSE)
```
Create a CP profiles with 'Henry' observation
```{r 10-cp-profiles, warning=FALSE}
cp_titanic_rf <- predict_profile(explainer = explain_rf, new_observation = henry)
cp_titanic_lmr <- predict_profile(explainer = explain_lmr, new_observation = henry)
ggplot2::theme_set(theme_ema())
cpplot_age_rf <- plot(cp_titanic_rf, variables = "age") +
ggtitle("Ceteris Paribus for titanic_rf", "") +
scale_y_continuous("model response", limits = c(0,1))
cpplot_age_lmr <- plot(cp_titanic_lmr, variables = "age") +
ggtitle("Ceteris Paribus for titanic_lmr", "") +
scale_y_continuous("model response", limits = c(0,1))
cpplot_class_rf <- plot(cp_titanic_rf, variables = "class", variable_type = "categorical", categorical_type = "bars") +
ggtitle("Ceteris Paribus for titanic_rf", "")
cpplot_class_lmr <- plot(cp_titanic_lmr, variables = "class", variable_type = "categorical", categorical_type = "bars") +
ggtitle("Ceteris Paribus for titanic_lmr", "")
```
```{r 10-cp-profiles-plot-1}
library(patchwork)
cpplot_age_lmr + cpplot_age_rf
```
Both CP profiles predict the survival probability for passenger Henry (1st class, age = 47, male), where the logistic regression results in 0.43 and the random forest model is 0.246.
```{r 10-cp-profiles-plot-2}
cpplot_class_lmr + cpplot_class_rf
```
Both models agree on the prediction direction, but not on the vector magnitude.
```{r 10-cp-profiles-plot-3}
plot(cp_titanic_rf) +
facet_wrap(~`_vname_`, ncol = 4, scales = "free_x") +
ggtitle("Ceteris Paribus for titanic_rf", "")
```
CP profiles for all continuous variables. What can we infer from the behavior of these variables?
## Pros and cons
Pros
- Easy to understand and communicate visually (1D CP profile)
- It is possible to show profiles for many variables or models in a single plot
Cons
- Presence of correlated explanatory variables may lead to unrealistic and misleadings results
- Not practical in case of a model with hundreds or thousands of variables
## Additional references
[Ceteris Paribus profiles](https://advanced-ds-in-r.netlify.app/posts/2021-03-24-imlglobal/#ceteris-paribus-profiles)
## Meeting Videos {.unnumbered}
### Cohort 1 {.unnumbered}
`r knitr::include_url("https://www.youtube.com/embed/URL")`
<details>
<summary>Meeting chat log</summary>
```
LOG
```
</details>