-
Notifications
You must be signed in to change notification settings - Fork 52
/
Copy pathJ-PAL_Power_built_in_commands.do
406 lines (266 loc) · 22.4 KB
/
J-PAL_Power_built_in_commands.do
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
** Power calculations using built-in Stata commands
*Created by: Sabhya Gupta with input from Jack Cavanagh, Maya Duru, Mike Gibson, Sarah Kopper
*All errors are the author's alone
*To report errors/make suggestions or ask questions contact: Sabhya Gupta - sagupta@povertyactionlab.org
*Last edited: 06/24/2021
********************************************************************************
/* This do-file:
*Computes sample size and effect size for a given power and treatment to control size ratio
*Includes variations with controls, clusters and take-up rate
*Uses the Balsakhi dataset to illustrate how to calculate power
About the data: The Balsakhi program was a remedial education program that was conducted in Indian schools to increase literacy and numeracy skills.
You can learn more about the Balsakhi dataset from the documentation and data here at https://doi.org/10.7910/DVN/UV7ERB
Variables:
- Outcome of interest is in the "normalised total score." This is represented by:
- "pre_totnorm" at baseline
- "post_totnorm" at the endline
- Treatment: bal
- 0=control
- 1=treatment
- Clustering variable (by school): divid
Note: key inputs for calculating power like the mean and the standard deviation at baseline, ICC, etc. are calculated
using the specified dataset, but they can also be specified manually.
About the file:
- This file contains sample code for the following:
0. Housekeeping and load data
1. No covariates
1a. Sample size for a given effect size
1b. MDE for a given sample size
2. With covariates (not applicable for binary data)
2a. Sample size for a specified effect - with covariates
2b. MDE for a given sample size - with covariates
3. Sample size with Partial Take-up
4. Overview of how MDE and sample size change as we add covariates and take-up changes
5. Clustered designs
5a. Compute number of clusters for a given effect size and size of cluster
5b. Compute cluster size given the number of clusters and effect size
5c. Compute effect size for a given cluster size and number of clusters
- With the exception of module 4, each module is self-contained and can be run on its own after running module 0
- To use this file for your own power calculations, change the dataset/variables/values marked as "SPECIFY"
** Binary outcome variable:
This code assumes a continuous outcome. If the outcome variable is a binary variable, use the "power twoproportions" command.
- The baseline mean is the proportion of 1s in the outcome variable
- The effect size is the change in the treatment of the proportion of the outcome variable from the control group
- Standard deviation is a function of the proportion of the outcome variable in the control dataset
- Documentation can be found by typing "help power twoproportions" or "help power twoproportions cluster"
The section on covariates is not applicable to binary outcome variables due to the different model specification for binary variables.
See McConnell and Vera-Hernandez (2015) (https://www.ifs.org.uk/uploads/publications/wps/WP201517_update_Sep15.pdf)
for a discussion of how the power calculations change with covariates when the outcome variable is binary.
*/
****************************************************************************************
***************************** 0. Housekeeping and load data ****************************
****************************************************************************************
cd "" //SPECIFY - working directory
capture log close "power_built_in_commands"
log using "power_built_in_commands", replace
use "baroda_0102_1obs.dta", clear //SPECIFY - the dataset
global outcome "pre_totnorm" //SPECIFY the outcome and treatment variable
global treatment "bal"
****************************************************************************************
*********************************** 1. No covariates ***********************************
****************************************************************************************
* The following code assumes the unit of randomization is the same as the unit of observation
****************************************************************************************
************************* 1a. Sample size for a given effect size **********************
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group (1=equal allocation)
local alpha = 0.05 //SPECIFY - the significance level
sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local sd = `r(sd)'
local baseline = `r(mean)'
local effect = `sd'*0.3 //SPECIFY - the expected effect. Here we specify 0.3 standard deviations, but this should be updated based on what is reasonable for the study
local treat = `baseline' + `effect'
power twomeans `baseline' `treat', power(`power') sd(`sd') nratio(`nratio') table
local effect = round(`effect',0.0001)
local samplesize = r(N)
di as error "The minimum sample size needed is `samplesize' to detect an effect size of `effect' with a probability of `power' if the effect is true and the ratio of units in treatment and control is `nratio'"
* How does the sample size change when standard deviation and the effect size changes?
power twomeans `baseline' `treat', power(`power') sd(0.5(0.1)2) nratio(`nratio') table //SPECIFY sd range
power twomeans `baseline', power(`power') sd(`sd') nratio(`nratio') diff(0.1(0.15)2) table //SPECIFY diff range to indicate the different possible effect sizes
****************************************************************************************
**************************** 1b. MDE for a given sample size ****************************
****************************************************************************************
local power = 0.8
local nratio = 1
local alpha = 0.05
local N = _N //SPECIFY - the total sample size. This is taken from the Balsakhi dataset but can be changed based on the study
quietly sum $outcome if !missing($outcome) //sum the baseline level and record the mean and the standard deviation
local sd = `r(sd)'
local baseline = `r(mean)'
power twomeans `baseline', n(`N') power(`power') sd(`sd') nratio(`nratio') table
local mde= round(`r(delta)',0.0001)
di as error "The MDE is `mde' given a sample size of `N', ratio of units in treatment and control of `nratio', and power `power'"
* How does MDE change when sample size and the ratio of allocation between the two groups changes
power twomeans `baseline', power(`power') sd(`sd') n(10000(2000)20000) nratio(`nratio') table //SPECIFY N range to indicate the different possible sample sizes
power twomeans `baseline', n(`N') power(`power') sd(`sd') nratio(1(-0.2)0.1) table //SPECIFY range of ratios of treatment to sample size
//NRatio = 1 means an equal allocation between treatment and control groups. A decrease in the ratio means that a larger proportion of the sample size is allocated to the control group
****************************************************************************************
********************************* 2. Adding covariates *********************************
****************************************************************************************
/* To see how potential controls affect power, we would ideally have access to a sample data set
(e.g. historical or pilot data). With these data, we would want to:
1. Regress Y_i (the outcome) on X_i (the controls)
2. Use the residual standard deviation of the outcome variable from this regression to evaluate
how much variance is explained by the set of covariates we plan to include
- In practice, this residual SD becomes the new SD we include in our parametric power calculations
With access to historical data, for example, this would involve regressing last year's test scores
on test scores from the year before. Using balsakhi data, this would be as follows.
Note that this section is not applicable for power calculations with a binary outcome variable.
See McConnell and Vera-Hernandez 2015 (https://www.ifs.org.uk/uploads/publications/wps/WP201517_update_Sep15.pdf)
for a discussion of covariates for binary outcomes and accompanying sample code */
****************************************************************************************
************** 2a. Sample size for a given effect size - with covariates **************
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha =0.05 //SPECIFY - the significance level
local covariates "female std sessiond" //SPECIFY the covariates - use baseline values of covariates
local number_covariates: word count `covariates'
regress $outcome `covariates' //SPECIFY outcome and control variables
local res_sd =round(sqrt(`e(rss)'/`e(df_r)'),0.0001) //this is the new standard deviation for the power calculation or the residual sd not explained by the control(s).
//This will be used for power calculation.
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local baseline = `r(mean)'
local sd = `r(sd)'
local effect_cov = `sd'*0.3 //SPECIFY - the expected effect. Here we specify 0.3 standard deviations, but this should be updated based on what is reasonable for the study
local treat = `baseline' + `effect_cov'
power twomeans `baseline' `treat', power(`power') sd(`res_sd') nratio(`nratio') alpha(`alpha') table
local effect_cov = round(`effect_cov',0.0001)
local samplesize_cov = `r(N)'
di as error "The minimum sample size needed is `samplesize_cov' to detect an effect of `effect_cov' with a probability of `power' if the effect is true, the ratio of units in treatment and control is `nratio', and the residual standard deviation is `res_sd' after accounting for covariates: `covariates'"
****************************************************************************************
****************** 2b. MDE for a given sample size - with covariates ******************
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha =0.05 //SPECIFY - the significance level
local N_cov= _N //SPECIFY - the total sample size.
//This is taken from the Balsakhi dataset but can be changed based on the study
local covariates "female std sessiond" //SPECIFY the covariates - use baseline values of covariates
regress $outcome `covariates' //SPECIFY outcome and control variables
local res_sd = round(sqrt(`e(rss)'/`e(df_r)'),0.0001) //this is the new standard deviation for the power calculation or the residual sd not explained by the control(s).
//This will be used for power calculation.
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local baseline = `r(mean)'
power twomeans `baseline', n(`N_cov') power(`power') sd(`res_sd') nratio(`nratio') alpha(`alpha') table
local mde_cov= round(`r(delta)',0.0001)
di as error "The MDE is `mde_cov' given a sample size of `N_cov', ratio of treatment and control group of `nratio', power `power', and the residual standard deviation of `res_sd' after accounting for covariates: `covariates'"
****************************************************************************************
********************* 3. Sample size with partial take-up ****************************
****************************************************************************************
/* When there is inperfect compliance in the treatment or the control group, the expected effect is
reduced by a factor of the effective take-up, where effective take-up = take-up in treatment - take-up in control */
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha = 0.05 //SPECIFY - the significance level
local takeup_treat = 0.9 //SPECIFY - take-up in the treatment
local takeup_control = 0.1 //SPECIFY - take-up in the control
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation with perfect take-up
local sd_tu = `r(sd)'
local baseline = `r(mean)'
local effect= `sd_tu'*0.3 //SPECIFY - the expected effect with perfect take-up. Here we specify 0.3 standard deviations, but this should be updated based on what is reasonable for the study
local tu = `takeup_treat' - `takeup_control' //effective take-up
local effect_tu = `effect'*`tu' //effect size after adjusting for take-up. This will be the effect size you expect to measure with a true effect size of `effect' and a take-up rate of `tu’. effect_tu < effect for imperfect take-up rates.
local treat_tu = `baseline' + `effect_tu' //treatment mean after adjusting for take-up
power twomeans `baseline' `treat_tu', power(`power') sd(`sd_tu') nratio(`nratio') table
local samplesize_tu = `r(N)'
local effect_tu = round(`effect_tu',0.01)
di as error "A minimum sample size of `samplesize_tu' is needed to detect an effect of `effect_tu' (the true effect of `effect’ adjusted for the effective take-up of `tu') with a probability of `power' if the effect is true and the ratio of units in treatment and control is `nratio'"
****************************************************************************************
* 4. Overview of how MDE and sample size change as we add covariates and take-up changes
****************************************************************************************
*Note: This module calls on locals in modules 1-3, so you'll have to run them too
//how sample_size changes when we change the design
matrix input sample_size = (1,0,`effect',`samplesize', `sd'\ 1,`number_covariates',`effect_cov',`samplesize_cov', `res_sd' \ `tu',0,`effect_tu',`samplesize_tu', `sd_tu')
matrix colnames sample_size = take_up_rate number_covariates effect_given_take_up sample_size standard_dev
matrix list sample_size
//how MDE changes when we add more covariates
matrix input mde = (0,`mde',`N',`sd'\ `number_covariates',`mde_cov',`N_cov', `res_sd')
matrix colnames mde = number_covariates MDE N standard_dev
matrix list mde
****************************************************************************************
********************************* 5. Clustered designs *********************************
****************************************************************************************
/* The code presented so far has been for when the unit of randomization is the same
as the unit of observation. The following code is for clustered designs, when there are
multiple units of observation contained in a single unit of randomization (e.g., randomization
is at the school level but outcomes measured at the student level) */
global cluster_var "divid" //SPECIFY - the cluster variable
****************************************************************************************
****** 5a.Compute number of clusters for a given effect size and size of clusters ******
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha = 0.05 //SPECIFY - the significance level
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local sd = `r(sd)'
local baseline = `r(mean)'
local cluster_size_control = 50 //SPECIFY - number of people in each cluster.
//This should be specified by the researcher
local mratio=1 //SPECIFY - the ratio of the cluster size in the treatment and the control
local cluster_size_treatment = `cluster_size_control'*`mratio'
local kratio = 1 //SPECIFY - The ratio of the number of treatment clusters to the number of control clusters
local effect_cluster = `sd'*0.3 //SPECIFY - the expected effect. Here we specify 0.3 standard deviations, but this should be updated based on what is reasonable for the study
//Here the expected change in the treatment group is half of the standard deviation
local treat= `baseline' + `effect_cluster' //define treatment mean
loneway $outcome $cluster_var //The loneway command calculates the one-way ANOVA by a group variable.
//It gives the within-group variation and the between group variation of a variable.
//It also produces the intra-cluster correlation coefficient (ICC)
local rho = `r(rho)'
power twomeans `baseline' `treat', cluster m1(`cluster_size_control') mratio(`mratio') kratio(`kratio') power(`power') sd(`sd') rho(`rho') alpha(`alpha') table
local effect_cluster = round(`effect_cluster',0.0001)
local n_clus_t = `r(K2)'
local n_clus_c = `r(K1)'
di as error "A minimum of `n_clus_c' control clusters and `n_clus_t' treatment clusters is needed to detect an effect of `effect_cluster' with a probability of `power' if the effect is true, given the size of each control cluster as `cluster_size_control' units and ratio of the treatment to control cluster size of `mratio'"
****************************************************************************************
********* 5b. Compute cluster size given the number of clusters and effect size ********
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha = 0.05 //SPECIFY - the significance level
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local sd = `r(sd)'
local baseline = `r(mean)'
bysort $cluster_var: gen control_cluster = _n==1
count if control_cluster & $treatment==0 //count the number of control clusters
local num_clusters_control = `r(N)' //SPECIFY number of clusters in the control group -
//Taken from Balsakhi but can be specified by researcher
local kratio = 1 //SPECIFY - The ratio of the number of treatment clusters to the number of control clusters
local num_clusters_treatment = `num_clusters_control'*`kratio'
local effect_cluster = `sd'*0.3 //SPECIFY - the expected effect. Here we specify 0.3 standard deviations, but this should be updated based on what is reasonable for the study
//Here the expected change in the treatment group is half of the standard deviation
local treat = `baseline' + `effect_cluster' //define treatment mean
loneway $outcome $cluster_var //The loneway command calculates the one-way ANOVA by a group variable.
//It gives the within-group variation and the between group variation of a variable.
//It also produces the intra-cluster correlation coefficient (ICC)
local rho = `r(rho)'
power twomeans `baseline' `treat', cluster k1(`num_clusters_control') kratio(`kratio') power(`power') sd(`sd') rho(`rho')
local clus_size_t = `r(M2)'
local clus_size_c = `r(M1)'
di as error "The minimum size of each cluster should be `clus_size_c' in the control and `clus_size_t' in the treatment to etect an effect of `effect' with a probability of `power' if the effect is true, given `num_clusters_control' clusters in the control and the ratio of the number of treatment and control clusters as `kratio'"
drop control_cluster
****************************************************************************************
******* 5c. Compute effect size for a given cluster size and number of clusters ********
****************************************************************************************
local power = 0.8 //SPECIFY - desired power
local nratio = 1 //SPECIFY - the ratio of experimental group to control group
local alpha =0.05 //SPECIFY - the significance level
quietly sum $outcome if !missing($outcome) //sum the outcome at baseline and record the mean and the standard deviation
local sd = `r(sd)'
local baseline = `r(mean)'
bysort $cluster_var: gen control_cluster = _n==1
count if control_cluster & $treatment==0 //count the number of control clusters
local num_clusters_control=`r(N)' //SPECIFY number of clusters in the control group - Taken from dataset but can be specified by researcher
local kratio = 1 //SPECIFY - The ratio of the number of treatment clusters to the number of control clusters
local cluster_size_control = 50 //SPECIFY - number of people in each cluster. This should be specified by the researcher
local mratio=1 //SPECIFY - the ratio of the cluster size in the treatment and the control
loneway $outcome $cluster_var //The loneway command calculates the one-way ANOVA by a group variable.
//It gives the within-group variation and the between group variation of a variable.
//It also produces the intra-cluster correlation coefficient (ICC)
local rho = `r(rho)'
power twomeans `baseline', cluster k1(`num_clusters_control') kratio(`kratio') mratio(`mratio') m1(`cluster_size_control') power(`power') sd(`sd') rho(`rho') alpha(`alpha') table
local mde_cluster = round(`r(delta)',0.0001)
di as error "The MDE is `mde_cluster' given `num_clusters_control' clusters in the control, ratio of the number of treatment and control clusters as `kratio', `cluster_size_control' units in the control, the ratio of units in each treatment and control cluster of `mratio', and power `power'."
drop control_cluster
cap log close