Lead developer and maintainer: Simo Goshev
Group: BC Research Services
This new Stata command wraps the user-contributed command hotdeck
offer functionality for hot deck imputation of scales. imputeHD
replaces the dataset in memory with a complete dataset ready for use with
Stata's mi
suite of commands.
To load imputeHD
, include the following line in your do file:
qui do "https://raw.githubusercontent.com/goshevs/imputeHD/master/ado/imputeHD.ado"
syntax scale_stubs [if] [in], Ivar(varlist) Timevar(varname) ///
[ BYvars(varlist) NImputations(integer 5) ///
MCItems(string asis) SCOREtype(string asis) ///
HDoptions(string asis) MERGOptions(string asis) ///
SAVEmidata(string asis) KEEPHDimp ]
takes the following arguments:
argument | description |
scale_stubs | stubs of scales to be imputed (must be unique) |
Ivar | unique cluster/panel identifier (i.e. person, firm, country id) |
Timevar | time/wave/period identifier |
Optional arguments:
argument | description |
BYvars | variables that define the imputation strata (e.g. study arm, level of education, etc.) |
NImputations | number of imputations |
default: 5 |
MCItems | stubs of scales whose items should be mean centered |
SCOREtype | type of score to be computed out of the scale items and then imputed (takes sum or mean ) |
HDoptions | options to be passed to command hotdeck |
MERGOptions | merge options to be passed on to merge upon merging the imputed data with the original data; imputed dataset is master, original dataset is using |
SAVEmidata | path/file/name of file to save the merged imputed data only |
KEEPHDimp | keep imputation files produced by hotdeck |
Format of input data
Input data for imputeHD
should be in long format. In addition, all extraneous items
of the scales in scale_stubs
should be removed from the dataset.
Requesting scale scores
If requesting the use of scale scores, the names of the variables for these scores follow the convention:
For example, if creating a mean score for scale with scale_stub er
, the mean score
variable name will be er_meanScore
Subsetting should be implemented with care
Subsetting using if
and in
should be done at the level of respondent. Otherwise,
observations in the if
or in
set will be imputed by imputeHD
If you are working with sensitive data, please ensure that you point Stata to a secure directory that it can use as a temporary directory. Please, see this reference for instructions on how to do this.
** Stubs of scale items to impute
local myScales "er fnc wb ptsd ss"
** Charasteristics to use as stratifiers in the imputation
local myScrnChr "age_cat_1 education female_n"
*** Generate a subsetting variable for entire record
sort resp_id timepoint
bys resp_id: gen subset = round(runiform()) if _n == 1
bys resp_id: replace subset = subset[1]
imputeHD `myScales' if subset, i(resp_id) t(timepoint) mci(`myScales') score(mean) ///
by(study_arm_1 `myScrnChr') ni(10) hd(seed(12345)) ///