|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Regression with panel data (an aside)\n", |
| 8 | + "\n", |
| 9 | + "In many studies in strategy and OT, we use text analysis as part of econometric models with panel data.\n", |
| 10 | + "Since we do not cover it elsewhere in the curriculum, we will take a small aside to discuss some of these models.\n", |
| 11 | + "\n", |
| 12 | + "**Note:** I'm using Stata here, so none of this content is interactive.\n", |
| 13 | + "\n", |
| 14 | + "This is partially adapted from the Stata `xtreg` docs, because we are covering it very quickly.\n", |
| 15 | + "You can find more detail [here](https://www.stata.com/manuals13/xtxtreg.pdf)." |
| 16 | + ] |
| 17 | + }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "metadata": {}, |
| 21 | + "source": [ |
| 22 | + "# Read data" |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "raw", |
| 27 | + "metadata": {}, |
| 28 | + "source": [ |
| 29 | + ". do panel.do\n", |
| 30 | + "\n", |
| 31 | + ". // Week 7b: Panel regression (an aside)\n", |
| 32 | + ". // Adapted from the Stata docs, so we have\n", |
| 33 | + ". // a dataset that's publicly available.\n", |
| 34 | + ". \n", |
| 35 | + ". use http://www.stata-press.com/data/r13/nlswork\n", |
| 36 | + "(National Longitudinal Survey. Young Women 14-26 years of age in 1968)" |
| 37 | + ] |
| 38 | + }, |
| 39 | + { |
| 40 | + "cell_type": "markdown", |
| 41 | + "metadata": {}, |
| 42 | + "source": [ |
| 43 | + "In Stata, the `use` command reads data, including from URLs." |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "metadata": {}, |
| 49 | + "source": [ |
| 50 | + "# Setting the panel variables\n", |
| 51 | + "\n", |
| 52 | + "To help the model commands understand the panel structure, we use the `xtset` command. \n", |
| 53 | + "Do note that the year variables are not automatically added, so you would need to add `i.year` to have Stata create and use indicators for you.\n", |
| 54 | + "\n", |
| 55 | + "`xtset idcode year`" |
| 56 | + ] |
| 57 | + }, |
| 58 | + { |
| 59 | + "cell_type": "raw", |
| 60 | + "metadata": {}, |
| 61 | + "source": [ |
| 62 | + " panel variable: idcode (unbalanced)\n", |
| 63 | + " time variable: year, 68 to 88, but with gaps\n", |
| 64 | + " delta: 1 unit" |
| 65 | + ] |
| 66 | + }, |
| 67 | + { |
| 68 | + "cell_type": "markdown", |
| 69 | + "metadata": {}, |
| 70 | + "source": [ |
| 71 | + "The output of `xtset` tells us about the panel variables." |
| 72 | + ] |
| 73 | + }, |
| 74 | + { |
| 75 | + "cell_type": "markdown", |
| 76 | + "metadata": {}, |
| 77 | + "source": [ |
| 78 | + "# Using local macros for collecting variable names\n", |
| 79 | + "\n", |
| 80 | + "A good practice with Stata is using a local macro to collect variable names.\n", |
| 81 | + "That way, if we're running multiple models, we can keep them in sync.\n", |
| 82 | + "It's especially helpful when we decide to add a control or other variable, and we want the change to apply to all models." |
| 83 | + ] |
| 84 | + }, |
| 85 | + { |
| 86 | + "cell_type": "markdown", |
| 87 | + "metadata": {}, |
| 88 | + "source": [ |
| 89 | + "```stata\n", |
| 90 | + "local controls ///\n", |
| 91 | + " grade ///\n", |
| 92 | + " age ///\n", |
| 93 | + " ttl_exp ///\n", |
| 94 | + " tenure\n", |
| 95 | + "\n", |
| 96 | + "\n", |
| 97 | + "local ivs ///\n", |
| 98 | + " not_smsa ///\n", |
| 99 | + " south\n", |
| 100 | + "```" |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "markdown", |
| 105 | + "metadata": {}, |
| 106 | + "source": [ |
| 107 | + "Note that we're using Stata's line continuation sentinel, `///`. \n", |
| 108 | + "This allows us to tell Stata that it should ignore the end of the line and process the next one as if there is no line break.\n", |
| 109 | + "\n", |
| 110 | + "There are two forms of practical significance here. \n", |
| 111 | + "First, we can avoid having a command that is one very long line that is hard to read and edit.\n", |
| 112 | + "Second, we can add a line continuation in front of one of these variables, and that one will be skipped, allowing us to easily \"turn off\" a variable in our analyses.\n", |
| 113 | + "\n", |
| 114 | + "**Note:** For some reason, the Stata app does not properly handle line continuations when entered in the command window." |
| 115 | + ] |
| 116 | + }, |
| 117 | + { |
| 118 | + "cell_type": "markdown", |
| 119 | + "metadata": {}, |
| 120 | + "source": [ |
| 121 | + "# Regressions compared" |
| 122 | + ] |
| 123 | + }, |
| 124 | + { |
| 125 | + "cell_type": "raw", |
| 126 | + "metadata": {}, |
| 127 | + "source": [ |
| 128 | + ". \n", |
| 129 | + ". reg ln_wage `controls' `ivs'\n", |
| 130 | + "\n", |
| 131 | + " Source | SS df MS Number of obs = 28,091\n", |
| 132 | + "-------------+---------------------------------- F(6, 28084) = 2626.73\n", |
| 133 | + " Model | 2305.54089 6 384.256816 Prob > F = 0.0000\n", |
| 134 | + " Residual | 4108.32299 28,084 .14628696 R-squared = 0.3595\n", |
| 135 | + "-------------+---------------------------------- Adj R-squared = 0.3593\n", |
| 136 | + " Total | 6413.86388 28,090 .228332641 Root MSE = .38247\n", |
| 137 | + "\n", |
| 138 | + "------------------------------------------------------------------------------\n", |
| 139 | + " ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", |
| 140 | + "-------------+----------------------------------------------------------------\n", |
| 141 | + " grade | .0670419 .0010237 65.49 0.000 .0650355 .0690483\n", |
| 142 | + " age | -.0038303 .0005265 -7.28 0.000 -.0048622 -.0027984\n", |
| 143 | + " ttl_exp | .0287283 .0009252 31.05 0.000 .0269148 .0305417\n", |
| 144 | + " tenure | .0195421 .0008321 23.48 0.000 .017911 .0211731\n", |
| 145 | + " not_smsa | -.1637396 .0051791 -31.62 0.000 -.1738909 -.1535883\n", |
| 146 | + " south | -.1135945 .0047533 -23.90 0.000 -.1229112 -.1042777\n", |
| 147 | + " _cons | .8004553 .0173735 46.07 0.000 .7664024 .8345081\n", |
| 148 | + "------------------------------------------------------------------------------" |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "markdown", |
| 153 | + "metadata": {}, |
| 154 | + "source": [ |
| 155 | + "The model above is simply an OLS model.\n", |
| 156 | + "As we'll see below, some of these parameter estimates are a lot higher than they are when we account for the non-independence.\n", |
| 157 | + "\n", |
| 158 | + "Note the syntax for using the local macros we created earlier: we use the name with a backtick `` ` `` on the left (the key to the left of the number 1 on a US keyboard) and an apostrophe `'` (the key to the right of the semicolon key on a US keyboard)." |
| 159 | + ] |
| 160 | + }, |
| 161 | + { |
| 162 | + "cell_type": "raw", |
| 163 | + "metadata": {}, |
| 164 | + "source": [ |
| 165 | + ". xtreg ln_wage `controls' `ivs', fe\n", |
| 166 | + "note: grade omitted because of collinearity\n", |
| 167 | + "\n", |
| 168 | + "Fixed-effects (within) regression Number of obs = 28,091\n", |
| 169 | + "Group variable: idcode Number of groups = 4,697\n", |
| 170 | + "\n", |
| 171 | + "R-sq: Obs per group:\n", |
| 172 | + " within = 0.1491 min = 1\n", |
| 173 | + " between = 0.3526 avg = 6.0\n", |
| 174 | + " overall = 0.2517 max = 15\n", |
| 175 | + "\n", |
| 176 | + " F(5,23389) = 819.94\n", |
| 177 | + "corr(u_i, Xb) = 0.2348 Prob > F = 0.0000\n", |
| 178 | + "\n", |
| 179 | + "------------------------------------------------------------------------------\n", |
| 180 | + " ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", |
| 181 | + "-------------+----------------------------------------------------------------\n", |
| 182 | + " grade | 0 (omitted)\n", |
| 183 | + " age | -.0026787 .000863 -3.10 0.002 -.0043703 -.0009871\n", |
| 184 | + " ttl_exp | .0287709 .0014474 19.88 0.000 .0259339 .0316079\n", |
| 185 | + " tenure | .0114355 .0009229 12.39 0.000 .0096265 .0132445\n", |
| 186 | + " not_smsa | -.0921689 .0096641 -9.54 0.000 -.1111112 -.0732266\n", |
| 187 | + " south | -.0633396 .0110819 -5.72 0.000 -.0850608 -.0416184\n", |
| 188 | + " _cons | 1.591678 .0186849 85.19 0.000 1.555054 1.628302\n", |
| 189 | + "-------------+----------------------------------------------------------------\n", |
| 190 | + " sigma_u | .36167618\n", |
| 191 | + " sigma_e | .29477563\n", |
| 192 | + " rho | .60086475 (fraction of variance due to u_i)\n", |
| 193 | + "------------------------------------------------------------------------------\n", |
| 194 | + "F test that all u_i=0: F(4696, 23389) = 6.63 Prob > F = 0.0000\n", |
| 195 | + "\n", |
| 196 | + ". estimates store fe" |
| 197 | + ] |
| 198 | + }, |
| 199 | + { |
| 200 | + "cell_type": "markdown", |
| 201 | + "metadata": {}, |
| 202 | + "source": [ |
| 203 | + "This is a fixed effects model.\n", |
| 204 | + "Note that grade does not vary within units, so the model drops it.\n", |
| 205 | + "Also, note that it splits out the within, between, and overall effects for us, and reports some panel stats in the header.\n", |
| 206 | + "\n", |
| 207 | + "It also has an F test that the unit effects are zero, which is rejected in this case.\n", |
| 208 | + "Note that, when using robust standard errors (as we often do), that test is suppressed.\n", |
| 209 | + "\n", |
| 210 | + "The command at the bottom, `estimates store fe` stores the model estimates with the name `fe`.\n", |
| 211 | + "We could have named it anything, but `fe` is descriptive." |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "raw", |
| 216 | + "metadata": {}, |
| 217 | + "source": [ |
| 218 | + ". xtreg ln_wage `controls' `ivs', re\n", |
| 219 | + "\n", |
| 220 | + "Random-effects GLS regression Number of obs = 28,091\n", |
| 221 | + "Group variable: idcode Number of groups = 4,697\n", |
| 222 | + "\n", |
| 223 | + "R-sq: Obs per group:\n", |
| 224 | + " within = 0.1483 min = 1\n", |
| 225 | + " between = 0.4701 avg = 6.0\n", |
| 226 | + " overall = 0.3569 max = 15\n", |
| 227 | + "\n", |
| 228 | + " Wald chi2(6) = 8304.62\n", |
| 229 | + "corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000\n", |
| 230 | + "\n", |
| 231 | + "------------------------------------------------------------------------------\n", |
| 232 | + " ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]\n", |
| 233 | + "-------------+----------------------------------------------------------------\n", |
| 234 | + " grade | .0691836 .0017689 39.11 0.000 .0657166 .0726506\n", |
| 235 | + " age | -.0038386 .0006544 -5.87 0.000 -.0051212 -.0025559\n", |
| 236 | + " ttl_exp | .0301313 .0011215 26.87 0.000 .0279331 .0323294\n", |
| 237 | + " tenure | .0134656 .0008442 15.95 0.000 .011811 .0151202\n", |
| 238 | + " not_smsa | -.128591 .0072246 -17.80 0.000 -.142751 -.114431\n", |
| 239 | + " south | -.0932646 .007231 -12.90 0.000 -.107437 -.0790921\n", |
| 240 | + " _cons | .7544109 .0273445 27.59 0.000 .7008168 .8080051\n", |
| 241 | + "-------------+----------------------------------------------------------------\n", |
| 242 | + " sigma_u | .26027808\n", |
| 243 | + " sigma_e | .29477563\n", |
| 244 | + " rho | .43808743 (fraction of variance due to u_i)\n", |
| 245 | + "------------------------------------------------------------------------------\n", |
| 246 | + "\n", |
| 247 | + ". estimates store re" |
| 248 | + ] |
| 249 | + }, |
| 250 | + { |
| 251 | + "cell_type": "markdown", |
| 252 | + "metadata": {}, |
| 253 | + "source": [ |
| 254 | + "This is a random effects model.\n", |
| 255 | + "Note the differences when we assume no correlation (and the model output reminds us of that fact)." |
| 256 | + ] |
| 257 | + }, |
| 258 | + { |
| 259 | + "cell_type": "markdown", |
| 260 | + "metadata": {}, |
| 261 | + "source": [ |
| 262 | + "# Testing whether the RE model is consistent\n", |
| 263 | + "\n", |
| 264 | + "A Hausman test can test whether the FE and RE estimates are consistent. \n", |
| 265 | + "If they are, we can use use the more efficient RE model.\n", |
| 266 | + "\n", |
| 267 | + "**Note:** Using this test assumes that a fixed-effects model would be appropriate.\n", |
| 268 | + "If you want a time-invariant variable in the regression, it will be dropped be FE.\n", |
| 269 | + "If you want a nearly time-invariant variable, almost all of the variance will be wiped out, but the model will still give you a parameter estimate.\n", |
| 270 | + "Reviewers often ask for this test, and you may need to argue smartly if FE isn't appropriate for your study." |
| 271 | + ] |
| 272 | + }, |
| 273 | + { |
| 274 | + "cell_type": "raw", |
| 275 | + "metadata": {}, |
| 276 | + "source": [ |
| 277 | + ". \n", |
| 278 | + ". hausman fe re\n", |
| 279 | + "\n", |
| 280 | + " ---- Coefficients ----\n", |
| 281 | + " | (b) (B) (b-B) sqrt(diag(V_b-V_B))\n", |
| 282 | + " | fe re Difference S.E.\n", |
| 283 | + "-------------+----------------------------------------------------------------\n", |
| 284 | + " age | -.0026787 -.0038386 .0011599 .0005626\n", |
| 285 | + " ttl_exp | .0287709 .0301313 -.0013603 .000915\n", |
| 286 | + " tenure | .0114355 .0134656 -.0020301 .0003729\n", |
| 287 | + " not_smsa | -.0921689 -.128591 .0364221 .0064187\n", |
| 288 | + " south | -.0633396 -.0932646 .029925 .0083977\n", |
| 289 | + "------------------------------------------------------------------------------\n", |
| 290 | + " b = consistent under Ho and Ha; obtained from xtreg\n", |
| 291 | + " B = inconsistent under Ha, efficient under Ho; obtained from xtreg\n", |
| 292 | + "\n", |
| 293 | + " Test: Ho: difference in coefficients not systematic\n", |
| 294 | + "\n", |
| 295 | + " chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)\n", |
| 296 | + " = 121.50\n", |
| 297 | + " Prob>chi2 = 0.0000" |
| 298 | + ] |
| 299 | + } |
| 300 | + ], |
| 301 | + "metadata": { |
| 302 | + "kernelspec": { |
| 303 | + "display_name": "Python 3", |
| 304 | + "language": "python", |
| 305 | + "name": "python3" |
| 306 | + }, |
| 307 | + "language_info": { |
| 308 | + "codemirror_mode": { |
| 309 | + "name": "ipython", |
| 310 | + "version": 3 |
| 311 | + }, |
| 312 | + "file_extension": ".py", |
| 313 | + "mimetype": "text/x-python", |
| 314 | + "name": "python", |
| 315 | + "nbconvert_exporter": "python", |
| 316 | + "pygments_lexer": "ipython3", |
| 317 | + "version": "3.7.5" |
| 318 | + } |
| 319 | + }, |
| 320 | + "nbformat": 4, |
| 321 | + "nbformat_minor": 4 |
| 322 | +} |
0 commit comments