## Chapter 2 - Simple Regression - Computer Exercises

```-------------------------------------------------------------------------------------
name:  SN
log:  ~Wooldridge\intro-econx\iproblem2.smcl
log type:  smcl
opened on:  27 Jan 2019, 01:15:29

. **********************************************
. * Solomon Negash - Solutions to Computer Exercises
. * Wooldridge (2016). Introductory Econometrics: A Modern Approach. 6th ed.
. * STATA Program, version 15.1.

. * Chapter 2  - The Simple Regression Model
. * Computer Exercises (Problems)
. ******************** SETUP *********************

. *Problem 2.1. Papke1995 (401k)
. use 401K.dta, clear
. d, short
Contains data from 401K.dta
obs:         1,534
vars:             8                          9 Jun 1998 08:20
size:        39,884
Sorted by:

. //a. Average prate & mrate
. mean pra mra
Mean estimation                   Number of obs   =      1,534
--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
prate |   87.36291   .4268091      86.52572     88.2001
mrate |   .7315124   .0199033      .6924718     .770553
--------------------------------------------------------------

. //b&c. Run-regres prate on mrate, interprate intercept & coef.
. reg prate mrate
Source |       SS           df       MS      Number of obs   =     1,534
-------------+----------------------------------   F(1, 1532)      =    123.68
Model |  32001.7271         1  32001.7271   Prob > F        =    0.0000
Residual |  396383.812     1,532   258.73617   R-squared       =    0.0747
Total |  428385.539     1,533  279.442622   Root MSE        =    16.085
------------------------------------------------------------------------------
prate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mrate |   5.861079   .5270107    11.12   0.000      4.82734    6.894818
_cons |   83.07546   .5632844   147.48   0.000     81.97057    84.18035
------------------------------------------------------------------------------

. //d. predict at mrate=3.5
. display _b[_cons] + _b[mrate]*3.5
103.58923

. //e. How much of the variation in prate is explained by mrate? Is it a lot?
. display " R-squared  = " e(r2)
R-squared  = .0747031

. *Problem 2.2. ceosal2.dta
. use ceosal2.dta , clear
. d, short
Contains data from ceosal2.dta
obs:           177
vars:            15                          17 Aug 1999 23:14
size:         6,549
Sorted by:

. //a. Average salary & average tenure
. mean lsalary ceoten comten
Mean estimation                   Number of obs   =        177
--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
lsalary |   6.582848   .0455542      6.492945     6.67275
ceoten |   7.954802    .537489      6.894049    9.015555
comten |   22.50282   .9241289      20.67902    24.32662
--------------------------------------------------------------

. //b. CEO at their first year (ceoten=0)
. count if ceoten==0
5
. sum ceoten
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ceoten |        177    7.954802    7.150826          0         37
. display r(max)
37

. //c. ols lsalary on ceoten, ...
. reg lsalary ceoten
Source |       SS           df       MS      Number of obs   =       177
-------------+----------------------------------   F(1, 175)       =      2.33
Model |  .850907024         1  .850907024   Prob > F        =    0.1284
Residual |   63.795306       175  .364544606   R-squared       =    0.0132
Total |  64.6462131       176  .367308029   Root MSE        =    .60378
------------------------------------------------------------------------------
lsalary |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ceoten |   .0097236   .0063645     1.53   0.128    -.0028374    .0222846
_cons |   6.505498   .0679911    95.68   0.000      6.37131    6.639686
------------------------------------------------------------------------------

. *Problem 2.3. sleep75.dta (Biddle&Hamermesh1990)
. use sleep75.dta , clear
. d sleep totwrk, short
storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------
sleep           int     %9.0g                 mins sleep at night, per wk
totwrk          int     %9.0g                 mins worked per week

. //a. ols sleep on totwrk & report in equation form. Interprate intercept.
. reg sleep totwrk
Source |       SS           df       MS      Number of obs   =       706
-------------+----------------------------------   F(1, 704)       =     81.09
Model |  14381717.2         1  14381717.2   Prob > F        =    0.0000
Residual |   124858119       704  177355.282   R-squared       =    0.1033
Total |   139239836       705  197503.313   Root MSE        =    421.14
------------------------------------------------------------------------------
sleep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
totwrk |  -.1507458   .0167403    -9.00   0.000    -.1836126    -.117879
_cons |   3586.377   38.91243    92.17   0.000     3509.979    3662.775
------------------------------------------------------------------------------

. //b. If totwrk increases by 2 hours, by how much is sleep estimated to fall?
. display _b[totwrk]*2*60
-18.089499

. *Problem 2.4. Wage2: ols salary on iq
. use wage2.dta, clear

. //a. average Salary, average IQ and sample sd of IQ
. sum wage IQ
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
wage |        935    957.9455    404.3608        115       3078
IQ |        935    101.2824    15.05264         50        145

. //b. efect of 15 point increase in IQ on Wage (constant dollar)
. reg wage IQ
Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =     98.55
Model |  14589782.6         1  14589782.6   Prob > F        =    0.0000
Residual |   138126386       933  148045.429   R-squared       =    0.0955
Total |   152716168       934  163507.675   Root MSE        =    384.77
------------------------------------------------------------------------------
wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
IQ |   8.303064   .8363951     9.93   0.000     6.661631    9.944498
_cons |   116.9916   85.64153     1.37   0.172    -51.08078    285.0639
------------------------------------------------------------------------------
. display "wage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2)
wage= 116.992+8.303IQ; N=935,Rsq=0.0955
. display _b[IQ]*15
124.54596

. //c. efect of 15 point increase in IQ on Wage (percentage)
. reg lwage IQ
Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =    102.62
Model |  16.4150939         1  16.4150939   Prob > F        =    0.0000
Residual |  149.241189       933  .159958402   R-squared       =    0.0991
Total |  165.656283       934  .177362188   Root MSE        =    .39995
------------------------------------------------------------------------------
lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
IQ |   .0088072   .0008694    10.13   0.000      .007101    .0105134
_cons |   5.886994   .0890206    66.13   0.000     5.712291    6.061698
------------------------------------------------------------------------------
. display "lwage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2)
lwage= 5.887+0.009IQ; N=935,Rsq=0.0991
. display "0" _b[IQ]*15
0.13210734

. *Problem 2.5. rdchem: r&d on sales
. use rdchem.dta , clear

. //a. Model for elasticity?
. *log(rd)=b0+b1log(sales) ; b1 is parameter elasticity

. //b. Estimate b1?
. reg lrd lsale
Source |       SS           df       MS      Number of obs   =        32
-------------+----------------------------------   F(1, 30)        =    302.72
Model |  84.8395785         1  84.8395785   Prob > F        =    0.0000
Residual |  8.40768588        30  .280256196   R-squared       =    0.9098
Total |  93.2472644        31  3.00797627   Root MSE        =    .52939
------------------------------------------------------------------------------
lrd |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lsales |   1.075731   .0618275    17.40   0.000     .9494619    1.201999
_cons |  -4.104722   .4527678    -9.07   0.000    -5.029398   -3.180047
------------------------------------------------------------------------------

. *Problem 2.6. meap93: math pass rate (math4) & spending per student (expend)
. use meap93.dta, clear
. //a. Diminishing effect
. //b. math10 = b0 + b1log(expend) + u -->
. *(dy/dlnx)*(dlnx/dx)=c% <==> (dy/dlnx)*1/x ; (dy/dlnx)=b1=cx ==> x=b1/c
. //c. ols math10 on lexpend,
. reg math10 lexpend
Source |       SS           df       MS      Number of obs   =       408
-------------+----------------------------------   F(1, 406)       =     12.41
Model |  1329.42517         1  1329.42517   Prob > F        =    0.0005
Residual |  43487.7553       406  107.112698   R-squared       =    0.0297
Total |  44817.1805       407  110.115923   Root MSE        =     10.35
------------------------------------------------------------------------------
math10 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lexpend |   11.16439   3.169011     3.52   0.000     4.934677    17.39411
_cons |   -69.3411   26.53013    -2.61   0.009    -121.4947   -17.18753
------------------------------------------------------------------------------
. display "math10= " %5.3f _b[_cons] "+" %5.3f _b[lexpend] "log(expend); ///
> N=" _N ",Rsq=" %5.4f e(r2)
math10= -69.341+11.164log(expend); N=408,Rsq=0.0297

. //d. How big is the effect? If spending increases by 10%?
. display  _b[lexpend]/10 "%"
1.1164395%

. //e. Why is "math10>100" not much of a worry in this data set?

. *Problem 2.7. charity: gifts and mailings; imported from R (wooldridge package)
. use charity.dta, clear

. //a. & b.
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
gift |      4,268     7.44447    15.06256          0        250
mailsyear |      4,268    2.049555      .66758        .25        3.5
2,561
. display 100*r(N)/4268 "%"
60.004686%

. //c. Regeress gift on mails per year,
Source |       SS           df       MS      Number of obs   =     4,268
-------------+----------------------------------   F(1, 4266)      =     59.65
Model |  13349.7251         1  13349.7251   Prob > F        =    0.0000
Residual |  954750.114     4,266  223.804528   R-squared       =    0.0138
Total |   968099.84     4,267  226.880675   Root MSE        =     14.96
------------------------------------------------------------------------------
gift |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mailsyear |   2.649546   .3430598     7.72   0.000     1.976971    3.322122
_cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------
. display "gift= " %5.3f _b[_cons] "+" %5.3f _b[mails] "mails; N=" _N ",Rsq=" %5.4f e(r2)

. //d. Does the charity make profit if per unit cost of mailing is one guilder?
. display  _b[mails] - 1
1.6495464

. //e. The smallest predicted gift (i.e., mail=0)
. margins, at(mail=0)
Adjusted predictions                            Number of obs     =      4,268
Model VCE    : OLS
Expression   : Linear prediction, predict()
at           : mailsyear       =           0
------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------

. *Problem 2.8.
. clear

. //a.
. set obs 500
number of observations (_N) was 0, now 500
. g x_ = uniform()
. g x = x_ *10
. sum x
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
x |        500    4.872077      2.9848   .0033314   9.980742

. //b.
. g u_ = runiform()
. g u = u_ *6
. sum u
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
u |        500    3.053768    1.768815   .0115764   5.998999

. //c.
. g y = 1 + 2*x + u
. reg y x
Source |       SS           df       MS      Number of obs   =       500
-------------+----------------------------------   F(1, 498)       =   5806.78
Model |  18178.7235         1  18178.7235   Prob > F        =    0.0000
Residual |  1559.04137       498  3.13060515   R-squared       =    0.9210
Total |  19737.7649       499   39.554639   Root MSE        =    1.7694
------------------------------------------------------------------------------
y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x |   2.022163   .0265368    76.20   0.000     1.970025    2.074301
_cons |   3.945788   .1515815    26.03   0.000      3.64797    4.243606
------------------------------------------------------------------------------

. //d.
. qui reg y x
. predict uh, residual
. g xuh=x*uh
. //verify if E(uh)=E(x'uh)=0 ; compare results with E(u)=E(x'u)=0. Discuss.
. sum xuh uh u
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
xuh |        500   -1.27e-08    10.14709  -28.35288   26.61722
uh |        500   -1.04e-09    1.767578  -3.085353   3.014317
u |        500    3.053768    1.768815   .0115764   5.998999

. //e.
. g xu = x * u
. sum xu xuh uh u
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
xu |        500    15.07525    13.89241   .0110519   56.60181
xuh |        500   -1.27e-08    10.14709  -28.35288   26.61722
uh |        500   -1.04e-09    1.767578  -3.085353   3.014317
u |        500    3.053768    1.768815   .0115764   5.998999

. //f. Rerun 2 or 3 times and compare results and conclude!

. *Problem 2.9. CountyMurders only 1996
. use countymurders.dta, clear
. keep if year==1996
(35,152 observations deleted)

. //a. how many counties had zero murders in 1996?
. count if murder==0 //counties with zero murder
1,051
. count if execs>0 //counties with at least one execution
31
. sum execs if murder>0
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
execs |      1,146    .0296684    .1937704          0          3
. display r(max)
3

. //b. ols murder = f (execs); report results the usual way with N & R^2 included
. reg murders execs
Source |       SS           df       MS      Number of obs   =     2,197
-------------+----------------------------------   F(1, 2195)      =    100.77
Model |  152381.693         1  152381.693   Prob > F        =    0.0000
Residual |  3319359.01     2,195  1512.23645   R-squared       =    0.0439
Total |   3471740.7     2,196  1580.93839   Root MSE        =    38.887
------------------------------------------------------------------------------
murders |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
execs |   58.55548   5.833255    10.04   0.000      47.1162    69.99476
_cons |   5.457241    .834838     6.54   0.000     3.820086    7.094396
------------------------------------------------------------------------------
. display "murders= " %5.2f _b[_cons] "+" %5.2f _b[execs] "execs; N= " _N ",Rsq=" %5.4f e(r2)
murders=  5.46+58.56execs; N= 2197,Rsq=0.0439

. //c. Interprate the slope coef.
. //d. The smallest murder that can be predicted using this model is when execution i
> s zero.
. display _b[_cons] + _b[execs]*0
5.4572409
. predict u, residual
. sum u if murder==0 & execs==0
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
u |      1,050   -5.457241           0  -5.457241  -5.457241

. //e. Why OLS is not suitable? Endogeniety issues: Omitted variable, measurment erro
> r, simultaniety.

. *Problem 2.10.
. use catholic.dta, clear

. //a. Sample size, mean & SD of math12 & read12.
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
math12 |      7,430    52.13362    9.459117       29.5      71.37
read12 |      7,430     51.7724    9.407761      29.15      68.09

. //b. Ols math12 on read12.
Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
Model |  335470.113         1  335470.113   Prob > F        =    0.0000
Residual |   329238.93     7,428  44.3240347   R-squared       =    0.5047
Total |  664709.043     7,429  89.4749015   Root MSE        =    6.6576
------------------------------------------------------------------------------
math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read12 |   .7142915   .0082105    87.00   0.000     .6981966    .7303863
_cons |   15.15304    .432036    35.07   0.000     14.30612    15.99995
------------------------------------------------------------------------------
. display "math12= " %5.2f _b[_cons] "+" %5.2f _b[read12] "read12; N= " _N ",Rsq=" %5.4f e(r2)

. //c. *Interprate the intercept.
. //d. Comment on the values of b1 and R^2.
. //e. I would run the reverse regression to refute the comment.
Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
Model |  331837.266         1  331837.266   Prob > F        =    0.0000
Residual |  325673.561     7,428  43.8440443   R-squared       =    0.5047
Total |  657510.828     7,429  88.5059668   Root MSE        =    6.6215
------------------------------------------------------------------------------
read12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
math12 |   .7065563   .0081216    87.00   0.000     .6906358    .7224769
_cons |   14.93706   .4303184    34.71   0.000     14.09352    15.78061
------------------------------------------------------------------------------
. *Spurious correlation or causality?

. log close
name:  SN
log:  ~Wooldridge\intro-econx\iproblem2.smcl
log type:  smcl
closed on:  27 Jan 2019, 01:15:29
-------------------------------------------------------------------------------------
```