Chapter 3. Multiple Regression Analysisl#
import statsmodels.api as sm
import statsmodels.formula.api as smf
from wooldridge import *
Example3.1 Determinants of College GPA#
dataWoo()
J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
Cengage Learning, 6th edition.
401k 401ksubs admnrev affairs airfare
alcohol apple approval athlet1 athlet2
attend audit barium beauty benefits
beveridge big9salary bwght bwght2 campus
card catholic cement census2000 ceosal1
ceosal2 charity consump corn countymurders
cps78_85 cps91 crime1 crime2 crime3
crime4 discrim driving earns econmath
elem94_95 engin expendshares ezanders ezunem
fair fertil1 fertil2 fertil3 fish
fringe gpa1 gpa2 gpa3 happiness
hprice1 hprice2 hprice3 hseinv htv
infmrt injury intdef intqrt inven
jtrain jtrain2 jtrain3 kielmc lawsch85
loanapp lowbrth mathpnl meap00_01 meap01
meap93 meapsingle minwage mlb1 mroz
murder nbasal nyse okun openness
pension phillips pntsprd prison prminwge
rdchem rdtelec recid rental return
saving sleep75 slp75_81 smoke traffic1
traffic2 twoyear volat vote1 vote2
voucher wage1 wage2 wagepan wageprc
wine
df = dataWoo('gpa1')
dataWoo('gpa1', description=True)
name of dataset: gpa1
no of variables: 29
no of observations: 141
+----------+--------------------------------+
| variable | label |
+----------+--------------------------------+
| age | in years |
| soph | =1 if sophomore |
| junior | =1 if junior |
| senior | =1 if senior |
| senior5 | =1 if fifth year senior |
| male | =1 if male |
| campus | =1 if live on campus |
| business | =1 if business major |
| engineer | =1 if engineering major |
| colGPA | MSU GPA |
| hsGPA | high school GPA |
| ACT | 'achievement' score |
| job19 | =1 if job <= 19 hours |
| job20 | =1 if job >= 20 hours |
| drive | =1 if drive to campus |
| bike | =1 if bicycle to campus |
| walk | =1 if walk to campus |
| voluntr | =1 if do volunteer work |
| PC | =1 of pers computer at sch |
| greek | =1 if fraternity or sorority |
| car | =1 if own car |
| siblings | =1 if have siblings |
| bgfriend | =1 if boy- or girlfriend |
| clubs | =1 if belong to MSU club |
| skipped | avg lectures missed per week |
| alcohol | avg # days per week drink alc. |
| gradMI | =1 if Michigan high school |
| fathcoll | =1 if father college grad |
| mothcoll | =1 if mother college grad |
+----------+--------------------------------+
Christopher Lemmon, a former MSU undergraduate, collected these data
from a survey he took of MSU students in Fall 1994.
gpa_multiple = smf.ols(formula='colGPA ~ hsGPA + ACT + 1', data=df).fit()
print(gpa_multiple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: colGPA R-squared: 0.176
Model: OLS Adj. R-squared: 0.164
Method: Least Squares F-statistic: 14.78
Date: Mon, 11 Dec 2023 Prob (F-statistic): 1.53e-06
Time: 18:36:22 Log-Likelihood: -46.573
No. Observations: 141 AIC: 99.15
Df Residuals: 138 BIC: 108.0
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.2863 0.341 3.774 0.000 0.612 1.960
hsGPA 0.4535 0.096 4.733 0.000 0.264 0.643
ACT 0.0094 0.011 0.875 0.383 -0.012 0.031
==============================================================================
Omnibus: 3.056 Durbin-Watson: 1.885
Prob(Omnibus): 0.217 Jarque-Bera (JB): 2.469
Skew: 0.199 Prob(JB): 0.291
Kurtosis: 2.488 Cond. No. 298.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
gpa_simple = smf.ols(formula='colGPA ~ACT +1', data=df).fit()
print(gpa_simple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: colGPA R-squared: 0.043
Model: OLS Adj. R-squared: 0.036
Method: Least Squares F-statistic: 6.207
Date: Mon, 11 Dec 2023 Prob (F-statistic): 0.0139
Time: 18:36:22 Log-Likelihood: -57.177
No. Observations: 141 AIC: 118.4
Df Residuals: 139 BIC: 124.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.4030 0.264 9.095 0.000 1.881 2.925
ACT 0.0271 0.011 2.491 0.014 0.006 0.049
==============================================================================
Omnibus: 3.174 Durbin-Watson: 1.909
Prob(Omnibus): 0.205 Jarque-Bera (JB): 2.774
Skew: 0.248 Prob(JB): 0.250
Kurtosis: 2.525 Cond. No. 209.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
from statsmodels.iolib.summary2 import summary_col
print(summary_col([gpa_multiple,gpa_simple],stars=True,float_format='%0.2f',
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)}))
=================================
colGPA I colGPA II
---------------------------------
ACT 0.01 0.03**
(0.01) (0.01)
Intercept 1.29*** 2.40***
(0.34) (0.26)
R-squared 0.18 0.04
R-squared Adj. 0.16 0.04
hsGPA 0.45***
(0.10)
N 141 141
R2 0.18 0.04
=================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
Example 3.2. Wage equation#
df = dataWoo('wage1')
wage_multiple = smf.ols(formula='lwage ~ educ + exper + tenure + 1', data=df).fit()
print(wage_multiple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: lwage R-squared: 0.316
Model: OLS Adj. R-squared: 0.312
Method: Least Squares F-statistic: 80.39
Date: Mon, 11 Dec 2023 Prob (F-statistic): 9.13e-43
Time: 18:36:22 Log-Likelihood: -313.55
No. Observations: 526 AIC: 635.1
Df Residuals: 522 BIC: 652.2
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.2844 0.104 2.729 0.007 0.080 0.489
educ 0.0920 0.007 12.555 0.000 0.078 0.106
exper 0.0041 0.002 2.391 0.017 0.001 0.008
tenure 0.0221 0.003 7.133 0.000 0.016 0.028
==============================================================================
Omnibus: 11.534 Durbin-Watson: 1.769
Prob(Omnibus): 0.003 Jarque-Bera (JB): 20.941
Skew: 0.021 Prob(JB): 2.84e-05
Kurtosis: 3.977 Cond. No. 135.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Example 3.3. Participation in 401(k) pension plans#
df = dataWoo('401k')
pension_multiple = smf.ols(formula='prate ~ mrate + age + 1', data=df).fit()
print(pension_multiple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: prate R-squared: 0.092
Model: OLS Adj. R-squared: 0.091
Method: Least Squares F-statistic: 77.79
Date: Mon, 11 Dec 2023 Prob (F-statistic): 6.67e-33
Time: 18:36:22 Log-Likelihood: -6422.3
No. Observations: 1534 AIC: 1.285e+04
Df Residuals: 1531 BIC: 1.287e+04
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 80.1190 0.779 102.846 0.000 78.591 81.647
mrate 5.5213 0.526 10.499 0.000 4.490 6.553
age 0.2431 0.045 5.440 0.000 0.155 0.331
==============================================================================
Omnibus: 375.579 Durbin-Watson: 1.910
Prob(Omnibus): 0.000 Jarque-Bera (JB): 805.992
Skew: -1.387 Prob(JB): 9.57e-176
Kurtosis: 5.217 Cond. No. 32.5
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Example 3.4. Determinants of College GPA, R-squared.#
df = dataWoo('gpa1')
gpa_multiple = smf.ols(formula='colGPA ~ hsGPA + ACT + 1', data=df).fit()
print(gpa_multiple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: colGPA R-squared: 0.176
Model: OLS Adj. R-squared: 0.164
Method: Least Squares F-statistic: 14.78
Date: Mon, 11 Dec 2023 Prob (F-statistic): 1.53e-06
Time: 18:36:23 Log-Likelihood: -46.573
No. Observations: 141 AIC: 99.15
Df Residuals: 138 BIC: 108.0
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.2863 0.341 3.774 0.000 0.612 1.960
hsGPA 0.4535 0.096 4.733 0.000 0.264 0.643
ACT 0.0094 0.011 0.875 0.383 -0.012 0.031
==============================================================================
Omnibus: 3.056 Durbin-Watson: 1.885
Prob(Omnibus): 0.217 Jarque-Bera (JB): 2.469
Skew: 0.199 Prob(JB): 0.291
Kurtosis: 2.488 Cond. No. 298.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Example3.5 Arrest records#
df = dataWoo('crime1')
crime_multiple = smf.ols(formula='narr86 ~ pcnv + ptime86 + qemp86 + 1', data=df).fit()
print(crime_multiple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: narr86 R-squared: 0.041
Model: OLS Adj. R-squared: 0.040
Method: Least Squares F-statistic: 39.10
Date: Mon, 11 Dec 2023 Prob (F-statistic): 9.91e-25
Time: 18:36:23 Log-Likelihood: -3394.7
No. Observations: 2725 AIC: 6797.
Df Residuals: 2721 BIC: 6821.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.7118 0.033 21.565 0.000 0.647 0.776
pcnv -0.1499 0.041 -3.669 0.000 -0.230 -0.070
ptime86 -0.0344 0.009 -4.007 0.000 -0.051 -0.018
qemp86 -0.1041 0.010 -10.023 0.000 -0.124 -0.084
==============================================================================
Omnibus: 2394.860 Durbin-Watson: 1.836
Prob(Omnibus): 0.000 Jarque-Bera (JB): 106169.153
Skew: 4.002 Prob(JB): 0.00
Kurtosis: 32.513 Cond. No. 8.27
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
crime_multiple_2 = smf.ols(formula='narr86 ~ avgsen + pcnv + ptime86 + qemp86 + 1', data=df).fit()
print(crime_multiple_2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: narr86 R-squared: 0.042
Model: OLS Adj. R-squared: 0.041
Method: Least Squares F-statistic: 29.96
Date: Mon, 11 Dec 2023 Prob (F-statistic): 2.01e-24
Time: 18:36:23 Log-Likelihood: -3393.5
No. Observations: 2725 AIC: 6797.
Df Residuals: 2720 BIC: 6826.
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.7068 0.033 21.319 0.000 0.642 0.772
avgsen 0.0074 0.005 1.572 0.116 -0.002 0.017
pcnv -0.1508 0.041 -3.692 0.000 -0.231 -0.071
ptime86 -0.0374 0.009 -4.252 0.000 -0.055 -0.020
qemp86 -0.1033 0.010 -9.940 0.000 -0.124 -0.083
==============================================================================
Omnibus: 2396.990 Durbin-Watson: 1.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 106841.658
Skew: 4.006 Prob(JB): 0.00
Kurtosis: 32.611 Cond. No. 10.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
print(summary_col([crime_multiple,crime_multiple_2],stars=True,float_format='%0.2f',
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)}))
=================================
narr86 I narr86 II
---------------------------------
Intercept 0.71*** 0.71***
(0.03) (0.03)
R-squared 0.04 0.04
R-squared Adj. 0.04 0.04
avgsen 0.01
(0.00)
pcnv -0.15*** -0.15***
(0.04) (0.04)
ptime86 -0.03*** -0.04***
(0.01) (0.01)
qemp86 -0.10*** -0.10***
(0.01) (0.01)
N 2725 2725
R2 0.04 0.04
=================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
Example 3.6 Wage equation#
df = dataWoo('wage1')
wage_simple = smf.ols(formula='lwage ~ educ + 1', data=df).fit()
print(wage_simple.summary())
OLS Regression Results
==============================================================================
Dep. Variable: lwage R-squared: 0.186
Model: OLS Adj. R-squared: 0.184
Method: Least Squares F-statistic: 119.6
Date: Mon, 11 Dec 2023 Prob (F-statistic): 3.27e-25
Time: 18:36:23 Log-Likelihood: -359.38
No. Observations: 526 AIC: 722.8
Df Residuals: 524 BIC: 731.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.5838 0.097 5.998 0.000 0.393 0.775
educ 0.0827 0.008 10.935 0.000 0.068 0.098
==============================================================================
Omnibus: 11.804 Durbin-Watson: 1.801
Prob(Omnibus): 0.003 Jarque-Bera (JB): 13.811
Skew: 0.268 Prob(JB): 0.00100
Kurtosis: 3.586 Cond. No. 60.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.