5. Analyzing RCT reemployment experiment#
5.1. Analyzing RCT with Precision by Adjusting for Baseline Covariates#
5.1.1. Jonathan Roth’s DGP#
Here we set up a DGP with heterogenous effects. In this example, with is due to Jonathan Roth, we have:
The CATE is
and the ATE is
We would like to estimate ATE as precisely as possible.
An economic motivation for this example could be provided as follows: Let D be the treatment of going to college, and \(Z\) academic skills. Suppose that academic skills cause lower earnings Y(0) in jobs that don’t require college degree, and cause higher earnings Y(1) in jobs that require college degrees. This type of scenario is reflected in the DGP set-up above.
# Import relevant packages for splitting data
import numpy as np
import random
import math
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
# Set Seed
# to make the results replicable (generating random numbers)
np.random.seed(12345676) # set MC seed
n = 1000 # sample size
Z = np.random.normal(0, 1, 1000).reshape((1000, 1)) # generate Z
Y0 = -Z + np.random.normal(0, 1, 1000).reshape((1000, 1)) # conditional average baseline response is -Z
Y1 = Z + np.random.normal(0, 1, 1000).reshape((1000, 1)) # conditional average treatment effect is +Z
D = (np.random.uniform(0, 1, n)<.2).reshape((1000, 1)) # treatment indicator; only 20% get treated
np.mean(D)
0.181
Y = Y1*D + Y0*(1-D) # observed Y
D = D - np.mean(D) # demean D
Z = Z - np.mean(Z) # demean Z
5.2. Analyze the RCT data with Precision Adjustment#
Consider
classical 2-sample approach, no adjustment (CL)
classical linear regression adjustment (CRA)
interactive regression adjusment (IRA)
Carry out inference using robust inference, using the sandwich formulas (Eicker-Huber-White).
Observe that CRA delivers estimates that are less efficient than CL (pointed out by Freedman), whereas IRA delivers more efficient approach (pointed out by Lin). In order for CRA to be more efficient than CL, we need the CRA to be a correct model of the conditional expectation function of Y given D and X, which is not the case here.
Z_times_D = Z*D
X = np.hstack((D, Z, Z_times_D))
data = pd.DataFrame(X, columns = ["D", "Z", "Z_times_D"])
data
D | Z | Z_times_D | |
---|---|---|---|
0 | -0.181 | 1.408649 | -0.254966 |
1 | -0.181 | 0.466085 | -0.084361 |
2 | -0.181 | 0.365742 | -0.066199 |
3 | -0.181 | -1.038993 | 0.188058 |
4 | -0.181 | 0.222988 | -0.040361 |
... | ... | ... | ... |
995 | -0.181 | 0.161225 | -0.029182 |
996 | -0.181 | -0.472047 | 0.085440 |
997 | -0.181 | 1.010122 | -0.182832 |
998 | -0.181 | -0.177596 | 0.032145 |
999 | -0.181 | 0.392989 | -0.071131 |
1000 rows × 3 columns
# Import packages for OLS regression
import statsmodels.api as sm
import statsmodels.formula.api as smf
CL_model = "Y ~ D"
CRA_model = "Y ~ D + Z" #classical
IRA_model = "Y ~ D+ Z+ Z*D" #interactive approach
CL = smf.ols(CL_model , data=data).fit()
CRA = smf.ols(CRA_model , data=data).fit()
IRA = smf.ols(IRA_model , data=data).fit()
# Check t values of regressors
print(CL.tvalues)
print(CRA.tvalues)
print(IRA.tvalues)
5.3. Using classical standard errors (non-robust) is misleading here.#
We don’t teach non-robust standard errors in econometrics courses, but the default statistical inference for lm() procedure in R, summary.lm(), still uses 100-year old concepts, perhaps in part due to historical legacy.
Here the non-robust standard errors suggest that there is not much difference between the different approaches, contrary to the conclusions reached using the robust standard errors.
# we are interested in the coefficients on variable "D".
print(CL.summary())
print(CRA.summary())
print(IRA.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.001
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 1.306
Date: Sat, 13 Mar 2021 Prob (F-statistic): 0.253
Time: 11:22:53 Log-Likelihood: -1737.3
No. Observations: 1000 AIC: 3479.
Df Residuals: 998 BIC: 3488.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0312 0.044 -0.717 0.473 -0.117 0.054
D 0.1292 0.113 1.143 0.253 -0.093 0.351
==============================================================================
Omnibus: 0.706 Durbin-Watson: 1.972
Prob(Omnibus): 0.702 Jarque-Bera (JB): 0.578
Skew: -0.006 Prob(JB): 0.749
Kurtosis: 3.117 Cond. No. 2.60
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.229
Model: OLS Adj. R-squared: 0.228
Method: Least Squares F-statistic: 148.3
Date: Sat, 13 Mar 2021 Prob (F-statistic): 4.15e-57
Time: 11:22:53 Log-Likelihood: -1607.7
No. Observations: 1000 AIC: 3221.
Df Residuals: 997 BIC: 3236.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0312 0.038 -0.816 0.415 -0.106 0.044
D 0.1546 0.099 1.556 0.120 -0.040 0.350
Z -0.6608 0.038 -17.173 0.000 -0.736 -0.585
==============================================================================
Omnibus: 32.600 Durbin-Watson: 1.952
Prob(Omnibus): 0.000 Jarque-Bera (JB): 79.232
Skew: -0.067 Prob(JB): 6.24e-18
Kurtosis: 4.372 Cond. No. 2.60
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.483
Model: OLS Adj. R-squared: 0.481
Method: Least Squares F-statistic: 310.0
Date: Sat, 13 Mar 2021 Prob (F-statistic): 4.01e-142
Time: 11:22:53 Log-Likelihood: -1408.2
No. Observations: 1000 AIC: 2824.
Df Residuals: 996 BIC: 2844.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0421 0.031 -1.343 0.180 -0.104 0.019
D 0.1060 0.081 1.301 0.194 -0.054 0.266
Z -0.6167 0.032 -19.518 0.000 -0.679 -0.555
Z:D 1.9111 0.086 22.102 0.000 1.741 2.081
==============================================================================
Omnibus: 2.203 Durbin-Watson: 2.015
Prob(Omnibus): 0.332 Jarque-Bera (JB): 2.274
Skew: 0.006 Prob(JB): 0.321
Kurtosis: 3.233 Cond. No. 2.77
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
5.4. Verify Asymptotic Approximations Hold in Finite-Sample Simulation Experiment#
np.random.seed(12345676) # set MC seed
n = 1000
B = 1000
# numpy format of data = float32
CLs = np.repeat(0., B)
CRAs = np.repeat(0., B)
IRAs = np.repeat(0., B)
# models
CL_model = "Y ~ D"
CRA_model = "Y ~ D + Z" #classical
IRA_model = "Y ~ D+ Z+ Z*D" #interactive approachIRAs = np.repeat(0, B)
# simulation
for i in range(0, B, 1):
Z = np.random.normal(0, 1, n).reshape((n, 1))
Y0 = -Z + np.random.normal(0, 1, n).reshape((n, 1))
Y1 = Z + np.random.normal(0, 1, n).reshape((n, 1))
D = (np.random.uniform(0, 1, n)<.2).reshape((n, 1))
D = D - np.mean(D)
Z = Z - np.mean(Z)
Y = Y1*D + Y0*(1-D)
Z_times_D = Z*D
X = np.hstack((D, Z, Z_times_D))
data = pd.DataFrame(X, columns = ["D", "Z", "Z_times_D"])
CLs[i,] = smf.ols(CL_model , data=data).fit().params[1]
CRAs[i,] = smf.ols(CRA_model , data=data).fit().params[1]
IRAs[i,] = smf.ols(IRA_model , data=data).fit().params[1]
# check standard deviations
print("Standard deviations for estimators")
print(np.sqrt(np.mean(CLs**2)))
print(np.sqrt(np.mean(CRAs**2)))
print(np.sqrt(np.mean(IRAs**2)))
Standard deviations for estimators
0.09610043229856047
0.13473909989882688
0.09467905369858437