OLS and lasso for gender wage gap inference

3. OLS and lasso for gender wage gap inference#

In the previous lab, we analyzed data from the March Supplement of the U.S. Current Population Survey (2015) and answered the question of how to use job-relevant characteristics, such as education and experience, to best predict wages. Now, we focus on the following inference question:

What is the difference in predicted wages between men and women with the same job-relevant characteristics?

Thus, we analyze if there is a difference in the payment of men and women (gender wage gap). The gender wage gap may partly reflect discrimination against women in the labor market or may partly reflect a selection effect, namely that women are relatively more likely to take on occupations that pay somewhat less (for example, school teaching).

To investigate the gender wage gap, we consider the following log-linear regression model

\[\begin{split} \begin{align} \log(Y) &= \beta'X + \epsilon\\ &= \beta_1 D + \beta_2' W + \epsilon, \end{align} \end{split}\]

where \(Y\) is hourly wage, \(D\) is the indicator of being female (\(1\) if female and \(0\) otherwise) and the \(W\)’s are a vector of worker characteristics explaining variation in wages. Considering transformed wages by the logarithm, we are analyzing the relative difference in the payment of men and women.

3.1. Data analysis#

We consider the same subsample of the U.S. Current Population Survey (2015) as in the previous lab. Let us load the data set.

install.packages("librarian", quiet = T)
librarian::shelf(tidyverse, sandwich, hdm, quiet = T)
data <- read_csv("https://github.com/d2cml-ai/14.388_R/raw/main/Data/wage2015_subsample_inference.csv"
        , show_col_types = F)
dim(data)
attach(data)

Warning message in system("timedatectl", intern = TRUE):
“running command 'timedatectl' had status 1”

5150
21

To start our (causal) analysis, we compare the sample means given gender:

variables <- c("lwage","sex","shs","hsg","scl","clg","ad","ne","mw","so","we","exp1")

Z <- data |> select(all_of(variables))

data_female <- data |> filter(sex == 1)
data_male <- data |> filter(sex == 0)

Z_mean = Z |>
  mutate(sex = 3) |> # ALL 
  bind_rows(Z) |> # Sex
  group_by(sex) |>
  summarise(across(where(is.numeric), mean)) |>
  ungroup() |>
  mutate(sex = case_when(sex == 1 ~ "Female", sex == 0 ~ "Male", T ~ "All"))

colnames(Z_mean) <- c("Sex","Log Wage","Less then High School","High School Graduate","Some College","College Graduate","Advanced Degree", "Northeast","Midwest","South","West","Experience")
Z_mean |>
  pivot_longer(!Sex, names_to = "Variable") |>
  pivot_wider(names_from = Sex, values_from = value)

A tibble: 11 × 4
Variable	Male	Female	All
<chr>	<dbl>	<dbl>	<dbl>
Log Wage	2.98782963	2.94948490	2.97078670
Less then High School	0.03180706	0.01266929	0.02330097
High School Graduate	0.29430269	0.18086501	0.24388350
Some College	0.27333100	0.28396680	0.27805825
College Graduate	0.29395316	0.34731324	0.31766990
Advanced Degree	0.10660608	0.17518567	0.13708738
Northeast	0.22195037	0.23503713	0.22776699
Midwest	0.25900035	0.26037571	0.25961165
South	0.29814750	0.29445173	0.29650485
West	0.22090178	0.21013543	0.21611650
Experience	13.78399161	13.73132372	13.76058252

In particular, the table above shows that the difference in average logwage between men and women is equal to \(0.038\)

mean(data_female$lwage)-mean(data_male$lwage)

-0.0383447336744154

Thus, the unconditional gender wage gap is about \(3,8\)% for the group of never married workers (women get paid less on average in our sample). We also observe that never married working women are relatively more educated than working men and have lower working experience.

This unconditional (predictive) effect of gender equals the coefficient \(\beta\) in the univariate ols regression of \(Y\) on \(D\):

\[ \begin{align} \log(Y) &=\beta D + \epsilon. \end{align} \]

We verify this by running an ols regression in R.

nocontrol_fit <- lm(lwage ~ sex, data = Z)
nocontrol_est <- summary(nocontrol_fit)$coef["sex",1]
HCV_coefs <- vcovHC(nocontrol_fit, type = 'HC'); # HC - "heteroskedasticity cosistent"
nocontrol_se <- sqrt(diag(HCV_coefs))[2] # Estimated std errors

# print unconditional effect of gender and the corresponding standard error
cat ("The estimated coefficient on the dummy for gender is",nocontrol_est,"\nand the corresponding robust standard error is", nocontrol_se) 

The estimated coefficient on the dummy for gender is -0.03834473 
and the corresponding robust standard error is 0.01590194

Note that the standard error is computed with the R package sandwich to be robust to heteroskedasticity.

Next, we run an ols regression of \(Y\) on \((D,W)\) to control for the effect of covariates summarized in \(W\):

\[ \begin{align} \log(Y) &=\beta_1 D + \beta_2' W + \epsilon. \end{align} \]

Here, we are considering the flexible model from the previous lab. Hence, \(W\) controls for experience, education, region, and occupation and industry indicators plus transformations and two-way interactions.

Let us run the ols regression with controls.

# ols regression with controls

flex <- lwage ~ sex + (exp1 + exp2 + exp3 + exp4) * (shs + hsg + scl + clg+ occ2 + ind2 + mw + so + we)

#   Note that ()*() operation in formula objects in R creates a formula of the sort:
#  (exp1+exp2+exp3+exp4)+ (shs+hsg+scl+clg+occ2+ind2+mw+so+we) + (exp1+exp2+exp3+exp4)*(shs+hsg+scl+clg+occ2+ind2+mw+so+we)
#  This is not intuitive at all, but that's what it does.

control_fit <- lm(flex, data = data)
control_est <- summary(control_fit)$coef[2,1]

summary(control_fit)

cat("Coefficient for OLS with controls", control_est)

HCV_coefs <- vcovHC(control_fit, type = 'HC');
control_se <- sqrt(diag(HCV_coefs))[2] # Estimated std errors

Call:
lm(formula = flex, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1282 -0.3065 -0.0151  0.2945  3.5341 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.5459818  0.1293099  27.422  < 2e-16 ***
sex         -0.1024693  0.0146380  -7.000 2.89e-12 ***
exp1         0.0395328  0.0481220   0.822  0.41139    
exp2        -0.0470984  0.5325375  -0.088  0.92953    
exp3        -0.0771429  0.2154298  -0.358  0.72029    
exp4         0.0192883  0.0284494   0.678  0.49781    
shs         -0.6909668  0.8988057  -0.769  0.44207    
hsg         -0.5816131  0.1944598  -2.991  0.00279 ** 
scl         -0.3297645  0.1234560  -2.671  0.00758 ** 
clg         -0.0873795  0.0668027  -1.308  0.19092    
occ2        -0.0266617  0.0050951  -5.233 1.74e-07 ***
ind2        -0.0161333  0.0062181  -2.595  0.00950 ** 
mw           0.1079623  0.0834048   1.294  0.19557    
so           0.0385942  0.0747084   0.517  0.60546    
we          -0.0035042  0.0854161  -0.041  0.96728    
exp1:shs    -0.0727104  0.1901283  -0.382  0.70216    
exp1:hsg    -0.0250250  0.0546885  -0.458  0.64727    
exp1:scl    -0.0609705  0.0417514  -1.460  0.14426    
exp1:clg    -0.0380453  0.0300321  -1.267  0.20528    
exp1:occ2    0.0031061  0.0017876   1.738  0.08234 .  
exp1:ind2    0.0004029  0.0020728   0.194  0.84590    
exp1:mw     -0.0270885  0.0301189  -0.899  0.36849    
exp1:so     -0.0078718  0.0265858  -0.296  0.76717    
exp1:we     -0.0024977  0.0305114  -0.082  0.93476    
exp2:shs     0.9029215  1.3741164   0.657  0.51115    
exp2:hsg     0.1877001  0.5146826   0.365  0.71536    
exp2:scl     0.5113091  0.4400572   1.162  0.24532    
exp2:clg     0.2030427  0.3705629   0.548  0.58376    
exp2:occ2   -0.0343464  0.0186214  -1.844  0.06517 .  
exp2:ind2   -0.0059163  0.0210536  -0.281  0.77871    
exp2:mw      0.2042858  0.3188136   0.641  0.52170    
exp2:so      0.0495460  0.2765429   0.179  0.85782    
exp2:we      0.1190125  0.3228731   0.369  0.71244    
exp3:shs    -0.3393592  0.4077661  -0.832  0.40531    
exp3:hsg    -0.0373823  0.1921295  -0.195  0.84574    
exp3:scl    -0.1409625  0.1751678  -0.805  0.42101    
exp3:clg    -0.0065430  0.1607228  -0.041  0.96753    
exp3:occ2    0.0141678  0.0071314   1.987  0.04701 *  
exp3:ind2    0.0042756  0.0079665   0.537  0.59150    
exp3:mw     -0.0669346  0.1233227  -0.543  0.58732    
exp3:so     -0.0212880  0.1047326  -0.203  0.83894    
exp3:we     -0.0616049  0.1249834  -0.493  0.62210    
exp4:shs     0.0390455  0.0426584   0.915  0.36007    
exp4:hsg     0.0016746  0.0243824   0.069  0.94525    
exp4:scl     0.0121324  0.0230861   0.526  0.59924    
exp4:clg    -0.0068572  0.0222925  -0.308  0.75840    
exp4:occ2   -0.0019077  0.0008949  -2.132  0.03308 *  
exp4:ind2   -0.0008135  0.0009948  -0.818  0.41355    
exp4:mw      0.0069432  0.0155866   0.445  0.65601    
exp4:so      0.0031201  0.0129355   0.241  0.80941    
exp4:we      0.0080471  0.0158007   0.509  0.61057    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.498 on 5099 degrees of freedom
Multiple R-squared:  0.2452,	Adjusted R-squared:  0.2378 
F-statistic: 33.14 on 50 and 5099 DF,  p-value: < 2.2e-16

Coefficient for OLS with controls -0.1024693

The estimated regression coefficient \(\beta_1\approx-0.0696\) measures how our linear prediction of wage changes if we set the gender variable \(D\) from 0 to 1, holding the controls \(W\) fixed. We can call this the predictive effect (PE), as it measures the impact of a variable on the prediction we make. Overall, we see that the unconditional wage gap of size \(4\)% for women increases to about \(7\)% after controlling for worker characteristics.

Next, we use the Frisch-Waugh-Lovell (FWL) theorem from lecture, partialling-out the linear effect of the controls via ols.

# Partialling-out using ols

# models
flex_y <- lwage ~ (exp1 + exp2 + exp3 + exp4) * (shs + hsg + scl +clg + occ2+ ind2 + mw + so + we) # model for Y
flex_d <- sex ~ (exp1 + exp2 + exp3 + exp4)*(shs + hsg + scl + clg + occ2 + ind2 + mw + so + we) # model for D

# partialling-out the linear effect of W from Y
t_Y <- lm(flex_y, data=data)$res
# partialling-out the linear effect of W from D
t_D <- lm(flex_d, data=data)$res

# regression of Y on D after partialling-out the effect of W
partial_fit <- lm(t_Y ~ t_D)
partial_est <- summary(partial_fit)$coef[2,1]

cat("Coefficient for D via partialling-out", partial_est)

# standard error
HCV_coefs <- vcovHC(partial_fit, type = 'HC')
partial_se <- sqrt(diag(HCV_coefs))[2]

# confidence interval
confint(partial_fit)[2,]

Coefficient for D via partialling-out -0.1024693

2.5 %: -0.131029157306723
97.5 %: -0.0739094617947904

Again, the estimated coefficient measures the linear predictive effect (PE) of \(D\) on \(Y\) after taking out the linear effect of \(W\) on both of these variables. This coefficient is numerically equivalent to the estimated coefficient from the ols regression with controls, confirming the FWL theorem.

We know that the partialling-out approach works well when the dimension of \(W\) is low in relation to the sample size \(n\). When the dimension of \(W\) is relatively high, we need to use variable selection or penalization for regularization purposes.

In the following, we illustrate the partialling-out approach using lasso instead of ols.

# Partialling-out using lasso

library(hdm)

# models
flex_y <- lwage ~ (exp1 + exp2 + exp3 + exp4) * (shs + hsg + scl + clg + occ2 + ind2 + mw + so + we) # model for Y
flex_d <- sex ~ (exp1 + exp2 + exp3 + exp4) * (shs + hsg + scl + clg + occ2 + ind2 + mw + so + we) # model for D

# partialling-out the linear effect of W from Y
t_Y <- rlasso(flex_y, data = data)$res
# partialling-out the linear effect of W from D
t_D <- rlasso(flex_d, data = data)$res

# regression of Y on D after partialling-out the effect of W
partial_lasso_fit <- lm(t_Y ~ t_D)
partial_lasso_est <- summary(partial_lasso_fit)$coef[2,1]

cat("Coefficient for D via partialling-out using lasso", partial_lasso_est)

# standard error
HCV_coefs <- vcovHC(partial_lasso_fit, type = 'HC')
partial_lasso_se <- sqrt(diag(HCV_coefs))[2]

Coefficient for D via partialling-out using lasso -0.1036713

Using lasso for partialling-out here provides similar results as using ols.

Next, we summarize the results.

table<- matrix(0, 4, 2)
table[1,1]<- nocontrol_est  
table[1,2]<- nocontrol_se   
table[2,1]<- control_est
table[2,2]<- control_se    
table[3,1]<- partial_est  
table[3,2]<- partial_se  
table[4,1]<-  partial_lasso_est
table[4,2]<- partial_lasso_se 
colnames(table)<- c("Estimate","Std. Error")
rownames(table)<- c("Without controls", "full reg", "partial reg", "partial reg via lasso")	
table

A matrix: 4 × 2 of type dbl
	Estimate	Std. Error
Without controls	-0.03834473	0.01590194
full reg	-0.10246931	0.01458860
partial reg	-0.10246931	0.01458860
partial reg via lasso	-0.10367131	0.01475760

It it worth noticing that controlling for worker characteristics increases the gender wage gap from less than 4% to 7%. The controls we used in our analysis include 5 educational attainment indicators (less than high school graduates, high school graduates, some college, college graduate, and advanced degree), 4 region indicators (midwest, south, west, and northeast); a quartic term (first, second, third, and fourth power) in experience and 22 occupation and 23 industry indicators.

Keep in mind that the predictive effect (PE) does not only measures discrimination (causal effect of being female), it also may reflect selection effects of unobserved differences in covariates between men and women in our sample.

Next we try an “extra” flexible model, where we take interactions of all controls, giving us about 1000 controls.

# extra flexible model

extraflex <- lwage ~ sex + (exp1 + exp2 + exp3 + exp4 + shs + hsg + scl + clg + occ2 + ind2 + mw + so + we)^2

control_fit <- lm(extraflex, data=data)
summary(control_fit)
control_est <- summary(control_fit)$coef[2,1]

cat("Number of Extra-Flex Controls", length(control_fit$coef) - 1, "\n")

cat("Coefficient for OLS with extra flex controls", control_est)

HCV_coefs <- vcovHC(control_fit, type = 'HC');

n = length(wage); p =length(control_fit$coef);

control_se <- sqrt(diag(HCV_coefs))[2] * sqrt( n / (n - p)) # Estimated std errors

# This is a crude adjustment for the effect of dimensionality on OLS standard errors, 
# motivated by Cattaneo, Jannson, and Newey (2018). For a more correct approach, we 
# would implement the approach of Cattaneo, Jannson, and Newey (2018)'s procedure.

Call:
lm(formula = extraflex, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1158 -0.3041 -0.0173  0.2893  3.5892 

Coefficients: (12 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.6297982  0.1552865  23.375  < 2e-16 ***
sex         -0.0967413  0.0146627  -6.598 4.60e-11 ***
exp1         0.0094395  0.0613320   0.154  0.87769    
exp2         0.7798590  1.3068326   0.597  0.55070    
exp3        -0.9937088  1.5435940  -0.644  0.51976    
exp4         0.5284032  0.9687560   0.545  0.58547    
shs          0.4280438  1.0441100   0.410  0.68185    
hsg         -0.5051460  0.2742205  -1.842  0.06552 .  
scl         -0.2951033  0.1897037  -1.556  0.11987    
clg         -0.1203554  0.1176922  -1.023  0.30653    
occ2        -0.0331114  0.0071125  -4.655 3.32e-06 ***
ind2        -0.0203271  0.0076790  -2.647  0.00814 ** 
mw           0.1000058  0.1185443   0.844  0.39892    
so           0.0590348  0.1071524   0.551  0.58170    
we           0.1070730  0.1173880   0.912  0.36174    
exp1:exp2           NA         NA      NA       NA    
exp1:exp3           NA         NA      NA       NA    
exp1:exp4   -0.0154773  0.0324014  -0.478  0.63290    
exp1:shs    -0.3655660  0.2241355  -1.631  0.10295    
exp1:hsg    -0.0872036  0.0723121  -1.206  0.22790    
exp1:scl    -0.0877190  0.0528018  -1.661  0.09672 .  
exp1:clg    -0.0448492  0.0319558  -1.403  0.16054    
exp1:occ2    0.0017415  0.0018413   0.946  0.34431    
exp1:ind2    0.0006614  0.0021592   0.306  0.75937    
exp1:mw     -0.0177427  0.0315136  -0.563  0.57345    
exp1:so      0.0068205  0.0278941   0.245  0.80684    
exp1:we     -0.0106886  0.0315461  -0.339  0.73476    
exp2:exp3           NA         NA      NA       NA    
exp2:exp4    0.0253399  0.0549125   0.461  0.64449    
exp2:shs     3.2517652  1.7043879   1.908  0.05646 .  
exp2:hsg     0.8260811  0.6758047   1.222  0.22163    
exp2:scl     0.8244532  0.5379554   1.533  0.12544    
exp2:clg     0.3103536  0.3859653   0.804  0.42138    
exp2:occ2   -0.0256483  0.0189029  -1.357  0.17489    
exp2:ind2   -0.0064101  0.0215917  -0.297  0.76657    
exp2:mw      0.1118794  0.3277397   0.341  0.73284    
exp2:so     -0.0296200  0.2863447  -0.103  0.91762    
exp2:we      0.1610522  0.3289716   0.490  0.62446    
exp3:exp4   -0.0017785  0.0037133  -0.479  0.63199    
exp3:shs    -1.0955552  0.5316414  -2.061  0.03938 *  
exp3:hsg    -0.2845745  0.2478395  -1.148  0.25093    
exp3:scl    -0.2716782  0.2082679  -1.304  0.19213    
exp3:clg    -0.0575082  0.1656527  -0.347  0.72848    
exp3:occ2    0.0121038  0.0071930   1.683  0.09249 .  
exp3:ind2    0.0039878  0.0081100   0.492  0.62294    
exp3:mw     -0.0323487  0.1257788  -0.257  0.79704    
exp3:so     -0.0089914  0.1081201  -0.083  0.93373    
exp3:we     -0.0692433  0.1265415  -0.547  0.58427    
exp4:shs     0.1229824  0.0581239   2.116  0.03440 *  
exp4:hsg     0.0332181  0.0308733   1.076  0.28200    
exp4:scl     0.0296681  0.0269001   1.103  0.27012    
exp4:clg     0.0004832  0.0228505   0.021  0.98313    
exp4:occ2   -0.0017545  0.0008997  -1.950  0.05122 .  
exp4:ind2   -0.0007445  0.0010089  -0.738  0.46057    
exp4:mw      0.0025865  0.0158330   0.163  0.87024    
exp4:so      0.0031431  0.0133641   0.235  0.81407    
exp4:we      0.0083434  0.0159458   0.523  0.60083    
shs:hsg             NA         NA      NA       NA    
shs:scl             NA         NA      NA       NA    
shs:clg             NA         NA      NA       NA    
shs:occ2     0.0117950  0.0108832   1.084  0.27851    
shs:ind2    -0.0060826  0.0097555  -0.624  0.53298    
shs:mw      -0.0553633  0.1711424  -0.323  0.74634    
shs:so      -0.0917162  0.1595041  -0.575  0.56531    
shs:we       0.3294645  0.1718442   1.917  0.05527 .  
hsg:scl             NA         NA      NA       NA    
hsg:clg             NA         NA      NA       NA    
hsg:occ2     0.0160702  0.0048987   3.280  0.00104 ** 
hsg:ind2     0.0010522  0.0055110   0.191  0.84859    
hsg:mw      -0.1136003  0.0795354  -1.428  0.15327    
hsg:so      -0.1896137  0.0734389  -2.582  0.00985 ** 
hsg:we      -0.0144124  0.0782891  -0.184  0.85395    
scl:clg             NA         NA      NA       NA    
scl:occ2     0.0063711  0.0046063   1.383  0.16668    
scl:ind2     0.0019307  0.0053230   0.363  0.71683    
scl:mw      -0.0835733  0.0750574  -1.113  0.26556    
scl:so      -0.1016680  0.0690405  -1.473  0.14093    
scl:we       0.0625579  0.0741913   0.843  0.39916    
clg:occ2     0.0033082  0.0044378   0.745  0.45603    
clg:ind2     0.0049356  0.0050629   0.975  0.32969    
clg:mw      -0.0442317  0.0692762  -0.638  0.52319    
clg:so      -0.0777453  0.0613241  -1.268  0.20494    
clg:we      -0.0332471  0.0667061  -0.498  0.61822    
occ2:ind2    0.0002593  0.0002069   1.254  0.21006    
occ2:mw      0.0072283  0.0034031   2.124  0.03372 *  
occ2:so      0.0007315  0.0032890   0.222  0.82402    
occ2:we      0.0009160  0.0035135   0.261  0.79434    
ind2:mw     -0.0029011  0.0038057  -0.762  0.44592    
ind2:so      0.0003376  0.0036861   0.092  0.92704    
ind2:we     -0.0065410  0.0039548  -1.654  0.09820 .  
mw:so               NA         NA      NA       NA    
mw:we               NA         NA      NA       NA    
so:we               NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4962 on 5069 degrees of freedom
Multiple R-squared:  0.2551,	Adjusted R-squared:  0.2433 
F-statistic:  21.7 on 80 and 5069 DF,  p-value: < 2.2e-16

Number of Extra-Flex Controls 92 
Coefficient for OLS with extra flex controls -0.09674132

library(hdm)

# models
extraflex_y <- lwage ~ (exp1 + exp2 + exp3 + exp4 + shs + hsg + scl + clg + occ2 + ind2 + mw + so + we)^2 # model for Y
extraflex_d <- sex ~ (exp1 + exp2 + exp3 + exp4 + shs + hsg + scl + clg + occ2 + ind2 + mw + so + we)^2 # model for D

# partialling-out the linear effect of W from Y
t_Y <- rlasso(extraflex_y, data=data)$res
# partialling-out the linear effect of W from D
t_D <- rlasso(extraflex_d, data=data)$res

# regression of Y on D after partialling-out the effect of W
partial_lasso_fit <- lm(t_Y~t_D)
partial_lasso_est <- summary(partial_lasso_fit)$coef[2,1]

cat("Coefficient for D via partialling-out using lasso", partial_lasso_est)

# standard error
HCV_coefs <- vcovHC(partial_lasso_fit, type = 'HC')
partial_lasso_se <- sqrt(diag(HCV_coefs))[2]

Coefficient for D via partialling-out using lasso -0.09916518

table<- matrix(0, 2, 2)
table[1,1]<- control_est
table[1,2]<- control_se    
table[2,1]<-  partial_lasso_est
table[2,2]<- partial_lasso_se 
colnames(table)<- c("Estimate","Std. Error")
rownames(table)<- c("full reg","partial reg via lasso")	
table

A matrix: 2 × 2 of type dbl
	Estimate	Std. Error
full reg	-0.09674132	0.01476431
partial reg via lasso	-0.09916518	0.01481612

In this case p/n = 20%, that is \(p/n\) is no longer small and we start seeing the differences between unregularized partialling out and regularized partialling out with lasso (double lasso). The results based on double lasso have rigorous guarantees in this non-small p/n regime under approximate sparsity. The results based on OLS still have guarantees in p/n< 1 regime under assumptions laid out in Cattaneo, Newey, and Jansson (2018), without approximate sparsity, although other regularity conditions are needed.

OLS and lasso for gender wage gap inference

Contents

3. OLS and lasso for gender wage gap inference#

3.1. Data analysis#