13. Deep Neural Networks for Wage Prediction#

So far we have considered many machine learning methods such as Lasso and Random Forests for building a predictive model. In this lab, we extend our toolbox by returning to our wage prediction problem and showing how a neural network can be used for prediction.

13.1. Data preparation#

Again, we consider data from the U.S. March Supplement of the Current Population Survey (CPS) in 2015.

install.packages("librarian", quiet = T)
librarian::shelf(keras, tidyverse)
data = read_csv("https://raw.githubusercontent.com/d2cml-ai/14.388_R/main/Data/wage2015_subsample_inference.csv", show_col_types = F)
Z <- data |> select(-c(lwage, wage)) # regressors

First, we split the data first and normalize it.

# split the data into training and testing sets
set.seed(1234)
training <- sample(nrow(data), nrow(data)*(3/4), replace=FALSE)

data_train <- data[training,1:16]
data_test <- data[-training,1:16]
# normalize the data
mean <- apply(data_train, 2, mean)
std <- apply(data_train, 2, sd)
data_train <- scale(data_train, center = mean, scale = std)
data_test <- scale(data_test, center = mean, scale = std)
data_train <- as.data.frame(data_train)
data_test <- as.data.frame(data_test)

Then, we construct the inputs for our network.

X_basic <-  "sex + exp1 + shs + hsg+ scl + clg + mw + so + we"
formula_basic <- as.formula(paste("lwage", "~", X_basic))
model_X_basic_train <- model.matrix(formula_basic,data_train)
model_X_basic_test <- model.matrix(formula_basic,data_test)

Y_train <- data_train$lwage
Y_test <- data_test$lwage

13.2. Neural Networks#

First, we need to determine the structure of our network. We are using the R package keras to build a simple sequential neural network with three dense layers and the ReLU activation function.

library(keras)

build_model <- function() {
  model <- keras_model_sequential() %>% 
    layer_dense(units = 20, activation = "relu", # ReLU activation function
                input_shape = dim(model_X_basic_train)[2])%>% 
    layer_dense(units = 10, activation = "relu") %>% 
    layer_dense(units = 1) 
  
  model %>% compile(
    optimizer = optimizer_adam(lr = 0.005), # Adam optimizer
    loss = "mse", 
    metrics = c("mae")
  )
}

Let us have a look at the structure of our network in detail.

model <- build_model()
summary(model)
Loaded Tensorflow version 2.8.2

Warning message in backcompat_fix_rename_lr_to_learning_rate(...):
“the `lr` argument has been renamed to `learning_rate`.”
Model: "sequential"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 dense_2 (Dense)                    (None, 20)                      220         
 dense_1 (Dense)                    (None, 10)                      210         
 dense (Dense)                      (None, 1)                       11          
================================================================================
Total params: 441
Trainable params: 441
Non-trainable params: 0
________________________________________________________________________________

We have \(441\) trainable parameters in total.

Now, let us train the network. Note that this takes substantial computation time. To speed up the computation time, we use GPU as an accelerator. The extent of computational time improvements varies based on a number of factors, including model architecture, batch-size, input pipeline complexity, etc.

# training the network 
num_epochs <- 1000
model %>% fit(model_X_basic_train, Y_train,
                    epochs = num_epochs, batch_size = 100, verbose = 0)

After training the neural network, we can evaluate the performance of our model on the test sample.

# evaluating performance
model %>% evaluate(model_X_basic_test, Y_test, verbose = 0)
loss
0.836172640323639
mae
0.694816708564758
# calculating the performance measures
pred.nn <- model %>% predict(model_X_basic_test)
MSE.nn = summary(lm((Y_test-pred.nn)^2~1))$coef[1:2]
R2.nn <- 1-MSE.nn[1]/var(Y_test)
# printing R^2
cat("R^2 of the neural network:",R2.nn)
R^2 of the neural network: 0.1429969