# !wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
# !dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
# !apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
# !apt update -q
# !apt install cuda gcc-6 g++-6 -y -q
# !ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
# !ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
# !curl -sSL "https://julialang-s3.julialang.org/bin/linux/x64/1.7/julia-1.7.3-linux-x86_64.tar.gz" -o julia.tar.gz
# !tar -xzf julia.tar.gz -C /usr --strip-components 1
# !rm -rf julia.tar.gz*
# !julia -e 'using Pkg; pkg"add IJulia; precompile"'
10. Double Lasso for the convergence hypothesis#
10.1. Introduction#
We provide an additional empirical example of partialling-out with Lasso to estimate the regression coefficient \(\beta_1\) in the high-dimensional linear regression model: $\( Y = \beta_1 D + \beta_2'W + \epsilon. \)$
Specifically, we are interested in how the rates at which economies of different countries grow (\(Y\)) are related to the initial wealth levels in each country (\(D\)) controlling for country’s institutional, educational, and other similar characteristics (\(W\)).
The relationship is captured by \(\beta_1\), the speed of convergence/divergence, which measures the speed at which poor countries catch up \((\beta_1< 0)\) or fall behind \((\beta_1> 0)\) rich countries, after controlling for \(W\). Our inference question here is: do poor countries grow faster than rich countries, controlling for educational and other characteristics? In other words, is the speed of convergence negative: \( \beta_1 <0?\) This is the Convergence Hypothesis predicted by the Solow Growth Model. This is a structural economic model. Under some strong assumptions, that we won’t state here, the predictive exercise we are doing here can be given causal interpretation.
The outcome \(Y\) is the realized annual growth rate of a country’s wealth (Gross Domestic Product per capita). The target regressor (\(D\)) is the initial level of the country’s wealth. The target parameter \(\beta_1\) is the speed of convergence, which measures the speed at which poor countries catch up with rich countries. The controls (\(W\)) include measures of education levels, quality of institutions, trade openness, and political stability in the country.
10.2. Data analysis#
We consider the data set GrowthData which is included in the package hdm. First, let us load the data set to get familiar with the data.
# import Pkg; Pkg.add("RData")
# import Pkg; Pkg.add("CodecBzip2")
# import Pkg; Pkg.add("DataStructures")
# import Pkg; Pkg.add("NamedArrays")
# import Pkg; Pkg.add("PrettyTables")
# import Pkg; Pkg.add("Lasso")
using RData, LinearAlgebra, GLM, DataFrames, Statistics, Random, Distributions, DataStructures, NamedArrays, PrettyTables
import CodecBzip2
# Importing .Rdata file
url = "https://github.com/d2cml-ai/14.388_jl/raw/github_data/data/GrowthData.RData"
download(url, "GrowthData.RData")
growth_read = load("GrowthData.RData")
rm("GrowthData.RData")
# Since growth_read is a dictionary, we check if there is a key called "GrowthData", the one we need for our analyze
haskey(growth_read, "GrowthData")
true
# Now we save that dataframe with a new name
growth = growth_read["GrowthData"]
Outcome | intercept | gdpsh465 | bmp1l | freeop | freetar | h65 | hm65 | |
---|---|---|---|---|---|---|---|---|
Float64 | Int32 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | -0.0243358 | 1 | 6.59167 | 0.2837 | 0.153491 | 0.043888 | 0.007 | 0.013 |
2 | 0.100473 | 1 | 6.82979 | 0.6141 | 0.313509 | 0.061827 | 0.019 | 0.032 |
3 | 0.0670515 | 1 | 8.89508 | 0.0 | 0.204244 | 0.009186 | 0.26 | 0.325 |
4 | 0.0640892 | 1 | 7.56528 | 0.1997 | 0.248714 | 0.03627 | 0.061 | 0.07 |
5 | 0.0279295 | 1 | 7.1624 | 0.174 | 0.299252 | 0.037367 | 0.017 | 0.027 |
6 | 0.0464074 | 1 | 7.21891 | 0.0 | 0.258865 | 0.02088 | 0.023 | 0.038 |
7 | 0.0673323 | 1 | 7.8536 | 0.0 | 0.182525 | 0.014385 | 0.039 | 0.063 |
8 | 0.0209777 | 1 | 7.70391 | 0.2776 | 0.215275 | 0.029713 | 0.024 | 0.035 |
9 | 0.0335512 | 1 | 9.06346 | 0.0 | 0.109614 | 0.002171 | 0.402 | 0.488 |
10 | 0.0391465 | 1 | 8.15191 | 0.1484 | 0.110885 | 0.028579 | 0.145 | 0.173 |
11 | 0.0761265 | 1 | 6.92952 | 0.0296 | 0.165784 | 0.020115 | 0.046 | 0.066 |
12 | 0.127951 | 1 | 7.23778 | 0.2151 | 0.078488 | 0.011581 | 0.022 | 0.031 |
13 | -0.0243261 | 1 | 8.11582 | 0.4318 | 0.137482 | 0.026547 | 0.059 | 0.073 |
14 | 0.0782934 | 1 | 7.2717 | 0.1689 | 0.164598 | 0.044446 | 0.029 | 0.045 |
15 | 0.112912 | 1 | 7.12125 | 0.1832 | 0.188016 | 0.045678 | 0.033 | 0.051 |
16 | 0.0523082 | 1 | 6.97728 | 0.0962 | 0.204611 | 0.077852 | 0.037 | 0.043 |
17 | 0.0363909 | 1 | 7.64969 | 0.0227 | 0.136287 | 0.04673 | 0.081 | 0.105 |
18 | 0.0297382 | 1 | 8.05674 | 0.0208 | 0.197853 | 0.037224 | 0.083 | 0.097 |
19 | -0.0566436 | 1 | 8.78094 | 0.2654 | 0.189867 | 0.031747 | 0.068 | 0.089 |
20 | 0.0192048 | 1 | 6.28786 | 0.4207 | 0.130682 | 0.109921 | 0.053 | 0.039 |
21 | 0.085206 | 1 | 6.13773 | 0.1371 | 0.123818 | 0.015897 | 0.028 | 0.025 |
22 | 0.133982 | 1 | 8.12888 | 0.0 | 0.16721 | 0.003311 | 0.129 | 0.196 |
23 | 0.173025 | 1 | 6.68085 | 0.4713 | 0.228424 | 0.029328 | 0.062 | 0.09 |
24 | 0.109699 | 1 | 7.17702 | 0.0178 | 0.18524 | 0.015453 | 0.02 | 0.026 |
25 | 0.0159899 | 1 | 6.64898 | 0.4762 | 0.171181 | 0.058937 | 0.018 | 0.028 |
26 | 0.0622498 | 1 | 6.87936 | 0.2927 | 0.179508 | 0.035842 | 0.188 | 0.169 |
27 | 0.109871 | 1 | 7.3473 | 0.1017 | 0.247626 | 0.037392 | 0.08 | 0.133 |
28 | 0.0921063 | 1 | 6.72503 | 0.0266 | 0.179933 | 0.046376 | 0.015 | 0.02 |
29 | 0.083376 | 1 | 8.45105 | 0.0 | 0.358556 | 0.016468 | 0.09 | 0.133 |
30 | 0.0762335 | 1 | 8.60245 | 0.0 | 0.416234 | 0.014721 | 0.148 | 0.194 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
We determine the type and the dimension of our data set.
typeof(growth), size(growth)
(DataFrame, (90, 63))
The sample contains \(90\) countries and \(63\) controls. Thus \(p \approx 60\), \(n=90\) and \(p/n\) is not small. We expect the least squares method to provide a poor estimate of \(\beta_1\). We expect the method based on partialling-out with Lasso to provide a high quality estimate of \(\beta_1\).
To check this hypothesis, we analyze the relation between the output variable \(Y\) and the other country’s characteristics by running a linear regression in the first step.
y = growth[!, "Outcome"]
y = DataFrame([y], [:y])
90 rows × 1 columns
y | |
---|---|
Float64 | |
1 | -0.0243358 |
2 | 0.100473 |
3 | 0.0670515 |
4 | 0.0640892 |
5 | 0.0279295 |
6 | 0.0464074 |
7 | 0.0673323 |
8 | 0.0209777 |
9 | 0.0335512 |
10 | 0.0391465 |
11 | 0.0761265 |
12 | 0.127951 |
13 | -0.0243261 |
14 | 0.0782934 |
15 | 0.112912 |
16 | 0.0523082 |
17 | 0.0363909 |
18 | 0.0297382 |
19 | -0.0566436 |
20 | 0.0192048 |
21 | 0.085206 |
22 | 0.133982 |
23 | 0.173025 |
24 | 0.109699 |
25 | 0.0159899 |
26 | 0.0622498 |
27 | 0.109871 |
28 | 0.0921063 |
29 | 0.083376 |
30 | 0.0762335 |
⋮ | ⋮ |
X = select(growth, Not(["Outcome"]))
90 rows × 62 columns (omitted printing of 53 columns)
intercept | gdpsh465 | bmp1l | freeop | freetar | h65 | hm65 | hf65 | p65 | |
---|---|---|---|---|---|---|---|---|---|
Int32 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 1 | 6.59167 | 0.2837 | 0.153491 | 0.043888 | 0.007 | 0.013 | 0.001 | 0.29 |
2 | 1 | 6.82979 | 0.6141 | 0.313509 | 0.061827 | 0.019 | 0.032 | 0.007 | 0.91 |
3 | 1 | 8.89508 | 0.0 | 0.204244 | 0.009186 | 0.26 | 0.325 | 0.201 | 1.0 |
4 | 1 | 7.56528 | 0.1997 | 0.248714 | 0.03627 | 0.061 | 0.07 | 0.051 | 1.0 |
5 | 1 | 7.1624 | 0.174 | 0.299252 | 0.037367 | 0.017 | 0.027 | 0.007 | 0.82 |
6 | 1 | 7.21891 | 0.0 | 0.258865 | 0.02088 | 0.023 | 0.038 | 0.006 | 0.5 |
7 | 1 | 7.8536 | 0.0 | 0.182525 | 0.014385 | 0.039 | 0.063 | 0.014 | 0.92 |
8 | 1 | 7.70391 | 0.2776 | 0.215275 | 0.029713 | 0.024 | 0.035 | 0.013 | 0.69 |
9 | 1 | 9.06346 | 0.0 | 0.109614 | 0.002171 | 0.402 | 0.488 | 0.314 | 1.0 |
10 | 1 | 8.15191 | 0.1484 | 0.110885 | 0.028579 | 0.145 | 0.173 | 0.114 | 1.0 |
11 | 1 | 6.92952 | 0.0296 | 0.165784 | 0.020115 | 0.046 | 0.066 | 0.025 | 0.73 |
12 | 1 | 7.23778 | 0.2151 | 0.078488 | 0.011581 | 0.022 | 0.031 | 0.014 | 1.0 |
13 | 1 | 8.11582 | 0.4318 | 0.137482 | 0.026547 | 0.059 | 0.073 | 0.045 | 1.0 |
14 | 1 | 7.2717 | 0.1689 | 0.164598 | 0.044446 | 0.029 | 0.045 | 0.013 | 0.84 |
15 | 1 | 7.12125 | 0.1832 | 0.188016 | 0.045678 | 0.033 | 0.051 | 0.015 | 0.91 |
16 | 1 | 6.97728 | 0.0962 | 0.204611 | 0.077852 | 0.037 | 0.043 | 0.03 | 1.0 |
17 | 1 | 7.64969 | 0.0227 | 0.136287 | 0.04673 | 0.081 | 0.105 | 0.056 | 0.99 |
18 | 1 | 8.05674 | 0.0208 | 0.197853 | 0.037224 | 0.083 | 0.097 | 0.069 | 1.0 |
19 | 1 | 8.78094 | 0.2654 | 0.189867 | 0.031747 | 0.068 | 0.089 | 0.046 | 0.94 |
20 | 1 | 6.28786 | 0.4207 | 0.130682 | 0.109921 | 0.053 | 0.039 | 0.011 | 0.74 |
21 | 1 | 6.13773 | 0.1371 | 0.123818 | 0.015897 | 0.028 | 0.025 | 0.007 | 0.72 |
22 | 1 | 8.12888 | 0.0 | 0.16721 | 0.003311 | 0.129 | 0.196 | 0.063 | 1.0 |
23 | 1 | 6.68085 | 0.4713 | 0.228424 | 0.029328 | 0.062 | 0.09 | 0.032 | 1.0 |
24 | 1 | 7.17702 | 0.0178 | 0.18524 | 0.015453 | 0.02 | 0.026 | 0.013 | 0.9 |
25 | 1 | 6.64898 | 0.4762 | 0.171181 | 0.058937 | 0.018 | 0.028 | 0.007 | 0.4 |
26 | 1 | 6.87936 | 0.2927 | 0.179508 | 0.035842 | 0.188 | 0.169 | 0.208 | 1.0 |
27 | 1 | 7.3473 | 0.1017 | 0.247626 | 0.037392 | 0.08 | 0.133 | 0.027 | 0.78 |
28 | 1 | 6.72503 | 0.0266 | 0.179933 | 0.046376 | 0.015 | 0.02 | 0.01 | 0.78 |
29 | 1 | 8.45105 | 0.0 | 0.358556 | 0.016468 | 0.09 | 0.133 | 0.044 | 1.0 |
30 | 1 | 8.60245 | 0.0 | 0.416234 | 0.014721 | 0.148 | 0.194 | 0.1 | 1.0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
size(X)
(90, 62)
typeof(y), typeof(X)
(DataFrame, DataFrame)
data = [y X]
90 rows × 63 columns (omitted printing of 55 columns)
y | intercept | gdpsh465 | bmp1l | freeop | freetar | h65 | hm65 | |
---|---|---|---|---|---|---|---|---|
Float64 | Int32 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | -0.0243358 | 1 | 6.59167 | 0.2837 | 0.153491 | 0.043888 | 0.007 | 0.013 |
2 | 0.100473 | 1 | 6.82979 | 0.6141 | 0.313509 | 0.061827 | 0.019 | 0.032 |
3 | 0.0670515 | 1 | 8.89508 | 0.0 | 0.204244 | 0.009186 | 0.26 | 0.325 |
4 | 0.0640892 | 1 | 7.56528 | 0.1997 | 0.248714 | 0.03627 | 0.061 | 0.07 |
5 | 0.0279295 | 1 | 7.1624 | 0.174 | 0.299252 | 0.037367 | 0.017 | 0.027 |
6 | 0.0464074 | 1 | 7.21891 | 0.0 | 0.258865 | 0.02088 | 0.023 | 0.038 |
7 | 0.0673323 | 1 | 7.8536 | 0.0 | 0.182525 | 0.014385 | 0.039 | 0.063 |
8 | 0.0209777 | 1 | 7.70391 | 0.2776 | 0.215275 | 0.029713 | 0.024 | 0.035 |
9 | 0.0335512 | 1 | 9.06346 | 0.0 | 0.109614 | 0.002171 | 0.402 | 0.488 |
10 | 0.0391465 | 1 | 8.15191 | 0.1484 | 0.110885 | 0.028579 | 0.145 | 0.173 |
11 | 0.0761265 | 1 | 6.92952 | 0.0296 | 0.165784 | 0.020115 | 0.046 | 0.066 |
12 | 0.127951 | 1 | 7.23778 | 0.2151 | 0.078488 | 0.011581 | 0.022 | 0.031 |
13 | -0.0243261 | 1 | 8.11582 | 0.4318 | 0.137482 | 0.026547 | 0.059 | 0.073 |
14 | 0.0782934 | 1 | 7.2717 | 0.1689 | 0.164598 | 0.044446 | 0.029 | 0.045 |
15 | 0.112912 | 1 | 7.12125 | 0.1832 | 0.188016 | 0.045678 | 0.033 | 0.051 |
16 | 0.0523082 | 1 | 6.97728 | 0.0962 | 0.204611 | 0.077852 | 0.037 | 0.043 |
17 | 0.0363909 | 1 | 7.64969 | 0.0227 | 0.136287 | 0.04673 | 0.081 | 0.105 |
18 | 0.0297382 | 1 | 8.05674 | 0.0208 | 0.197853 | 0.037224 | 0.083 | 0.097 |
19 | -0.0566436 | 1 | 8.78094 | 0.2654 | 0.189867 | 0.031747 | 0.068 | 0.089 |
20 | 0.0192048 | 1 | 6.28786 | 0.4207 | 0.130682 | 0.109921 | 0.053 | 0.039 |
21 | 0.085206 | 1 | 6.13773 | 0.1371 | 0.123818 | 0.015897 | 0.028 | 0.025 |
22 | 0.133982 | 1 | 8.12888 | 0.0 | 0.16721 | 0.003311 | 0.129 | 0.196 |
23 | 0.173025 | 1 | 6.68085 | 0.4713 | 0.228424 | 0.029328 | 0.062 | 0.09 |
24 | 0.109699 | 1 | 7.17702 | 0.0178 | 0.18524 | 0.015453 | 0.02 | 0.026 |
25 | 0.0159899 | 1 | 6.64898 | 0.4762 | 0.171181 | 0.058937 | 0.018 | 0.028 |
26 | 0.0622498 | 1 | 6.87936 | 0.2927 | 0.179508 | 0.035842 | 0.188 | 0.169 |
27 | 0.109871 | 1 | 7.3473 | 0.1017 | 0.247626 | 0.037392 | 0.08 | 0.133 |
28 | 0.0921063 | 1 | 6.72503 | 0.0266 | 0.179933 | 0.046376 | 0.015 | 0.02 |
29 | 0.083376 | 1 | 8.45105 | 0.0 | 0.358556 | 0.016468 | 0.09 | 0.133 |
30 | 0.0762335 | 1 | 8.60245 | 0.0 | 0.416234 | 0.014721 | 0.148 | 0.194 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
# OLS regression
reg_ols = lm(term(:Outcome) ~ sum(term.(names(growth[!, Not(["Outcome", "intercept"])]))), growth, dropcollinear=false)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
Outcome ~ 1 + gdpsh465 + bmp1l + freeop + freetar + h65 + hm65 + hf65 + p65 + pm65 + pf65 + s65 + sm65 + sf65 + fert65 + mort65 + lifee065 + gpop1 + fert1 + mort1 + invsh41 + geetot1 + geerec1 + gde1 + govwb1 + govsh41 + gvxdxe41 + high65 + highm65 + highf65 + highc65 + highcm65 + highcf65 + human65 + humanm65 + humanf65 + hyr65 + hyrm65 + hyrf65 + no65 + nom65 + nof65 + pinstab1 + pop65 + worker65 + pop1565 + pop6565 + sec65 + secm65 + secf65 + secc65 + seccm65 + seccf65 + syr65 + syrm65 + syrf65 + teapri65 + teasec65 + ex1 + im1 + xr65 + tot1
Coefficients:
─────────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.247161 0.784502 0.32 0.7551 -1.35982 1.85414
gdpsh465 -0.00937799 0.0298877 -0.31 0.7560 -0.0706002 0.0518442
bmp1l -0.0688627 0.0325307 -2.12 0.0433 -0.135499 -0.00222666
freeop 0.080069 0.207864 0.39 0.7030 -0.345721 0.505859
freetar -0.488963 0.418163 -1.17 0.2521 -1.34553 0.367605
h65 -2.3621 0.857292 -2.76 0.0102 -4.11818 -0.606016
hm65 0.707143 0.523145 1.35 0.1873 -0.364471 1.77876
hf65 1.69345 0.503189 3.37 0.0022 0.662713 2.72418
p65 0.265527 0.164294 1.62 0.1173 -0.0710144 0.602068
pm65 0.136953 0.151217 0.91 0.3728 -0.172802 0.446708
pf65 -0.331267 0.165123 -2.01 0.0546 -0.669506 0.00697217
s65 0.0390793 0.185522 0.21 0.8347 -0.340946 0.419105
sm65 -0.0306685 0.116794 -0.26 0.7948 -0.26991 0.208573
sf65 -0.179917 0.118099 -1.52 0.1389 -0.421832 0.0619974
fert65 0.00688083 0.0270514 0.25 0.8011 -0.0485315 0.0622932
mort65 -0.233455 0.817422 -0.29 0.7773 -1.90787 1.44096
lifee065 -0.0149145 0.193336 -0.08 0.9391 -0.410944 0.381116
gpop1 0.970185 1.81221 0.54 0.5966 -2.74196 4.68233
fert1 0.00883819 0.0350415 0.25 0.8027 -0.062941 0.0806174
mort1 0.0665629 0.68481 0.10 0.9233 -1.33621 1.46933
invsh41 0.0744611 0.108447 0.69 0.4980 -0.147682 0.296605
geetot1 -0.715105 1.68014 -0.43 0.6736 -4.15671 2.7265
geerec1 0.630005 2.44738 0.26 0.7987 -4.38322 5.64323
gde1 -0.443575 1.67121 -0.27 0.7926 -3.86689 2.97974
govwb1 0.337452 0.437987 0.77 0.4475 -0.559723 1.23463
govsh41 0.463175 1.92544 0.24 0.8117 -3.4809 4.40725
gvxdxe41 -0.793382 2.0594 -0.39 0.7030 -5.01187 3.4251
high65 -0.752484 0.905722 -0.83 0.4131 -2.60777 1.1028
highm65 -0.390262 0.681249 -0.57 0.5713 -1.78574 1.00521
highf65 -0.417749 0.56149 -0.74 0.4631 -1.56791 0.732412
highc65 -2.21579 1.48081 -1.50 0.1458 -5.24909 0.817506
highcm65 0.279718 0.658239 0.42 0.6741 -1.06862 1.62806
highcf65 0.392077 0.766025 0.51 0.6128 -1.17705 1.96121
human65 2.33727 3.30727 0.71 0.4856 -4.43736 9.1119
humanm65 -1.20925 1.61852 -0.75 0.4612 -4.52463 2.10613
humanf65 -1.10395 1.68469 -0.66 0.5176 -4.55489 2.34699
hyr65 54.9139 23.8873 2.30 0.0292 5.98298 103.845
hyrm65 12.935 23.1714 0.56 0.5811 -34.5295 60.3996
hyrf65 9.09258 17.6696 0.51 0.6109 -27.1019 45.2871
no65 0.0372099 0.131973 0.28 0.7801 -0.233125 0.307545
nom65 -0.0211977 0.0649598 -0.33 0.7466 -0.154262 0.111866
nof65 -0.0168578 0.0670026 -0.25 0.8032 -0.154106 0.120391
pinstab1 -0.0499711 0.0309208 -1.62 0.1173 -0.11331 0.0133673
pop65 1.0318e-7 1.31795e-7 0.78 0.4403 -1.6679e-7 3.7315e-7
worker65 0.034079 0.156191 0.22 0.8289 -0.285864 0.354022
pop1565 -0.465535 0.471334 -0.99 0.3318 -1.43102 0.499949
pop6565 -1.35745 0.634942 -2.14 0.0414 -2.65807 -0.0568295
sec65 -0.0108928 0.307662 -0.04 0.9720 -0.641109 0.619324
secm65 0.00334366 0.151192 0.02 0.9825 -0.30636 0.313047
secf65 -0.00230433 0.157972 -0.01 0.9885 -0.325894 0.321286
secc65 -0.491528 0.729041 -0.67 0.5057 -1.9849 1.00184
seccm65 0.259602 0.355653 0.73 0.4715 -0.468921 0.988124
seccf65 0.220652 0.373333 0.59 0.5592 -0.544086 0.985391
syr65 -0.755581 7.97676 -0.09 0.9252 -17.0952 15.5841
syrm65 0.310901 3.89673 0.08 0.9370 -7.67119 8.29299
syrf65 0.759281 4.11063 0.18 0.8548 -7.66097 9.17953
teapri65 3.95452e-5 0.000770041 0.05 0.9594 -0.00153781 0.0016169
teasec65 0.000249667 0.00117123 0.21 0.8327 -0.00214949 0.00264882
ex1 -0.580408 0.241847 -2.40 0.0233 -1.07581 -0.0850066
im1 0.591445 0.250298 2.36 0.0253 0.0787333 1.10416
xr65 -0.000103777 5.41676e-5 -1.92 0.0656 -0.000214734 7.18049e-6
tot1 -0.127901 0.112595 -1.14 0.2656 -0.358542 0.10274
─────────────────────────────────────────────────────────────────────────────────────
# OLS regression
reg_ols = lm(term(:y) ~ sum(term.(names(data[!, Not(["y", "intercept"])]))), data, dropcollinear=false)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + gdpsh465 + bmp1l + freeop + freetar + h65 + hm65 + hf65 + p65 + pm65 + pf65 + s65 + sm65 + sf65 + fert65 + mort65 + lifee065 + gpop1 + fert1 + mort1 + invsh41 + geetot1 + geerec1 + gde1 + govwb1 + govsh41 + gvxdxe41 + high65 + highm65 + highf65 + highc65 + highcm65 + highcf65 + human65 + humanm65 + humanf65 + hyr65 + hyrm65 + hyrf65 + no65 + nom65 + nof65 + pinstab1 + pop65 + worker65 + pop1565 + pop6565 + sec65 + secm65 + secf65 + secc65 + seccm65 + seccf65 + syr65 + syrm65 + syrf65 + teapri65 + teasec65 + ex1 + im1 + xr65 + tot1
Coefficients:
─────────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.247161 0.784502 0.32 0.7551 -1.35982 1.85414
gdpsh465 -0.00937799 0.0298877 -0.31 0.7560 -0.0706002 0.0518442
bmp1l -0.0688627 0.0325307 -2.12 0.0433 -0.135499 -0.00222666
freeop 0.080069 0.207864 0.39 0.7030 -0.345721 0.505859
freetar -0.488963 0.418163 -1.17 0.2521 -1.34553 0.367605
h65 -2.3621 0.857292 -2.76 0.0102 -4.11818 -0.606016
hm65 0.707143 0.523145 1.35 0.1873 -0.364471 1.77876
hf65 1.69345 0.503189 3.37 0.0022 0.662713 2.72418
p65 0.265527 0.164294 1.62 0.1173 -0.0710144 0.602068
pm65 0.136953 0.151217 0.91 0.3728 -0.172802 0.446708
pf65 -0.331267 0.165123 -2.01 0.0546 -0.669506 0.00697217
s65 0.0390793 0.185522 0.21 0.8347 -0.340946 0.419105
sm65 -0.0306685 0.116794 -0.26 0.7948 -0.26991 0.208573
sf65 -0.179917 0.118099 -1.52 0.1389 -0.421832 0.0619974
fert65 0.00688083 0.0270514 0.25 0.8011 -0.0485315 0.0622932
mort65 -0.233455 0.817422 -0.29 0.7773 -1.90787 1.44096
lifee065 -0.0149145 0.193336 -0.08 0.9391 -0.410944 0.381116
gpop1 0.970185 1.81221 0.54 0.5966 -2.74196 4.68233
fert1 0.00883819 0.0350415 0.25 0.8027 -0.062941 0.0806174
mort1 0.0665629 0.68481 0.10 0.9233 -1.33621 1.46933
invsh41 0.0744611 0.108447 0.69 0.4980 -0.147682 0.296605
geetot1 -0.715105 1.68014 -0.43 0.6736 -4.15671 2.7265
geerec1 0.630005 2.44738 0.26 0.7987 -4.38322 5.64323
gde1 -0.443575 1.67121 -0.27 0.7926 -3.86689 2.97974
govwb1 0.337452 0.437987 0.77 0.4475 -0.559723 1.23463
govsh41 0.463175 1.92544 0.24 0.8117 -3.4809 4.40725
gvxdxe41 -0.793382 2.0594 -0.39 0.7030 -5.01187 3.4251
high65 -0.752484 0.905722 -0.83 0.4131 -2.60777 1.1028
highm65 -0.390262 0.681249 -0.57 0.5713 -1.78574 1.00521
highf65 -0.417749 0.56149 -0.74 0.4631 -1.56791 0.732412
highc65 -2.21579 1.48081 -1.50 0.1458 -5.24909 0.817506
highcm65 0.279718 0.658239 0.42 0.6741 -1.06862 1.62806
highcf65 0.392077 0.766025 0.51 0.6128 -1.17705 1.96121
human65 2.33727 3.30727 0.71 0.4856 -4.43736 9.1119
humanm65 -1.20925 1.61852 -0.75 0.4612 -4.52463 2.10613
humanf65 -1.10395 1.68469 -0.66 0.5176 -4.55489 2.34699
hyr65 54.9139 23.8873 2.30 0.0292 5.98298 103.845
hyrm65 12.935 23.1714 0.56 0.5811 -34.5295 60.3996
hyrf65 9.09258 17.6696 0.51 0.6109 -27.1019 45.2871
no65 0.0372099 0.131973 0.28 0.7801 -0.233125 0.307545
nom65 -0.0211977 0.0649598 -0.33 0.7466 -0.154262 0.111866
nof65 -0.0168578 0.0670026 -0.25 0.8032 -0.154106 0.120391
pinstab1 -0.0499711 0.0309208 -1.62 0.1173 -0.11331 0.0133673
pop65 1.0318e-7 1.31795e-7 0.78 0.4403 -1.6679e-7 3.7315e-7
worker65 0.034079 0.156191 0.22 0.8289 -0.285864 0.354022
pop1565 -0.465535 0.471334 -0.99 0.3318 -1.43102 0.499949
pop6565 -1.35745 0.634942 -2.14 0.0414 -2.65807 -0.0568295
sec65 -0.0108928 0.307662 -0.04 0.9720 -0.641109 0.619324
secm65 0.00334366 0.151192 0.02 0.9825 -0.30636 0.313047
secf65 -0.00230433 0.157972 -0.01 0.9885 -0.325894 0.321286
secc65 -0.491528 0.729041 -0.67 0.5057 -1.9849 1.00184
seccm65 0.259602 0.355653 0.73 0.4715 -0.468921 0.988124
seccf65 0.220652 0.373333 0.59 0.5592 -0.544086 0.985391
syr65 -0.755581 7.97676 -0.09 0.9252 -17.0952 15.5841
syrm65 0.310901 3.89673 0.08 0.9370 -7.67119 8.29299
syrf65 0.759281 4.11063 0.18 0.8548 -7.66097 9.17953
teapri65 3.95452e-5 0.000770041 0.05 0.9594 -0.00153781 0.0016169
teasec65 0.000249667 0.00117123 0.21 0.8327 -0.00214949 0.00264882
ex1 -0.580408 0.241847 -2.40 0.0233 -1.07581 -0.0850066
im1 0.591445 0.250298 2.36 0.0253 0.0787333 1.10416
xr65 -0.000103777 5.41676e-5 -1.92 0.0656 -0.000214734 7.18049e-6
tot1 -0.127901 0.112595 -1.14 0.2656 -0.358542 0.10274
─────────────────────────────────────────────────────────────────────────────────────
# output: estimated regression coefficient corresponding to the target regressor
est_ols = coef(reg_ols)[2]
# output: std. error
std_ols = stderror(reg_ols)[2]
# output: 95% confidence interval
lower_ci = coeftable(reg_ols).cols[5][2]
upper_ci = coeftable(reg_ols).cols[6][2]
0.05184424349589355
10.3. Summarize OLS results#
table_1 = NamedArray(zeros(1, 4))
setnames!(table_1, ["OLS"], 1)
table_1
1×4 Named Matrix{Float64}
A ╲ B │ 1 2 3 4
──────┼───────────────────
OLS │ 0.0 0.0 0.0 0.0
table_1[1] = est_ols
table_1[2] = std_ols
table_1[3] = lower_ci
table_1[4] = upper_ci
table_1
1×4 Named Matrix{Float64}
A ╲ B │ 1 2 3 4
──────┼───────────────────────────────────────────────────
OLS │ -0.00937799 0.0298877 -0.0706002 0.0518442
table_1_pandas = DataFrame( table_1, [ :"Estimator", :"Std. Error", :"lower bound CI", :"upper bound CI"])
model_1 = DataFrame(model = ["OLS"])
table_1_pandas = hcat(model_1,table_1_pandas)
table_1_pandas
1 rows × 5 columns
model | Estimator | Std. Error | lower bound CI | upper bound CI | |
---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | |
1 | OLS | -0.00937799 | 0.0298877 | -0.0706002 | 0.0518442 |
header = (["Model", "Estimator", "Std. Error", "lower bound CI", "upper bound CI"])
pretty_table(table_1_pandas; backend = Val(:html), header = header, formatters=ft_round(6), alignment=:c)
Model | Estimator | Std. Error | lower bound CI | upper bound CI |
---|---|---|---|---|
OLS | -0.009378 | 0.029888 | -0.0706 | 0.051844 |
Least squares provides a rather noisy estimate (high standard error) of the speed of convergence, and does not allow us to answer the question about the convergence hypothesis since the confidence interval includes zero.
In contrast, we can use the partialling-out approach based on lasso regression (“Double Lasso”).
# Importing .Rdata file
growth_read = load("../data/GrowthData.RData")
# Now we save that dataframe with a new name
growth = growth_read["GrowthData"]
# Create main variables
Y = growth[!, "Outcome"]
Y = DataFrame([Y], [:Y])
D = growth[!, "gdpsh465"]
D = DataFrame([D], [:D])
W = select(growth, Not(["Outcome", "intercept", "gdpsh465"]))
data = [Y D W]
90 rows × 62 columns (omitted printing of 53 columns)
Y | D | bmp1l | freeop | freetar | h65 | hm65 | hf65 | p65 | |
---|---|---|---|---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | -0.0243358 | 6.59167 | 0.2837 | 0.153491 | 0.043888 | 0.007 | 0.013 | 0.001 | 0.29 |
2 | 0.100473 | 6.82979 | 0.6141 | 0.313509 | 0.061827 | 0.019 | 0.032 | 0.007 | 0.91 |
3 | 0.0670515 | 8.89508 | 0.0 | 0.204244 | 0.009186 | 0.26 | 0.325 | 0.201 | 1.0 |
4 | 0.0640892 | 7.56528 | 0.1997 | 0.248714 | 0.03627 | 0.061 | 0.07 | 0.051 | 1.0 |
5 | 0.0279295 | 7.1624 | 0.174 | 0.299252 | 0.037367 | 0.017 | 0.027 | 0.007 | 0.82 |
6 | 0.0464074 | 7.21891 | 0.0 | 0.258865 | 0.02088 | 0.023 | 0.038 | 0.006 | 0.5 |
7 | 0.0673323 | 7.8536 | 0.0 | 0.182525 | 0.014385 | 0.039 | 0.063 | 0.014 | 0.92 |
8 | 0.0209777 | 7.70391 | 0.2776 | 0.215275 | 0.029713 | 0.024 | 0.035 | 0.013 | 0.69 |
9 | 0.0335512 | 9.06346 | 0.0 | 0.109614 | 0.002171 | 0.402 | 0.488 | 0.314 | 1.0 |
10 | 0.0391465 | 8.15191 | 0.1484 | 0.110885 | 0.028579 | 0.145 | 0.173 | 0.114 | 1.0 |
11 | 0.0761265 | 6.92952 | 0.0296 | 0.165784 | 0.020115 | 0.046 | 0.066 | 0.025 | 0.73 |
12 | 0.127951 | 7.23778 | 0.2151 | 0.078488 | 0.011581 | 0.022 | 0.031 | 0.014 | 1.0 |
13 | -0.0243261 | 8.11582 | 0.4318 | 0.137482 | 0.026547 | 0.059 | 0.073 | 0.045 | 1.0 |
14 | 0.0782934 | 7.2717 | 0.1689 | 0.164598 | 0.044446 | 0.029 | 0.045 | 0.013 | 0.84 |
15 | 0.112912 | 7.12125 | 0.1832 | 0.188016 | 0.045678 | 0.033 | 0.051 | 0.015 | 0.91 |
16 | 0.0523082 | 6.97728 | 0.0962 | 0.204611 | 0.077852 | 0.037 | 0.043 | 0.03 | 1.0 |
17 | 0.0363909 | 7.64969 | 0.0227 | 0.136287 | 0.04673 | 0.081 | 0.105 | 0.056 | 0.99 |
18 | 0.0297382 | 8.05674 | 0.0208 | 0.197853 | 0.037224 | 0.083 | 0.097 | 0.069 | 1.0 |
19 | -0.0566436 | 8.78094 | 0.2654 | 0.189867 | 0.031747 | 0.068 | 0.089 | 0.046 | 0.94 |
20 | 0.0192048 | 6.28786 | 0.4207 | 0.130682 | 0.109921 | 0.053 | 0.039 | 0.011 | 0.74 |
21 | 0.085206 | 6.13773 | 0.1371 | 0.123818 | 0.015897 | 0.028 | 0.025 | 0.007 | 0.72 |
22 | 0.133982 | 8.12888 | 0.0 | 0.16721 | 0.003311 | 0.129 | 0.196 | 0.063 | 1.0 |
23 | 0.173025 | 6.68085 | 0.4713 | 0.228424 | 0.029328 | 0.062 | 0.09 | 0.032 | 1.0 |
24 | 0.109699 | 7.17702 | 0.0178 | 0.18524 | 0.015453 | 0.02 | 0.026 | 0.013 | 0.9 |
25 | 0.0159899 | 6.64898 | 0.4762 | 0.171181 | 0.058937 | 0.018 | 0.028 | 0.007 | 0.4 |
26 | 0.0622498 | 6.87936 | 0.2927 | 0.179508 | 0.035842 | 0.188 | 0.169 | 0.208 | 1.0 |
27 | 0.109871 | 7.3473 | 0.1017 | 0.247626 | 0.037392 | 0.08 | 0.133 | 0.027 | 0.78 |
28 | 0.0921063 | 6.72503 | 0.0266 | 0.179933 | 0.046376 | 0.015 | 0.02 | 0.01 | 0.78 |
29 | 0.083376 | 8.45105 | 0.0 | 0.358556 | 0.016468 | 0.09 | 0.133 | 0.044 | 1.0 |
30 | 0.0762335 | 8.60245 | 0.0 | 0.416234 | 0.014721 | 0.148 | 0.194 | 0.1 | 1.0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
10.4. Method 1 - Using Lasso#
using Lasso, GLM
# Seat values for Lasso
lasso_model = fit(LassoModel, term(:Y) ~ sum(term.(names(data[!, Not(["Y", "D"])]))), data; α = 0.8)
r_Y = residuals(lasso_model)
r_Y = DataFrame([r_Y], [:r_Y])
# Part. out d
lasso_model = fit(LassoModel, term(:D) ~ sum(term.(names(data[!, Not(["Y", "D"])]))), data; α = 0.8)
r_D = residuals(lasso_model)
r_D = DataFrame([r_D], [:r_D])
# ols
data_aux = [r_Y r_D]
fm_1 = @formula(r_Y ~ r_D)
partial_lasso_fit = lm(fm_1, data_aux)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
r_Y ~ 1 + r_D
Coefficients:
────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept) -2.96067e-18 0.00417979 -0.00 1.0000 -0.00830646 0.00830646
r_D -0.0500007 0.0152165 -3.29 0.0015 -0.0802403 -0.0197611
────────────────────────────────────────────────────────────────────────────────
# output: estimated regression coefficient corresponding to the target regressor
est_lasso = coef(partial_lasso_fit)[2]
# output: std. error
std_lasso = stderror(partial_lasso_fit)[2]
# output: 95% confidence interval
lower_ci_lasso = coeftable(partial_lasso_fit).cols[5][2]
upper_ci_lasso = coeftable(partial_lasso_fit).cols[6][2]
-0.01976108418900318
# Regress residuales
partial_lasso_fit = lm(fm_1, data_aux)
partial_lasso_est = coef(partial_lasso_fit)[2]
println("Coefficient for D via partialling-out using lasso is: ", partial_lasso_est )
Coefficient for D via partialling-out using lasso is: -0.050000699175125084
10.5. Summary LASSO results#
Finally, let us have a look at the results.
table_2 = NamedArray(zeros(1, 4))
setnames!(table_2, ["DOUBLE LASSO"], 1)
table_2
1×4 Named Matrix{Float64}
A ╲ B │ 1 2 3 4
─────────────┼───────────────────
DOUBLE LASSO │ 0.0 0.0 0.0 0.0
table_2[1] = est_lasso
table_2[2] = std_lasso
table_2[3] = lower_ci_lasso
table_2[4] = upper_ci_lasso
table_2
1×4 Named Matrix{Float64}
A ╲ B │ 1 2 3 4
─────────────┼───────────────────────────────────────────────
DOUBLE LASSO │ -0.0500007 0.0152165 -0.0802403 -0.0197611
table_2_pandas = DataFrame( table_2, [ :"Estimator", :"Std. Error", :"lower bound CI", :"upper bound CI"])
model_2 = DataFrame(model = ["DOUBLE LASSO"])
table_2_pandas = hcat(model_2,table_2_pandas)
table_2_pandas
1 rows × 5 columns
model | Estimator | Std. Error | lower bound CI | upper bound CI | |
---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | |
1 | DOUBLE LASSO | -0.0500007 | 0.0152165 | -0.0802403 | -0.0197611 |
header = (["Model", "Estimator", "Std. Error", "lower bound CI", "upper bound CI"])
pretty_table(table_2_pandas; backend = Val(:html), header = header, formatters=ft_round(6), alignment=:c)
Model | Estimator | Std. Error | lower bound CI | upper bound CI |
---|---|---|---|---|
DOUBLE LASSO | -0.050001 | 0.015217 | -0.08024 | -0.019761 |
table_3 = vcat(table_1, table_2)
setnames!(table_3, ["OLS", "DOUBLE LASSO"], 1)
table_3
2×4 Named Matrix{Float64}
vcat ╲ B │ 1 2 3 4
─────────────┼───────────────────────────────────────────────────
OLS │ -0.00937799 0.0298877 -0.0706002 0.0518442
DOUBLE LASSO │ -0.0500007 0.0152165 -0.0802403 -0.0197611
table_3_pandas = DataFrame( table_3, [ :"Estimator", :"Std. Error", :"lower bound CI", :"upper bound CI"])
model_3 = DataFrame(model = ["OLS", "DOUBLE LASSO"])
table_3_pandas = hcat(model_3, table_3_pandas)
table_3_pandas
2 rows × 5 columns
model | Estimator | Std. Error | lower bound CI | upper bound CI | |
---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | |
1 | OLS | -0.00937799 | 0.0298877 | -0.0706002 | 0.0518442 |
2 | DOUBLE LASSO | -0.0500007 | 0.0152165 | -0.0802403 | -0.0197611 |
header = (["Model", "Estimator", "Std. Error", "lower bound CI", "upper bound CI"])
pretty_table(table_3_pandas; backend = Val(:html), header = header, formatters=ft_round(6), alignment=:c)
Model | Estimator | Std. Error | lower bound CI | upper bound CI |
---|---|---|---|---|
OLS | -0.009378 | 0.029888 | -0.0706 | 0.051844 |
DOUBLE LASSO | -0.050001 | 0.015217 | -0.08024 | -0.019761 |
The least square method provides a rather noisy estimate of the speed of convergence. We can not answer the question if poor countries grow faster than rich countries. The least square method does not work when the ratio \(p/n\) is large.
In sharp contrast, partialling-out via Lasso provides a more precise estimate. The Lasso based point estimate is \(-5\%\) and the \(95\%\) confidence interval for the (annual) rate of convergence \([-8\%,-1.9\%]\) only includes negative numbers. This empirical evidence does support the convergence hypothesis.
Estimator | Std. Error | lower bound CI | upper bound CI | |
---|---|---|---|---|
OLS | -0.009378 | 0.029888 | -0.070600 | 0.051844 |
DOUBLE LASSO | -0.050001 | 0.015217 | -0.08024 | -0.019761 |
using Plots
xerror = table_3_pandas[!,2] .- table_3_pandas[!,4]
2-element Named Vector{Float64}
vcat │
─────────────┼──────────
OLS │ 0.0612222
DOUBLE LASSO │ 0.0302396
scatter(table_3_pandas[!,2], ["OLS", "Double Lasso"], label = "Estimator", xerrors = xerror,
xtick = -1:0.008:1, linestyle = :dash, seriescolor=:orange)
plot!(size=(950,400))
10.6. Using HDMJL#
10.6.1. First we need to call hdmjl.jl script to use all the functions, mainly rlasso function#
include("hdmjl/hdmjl.jl")
10.6.2. You use the rlasso_arg to declare the parameters that we use and the matrixes need. You only need to change the first two arguments!#
res_Y_0 = rlasso_arg( W, Y, nothing, true, true, true, false, false,
nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )
rlasso_arg(90×60 DataFrame
Row │ bmp1l freeop freetar h65 hm65 hf65 p65 pm65 ⋯
│ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float6 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 0.2837 0.153491 0.043888 0.007 0.013 0.001 0.29 0.3 ⋯
2 │ 0.6141 0.313509 0.061827 0.019 0.032 0.007 0.91 1.0
3 │ 0.0 0.204244 0.009186 0.26 0.325 0.201 1.0 1.0
4 │ 0.1997 0.248714 0.03627 0.061 0.07 0.051 1.0 1.0
5 │ 0.174 0.299252 0.037367 0.017 0.027 0.007 0.82 0.8 ⋯
6 │ 0.0 0.258865 0.02088 0.023 0.038 0.006 0.5 0.5
7 │ 0.0 0.182525 0.014385 0.039 0.063 0.014 0.92 0.9
8 │ 0.2776 0.215275 0.029713 0.024 0.035 0.013 0.69 0.6
9 │ 0.0 0.109614 0.002171 0.402 0.488 0.314 1.0 1.0 ⋯
10 │ 0.1484 0.110885 0.028579 0.145 0.173 0.114 1.0 1.0
11 │ 0.0296 0.165784 0.020115 0.046 0.066 0.025 0.73 0.8
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
81 │ 0.0 0.293138 0.005517 0.245 0.251 0.238 1.0 1.0
82 │ 0.0 0.30472 0.011658 0.246 0.26 0.19 1.0 1.0 ⋯
83 │ 0.0363 0.288405 0.011589 0.183 0.222 0.142 1.0 1.0
84 │ 0.0 0.345485 0.006503 0.188 0.248 0.136 1.0 1.0
85 │ 0.0 0.28844 0.005995 0.256 0.301 0.199 1.0 1.0
86 │ 0.0 0.371898 0.014586 0.255 0.336 0.17 0.98 0.9 ⋯
87 │ 0.005 0.296437 0.013615 0.108 0.117 0.093 1.0 1.0
88 │ 0.0 0.265778 0.008629 0.288 0.337 0.237 1.0 1.0
89 │ 0.0 0.282939 0.005048 0.188 0.236 0.139 1.0 1.0
90 │ 0.0 0.150366 0.024377 0.257 0.338 0.215 1.0 1.0 ⋯
53 columns and 69 rows omitted, 90×1 DataFrame
Row │ Y
│ Float64
─────┼────────────
1 │ -0.0243358
2 │ 0.100473
3 │ 0.0670515
4 │ 0.0640892
5 │ 0.0279295
6 │ 0.0464074
7 │ 0.0673323
8 │ 0.0209777
9 │ 0.0335512
10 │ 0.0391465
11 │ 0.0761265
⋮ │ ⋮
81 │ 0.038095
82 │ 0.034213
83 │ 0.0527591
84 │ 0.0384156
85 │ 0.0318948
86 │ 0.031196
87 │ 0.0340957
88 │ 0.0469005
89 │ 0.0397734
90 │ 0.0406415
69 rows omitted, nothing, true, true, true, false, false, nothing, 1.1, nothing, 5000, 15, 1.0000000000000003e-5, -Inf, true, Inf, true)
10.6.3. Then we need to use the rlasso function including the arguments declared above#
res_Y = rlasso(res_Y_0)
Dict{String, Any} with 19 entries:
"tss" => 0.23435
"dev" => [-0.0696852, 0.0551231, 0.021702, 0.0187397, -0.0174199, 0.…
"model" => [0.114953 -0.0666109 … -42.3159 -0.023963; 0.445353 0.09340…
"loadings" => [0.0105688 0.00313556 … 6.58751 0.00342864]
"sigma" => [0.0477366]
"lambda0" => 74.3078
"lambda" => 60×2 DataFrame…
"intercept" => 0.0581009
"Xy" => [-0.417365, 0.0764848, -0.0198564, -0.0288688, -0.00603343,…
"iter" => 4
"residuals" => [-0.0609987, 0.0887764, 0.00895057, 0.0210787, -0.017023, -…
"rss" => 0.23435
"index" => Bool[1, 0, 0, 0, 0, 0, 0, 0, 0, 0 … 0, 0, 0, 0, 0, 0, 0, …
"beta" => 60×2 DataFrame…
"options" => Dict{String, Any}("intercept"=>true, "post"=>true, "meanx"=…
"x1" => [0.114953; 0.445353; … ; -0.168747; -0.168747;;]
"pen" => Dict{String, Any}("lambda0"=>74.3078, "lambda"=>[0.786932; …
"startingval" => [-0.0364456, 0.0729772, 0.0168573, 0.0172799, -0.0214212, 0…
"coefficients" => 61×2 DataFrame…
10.6.4. The output is a dictionary, so you can access using the main keys. In our case we need the residuals#
res_Y = res_Y["residuals"]
90-element Vector{Float64}:
-0.06099874009956701
0.08877641223870311
0.008950566442299697
0.021078676644940578
-0.017022974171323174
-0.011693476557700297
0.009231424442299706
-0.01614625852288254
-0.0245496795577003
-0.007740475439663712
0.02026232962756845
0.08610442802511419
-0.04979783062530005
⋮
0.04302905133674367
0.007127644442299706
-0.0200058925577003
-0.0238879135577003
-0.002598750661576797
-0.019685277557700297
-0.0262061255577003
-0.0269049305577003
-0.023627432161540035
-0.011200453557700302
-0.018327543557700297
-0.0174593775577003
10.6.5. We do the same for the second equation#
res_D_0 = rlasso_arg( W, D, nothing, true, true, true, false, false,
nothing, 1.1, nothing, 5000, 15, 10^(-5), -Inf, true, Inf, true )
res_D = rlasso(res_D_0)["residuals"]
# We need to convert the vector into matrix because the lm function requires "X" to be matrix
res_D = reshape(res_D, length(res_D), 1)
90×1 Matrix{Float64}:
0.5222476356191383
0.13027816139145787
0.07232102747813007
-0.13196899833674824
0.09840468850707951
0.35730568468428503
0.2940976463777745
0.7977842044326772
-0.012818468998378796
0.0894831204466463
0.2479204691420016
-0.15386797901857718
0.519165734469782
⋮
0.3977750029233086
-0.21166662271143571
0.32571865064338956
0.18176367161336326
-0.3787033106715221
-0.30497902649692843
-0.11113678995670373
0.11492064458763807
-0.08623958408114252
-0.013143270340858715
0.08676507620589669
0.08653048463271396
10.6.6. Regress errors#
lm(res_D, res_Y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}:
Coefficients:
───────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────
x1 -0.0498115 0.0138578 -3.59 0.0005 -0.0773467 -0.0222762
───────────────────────────────────────────────────────────────────