12. ColliderBias Hollywood#
Here is a simple mnemonic example to illustate the collider or M-bias.
Here the idea is that people who get to Hollywood have to have high congenility = talent + beauty. Funnily enough this induces a negative correlation between talents and looks, when we condition on the set of actors or celebrities. This simple example explains an anecdotal observation that “talent and beaty are negatively correlated” for celebrities.
install.packages("dagitty")
library(dagitty)
g <- dagitty( "dag{ T -> C <- B }" )
plot(g)
Plot coordinates for graph not supplied! Generating coordinates, see ?coordinates for how to set your own.
#collider bias
n=1000000
T = rnorm(n) #talent
B = rnorm(n) #beaty
C = T+B + rnorm(n) #congeniality
T.H= subset(T, C>0) # condition on C>0
B.H= subset(B, C>0) # condition on C>0
summary(lm(T~ B)) #regression of T on B
summary(lm(T~ B +C)) #regression of T on B and C
summary(lm(T.H~ B.H)) #regression of T on B, conditional on C>0.
Call:
lm(formula = T ~ B)
Residuals:
Min 1Q Median 3Q Max
-5.2137 -0.6738 -0.0004 0.6747 5.2290
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0004180 0.0010000 0.418 0.676
B 0.0004454 0.0009999 0.445 0.656
Residual standard error: 1 on 999998 degrees of freedom
Multiple R-squared: 1.984e-07, Adjusted R-squared: -8.016e-07
F-statistic: 0.1984 on 1 and 999998 DF, p-value: 0.656
Call:
lm(formula = T ~ B + C)
Residuals:
Min 1Q Median 3Q Max
-3.3345 -0.4762 -0.0001 0.4762 3.4964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0005995 0.0007068 -0.848 0.396
B -0.5004839 0.0008660 -577.921 <2e-16 ***
C 0.5003564 0.0004999 1000.852 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7068 on 999997 degrees of freedom
Multiple R-squared: 0.5004, Adjusted R-squared: 0.5004
F-statistic: 5.009e+05 on 2 and 999997 DF, p-value: < 2.2e-16
Call:
lm(formula = T.H ~ B.H)
Residuals:
Min 1Q Median 3Q Max
-3.9738 -0.5843 -0.0201 0.5620 4.8936
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.585032 0.001362 429.6 <2e-16 ***
B.H -0.269361 0.001362 -197.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8552 on 500352 degrees of freedom
Multiple R-squared: 0.07254, Adjusted R-squared: 0.07254
F-statistic: 3.914e+04 on 1 and 500352 DF, p-value: < 2.2e-16
We can also use package Dagitty to illustrate collider bias, also known as M-bias.
## If we want to infer causal effec of B on T,
## we can apply the command to figure out
## variables we should condition on:
adjustmentSets( g, "T", "B" )
## empty set -- we should not condition on the additional
## variable C.
## Generate data where C = .5T + .5B
set.seed( 123); d <- simulateSEM( g, .5 )
confint( lm( T ~ B, d ) )["B",] # includes 0
confint( lm( T ~ B + C, d ) )["B",] # does not include 0
{}
- 2.5 %
- -0.100051394448281
- 97.5 %
- 0.0712473972196271
- 2.5 %
- -0.427145241356265
- 97.5 %
- -0.256291434974066