12. ColliderBias Hollywood#

Here is a simple mnemonic example to illustate the collider or M-bias.

Here the idea is that people who get to Hollywood have to have high congenility = talent + beauty. Funnily enough this induces a negative correlation between talents and looks, when we condition on the set of actors or celebrities. This simple example explains an anecdotal observation that “talent and beaty are negatively correlated” for celebrities.

install.packages("dagitty")
library(dagitty)
g <- dagitty( "dag{ T -> C <- B }" )
plot(g)
Plot coordinates for graph not supplied! Generating coordinates, see ?coordinates for how to set your own.
../_images/69d8ec1f7d6f5f681477ce05a9675196e706bd172389ca141c0347eae9123b0a.png
#collider bias
n=1000000
T = rnorm(n)   #talent
B = rnorm(n)   #beaty
C = T+B + rnorm(n) #congeniality
T.H= subset(T, C>0) # condition on C>0
B.H= subset(B, C>0) # condition on C>0

summary(lm(T~ B))  #regression of T on B
summary(lm(T~ B +C)) #regression of T on B and C
summary(lm(T.H~ B.H)) #regression of T on B, conditional on C>0.
Call:
lm(formula = T ~ B)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.2137 -0.6738 -0.0004  0.6747  5.2290 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0004180  0.0010000   0.418    0.676
B           0.0004454  0.0009999   0.445    0.656

Residual standard error: 1 on 999998 degrees of freedom
Multiple R-squared:  1.984e-07,	Adjusted R-squared:  -8.016e-07 
F-statistic: 0.1984 on 1 and 999998 DF,  p-value: 0.656
Call:
lm(formula = T ~ B + C)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3345 -0.4762 -0.0001  0.4762  3.4964 

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -0.0005995  0.0007068   -0.848    0.396    
B           -0.5004839  0.0008660 -577.921   <2e-16 ***
C            0.5003564  0.0004999 1000.852   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7068 on 999997 degrees of freedom
Multiple R-squared:  0.5004,	Adjusted R-squared:  0.5004 
F-statistic: 5.009e+05 on 2 and 999997 DF,  p-value: < 2.2e-16
Call:
lm(formula = T.H ~ B.H)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9738 -0.5843 -0.0201  0.5620  4.8936 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.585032   0.001362   429.6   <2e-16 ***
B.H         -0.269361   0.001362  -197.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8552 on 500352 degrees of freedom
Multiple R-squared:  0.07254,	Adjusted R-squared:  0.07254 
F-statistic: 3.914e+04 on 1 and 500352 DF,  p-value: < 2.2e-16

We can also use package Dagitty to illustrate collider bias, also known as M-bias.

## If we want to infer causal effec of B on T,
## we can apply the command to figure out 
## variables we should condition on:

adjustmentSets( g, "T", "B" ) 

## empty set -- we should not condition on the additional
## variable C.

## Generate data where C = .5T + .5B
set.seed( 123); d <- simulateSEM( g, .5 )
confint( lm( T ~ B, d ) )["B",] # includes 0
confint( lm( T ~ B + C, d ) )["B",] # does not include 0
 {}
2.5 %
-0.100051394448281
97.5 %
0.0712473972196271
2.5 %
-0.427145241356265
97.5 %
-0.256291434974066