Chapter 2 Regression Approach to Treatment Effect Estimation

Suppose one would like to use a regression model to estimate the treatment effect of a SAT, but controlling for the covariate ‘SES’.

Simulating data:

ID = c(1:6)
Grp = rep(c(0, 1), each = 3)
Score = c(550, 600, 650, 600, 720, 630)
SES = c(1, 2, 2, 2, 3, 2)

SATdat = data.frame(ID, Grp, Score, SES)
SATdat$SES = factor(SATdat$SES)

2.1 Regression w/ no confounder

lm1 = lm(Score ~ Grp, data = SATdat)
summary(lm1)
## 
## Call:
## lm(formula = Score ~ Grp, data = SATdat)
## 
## Residuals:
##          1          2          3          4          5          6 
## -5.000e+01 -7.638e-14  5.000e+01 -5.000e+01  7.000e+01 -2.000e+01 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   600.00      32.66  18.371 5.17e-05 ***
## Grp            50.00      46.19   1.083     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.57 on 4 degrees of freedom
## Multiple R-squared:  0.2266, Adjusted R-squared:  0.03323 
## F-statistic: 1.172 on 1 and 4 DF,  p-value: 0.3399

This is the same as a simple t-test:

t.test(SATdat$Score[SATdat$Grp==1], SATdat$Score[SATdat$Grp==0])
## 
##  Welch Two Sample t-test
## 
## data:  SATdat$Score[SATdat$Grp == 1] and SATdat$Score[SATdat$Grp == 0]
## t = 1.0825, df = 3.8173, p-value = 0.3426
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -80.69522 180.69522
## sample estimates:
## mean of x mean of y 
##       650       600

not sig – sample size is too small

2.2 Including confounder SES

ses.lm = lm(Score ~ Grp+SES, data = SATdat)
summary(ses.lm)
## 
## Call:
## lm(formula = Score ~ Grp + SES, data = SATdat)
## 
## Residuals:
##          1          2          3          4          5          6 
##  1.066e-14 -2.500e+01  2.500e+01 -1.500e+01 -4.016e-15  1.500e+01 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   550.00      29.15  18.865   0.0028 **
## Grp           -10.00      29.15  -0.343   0.7643   
## SES2           75.00      35.71   2.100   0.1705   
## SES3          180.00      50.50   3.565   0.0705 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 2 degrees of freedom
## Multiple R-squared:  0.8973, Adjusted R-squared:  0.7432 
## F-statistic: 5.824 on 3 and 2 DF,  p-value: 0.1501

The effect (-10) is assumed between groups WITHIN EACH SES LEVEL:

2.2.1 low SES

predict(ses.lm, newdata = data.frame(Grp=0, SES=factor(1)))
##   1 
## 550
predict(ses.lm, newdata = data.frame(Grp=1, SES=factor(1)))
##   1 
## 540

2.2.2 middle SES

predict(ses.lm, newdata = data.frame(Grp=0, SES=factor(2)))
##   1 
## 625
predict(ses.lm, newdata = data.frame(Grp=1, SES=factor(2)))
##   1 
## 615

2.2.3 high SES

predict(ses.lm, newdata = data.frame(Grp=0, SES=factor(3)))
##   1 
## 730
predict(ses.lm, newdata = data.frame(Grp=1, SES=factor(3)))
##   1 
## 720