Chapter 2 Regression Approach to Treatment Effect Estimation
Suppose one would like to use a regression model to estimate the treatment effect of a SAT, but controlling for the covariate ‘SES’.
Simulating data:
= c(1:6)
ID = rep(c(0, 1), each = 3)
Grp = c(550, 600, 650, 600, 720, 630)
Score = c(1, 2, 2, 2, 3, 2)
SES
= data.frame(ID, Grp, Score, SES)
SATdat $SES = factor(SATdat$SES) SATdat
2.1 Regression w/ no confounder
= lm(Score ~ Grp, data = SATdat)
lm1 summary(lm1)
##
## Call:
## lm(formula = Score ~ Grp, data = SATdat)
##
## Residuals:
## 1 2 3 4 5 6
## -5.000e+01 -7.638e-14 5.000e+01 -5.000e+01 7.000e+01 -2.000e+01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 600.00 32.66 18.371 5.17e-05 ***
## Grp 50.00 46.19 1.083 0.34
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.57 on 4 degrees of freedom
## Multiple R-squared: 0.2266, Adjusted R-squared: 0.03323
## F-statistic: 1.172 on 1 and 4 DF, p-value: 0.3399
This is the same as a simple t-test:
t.test(SATdat$Score[SATdat$Grp==1], SATdat$Score[SATdat$Grp==0])
##
## Welch Two Sample t-test
##
## data: SATdat$Score[SATdat$Grp == 1] and SATdat$Score[SATdat$Grp == 0]
## t = 1.0825, df = 3.8173, p-value = 0.3426
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -80.69522 180.69522
## sample estimates:
## mean of x mean of y
## 650 600
not sig – sample size is too small
2.2 Including confounder SES
= lm(Score ~ Grp+SES, data = SATdat)
ses.lm summary(ses.lm)
##
## Call:
## lm(formula = Score ~ Grp + SES, data = SATdat)
##
## Residuals:
## 1 2 3 4 5 6
## 1.066e-14 -2.500e+01 2.500e+01 -1.500e+01 -4.016e-15 1.500e+01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 550.00 29.15 18.865 0.0028 **
## Grp -10.00 29.15 -0.343 0.7643
## SES2 75.00 35.71 2.100 0.1705
## SES3 180.00 50.50 3.565 0.0705 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29.15 on 2 degrees of freedom
## Multiple R-squared: 0.8973, Adjusted R-squared: 0.7432
## F-statistic: 5.824 on 3 and 2 DF, p-value: 0.1501
The effect (-10) is assumed between groups WITHIN EACH SES LEVEL:
2.2.1 low SES
predict(ses.lm, newdata = data.frame(Grp=0, SES=factor(1)))
## 1
## 550
predict(ses.lm, newdata = data.frame(Grp=1, SES=factor(1)))
## 1
## 540