Chapter 10 Regression Discontinuity Designs (RDD)
- This lab goes over several estimators for RDD designs.
- Part of the R codes were adapted from Chapter 13 of: R Companion to Real Econometrics: https://bookdown.org/carillitony/bailey/chp11.html#fn20
Install and load the following packages:
library(ggplot2)
library(tidyverse)
library(rdd)
10.1 Data
Let’s estimate an RDD model using the data from a 2009 paper by Carpenter and Dobkin about the effect of increasing the drinking age on mortality rates.
Carpenter, Christopher and Carlos Dobkin. “The Effects of Alcohol Consumption on Mortality: Regression Discontinuity from the Minimum Drinking Age,” American Economic Journal: Applied Econometrics, 2009, 1:1, 164-182.
The data are available at https://github.com/jrnold/masteringmetrics/tree/master/masteringmetrics/data in mlda.rda
First, Let’s look at motor vehicle deaths (outcome) as a function of age (assignment variable). The cutoff age is 21 years old.
load("mlda.rda")
= na.omit(mlda) # removed 2 rows with missing data
mlda head(mlda)
## # A tibble: 6 × 19
## agecell all allfitted internal interna…¹ exter…² exter…³ alcohol alcoh…⁴ homic…⁵ homic…⁶ suicide
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 19.1 92.8 91.7 16.6 16.7 76.2 75.0 0.639 0.794 16.3 16.3 11.2
## 2 19.2 95.1 91.9 18.3 16.9 76.8 75.0 0.677 0.838 16.9 16.3 12.2
## 3 19.2 92.1 92.0 18.9 17.1 73.2 75.0 0.866 0.878 15.2 16.3 11.7
## 4 19.3 88.4 92.2 16.1 17.3 72.3 74.9 0.867 0.915 16.7 16.3 11.3
## 5 19.4 88.7 92.3 17.4 17.4 71.3 74.9 1.02 0.949 14.9 16.3 11.0
## 6 19.5 90.2 92.5 17.9 17.6 72.3 74.9 1.17 0.981 15.6 16.3 12.2
## # … with 7 more variables: suicidefitted <dbl>, mva <dbl>, mvafitted <dbl>, drugs <dbl>,
## # drugsfitted <dbl>, externalother <dbl>, externalotherfitted <dbl>, and abbreviated variable
## # names ¹internalfitted, ²external, ³externalfitted, ⁴alcoholfitted, ⁵homicide, ⁶homicidefitted
dim(mlda)
## [1] 48 19
- assignment: minimum legal drinking age (agecell)
- cutoff: 21 years old
- outcome: Motor Vehicle Accidents Death (mva) per 100,000 individuals
Plot the outcome against assignment variable (ggplot2):
%>%
mlda select(agecell, mva) %>%
mutate(Treat = as.factor(ifelse(agecell >= 21, 1, 0))) %>%
ggplot(aes(x = agecell, y = mva, color = Treat)) +
geom_point() +
geom_vline(xintercept = 21) +
labs(y = "Deaths in Moving Vehicle Accidents", x = "Age")
Alternatively, there is a generic way of plotting using plot():
plot(mlda$agecell, mlda$mva, xlab = "Age", ylab = "Deaths in Moving Vehicle Accidents")
abline(v=21, col = 'red')
10.2 Segmented regression
- There appears to be a discontinuity at age 21. Let’s estimate the RD model:
- mva = b0 + b1*(agecell-21) + b2*Treat + b3*interaction1 + e
- where Treat=1 if agecell>=21 and Treat=0 if agecell<21
- let’s call age_centered = agecell-21
- interaction1 = *age_centered*Treat
$Treat = ifelse(mlda$agecell>=21, 1, 0) # dummy variable for policy
mlda$age_centered = mlda$agecell - 21 # "centered" age
mlda$int1 = mlda$Treat*mlda$age_centered # interaction with "centered" age mlda
Segmented regression:
= lm(mva ~ age_centered + Treat + int1, data = mlda)
lm1 summary(lm1)
##
## Call:
## lm(formula = mva ~ age_centered + Treat + int1, data = mlda)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4124 -0.7774 -0.2913 0.8495 3.2378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.9292 0.5308 56.390 < 2e-16 ***
## age_centered -2.5676 0.4661 -5.508 0.000001772 ***
## Treat 4.5340 0.7506 6.041 0.000000294 ***
## int1 -1.1624 0.6592 -1.763 0.0848 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.299 on 44 degrees of freedom
## Multiple R-squared: 0.7222, Adjusted R-squared: 0.7032
## F-statistic: 38.13 on 3 and 44 DF, p-value: 2.671e-12
# interpretation?
Now, note that age must be centered at the cutoff for ease of interpretation:
$int2 = mlda$Treat*mlda$agecell # interaction with age
mlda
= lm(mva ~ agecell + Treat + int2, data = mlda)
lm.wrong summary(lm.wrong)
##
## Call:
## lm(formula = mva ~ agecell + Treat + int2, data = mlda)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4124 -0.7774 -0.2913 0.8495 3.2378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.8492 9.3328 8.984 1.63e-11 ***
## agecell -2.5676 0.4661 -5.508 1.77e-06 ***
## Treat 28.9450 13.8638 2.088 0.0426 *
## int2 -1.1624 0.6592 -1.763 0.0848 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.299 on 44 degrees of freedom
## Multiple R-squared: 0.7222, Adjusted R-squared: 0.7032
## F-statistic: 38.13 on 3 and 44 DF, p-value: 2.671e-12
# interpretation?
::stargazer(lm1, lm.wrong, type = 'text') stargazer
##
## ==========================================================
## Dependent variable:
## ----------------------------
## mva
## (1) (2)
## ----------------------------------------------------------
## age_centered -2.568***
## (0.466)
##
## agecell -2.568***
## (0.466)
##
## Treat 4.534*** 28.945**
## (0.751) (13.864)
##
## int1 -1.162*
## (0.659)
##
## int2 -1.162*
## (0.659)
##
## Constant 29.929*** 83.849***
## (0.531) (9.333)
##
## ----------------------------------------------------------
## Observations 48 48
## R2 0.722 0.722
## Adjusted R2 0.703 0.703
## Residual Std. Error (df = 44) 1.299 1.299
## F Statistic (df = 3; 44) 38.125*** 38.125***
## ==========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
PLOTTING: it is easier and fancier to do the plotting using ggplot2, but it’s also feasible to plot using generic plot()
# RDD - unequal slope
%>%
mlda select(agecell, mva) %>%
mutate(Treat = as.factor(ifelse(agecell >= 21, 1, 0))) %>%
ggplot(aes(x = agecell, y = mva, color = Treat)) +
geom_point() +
geom_smooth(method = "lm") +
labs(y = "Deaths in Moving Vehicle Accidents", x = "Age")
## `geom_smooth()` using formula = 'y ~ x'
10.3 IK estimator
Imbens-Kalyanaraman optimal bandwidth calculation:
library(rdd)
= RDestimate(formula = mva ~ agecell, data = mlda, cutpoint = 21)
rd1 summary(rd1)
##
## Call:
## RDestimate(formula = mva ~ agecell, data = mlda, cutpoint = 21)
##
## Type:
## sharp
##
## Estimates:
## Bandwidth Observations Estimate Std. Error z value Pr(>|z|)
## LATE 1.2109 30 4.977 1.0590 4.700 2.607e-06 ***
## Half-BW 0.6054 14 4.956 1.3767 3.600 3.182e-04 ***
## Double-BW 2.4218 48 4.566 0.7086 6.444 1.162e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistics:
## F Num. DoF Denom. DoF p
## LATE 13.32 3 26 3.692e-05
## Half-BW 12.76 3 10 1.879e-03
## Double-BW 26.99 3 44 9.322e-10
- In an RDD framework, we always need to balance our choice of large and small bandwidth. Larger bandwidth make the comparison between the treatment group and control group “less similar”.
- We see that the coefficient called “LATE”, is our parameter of interest of the MLDA program on mortality.
- We observe that deaths increase by 4.977 at the cutoff point of 21. The estimate of the treatment is slightly larger than the RDD estimate in lm1 (4.5340).
smoothed curve:
plot(rd1)