Chapter 8 Interupted Time Series (ITS) (1)

  • In this lab, we will see how to use Segmented Regression to analyze ITS data.
  • Source: “Global Health Research: Design + Methods”. By Dr. Eric Green

Install and load the following packages:

library(ggplot2)
library(forecast)
library(sandwich)
library(lmtest)

8.1 Data

Generating sample data:

dat <- data.frame(month=seq(from=1, to=24), 
                  post=c(rep(0, 12), rep(1, 12)),
                  monthpost=c(rep(0, 12), seq(from=1, to=12)), 
                  outcome=c(50, 60, 45, 47, 50, 55, 53, 51, 48, 50, 47, 48,
                            55, 60, 63, 67, 65, 69, 72, 68, 74, 71, 76, 70))
head(dat)
##   month post monthpost outcome
## 1     1    0         0      50
## 2     2    0         0      60
## 3     3    0         0      45
## 4     4    0         0      47
## 5     5    0         0      50
## 6     6    0         0      55

8.1.1 Line plot

p1 <- ggplot(data = dat, mapping = aes(x = month, y = outcome)) +
  geom_line(color = 'red') + 
  geom_point(fill="white", shape=21) + theme_bw() + 
  geom_vline(xintercept = 12.5, color = 'blue', linetype = "longdash")
p1

8.2 OLS Segmented Regression

lmFit1 <- lm(outcome ~ month + post + monthpost, data=dat) 
summary(lmFit1)
## 
## Call:
## lm(formula = outcome ~ month + post + monthpost, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5326 -1.6935 -0.3939  2.3747  8.1247 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  52.5606     2.2450  23.412 5.24e-16 ***
## month        -0.3427     0.3050  -1.123 0.274605    
## post          9.6876     2.9940   3.236 0.004144 ** 
## monthpost     1.7832     0.4314   4.134 0.000515 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.648 on 20 degrees of freedom
## Multiple R-squared:  0.8867, Adjusted R-squared:  0.8697 
## F-statistic: 52.15 on 3 and 20 DF,  p-value: 1.227e-09
  • change in intercept: b2 = 9.6876 p = 0.004144 **
  • change in slope: b3 = 1.7832 p = 0.000515 ***

8.2.1 Regression diagnostics

par(mfrow=c(2,2), mar = c(2, 4, 4, 4))
plot(lmFit1)

par(mfrow=c(1,1))

nothing abnormal.

8.2.2 Overplot prediction

dat$predicted = predict(lmFit1, data = dat)
p1 + 
  geom_line(data = dat[dat$post==0,], aes(x=month, y=predicted), color="green") +
  geom_line(data = dat[dat$post==1,], aes(x=month, y=predicted), color="green")

8.3 OLS Segmented Regression with Polynomial term

outcome=Time+Time^{2} +Treat+delta_Time+delta_Time^{2}

lmFit2 <- lm(outcome ~ month + I(month^2) + post + monthpost + I(monthpost^2), data=dat) 
summary(lmFit2)
## 
## Call:
## lm(formula = outcome ~ month + I(month^2) + post + monthpost + 
##     I(monthpost^2), data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5190 -1.5350  0.1249  1.5789  8.4660 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    51.31818    3.55724  14.426 2.47e-11 ***
## month           0.18981    1.25812   0.151  0.88176    
## I(month^2)     -0.04096    0.09421  -0.435  0.66891    
## post            4.52947    4.37383   1.036  0.31411    
## monthpost       4.76623    1.65520   2.880  0.00998 ** 
## I(monthpost^2) -0.15385    0.13324  -1.155  0.26332    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.442 on 18 degrees of freedom
## Multiple R-squared:  0.9092, Adjusted R-squared:  0.884 
## F-statistic: 36.04 on 5 and 18 DF,  p-value: 9.126e-09
  • change in intercept: b3 = 4.52947 p = 0.31411 **
  • change in slope: b4 = 4.76623 p = 0.00998 **
  • change in slope^2: b5 = -0.15385 p = 0.26332

8.3.1 Overplot prediction

dat$predicted2 = predict(lmFit2, data = dat)
p1 + 
  geom_line(data = dat[dat$post==0,], aes(x=month, y=predicted2), color="green") +
  geom_line(data = dat[dat$post==1,], aes(x=month, y=predicted2), color="green")

8.4 Autocorrelation

ACF and PACF of residuals:

par(mfrow=c(2,2), mar = c(2, 4, 4, 4))
acf(dat$outcome, lag.max = 23) # for Y
pacf(dat$outcome, lag.max = 23) # for Y
acf(resid(lmFit2)) # for e
pacf(resid(lmFit2)) # for e

checkresiduals(lmFit2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 9
## 
## data:  Residuals
## LM test = 15.996, df = 9, p-value = 0.06696
  • the Breusch-Godfrey test for jointly testing up to 9th order autocorrelation.
  • The residual plot shows some changing variation over time, but is not remarkable (p = 0.06696).
  • The histogram shows that the residuals seem to be slightly skewed, which may also affect the standard errors of the residuals.
  • The autocorrelation plot (ACF) shows no significant spike beyond the dashed blue line. Even up to lag 8, there is not quite enough evidence for the Breusch-Godfrey to be significant at the 5% level. The autocorrelations are not particularly large, and will be unlikely to have any noticeable impact on the forecasts or the prediction intervals.
  • We are good with segmented regression…
  • But for illustration purposes…

8.5 Segmented Regression with HAC correction for standard errors

8.5.1 OLS:

round(vcov(lmFit2), 2) # original
##                (Intercept) month I(month^2)  post monthpost I(monthpost^2)
## (Intercept)          12.65 -4.04       0.27 -2.96     -2.42          -0.27
## month                -4.04  1.58      -0.12  1.66      1.19           0.12
## I(month^2)            0.27 -0.12       0.01 -0.16     -0.10          -0.01
## post                 -2.96  1.66      -0.16 19.13     -1.79           0.43
## monthpost            -2.42  1.19      -0.10 -1.79      2.74          -0.02
## I(monthpost^2)       -0.27  0.12      -0.01  0.43     -0.02           0.02
summary(lmFit2)
## 
## Call:
## lm(formula = outcome ~ month + I(month^2) + post + monthpost + 
##     I(monthpost^2), data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5190 -1.5350  0.1249  1.5789  8.4660 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    51.31818    3.55724  14.426 2.47e-11 ***
## month           0.18981    1.25812   0.151  0.88176    
## I(month^2)     -0.04096    0.09421  -0.435  0.66891    
## post            4.52947    4.37383   1.036  0.31411    
## monthpost       4.76623    1.65520   2.880  0.00998 ** 
## I(monthpost^2) -0.15385    0.13324  -1.155  0.26332    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.442 on 18 degrees of freedom
## Multiple R-squared:  0.9092, Adjusted R-squared:  0.884 
## F-statistic: 36.04 on 5 and 18 DF,  p-value: 9.126e-09

8.5.2 HAC:

round(vcovHAC(lmFit2), 2)  # HAC corrected
##                (Intercept) month I(month^2)  post monthpost I(monthpost^2)
## (Intercept)           9.81 -2.55       0.15 -0.65     -0.99          -0.15
## month                -2.55  1.14      -0.08  0.87      0.84           0.08
## I(month^2)            0.15 -0.08       0.01 -0.08     -0.07          -0.01
## post                 -0.65  0.87      -0.08  2.26      0.66           0.10
## monthpost            -0.99  0.84      -0.07  0.66      0.95           0.06
## I(monthpost^2)       -0.15  0.08      -0.01  0.10      0.06           0.01
coeftest(lmFit2, vcov = vcovHAC(lmFit2))
## 
## t test of coefficients:
## 
##                 Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)    51.318182   3.132281 16.3836 2.919e-12 ***
## month           0.189810   1.067497  0.1778 0.8608590    
## I(month^2)     -0.040959   0.079998 -0.5120 0.6148763    
## post            4.529471   1.504238  3.0111 0.0075014 ** 
## monthpost       4.766234   0.974371  4.8916 0.0001175 ***
## I(monthpost^2) -0.153846   0.084921 -1.8116 0.0867621 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1