Chapter 14 Week11_2: Lavaan Lab 11 Model Local Fitting and Model Modifications

In this lab, we will learn:

  • how to examine SEM local fit using residuals
  • how to modify SEM models for improved fit using modification indices

Load up the lavaan library:

library(lavaan)

14.1 PART I: Local Fit with Residuals

Let’s read in the new dataset ChiStatSimDat.csv:

cfaData<- read.csv("ChiStatSimDat.csv", header = T)

Write out syntax for a two-factor CFA model:

fixedIndTwoFacSyntax <- "
    #Factor Specification   
    posAffect =~ glad + happy + cheerful 
    satisfaction =~ satisfied + content + comfortable
"

Fit the two-factor model:

fixedIndTwoFacRun = lavaan::sem(model = fixedIndTwoFacSyntax, 
                        data = cfaData, 
                        fixed.x=FALSE)

14.1.1 Unstandardized residuals

resid(fixedIndTwoFacRun)$cov
##             glad   happy  cherfl satsfd contnt cmfrtb
## glad         0.000                                   
## happy        0.017  0.000                            
## cheerful    -0.028 -0.010  0.000                     
## satisfied   -0.001  0.011  0.038  0.000              
## content     -0.030 -0.114  0.113  0.017  0.000       
## comfortable  0.078  0.063  0.344 -0.090  0.012  0.000

What does this mean? What is the metric?

14.1.2 Standardized residuals

resid(fixedIndTwoFacRun, type = "standardized")$cov
##             glad   happy  cherfl satsfd contnt cmfrtb
## glad         0.000                                   
## happy        3.950  0.000                            
## cheerful    -3.732 -0.764  0.000                     
## satisfied   -0.035  0.431  1.189  0.000              
## content     -2.164 -5.979  3.164  3.410  0.000       
## comfortable  2.079  1.383  7.821 -4.755  0.762  0.000

14.1.3 Normalized residuals

resid(fixedIndTwoFacRun, type = "normalized")$cov
##             glad   happy  cherfl satsfd contnt cmfrtb
## glad         0.000                                   
## happy        0.117  0.000                            
## cheerful    -0.254 -0.081  0.000                     
## satisfied   -0.004  0.073  0.337  0.000              
## content     -0.214 -0.738  0.903  0.105  0.000       
## comfortable  0.686  0.502  3.187 -0.751  0.087  0.000
  • Different residuals, same story
  • The covariance residual between cheerful and comfortable is the largest and positive
  • The model under-predicts this covariance
  • Fix!

14.2 PART II: Modification Indices

modindices(fixedIndTwoFacRun)

Filter output and only show rows with a modification index value equal or higher than 1:

modindices(fixedIndTwoFacRun, minimum.value = 10)
##             lhs op         rhs     mi    epc sepc.lv sepc.all sepc.nox
## 17    posAffect =~     content 17.031 -0.718  -0.801   -0.587   -0.587
## 18    posAffect =~ comfortable 14.926  0.500   0.558    0.487    0.487
## 20 satisfaction =~       happy 10.239 -0.390  -0.450   -0.338   -0.338
## 21 satisfaction =~    cheerful 20.960  0.456   0.526    0.501    0.501
## 22         glad ~~       happy 20.960  0.256   0.256    1.245    1.245
## 23         glad ~~    cheerful 10.239 -0.106  -0.106   -0.443   -0.443
## 29        happy ~~     content 18.588 -0.132  -0.132   -0.525   -0.525
## 33     cheerful ~~ comfortable 45.215  0.253   0.253    0.521    0.521
## 34    satisfied ~~     content 14.926  0.303   0.303    1.151    1.151
## 35    satisfied ~~ comfortable 17.030 -0.181  -0.181   -0.414   -0.414

Sort the output using the values of the modification index values. Higher values appear first:

modindices(fixedIndTwoFacRun, minimum.value = 10, sort = TRUE)
##             lhs op         rhs     mi    epc sepc.lv sepc.all sepc.nox
## 33     cheerful ~~ comfortable 45.215  0.253   0.253    0.521    0.521
## 21 satisfaction =~    cheerful 20.960  0.456   0.526    0.501    0.501
## 22         glad ~~       happy 20.960  0.256   0.256    1.245    1.245
## 29        happy ~~     content 18.588 -0.132  -0.132   -0.525   -0.525
## 17    posAffect =~     content 17.031 -0.718  -0.801   -0.587   -0.587
## 35    satisfied ~~ comfortable 17.030 -0.181  -0.181   -0.414   -0.414
## 18    posAffect =~ comfortable 14.926  0.500   0.558    0.487    0.487
## 34    satisfied ~~     content 14.926  0.303   0.303    1.151    1.151
## 20 satisfaction =~       happy 10.239 -0.390  -0.450   -0.338   -0.338
## 23         glad ~~    cheerful 10.239 -0.106  -0.106   -0.443   -0.443
  • op ~~ : a correlation between two unique factors
  • op =~ : cross-loading
  • This indicates that the parameters lavaan detects for you to free up are all residual covariances.

14.2.1 Modified Model 1:

mod1 <- "
    posAffect =~ glad + happy + cheerful 
    satisfaction =~ satisfied + content + comfortable  
    
    #residal covariance
    cheerful ~~ comfortable
"
mod1_fit <- lavaan::sem(mod1, data = cfaData, 
                        std.lv = TRUE, fixed.x=FALSE)

summary(mod1_fit, fit.measures = T, standardized = T)
## lavaan 0.6-12 ended normally after 29 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                60.048
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1168.055
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.954
##   Tucker-Lewis Index (TLI)                       0.901
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1386.429
##   Loglikelihood unrestricted model (H1)      -1356.405
##                                                       
##   Akaike (AIC)                                2800.859
##   Bayesian (BIC)                              2847.035
##   Sample-size adjusted Bayesian (BIC)         2802.682
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.195
##   90 Percent confidence interval - lower         0.151
##   90 Percent confidence interval - upper         0.241
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.058
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   posAffect =~                                                 
##     glad              1.111    0.064   17.475    0.000    1.111
##     happy             1.221    0.073   16.802    0.000    1.221
##     cheerful          0.801    0.060   13.392    0.000    0.801
##   satisfaction =~                                              
##     satisfied         1.161    0.071   16.389    0.000    1.161
##     content           1.262    0.075   16.835    0.000    1.262
##     comfortable       0.750    0.069   10.821    0.000    0.750
##   Std.all
##          
##     0.940
##     0.919
##     0.790
##          
##     0.910
##     0.925
##     0.679
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##  .cheerful ~~                                                  
##    .comfortable       0.270    0.043    6.230    0.000    0.270
##   posAffect ~~                                                 
##     satisfaction      0.876    0.022   39.298    0.000    0.876
##   Std.all
##          
##     0.536
##          
##     0.876
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##    .glad              0.163    0.031    5.200    0.000    0.163
##    .happy             0.274    0.043    6.406    0.000    0.274
##    .cheerful          0.386    0.043    8.990    0.000    0.386
##    .satisfied         0.281    0.046    6.106    0.000    0.281
##    .content           0.271    0.050    5.365    0.000    0.271
##    .comfortable       0.657    0.070    9.380    0.000    0.657
##     posAffect         1.000                               1.000
##     satisfaction      1.000                               1.000
##   Std.all
##     0.116
##     0.155
##     0.375
##     0.172
##     0.145
##     0.539
##     1.000
##     1.000

Model comparison:

anova(mod1_fit, fixedIndTwoFacRun)
## Chi-Squared Difference Test
## 
##                   Df    AIC    BIC   Chisq Chisq diff Df diff
## mod1_fit           7 2800.9 2847.0  60.048                   
## fixedIndTwoFacRun  8 2852.4 2895.3 113.638      53.59       1
##                   Pr(>Chisq)    
## mod1_fit                        
## fixedIndTwoFacRun   2.47e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Keep modifying mod1:

modindices(mod1_fit, minimum.value = 10, sort = TRUE)
##             lhs op         rhs     mi    epc sepc.lv sepc.all sepc.nox
## 22 satisfaction =~    cheerful 27.353  0.612   0.612    0.604    0.604
## 23         glad ~~       happy 27.353  0.275   0.275    1.304    1.304
## 34    satisfied ~~     content 23.194  0.369   0.369    1.340    1.340
## 19    posAffect =~ comfortable 23.194  0.711   0.711    0.644    0.644
## 30        happy ~~     content 22.974 -0.157  -0.157   -0.576   -0.576
## 33     cheerful ~~     content 16.477  0.113   0.113    0.348    0.348
## 24         glad ~~    cheerful 11.873 -0.098  -0.098   -0.392   -0.392
## 21 satisfaction =~       happy 11.872 -0.506  -0.506   -0.381   -0.381
## 35    satisfied ~~ comfortable 11.662 -0.127  -0.127   -0.297   -0.297
## 18    posAffect =~     content 11.661 -0.695  -0.695   -0.509   -0.509

14.2.2 Modified Model 2_1:

mod2_1 <- "
  # cross loading
    posAffect =~ glad + happy + cheerful 
    satisfaction =~ cheerful + satisfied + content + comfortable  # cheerful also loads on satisfaction
    
    #residal covariance
    cheerful ~~ comfortable
"
mod2_1_fit <- lavaan::sem(mod2_1, data = cfaData, 
                          std.lv = TRUE, fixed.x=FALSE)

summary(mod2_1_fit, fit.measures = T, standardized = T)
## lavaan 0.6-12 ended normally after 27 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                34.265
##   Degrees of freedom                                 6
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1168.055
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.975
##   Tucker-Lewis Index (TLI)                       0.939
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1373.538
##   Loglikelihood unrestricted model (H1)      -1356.405
##                                                       
##   Akaike (AIC)                                2777.076
##   Bayesian (BIC)                              2826.551
##   Sample-size adjusted Bayesian (BIC)         2779.029
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.153
##   90 Percent confidence interval - lower         0.106
##   90 Percent confidence interval - upper         0.205
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.031
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   posAffect =~                                                 
##     glad              1.128    0.063   17.786    0.000    1.128
##     happy             1.223    0.073   16.752    0.000    1.223
##     cheerful          0.389    0.084    4.624    0.000    0.389
##   satisfaction =~                                              
##     cheerful          0.483    0.089    5.400    0.000    0.483
##     satisfied         1.151    0.071   16.168    0.000    1.151
##     content           1.284    0.074   17.339    0.000    1.284
##     comfortable       0.815    0.072   11.361    0.000    0.815
##   Std.all
##          
##     0.954
##     0.921
##     0.374
##          
##     0.465
##     0.902
##     0.941
##     0.712
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##  .cheerful ~~                                                  
##    .comfortable       0.262    0.043    6.055    0.000    0.262
##   posAffect ~~                                                 
##     satisfaction      0.835    0.027   30.624    0.000    0.835
##   Std.all
##          
##     0.529
##          
##     0.835
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##    .glad              0.126    0.035    3.618    0.000    0.126
##    .happy             0.269    0.047    5.747    0.000    0.269
##    .cheerful          0.380    0.041    9.270    0.000    0.380
##    .satisfied         0.303    0.047    6.458    0.000    0.303
##    .content           0.215    0.049    4.409    0.000    0.215
##    .comfortable       0.646    0.070    9.287    0.000    0.646
##     posAffect         1.000                               1.000
##     satisfaction      1.000                               1.000
##   Std.all
##     0.090
##     0.152
##     0.353
##     0.186
##     0.115
##     0.493
##     1.000
##     1.000

Model comparison:

anova(mod2_1_fit, mod1_fit)
## Chi-Squared Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## mod2_1_fit  6 2777.1 2826.6 34.265                                  
## mod1_fit    7 2800.9 2847.0 60.048     25.783       1  3.821e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

14.2.3 Modified Model 2_2:

mod2_2 <- "
    posAffect =~ glad + happy + cheerful 
    satisfaction =~ satisfied + content + comfortable 
    
    #residal covariance
    cheerful ~~ comfortable
    glad ~~ happy
"
mod2_2_fit <- lavaan::sem(mod2_2, data = cfaData, 
                          std.lv = TRUE, fixed.x=FALSE)

summary(mod2_2_fit, fit.measures = T, standardized = T)
## lavaan 0.6-12 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                34.265
##   Degrees of freedom                                 6
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1168.055
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.975
##   Tucker-Lewis Index (TLI)                       0.939
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1373.538
##   Loglikelihood unrestricted model (H1)      -1356.405
##                                                       
##   Akaike (AIC)                                2777.076
##   Bayesian (BIC)                              2826.551
##   Sample-size adjusted Bayesian (BIC)         2779.029
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.153
##   90 Percent confidence interval - lower         0.106
##   90 Percent confidence interval - upper         0.205
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.031
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   posAffect =~                                                 
##     glad              1.020    0.069   14.784    0.000    1.020
##     happy             1.107    0.079   13.947    0.000    1.107
##     cheerful          0.875    0.061   14.323    0.000    0.875
##   satisfaction =~                                              
##     satisfied         1.151    0.071   16.168    0.000    1.151
##     content           1.284    0.074   17.339    0.000    1.284
##     comfortable       0.815    0.072   11.361    0.000    0.815
##   Std.all
##          
##     0.863
##     0.833
##     0.843
##          
##     0.902
##     0.941
##     0.712
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##  .cheerful ~~                                                  
##    .comfortable       0.262    0.043    6.055    0.000    0.262
##  .glad ~~                                                      
##    .happy             0.250    0.055    4.563    0.000    0.250
##   posAffect ~~                                                 
##     satisfaction      0.923    0.020   46.285    0.000    0.923
##   Std.all
##          
##     0.584
##          
##     0.569
##          
##     0.923
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##    .glad              0.356    0.053    6.679    0.000    0.356
##    .happy             0.540    0.074    7.249    0.000    0.540
##    .cheerful          0.312    0.043    7.197    0.000    0.312
##    .satisfied         0.303    0.047    6.458    0.000    0.303
##    .content           0.215    0.049    4.409    0.000    0.215
##    .comfortable       0.646    0.070    9.287    0.000    0.646
##     posAffect         1.000                               1.000
##     satisfaction      1.000                               1.000
##   Std.all
##     0.255
##     0.306
##     0.290
##     0.186
##     0.115
##     0.493
##     1.000
##     1.000

Model comparison:

anova(mod2_2_fit, mod1_fit)
## Chi-Squared Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## mod2_2_fit  6 2777.1 2826.6 34.265                                  
## mod1_fit    7 2800.9 2847.0 60.048     25.783       1  3.821e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Keep modifying 2_2:

modindices(mod2_2_fit, minimum.value = 10, sort = TRUE)
##      lhs op     rhs    mi    epc sepc.lv sepc.all sepc.nox
## 30 happy ~~ content 14.77 -0.117  -0.117   -0.344   -0.344

14.2.4 Modified Model 3:

mod3 <- "
    posAffect =~ glad + happy + cheerful 
    satisfaction =~ satisfied + content + comfortable 
    
    #residal covariance
    cheerful ~~ comfortable
    glad ~~ happy
    content ~~ happy
"
mod3_fit <- lavaan::sem(mod3, data = cfaData, std.lv = TRUE, fixed.x = FALSE)

summary(mod3_fit, fit.measures = T, standardized = T)
## lavaan 0.6-12 ended normally after 33 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        16
## 
##   Number of observations                           200
## 
## Model Test User Model:
##                                                       
##   Test statistic                                16.311
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.006
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1168.055
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.990
##   Tucker-Lewis Index (TLI)                       0.971
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1364.561
##   Loglikelihood unrestricted model (H1)      -1356.405
##                                                       
##   Akaike (AIC)                                2761.122
##   Bayesian (BIC)                              2813.895
##   Sample-size adjusted Bayesian (BIC)         2763.205
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.106
##   90 Percent confidence interval - lower         0.052
##   90 Percent confidence interval - upper         0.166
##   P-value RMSEA <= 0.05                          0.046
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.025
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   posAffect =~                                                 
##     glad              1.017    0.069   14.710    0.000    1.017
##     happy             1.152    0.077   14.869    0.000    1.152
##     cheerful          0.871    0.061   14.298    0.000    0.871
##   satisfaction =~                                              
##     satisfied         1.141    0.071   16.002    0.000    1.141
##     content           1.293    0.074   17.585    0.000    1.293
##     comfortable       0.814    0.071   11.427    0.000    0.814
##   Std.all
##          
##     0.860
##     0.873
##     0.838
##          
##     0.895
##     0.948
##     0.711
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##  .cheerful ~~                                                  
##    .comfortable       0.266    0.042    6.285    0.000    0.266
##  .glad ~~                                                      
##    .happy             0.187    0.055    3.384    0.001    0.187
##  .happy ~~                                                     
##    .content          -0.134    0.031   -4.358    0.000   -0.134
##   posAffect ~~                                                 
##     satisfaction      0.928    0.020   47.200    0.000    0.928
##   Std.all
##          
##     0.583
##          
##     0.484
##          
##    -0.484
##          
##     0.928
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##    .glad              0.364    0.054    6.787    0.000    0.364
##    .happy             0.412    0.073    5.661    0.000    0.412
##    .cheerful          0.321    0.042    7.645    0.000    0.321
##    .satisfied         0.325    0.046    7.060    0.000    0.325
##    .content           0.187    0.048    3.901    0.000    0.187
##    .comfortable       0.647    0.068    9.474    0.000    0.647
##     posAffect         1.000                               1.000
##     satisfaction      1.000                               1.000
##   Std.all
##     0.260
##     0.237
##     0.298
##     0.200
##     0.101
##     0.494
##     1.000
##     1.000

Model comparison:

anova(mod3_fit, mod2_2_fit)
## Chi-Squared Difference Test
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## mod3_fit    5 2761.1 2813.9 16.311                                  
## mod2_2_fit  6 2777.1 2826.6 34.265     17.954       1  2.263e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Keep modifying mod3:

modindices(mod3_fit, minimum.value = 10, sort = TRUE)
## [1] lhs      op       rhs      mi       epc      sepc.lv  sepc.all
## [8] sepc.nox
## <0 rows> (or 0-length row.names)

No suggestions could decrease the model chisquare by more than 10.