Chapter 13 Lavaan Lab 10: Model Fit Part II (Fit Indices)
In this lab, we will learn:
- how to calculate and interpret global fit indices for SEM models.
- how to compare non-nested models using AIC and BIC.
Load up the lavaan library:
library(lavaan)
13.1 PART I: Fit Indices
Let’s read this dataset in:
<- read.csv("cfaInclassData.csv", header = T) cfaData
Write out syntax for a two-factor CFA model:
<- "
fixedIndTwoFacSyntax #Factor Specification
posAffect =~ glad + happy + cheerful
satisfaction =~ satisfied + content + comfortable
"
Fit the model regularly:
= lavaan::sem(model = fixedIndTwoFacSyntax,
fixedIndTwoFacRun data = cfaData,
fixed.x=FALSE,
estimator = 'MLMV')
Request fit indices by adding fit.measures = T in the summary() function:
summary(fixedIndTwoFacRun, standardized = T, fit.measures = T)
## lavaan 0.6-12 ended normally after 24 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 1000
##
## Model Test User Model:
## Standard Robust
## Test Statistic 2.957 2.935
## Degrees of freedom 8 8
## P-value (Chi-square) 0.937 0.938
## Scaling correction factor 1.032
## Shift parameter 0.068
## simple second-order correction
##
## Model Test Baseline Model:
##
## Test statistic 2020.010 849.820
## Degrees of freedom 15 15
## P-value 0.000 0.000
## Scaling correction factor 2.397
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.005 1.011
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7483.272 -7483.272
## Loglikelihood unrestricted model (H1) -7481.793 -7481.793
##
## Akaike (AIC) 14992.544 14992.544
## Bayesian (BIC) 15056.345 15056.345
## Sample-size adjusted Bayesian (BIC) 15015.056 15015.056
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 0.000
## 90 Percent confidence interval - lower 0.000 0.000
## 90 Percent confidence interval - upper 0.009 0.008
## P-value RMSEA <= 0.05 1.000 1.000
##
## Robust RMSEA NA
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.007 0.007
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect =~
## glad 1.000 0.694 0.706
## happy 1.067 0.054 19.782 0.000 0.740 0.758
## cheerful 1.112 0.055 20.147 0.000 0.772 0.785
## satisfaction =~
## satisfied 1.000 0.773 0.767
## content 1.068 0.050 21.399 0.000 0.826 0.762
## comfortable 0.918 0.044 21.056 0.000 0.709 0.746
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect ~~
## satisfaction 0.262 0.027 9.539 0.000 0.488 0.488
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .glad 0.484 0.028 17.146 0.000 0.484 0.501
## .happy 0.405 0.028 14.260 0.000 0.405 0.425
## .cheerful 0.371 0.028 13.371 0.000 0.371 0.384
## .satisfied 0.419 0.028 14.828 0.000 0.419 0.412
## .content 0.491 0.033 14.729 0.000 0.491 0.419
## .comfortable 0.400 0.027 14.876 0.000 0.400 0.443
## posAffect 0.482 0.042 11.544 0.000 1.000 1.000
## satisfaction 0.597 0.049 12.189 0.000 1.000 1.000
13.1.1 RMSEA
:
Root Mean Square Error of Approximation
0.000
RMSEA 90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.009
-value RMSEA <= 0.05 1.000 P
reproducing RMSEA:
- T = 2.957
- df = 8
- N = 1000
- RMSEA = sqrt(max(T-df,0)/(N-1)/df) = sqrt(max(2.957-8,0)/(1000-1)/8) = 0
13.1.2 SRMR
:
Standardized Root Mean Square Residual
0.007 SRMR
reproducing SRMR:
= cov(cfaData[,-1])
S = colnames(S)
colnames = fitted(fixedIndTwoFacRun)$cov[colnames, colnames]
SIGMA = ncol(S)
p
# use cov2cor() function to convert diff to a correlation matrix and standardize the residuals:
= cov2cor(S) - cov2cor(SIGMA)
resd
# keep only the nonduplicated elements:
= lav_matrix_vech(resd)
resd2
sqrt(sum(resd2^2)/(p*(p+1)/2))
## [1] 0.006847801
- A small average standardized residual…looks good
13.1.3 Null Model MO
:
Model Test Baseline Model
2020.010
Test statistic 15
Degrees of freedom -value 0.000 P
- This is the chi_sq for the baseline model used in the CFI/TLI/comparative fit measures.
- We know what this means now!
- chisquare of the null model: 2020.010
- df of the null model: 15
reproducing M0:
<- "
baselineM0 glad ~~ glad
happy ~~ happy
cheerful ~~ cheerful
satisfied ~~ satisfied
content ~~ content
comfortable ~~ comfortable
"
<- lavaan::sem(baselineM0, data = cfaData, fixed.x = FALSE)
base_fit base_fit
## lavaan 0.6-12 ended normally after 12 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 2020.010
## Degrees of freedom 15
## P-value (Chi-square) 0.000
13.1.4 CFI/TLI
:
User Model versus Baseline Model
Index (CFI) 1.000
Comparative Fit -Lewis Index (TLI) 1.005 Tucker
- 100% improvement over the null(baseline) model … great fit
- Here TLI is larger than 1 because this is a rare situation with chisquare=2.957<df=8
- numerator of TLI = (2020.010/15-2.957/8) = 134.2977
- denominator of TLI = (2020.010/15-1) = 133.6673
- TLI = 134.2977 / 133.6673 = 1.005
- A TLI that is larger than 1 is no different from TLI = 1
great overall fit.
13.1.5 Loglikelihood
:
Loglikelihood and Information Criteria
model (H0) -7483.272
Loglikelihood user model (H1) -7481.793 Loglikelihood unrestricted
- H0 is the loglikelihood of your model (user model … lavaan is kind to clarify this)
- H1 is the saturated model loglikelihood.
13.1.6 AIC/BIC
Akaike (AIC) 14992.544
Bayesian (BIC) 15056.345
-size adjusted Bayesian (BIC) 15015.056 Sample
- Penalized -2 LogL
- If these are lower than some other model -> prefer this model.
- If these are higher than some other model -> prefer the other model.
reproducing AIC/BIC:
= -7483.272
logLik = 13 # (4 loadings + 6 unique factor variances + 3 factor var/covs)
q = 1000
N
AIC = -2*logLik + 2*q) (
## [1] 14992.54
BIC = -2*logLik + log(N)*q) (
## [1] 15056.34
13.2 PART II: Exercise
For this portion, we will run the CFA analyses on a new simulated dataset based on Todd Little’s positive affect example.
Read in the new dataset:
<- read.csv("ChiStatSimDat.csv", header = T) affectData_new
Examine the dataset:
head(affectData_new)
## glad cheerful happy satisfied content comfortable
## 1 -0.88164396 0.08934146 -0.02412456 -0.570588909 0.3525594 0.59309666
## 2 -0.05250507 0.68355268 0.74157736 0.146592063 0.5393570 1.12686970
## 3 1.87921754 0.84984042 2.87784843 0.279073394 1.1723201 0.26855073
## 4 -0.22371928 0.10069522 -0.19760745 -0.063127288 -1.1782493 -0.59893739
## 5 0.17341853 0.26949455 -0.49326121 0.237083760 -1.0066170 -0.04426791
## 6 -1.03348018 0.04250862 -1.00909832 0.009657695 0.4366756 0.05996283
Examine the covariance matrix:
cov(affectData_new)
## glad cheerful happy satisfied content comfortable
## glad 1.4049090 0.9262102 1.3859992 1.1088378 1.200695 0.8715664
## cheerful 0.9262102 1.1043581 1.0344850 0.8850550 1.053458 0.9513925
## happy 1.3859992 1.0344850 1.7737439 1.2253205 1.233110 0.9317963
## satisfied 1.1088378 0.8850550 1.2253205 1.6365917 1.496492 0.8626294
## content 1.2006950 1.0534583 1.2331099 1.4964919 1.872718 1.0693608
## comfortable 0.8715664 0.9513925 0.9317963 0.8626294 1.069361 1.3162770
all positive! (Remember that indicators need to be all positively correlated for CFA models?)
13.2.1 PART I: Plot the distributions of all indicators
library(PerformanceAnalytics)
chart.Correlation(affectData_new)
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
all indicators look roughly normal
13.2.2 PART II: Write out the model syntax for two-factor model
<- "
twofa.model #Factor Specification
posAffect =~ glad + happy + cheerful
satisfaction =~ satisfied + content + comfortable
"
13.2.3 PART III: Fit the two-factor model
= lavaan::sem(twofa.model, data = affectData_new, fixed.x=FALSE)
new_fit summary(new_fit, standardized = T, fit.measures = T)
## lavaan 0.6-12 ended normally after 29 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 113.638
## Degrees of freedom 8
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1168.055
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.908
## Tucker-Lewis Index (TLI) 0.828
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1413.224
## Loglikelihood unrestricted model (H1) -1356.405
##
## Akaike (AIC) 2852.449
## Bayesian (BIC) 2895.327
## Sample-size adjusted Bayesian (BIC) 2854.141
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.257
## 90 Percent confidence interval - lower 0.216
## 90 Percent confidence interval - upper 0.300
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.070
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect =~
## glad 1.000 1.115 0.943
## happy 1.095 0.048 22.882 0.000 1.221 0.919
## cheerful 0.763 0.046 16.739 0.000 0.851 0.812
## satisfaction =~
## satisfied 1.000 1.152 0.902
## content 1.110 0.054 20.462 0.000 1.278 0.936
## comfortable 0.715 0.057 12.548 0.000 0.823 0.719
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect ~~
## satisfaction 1.104 0.131 8.425 0.000 0.859 0.859
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .glad 0.154 0.031 4.966 0.000 0.154 0.110
## .happy 0.274 0.043 6.398 0.000 0.274 0.155
## .cheerful 0.374 0.042 8.851 0.000 0.374 0.341
## .satisfied 0.302 0.047 6.492 0.000 0.302 0.186
## .content 0.230 0.048 4.742 0.000 0.230 0.123
## .comfortable 0.632 0.068 9.253 0.000 0.632 0.482
## posAffect 1.244 0.142 8.790 0.000 1.000 1.000
## satisfaction 1.326 0.164 8.093 0.000 1.000 1.000
13.2.4 PART IV: Interpret the chisquare statistic and fit indices
:
Model Test User Model
113.638
Test statistic 8
Degrees of freedom -value (Chi-square) 0.000 P
- 113.638 is much larger than df=8 and the p-value is 0.000<0.05,
- This chisquare is too large and the model is a poor fit.
:
Root Mean Square Error of Approximation
0.257
RMSEA 90 Percent confidence interval - lower 0.216
90 Percent confidence interval - upper 0.300
-value RMSEA <= 0.05 0.000
P
:
Standardized Root Mean Square Residual
0.070
SRMR
:
User Model versus Baseline Model
Index (CFI) 0.908
Comparative Fit -Lewis Index (TLI) 0.828 Tucker
- RMSEA = 0.257 >> 0.1. The lower bound of the confidence interval is also larger than 0.1. This indicates a poor fit.
- P-value RMSEA <= 0.05: sig - close fit null hypothesis rejected.
- SRMR = 0.07 < 0.08, meaning that the average standardized residual between S and Sigma is no larger than 0.08, but SRMR is known to be lenient (i.e., low SRMR =/= good models, but high SRMR = bad models).
- 90.8% improvement over the null model … marginal fit