Chapter 16 Lavaan Lab 13: SEM for Nonnormal and Categorical Data
Load up the lavaan library:
library(lavaan)
16.1 PART I: Nonnormality Diagnosis
Let’s first load the simulated non-normal data and look at the normality/nonnormality of the items:
<- read.csv("nonnormal.csv", header = T)
nnorm_dat head(nnorm_dat)
## odd1 odd2 odd3 odd4 odd5 odd6 odd7 odd8
## 1 1.455398 1.1968135 -0.52181159 -1.9651704 -4.2353119 1.0959168 1.1434283 -0.4393833
## 2 6.517492 2.7962582 0.56780460 4.6117031 5.0850284 1.4194502 1.3925536 5.2085814
## 3 2.047266 -2.1148975 -1.35879880 0.6664211 3.1903653 2.0040147 0.5134188 0.2114613
## 4 8.251226 4.9818940 10.29910350 7.5773213 0.2595174 1.8953510 -2.5800594 -2.0036514
## 5 3.553213 4.7921231 2.44286008 1.4219472 0.5007659 1.2665319 0.4181851 1.1586593
## 6 2.728876 -0.3436493 0.07283766 -1.8742132 -3.0228869 0.5607386 -0.7374122 -0.3249854
par(mfrow = c(2, 2)) #opens graph window with 2 rows 2 columns
hist(nnorm_dat$odd5)
hist(nnorm_dat$odd6)
hist(nnorm_dat$odd7)
hist(nnorm_dat$odd8)
Use describe() function from the psych package to get univariate descriptives:
#install.packages("psych")
library(psych)
describe(nnorm_dat)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## odd1 1 1000 1.23 2.19 1.18 1.17 1.49 -9.50 18.41 27.91 0.87 7.77 0.07
## odd2 2 1000 0.91 2.18 0.87 0.89 1.65 -12.49 17.81 30.30 0.45 7.07 0.07
## odd3 3 1000 1.05 2.21 1.02 1.03 1.41 -18.10 22.44 40.55 0.58 18.62 0.07
## odd4 4 1000 0.65 2.19 0.55 0.62 1.64 -18.04 12.43 30.47 -0.39 8.35 0.07
## odd5 5 1000 0.46 2.24 0.56 0.53 1.66 -16.83 7.79 24.62 -0.99 6.17 0.07
## odd6 6 1000 0.54 2.32 0.63 0.62 1.59 -24.87 12.98 37.84 -2.37 24.73 0.07
## odd7 7 1000 0.26 1.91 0.23 0.28 1.32 -12.30 16.58 28.87 0.07 12.23 0.06
## odd8 8 1000 0.28 1.89 0.24 0.28 1.33 -8.26 12.64 20.90 0.39 6.23 0.06
Use mardia() from the psych package to test multivariate normality:
par(mfrow = c(1, 1)) #opens graph window
mardia(nnorm_dat)
## Call: mardia(x = nnorm_dat)
##
## Mardia tests of multivariate skew and kurtosis
## Use describe(x) the to get univariate tests
## n.obs = 1000 num.vars = 8
## b1p = 40.96 skew = 6827.1 with probability <= 0
## small sample skew = 6852.15 with probability <= 0
## b2p = 323.37 kurtosis = 304.21 with probability <= 0
In any case, these data are clearly far from normal, so …
16.2 PART II: Robust corrections
Write out syntax for a one-factor CFA model:
<- "
cfaSyn odd =~ odd1 + odd2 + odd3 + odd4 + odd5 + odd6 + odd7 + odd8
"
Fit the one-factor model:
<- lavaan::sem(cfaSyn,
mlrFit data = nnorm_dat,
fixed.x = FALSE,
estimator = "mlr")
summary(mlrFit, fit.measure = T)
## lavaan 0.6-12 ended normally after 34 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 16
##
## Number of observations 1000
##
## Model Test User Model:
## Standard Robust
## Test Statistic 430.290 125.915
## Degrees of freedom 20 20
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 3.417
## Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
## Test statistic 1692.974 454.468
## Degrees of freedom 28 28
## P-value 0.000 0.000
## Scaling correction factor 3.725
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.754 0.752
## Tucker-Lewis Index (TLI) 0.655 0.652
##
## Robust Comparative Fit Index (CFI) 0.772
## Robust Tucker-Lewis Index (TLI) 0.681
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -16785.305 -16785.305
## Scaling correction factor 5.604
## for the MLR correction
## Loglikelihood unrestricted model (H1) -16570.161 -16570.161
## Scaling correction factor 4.389
## for the MLR correction
##
## Akaike (AIC) 33602.611 33602.611
## Bayesian (BIC) 33681.135 33681.135
## Sample-size adjusted Bayesian (BIC) 33630.318 33630.318
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.143 0.073
## 90 Percent confidence interval - lower 0.132 0.066
## 90 Percent confidence interval - upper 0.155 0.079
## P-value RMSEA <= 0.05 0.000 0.000
##
## Robust RMSEA 0.135
## 90 Percent confidence interval - lower 0.113
## 90 Percent confidence interval - upper 0.157
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.077 0.077
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## odd =~
## odd1 1.000
## odd2 1.095 0.146 7.499 0.000
## odd3 1.035 0.172 6.000 0.000
## odd4 0.946 0.150 6.317 0.000
## odd5 0.778 0.198 3.920 0.000
## odd6 0.939 0.178 5.282 0.000
## odd7 0.810 0.158 5.135 0.000
## odd8 0.890 0.144 6.198 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .odd1 3.312 0.486 6.814 0.000
## .odd2 3.003 0.295 10.197 0.000
## .odd3 3.310 0.410 8.074 0.000
## .odd4 3.490 0.482 7.234 0.000
## .odd5 4.105 0.418 9.811 0.000
## .odd6 4.092 0.645 6.345 0.000
## .odd7 2.665 0.437 6.099 0.000
## .odd8 2.394 0.232 10.313 0.000
## odd 1.469 0.355 4.138 0.000
16.3 PART III: Categorical Data Analysis in Lavaan
Let’s load the simulated data in which ODD items are ordinal:
<- read.csv("oddData.csv", header = T)
odd head(odd)
## id odd1 odd2 odd3 odd4 odd5 odd6 odd7 odd8
## 1 4 1 1 1 1 1 0 0 0
## 2 7 2 0 1 0 0 0 1 0
## 3 12 2 0 1 0 1 0 0 0
## 4 14 1 1 2 1 1 1 0 1
## 5 37 1 1 1 0 0 0 0 0
## 6 39 1 1 1 0 0 1 1 0
Write out syntax for a one-factor CFA model:
= '
oddOneFac #Specify Overall Odd Factor
odd =~ odd1 + odd2 + odd3 + odd4 + odd5 + odd6 + odd7 + odd8
'
- Fit the one-factor model:
- label ordinal variables using ordered argument:
- ordered = c( #NAMES OF ORDINAL INDICATORS#)
<- lavaan::sem(oddOneFac,
oneFacFit data = odd,
ordered=c('odd1','odd2','odd3','odd4','odd5','odd6','odd7','odd8'),
fixed.x = FALSE,
parameterization = 'theta') # or delta
#declare these as ordered variable
summary(oneFacFit, fit.measures = T)
## lavaan 0.6-12 ended normally after 34 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 24
##
## Number of observations 221
##
## Model Test User Model:
## Standard Robust
## Test Statistic 44.892 63.690
## Degrees of freedom 20 20
## P-value (Chi-square) 0.001 0.000
## Scaling correction factor 0.732
## Shift parameter 2.377
## simple second-order correction
##
## Model Test Baseline Model:
##
## Test statistic 888.285 643.487
## Degrees of freedom 28 28
## P-value 0.000 0.000
## Scaling correction factor 1.398
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.971 0.929
## Tucker-Lewis Index (TLI) 0.959 0.901
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.075 0.100
## 90 Percent confidence interval - lower 0.046 0.073
## 90 Percent confidence interval - upper 0.105 0.128
## P-value RMSEA <= 0.05 0.076 0.002
##
## Robust RMSEA NA
## 90 Percent confidence interval - lower NA
## 90 Percent confidence interval - upper NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.085 0.085
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Unstructured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## odd =~
## odd1 1.000
## odd2 0.850 0.180 4.725 0.000
## odd3 0.759 0.152 4.981 0.000
## odd4 0.602 0.138 4.373 0.000
## odd5 0.404 0.099 4.079 0.000
## odd6 0.607 0.128 4.735 0.000
## odd7 0.848 0.190 4.456 0.000
## odd8 0.871 0.186 4.694 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .odd1 0.000
## .odd2 0.000
## .odd3 0.000
## .odd4 0.000
## .odd5 0.000
## .odd6 0.000
## .odd7 0.000
## .odd8 0.000
## odd 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|)
## odd1|t1 -1.884 0.203 -9.262 0.000
## odd1|t2 0.613 0.136 4.500 0.000
## odd2|t1 -0.970 0.136 -7.122 0.000
## odd2|t2 1.138 0.147 7.721 0.000
## odd3|t1 -1.355 0.153 -8.869 0.000
## odd3|t2 1.018 0.134 7.587 0.000
## odd4|t1 0.076 0.103 0.737 0.461
## odd4|t2 1.699 0.167 10.168 0.000
## odd5|t1 0.031 0.093 0.335 0.737
## odd5|t2 1.417 0.129 10.962 0.000
## odd6|t1 -0.174 0.103 -1.680 0.093
## odd6|t2 1.568 0.153 10.276 0.000
## odd7|t1 0.632 0.134 4.711 0.000
## odd7|t2 2.692 0.325 8.287 0.000
## odd8|t1 0.981 0.151 6.510 0.000
## odd8|t2 2.547 0.283 9.003 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .odd1 1.000
## .odd2 1.000
## .odd3 1.000
## .odd4 1.000
## .odd5 1.000
## .odd6 1.000
## .odd7 1.000
## .odd8 1.000
## odd 1.330 0.382 3.483 0.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|)
## odd1 0.655
## odd2 0.714
## odd3 0.753
## odd4 0.822
## odd5 0.906
## odd6 0.819
## odd7 0.715
## odd8 0.705
16.4 PART IV: What if you have it all?
- Unfortunately you cannot use missing = ‘fiml’ for categorical data:
<- lavaan::sem(oddOneFac,
FitMessy data = odd,
ordered=c('odd1','odd2','odd3','odd4','odd5','odd6','odd7','odd8'),
fixed.x = FALSE,
estimator = "DWLS",
#missing = 'fiml'
)
FitMessy
## lavaan 0.6-12 ended normally after 16 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of model parameters 24
##
## Number of observations 221
##
## Model Test User Model:
##
## Test statistic 44.892
## Degrees of freedom 20
## P-value (Chi-square) 0.001
#summary(FitMessy, fit.measures = T)
- But you cannot use missing = ‘fiml’ together with MLR for nonnormal data:
<- lavaan::sem(oddOneFac,
FitMessy data = nnorm_dat,
#ordered=c('odd1','odd2','odd3','odd4','odd5','odd6','odd7','odd8'),
fixed.x = FALSE,
estimator = "mlr",
missing = 'fiml')
FitMessy
## lavaan 0.6-12 ended normally after 34 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 24
##
## Number of observations 1000
## Number of missing patterns 1
##
## Model Test User Model:
## Standard Robust
## Test Statistic 430.290 125.915
## Degrees of freedom 20 20
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 3.417
## Yuan-Bentler correction (Mplus variant)
#summary(FitMessy, fit.measures = T)