Chapter 8 Week7_1: Lavaan Lab 5 One-factor CFA Model
In this lab, we will learn how to:
- Identify the One-factor CFA Model
- Scale the One-factor CFA Model
- Estimate the One-factor CFA Model
- Interpret the One-factor CFA Model
8.1 Data Prep
We will use cfaInClassData.csv in this lab.
This is a simulated dataset based on Todd Little’s positive affect example.
The hypothesis is that a latent variable ‘positive affect’ is measured by three indicators (glad, cheerful, and happy).
Let’s read this dataset in:
<- read.csv("cfaInclassData.csv", header = T) cfaData
and examine the dataset:
head(cfaData)
## ID glad cheerful happy satisfied content comfortable
## 1 1 0.13521092 0.5413297 -0.1041445 -0.5777446 0.8645383 0.02935020
## 2 2 -0.29116043 0.2434081 0.6671535 2.0763730 -0.7382832 1.05439183
## 3 3 0.71975913 0.2218277 0.4722337 2.1685984 -0.2727574 0.09053090
## 4 4 0.44432030 0.9295414 0.8574083 -1.0575363 -1.3841364 -0.07940091
## 5 5 2.84476524 3.1710123 3.5145040 1.5725274 2.3406754 1.59866763
## 6 6 -0.03317526 -0.8434011 -0.1485924 -0.5469343 -1.5750953 -0.69629828
str(cfaData)
## 'data.frame': 1000 obs. of 7 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ glad : num 0.135 -0.291 0.72 0.444 2.845 ...
## $ cheerful : num 0.541 0.243 0.222 0.93 3.171 ...
## $ happy : num -0.104 0.667 0.472 0.857 3.515 ...
## $ satisfied : num -0.578 2.076 2.169 -1.058 1.573 ...
## $ content : num 0.865 -0.738 -0.273 -1.384 2.341 ...
## $ comfortable: num 0.0294 1.0544 0.0905 -0.0794 1.5987 ...
dim(cfaData) #n = 1000, 7 variables
## [1] 1000 7
Let’s examine their means and standard deviations:
round(apply(cfaData[,-1], 2, mean), 2) # mean-centered
## glad cheerful happy satisfied content comfortable
## 0.00 0.00 0.01 -0.04 -0.04 -0.04
round(apply(cfaData[,-1], 2, sd), 2)
## glad cheerful happy satisfied content comfortable
## 0.98 0.98 0.98 1.01 1.08 0.95
Let’s call up the lavaan library and run some CFA’s!
library(lavaan)
8.2 PART I: One-Factor CFA, Fixed Loading
8.2.1 Fixed Loading, AKA Marker Variable method.
FYI, the three equations for the three indicators are:
- Glad = lambda1*posAffect(eta) + u1
- Cheerful = lambda2*posAffect(eta) + u2
- Happy = lambda3*posAffect(eta) + u3
Let’s first follow the equations above and write the syntax (disturbances are automatically included):
<- "
mod1.wrong glad ~ posAffect
cheerful ~ posAffect
happy ~ posAffect
"
= lavaan::sem(model = mod1.wrong, data = cfaData, fixed.x=FALSE) fit1.wrong
Oops - an error message!
in lav_data_full(data = data, group = group, cluster = cluster, :
Error : missing observed variables in dataset: posAffect lavaan ERROR
This is because posAffect is a latent variable and we have to use =~ to define a latent variable:
<-'
mod1.wrongposAffect =~ Glad + Cheerful + Happy
'
= lavaan::sem(model = mod1.wrong, data = cfaData, fixed.x=FALSE) fit1.wrong
in lavaan::lavaan(model = mod1.wrong, data = cfaData, fixed.x = FALSE, :
Error : missing observed variables in dataset: Glad Cheerful Happy lavaan ERROR
Error, why?
The variable names in the model syntax have to match the column names EXACTLY, even the letter cases.
Let’s try again:
<-'
mod1posAffect =~ glad + cheerful + happy
'
Let’s explain the lavaan model syntax!
- mod1 is used to name our model.
- Since posAffect is a latent variable (it’s not in the data), we cannot follow the equations above and write syntax like glad ~ posAffect
- Instead, we specify a CFA measurement model in mod1.
- NEW SYNTAX ALERT: Using =~ means “manifested by”
- In the code above we can see that our latent construct ‘posAffect’ is manifested by glad, cheerful, and happy
- By default, the loading of glad is fixed at 1 (Fixed Loading Method)
Next we name the fitted object ‘fit1’ to see our output.
= lavaan::sem(mod1, data = cfaData, fixed.x=FALSE) fit1
This summary will show us the loadings (I also requested standardized results):
summary(fit1, standardized = T)
## lavaan 0.6-12 ended normally after 20 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect =~
## glad 1.000 0.693 0.705
## cheerful 1.117 0.059 18.782 0.000 0.774 0.787
## happy 1.066 0.057 18.786 0.000 0.739 0.757
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .glad 0.485 0.030 16.238 0.000 0.485 0.503
## .cheerful 0.367 0.030 12.062 0.000 0.367 0.380
## .happy 0.407 0.030 13.751 0.000 0.407 0.427
## posAffect 0.480 0.043 11.270 0.000 1.000 1.000
= 0 (why?)
df
:
Latent Variables-value P(>|z|) Std.lv Std.all
Estimate Std.Err z=~
posAffect 1.000 0.693 0.705
glad 1.117 0.059 18.782 0.000 0.774 0.787
cheerful 1.066 0.057 18.786 0.000 0.739 0.757 happy
What does this mean?
- 1 unit change in posAffect produces:
- 1-unit change in “glad” (marker indicator)
- 1.117-unit change in “cheerful” (1.117 times greater than the effect on “glad”)
- 1.066-unit change in “happy” (1.066 times greater than the effect on “glad”)
:
Variances:
Unique factor variances-value P(>|z|) Std.lv Std.all
Estimate Std.Err z0.485 0.030 16.238 0.000 0.485 0.503
.glad 0.367 0.030 12.062 0.000 0.367 0.380
.cheerful 0.407 0.030 13.751 0.000 0.407 0.427 .happy
- The leftover unique factor variances remain substantial
- Meaning that none of the indicators is a perfect measure of posAffect
- but they all contribute significantly to the measurement of posAffect (the standardized loadings above larger than 0.6)
Followed by the latent factor variance.
-value P(>|z|) Std.lv Std.all
Estimate Std.Err z0.480 0.043 11.270 0.000 1.000 1.000 posAffect
8.2.2 Change marker indicator
If you’d like to fix the 2nd loading to 1:
<-'
mod1b_wrongposAffect =~ glad + 1*cheerful + happy
'
won’t work.
You will get something like this:
:
Latent Variables-value P(>|z|) Std.lv Std.all
Estimate Std.Err z=~
posAffect 1.000 0.734 0.733
glad 1.000 0.734 0.759
cheerful 1.009 0.046 22.052 0.000 0.741 0.759 happy
You’ll have to change the order of the indicators to move cheerful to the front of the variable list:
<-'
mod1bposAffect =~ cheerful + glad + happy
'
Or use *NA to specify which loading to keep free and use *1 to specify the marker variable whose loading to be fixed at 1
<-'
mod1bposAffect =~ NA*glad + 1*cheerful + NA*happy
'
Here we named the fitted object ‘fit1b’ to see our output.
= lavaan::sem(mod1b, data = cfaData, fixed.x=FALSE)
fit1b summary(fit1b, standardized = T)
## lavaan 0.6-12 ended normally after 19 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect =~
## glad 0.895 0.048 18.782 0.000 0.693 0.705
## cheerful 1.000 0.774 0.787
## happy 0.954 0.050 19.130 0.000 0.739 0.757
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .glad 0.485 0.030 16.238 0.000 0.485 0.503
## .cheerful 0.367 0.030 12.062 0.000 0.367 0.380
## .happy 0.407 0.030 13.751 0.000 0.407 0.427
## posAffect 0.599 0.048 12.616 0.000 1.000 1.000
- The loadings can be obtained by dividing those in fit1 by 1.117 (i.e., they change proportionally).
- The variances of unique factors and latent factor remain unchanged.
8.3 PART II: One-Factor CFA, Fixed Factor Variance
8.3.1 Fixed Factor Method
Keep using the same syntax but assign a new name mod2:
<-'
mod2posAffect =~ glad + cheerful + happy
'
To fix the variance of the latent variable to 1, add std.lv=T to sem() function:
<-lavaan::sem(mod2, data = cfaData, fixed.x=FALSE, std.lv=T)
fit2summary(fit2, standardized = TRUE)
## lavaan 0.6-12 ended normally after 11 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## posAffect =~
## glad 0.693 0.031 22.540 0.000 0.693 0.705
## cheerful 0.774 0.031 25.233 0.000 0.774 0.787
## happy 0.739 0.030 24.226 0.000 0.739 0.757
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .glad 0.485 0.030 16.238 0.000 0.485 0.503
## .cheerful 0.367 0.030 12.062 0.000 0.367 0.380
## .happy 0.407 0.030 13.751 0.000 0.407 0.427
## posAffect 1.000 1.000 1.000
:
Latent Variables-value P(>|z|) Std.lv Std.all
Estimate Std.Err z=~
posAffect 0.693 0.031 22.540 0.000 0.693 0.705
glad 0.774 0.031 25.233 0.000 0.774 0.787
cheerful 0.739 0.030 24.226 0.000 0.739 0.757 happy
- 1-SD change in the factor (posAffect) causes:
- 0.693-unit change in glad (on its raw scale)
- 0.774-unit change in cheerful (on its raw scale)
- 0.739-unit change in happy (on its raw scale)
:
Variances-value P(>|z|) Std.lv Std.all
Estimate Std.Err z0.485 0.030 16.238 0.000 0.485 0.503
.glad 0.367 0.030 12.062 0.000 0.367 0.380
.cheerful 0.407 0.030 13.751 0.000 0.407 0.427
.happy 1.000 1.000 1.000 posAffect
- We see that posAffect now has variance (=sd) of 1
- All loadings were freely estimated, no loading is 1.
- and the unique factor variances are the same as before