In this section, we will analyse the same data as we did this morning: children of female respondents to the US National Longitudinal Survey of Youth (a subsample from the National Longitudinal Survey of Youth). Starting in 1986, children of the female respondents of the original NLSY Youth sample were assessed in 1986, and again in 1988, 1990 and 1992. For inclusion in the subsample considered here, children had to be between 6 and 8 years old at the first measurement. We also restrict the analysis to 221 children who were assessed at all four occasions.
The difference is that we will now fit latent growth curve models
under an SEM approach. As usual, let’s start by installing and/or
loading the packages we will need for the session. In this session, we
will use three packages: haven
, tidyverse
, and
lavaan
.
## We are also going to use the tidyverse package, so let's load it now
# Install the tidyverse package; you only need to do that once
## install.packages("tidyverse") # (uncomment if you need to install)
# Load the tidyverse package
library(tidyverse)
## We also need the haven package to import datasets into R
## install.packages("haven") # (uncomment if you need to install)
library(haven)
# Finally, we also need the lavaan package---the SEM package for R
## install.packages("lavaan") # (uncomment if you need to install)
# Load the lavaan package
library(lavaan)
You should already have downloaded the read_long.dta
data file from the website and stored it in
your computer. Now, use the read_stata()
function to import
the dataset into R
.
Because the data is available online, we can also use the following
function to import data into R
, which is more efficient.
(Though do note that most datasets are not readily available
online).
# import data into R
reading <- read_stata("https://thiagoroliveira.com/LDA-2025/data/read_long.dta")
The variables available in the reading
dataset are the
following:
Name | Description |
---|---|
childid |
Identifier for each respondent |
t |
Occasion (0 = 1986, 1 = 1988, 2 = 1990, 3 = 1992) |
male |
Gender (1 = male) |
homecog |
Amount of cognitive support at home |
read |
Reading scores |
year |
Year |
This morning, we produced some analyses fitting growth curve models under a multilevel approach. Now, let’s do the same under an SEM approach.
Under an SEM framework, longitudinal datasets need to be set as wide. Therefore, let’s start reshaping our dataset into a wide format.
# reshaping the dataset into a wider format
reading_wide <-
reading %>%
pivot_wider(
id_cols = c(childid, male, homecog), # variables that do NOT vary with time
names_from = t, # time variable
values_from = c(read, year), # time-varying variables
names_sep = "_t" # e.g., read_t0, read_t1, etc.
)
# double check it worked
head(reading_wide)
## # A tibble: 6 × 11
## childid male homecog read_t0 read_t1 read_t2 read_t3 year_t0 year_t1 year_t2
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 9 2.10 2.90 4.5 4.5 1986 1988 1990
## 2 2 0 9 2.30 4.5 4.20 4.60 1986 1988 1990
## 3 3 0 10 2.30 3.80 4.30 6.20 1986 1988 1990
## 4 4 1 8 1.80 2.60 4.10 4 1986 1988 1990
## 5 5 1 10 3.5 4.80 5.80 7.5 1986 1988 1990
## 6 6 0 9 3.5 5.70 7 6.90 1986 1988 1990
## # ℹ 1 more variable: year_t3 <dbl>
To perform any analysis under the SEM framework using
lavaan
, we can use the sem()
function. We
essentially need to type out all possible relationships theorised by our
structural model. For example:
# example of how to use lavaan
my_model <- "latent_variable_1 =~ x1 + x2 + x3 + x4 # specify freely estimated measurement model
latent_variable_2 =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 # specify measurement model with loadings constrained to 1
latent_variable_3 =~ 0*z1 + 1*z2 + 2*z3 + 3*z4 # specify measurement model with loadings constrained to 0, 1, 2, 3
latent_variable_1 ~~ latent_variable_2 # variables are allowed to correlate
latent_variable_3 ~ a1 + b1 + c1 # specify a regression model
"
The =~
specification implies “is defined by” and is used
to specify measurement models. The ~~
specification implies
“is allowed to correlate with” and is used to specify covariances. The
~
specification implies “is regressed by” and is used to
specify regression models. Finally, we use the *
to
constrain paramaters to certain values.
Now, use the sem
function to estimate four models:
# specify the LGCM
lgcm_ri <- '
# Latent intercept factor
intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
'
# estimate the LGCM
fit_lgcm_ri <- sem(lgcm_ri, data = reading_wide, missing = "fiml")
# check results
summary(fit_lgcm_ri, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 15 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 221
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 91.823
## Degrees of freedom 5
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 601.612
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.854
## Tucker-Lewis Index (TLI) 0.825
##
## Robust Comparative Fit Index (CFI) 0.854
## Robust Tucker-Lewis Index (TLI) 0.825
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1034.943
## Loglikelihood unrestricted model (H1) -989.032
##
## Akaike (AIC) 2087.886
## Bayesian (BIC) 2118.470
## Sample-size adjusted Bayesian (SABIC) 2089.949
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.280
## 90 Percent confidence interval - lower 0.232
## 90 Percent confidence interval - upper 0.332
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.280
## 90 Percent confidence interval - lower 0.232
## 90 Percent confidence interval - upper 0.332
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.267
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept =~
## read_t0 1.000 0.903 0.769
## read_t1 1.000 0.903 0.909
## read_t2 1.000 0.903 0.873
## read_t3 1.000 0.903 0.801
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 2.516 0.079 31.840 0.000 2.516 2.142
## .read_t1 4.041 0.067 60.419 0.000 4.041 4.064
## .read_t2 5.021 0.070 72.123 0.000 5.021 4.852
## .read_t3 5.803 0.076 76.534 0.000 5.803 5.148
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 0.564 0.063 9.012 0.000 0.564 0.409
## .read_t1 0.173 0.028 6.104 0.000 0.173 0.175
## .read_t2 0.255 0.035 7.235 0.000 0.255 0.238
## .read_t3 0.455 0.053 8.625 0.000 0.455 0.358
## intercept 0.816 0.086 9.495 0.000 1.000 1.000
# specify the LGCM
lgcm_slope <- '
# Latent intercept factor
intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
# Latent slope factor
slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
# Latent variables allowed to covary
intercept ~~ slope
'
# estimate the LGCM
fit_lgcm_slope <- sem(lgcm_slope, data = reading_wide, missing = "fiml")
# check results
summary(fit_lgcm_slope, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 23 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 11
##
## Number of observations 221
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 17.584
## Degrees of freedom 3
## P-value (Chi-square) 0.001
##
## Model Test Baseline Model:
##
## Test statistic 601.612
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.976
## Tucker-Lewis Index (TLI) 0.951
##
## Robust Comparative Fit Index (CFI) 0.976
## Robust Tucker-Lewis Index (TLI) 0.951
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -997.824
## Loglikelihood unrestricted model (H1) -989.032
##
## Akaike (AIC) 2017.647
## Bayesian (BIC) 2055.027
## Sample-size adjusted Bayesian (SABIC) 2020.168
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.148
## 90 Percent confidence interval - lower 0.086
## 90 Percent confidence interval - upper 0.219
## P-value H_0: RMSEA <= 0.050 0.006
## P-value H_0: RMSEA >= 0.080 0.964
##
## Robust RMSEA 0.148
## 90 Percent confidence interval - lower 0.086
## 90 Percent confidence interval - upper 0.219
## P-value H_0: Robust RMSEA <= 0.050 0.006
## P-value H_0: Robust RMSEA >= 0.080 0.964
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.077
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept =~
## read_t0 1.000 0.744 0.790
## read_t1 1.000 0.744 0.791
## read_t2 1.000 0.744 0.691
## read_t3 1.000 0.744 0.594
## slope =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.270 0.287
## read_t2 2.000 0.540 0.501
## read_t3 3.000 0.810 0.646
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept ~~
## slope 0.024 0.024 0.994 0.320 0.119 0.119
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 2.516 0.063 39.713 0.000 2.516 2.671
## .read_t1 4.041 0.063 63.875 0.000 4.041 4.297
## .read_t2 5.021 0.072 69.294 0.000 5.021 4.661
## .read_t3 5.803 0.084 68.840 0.000 5.803 4.631
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 0.334 0.054 6.209 0.000 0.334 0.376
## .read_t1 0.210 0.029 7.274 0.000 0.210 0.238
## .read_t2 0.220 0.033 6.689 0.000 0.220 0.189
## .read_t3 0.218 0.055 3.966 0.000 0.218 0.139
## intercept 0.554 0.074 7.507 0.000 1.000 1.000
## slope 0.073 0.014 5.140 0.000 1.000 1.000
# specify the LGCM
lgcm_quadratic <- '
# Latent intercept factor
intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
# Latent slope factor
slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
# Latent quadratic factor
quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
# Fix variance of quad to zero (fixed effect only)
quadratic ~~ 0*quadratic
# Latent variables allowed to covary
intercept ~~ slope
'
# estimate the LGCM
fit_lgcm_quadratic <- sem(lgcm_quadratic, data = reading_wide, missing = "fiml")
## Warning: lavaan->lav_object_post_check():
## covariance matrix of latent variables is not positive definite ; use
## lavInspect(fit, "cov.lv") to investigate.
# check results
summary(fit_lgcm_quadratic, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 34 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 13
##
## Number of observations 221
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 0.142
## Degrees of freedom 1
## P-value (Chi-square) 0.706
##
## Model Test Baseline Model:
##
## Test statistic 601.612
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.009
##
## Robust Comparative Fit Index (CFI) 1.000
## Robust Tucker-Lewis Index (TLI) 1.009
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -989.103
## Loglikelihood unrestricted model (H1) -989.032
##
## Akaike (AIC) 2004.205
## Bayesian (BIC) 2048.382
## Sample-size adjusted Bayesian (SABIC) 2007.184
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.130
## P-value H_0: RMSEA <= 0.050 0.774
## P-value H_0: RMSEA >= 0.080 0.150
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.130
## P-value H_0: Robust RMSEA <= 0.050 0.774
## P-value H_0: Robust RMSEA >= 0.080 0.150
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.003
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept =~
## read_t0 1.000 0.753 0.858
## read_t1 1.000 0.753 0.751
## read_t2 1.000 0.753 0.687
## read_t3 1.000 0.753 0.619
## slope =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.574 0.573
## read_t2 2.000 1.148 1.046
## read_t3 3.000 1.722 1.415
## quadratic =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.000 0.000
## read_t2 4.000 0.000 0.000
## read_t3 9.000 0.000 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept ~~
## slope -0.022 0.127 -0.173 0.863 -0.051 -0.051
## quadratic -0.002 0.032 -0.053 0.957 -Inf -Inf
## slope ~~
## quadratic -0.044 0.021 -2.134 0.033 -Inf -Inf
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 2.516 0.059 42.622 0.000 2.516 2.867
## .read_t1 4.041 0.067 59.912 0.000 4.041 4.030
## .read_t2 5.021 0.074 68.019 0.000 5.021 4.575
## .read_t3 5.803 0.082 70.878 0.000 5.803 4.768
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## quadratic 0.000 NaN NaN
## .read_t0 0.203 0.116 1.743 0.081 0.203 0.263
## .read_t1 0.244 0.044 5.483 0.000 0.244 0.242
## .read_t2 0.126 0.049 2.563 0.010 0.126 0.105
## .read_t3 0.494 0.134 3.682 0.000 0.494 0.334
## intercept 0.568 0.126 4.492 0.000 1.000 1.000
## slope 0.330 0.124 2.654 0.008 1.000 1.000
# specify the LGCM
lgcm_quadratic_latent <- '
# Latent intercept factor
intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
# Latent slope factor
slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
# Latent quadratic factor
quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
# Latent variables allowed to covary
intercept ~~ slope
intercept ~~ quadratic
slope ~~ quadratic
'
# estimate the LGCM
fit_lgcm_quadratic_latent <- sem(lgcm_quadratic_latent, data = reading_wide, missing = "fiml")
## Warning: lavaan->lav_object_post_check():
## covariance matrix of latent variables is not positive definite ; use
## lavInspect(fit, "cov.lv") to investigate.
# check results
summary(fit_lgcm_quadratic_latent, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 65 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 14
##
## Number of observations 221
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 601.612
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Robust Comparative Fit Index (CFI) 1.000
## Robust Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -989.032
## Loglikelihood unrestricted model (H1) -989.032
##
## Akaike (AIC) 2006.063
## Bayesian (BIC) 2053.638
## Sample-size adjusted Bayesian (SABIC) 2009.271
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 NA
## P-value H_0: RMSEA >= 0.080 NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept =~
## read_t0 1.000 0.761 0.867
## read_t1 1.000 0.761 0.760
## read_t2 1.000 0.761 0.691
## read_t3 1.000 0.761 0.627
## slope =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.602 0.601
## read_t2 2.000 1.204 1.094
## read_t3 3.000 1.805 1.487
## quadratic =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.067 0.067
## read_t2 4.000 0.269 0.244
## read_t3 9.000 0.605 0.498
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept ~~
## slope -0.036 0.131 -0.272 0.786 -0.078 -0.078
## quadratic 0.002 0.033 0.060 0.952 0.039 0.039
## slope ~~
## quadratic -0.056 0.037 -1.515 0.130 -1.378 -1.378
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 2.516 0.059 42.631 0.000 2.516 2.868
## .read_t1 4.041 0.067 59.977 0.000 4.041 4.034
## .read_t2 5.021 0.074 67.815 0.000 5.021 4.562
## .read_t3 5.803 0.082 71.074 0.000 5.803 4.781
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 0.191 0.118 1.620 0.105 0.191 0.248
## .read_t1 0.237 0.047 4.996 0.000 0.237 0.236
## .read_t2 0.130 0.050 2.611 0.009 0.130 0.107
## .read_t3 0.456 0.165 2.762 0.006 0.456 0.310
## intercept 0.579 0.129 4.489 0.000 1.000 1.000
## slope 0.362 0.150 2.412 0.016 1.000 1.000
## quadratic 0.005 0.012 0.380 0.704 1.000 1.000
# check fit measures of each of the four models
fitMeasures(fit_lgcm_ri, c("aic", "bic", "rmsea", "cfi", "tli"))
## aic bic rmsea cfi tli
## 2087.886 2118.470 0.280 0.854 0.825
fitMeasures(fit_lgcm_slope, c("aic", "bic", "rmsea", "cfi", "tli"))
## aic bic rmsea cfi tli
## 2017.647 2055.027 0.148 0.976 0.951
fitMeasures(fit_lgcm_quadratic, c("aic", "bic", "rmsea", "cfi", "tli"))
## aic bic rmsea cfi tli
## 2004.205 2048.382 0.000 1.000 1.009
fitMeasures(fit_lgcm_quadratic_latent, c("aic", "bic", "rmsea", "cfi", "tli"))
## aic bic rmsea cfi tli
## 2006.063 2053.638 0.000 1.000 1.000
The quadratic latent growth model with a fixed coefficient for age
squared is the preferred solution.
# specify the LGCM
lgcm_quadratic_covariates <- '
# Latent intercept factor
intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
# Latent slope factor
slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
# Latent quadratic factor
quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
# Fix variance of quad to zero (fixed effect only)
quadratic ~~ 0*quadratic
# Latent variables allowed to covary
intercept ~~ slope
# Regress growth factors on time-invariant covariates
intercept ~ male + homecog
slope ~ male + homecog
# If we had, this is how we would regress observed outcomes on time-varying covariates
# read_t0 ~ x_t0
# read_t1 ~ x_t1
# read_t2 ~ x_t2
# read_t3 ~ x_t3
'
# estimate the LGCM
fit_lgcm_quadratic_covariates <- sem(lgcm_quadratic_covariates, data = reading_wide, missing = "fiml")
# check results
summary(fit_lgcm_quadratic_covariates, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 37 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 15
##
## Number of observations 221
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 19.084
## Degrees of freedom 7
## P-value (Chi-square) 0.008
##
## Model Test Baseline Model:
##
## Test statistic 614.316
## Degrees of freedom 14
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.980
## Tucker-Lewis Index (TLI) 0.960
##
## Robust Comparative Fit Index (CFI) 0.980
## Robust Tucker-Lewis Index (TLI) 0.960
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -992.222
## Loglikelihood unrestricted model (H1) -982.680
##
## Akaike (AIC) 2014.444
## Bayesian (BIC) 2065.416
## Sample-size adjusted Bayesian (SABIC) 2017.881
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.088
## 90 Percent confidence interval - lower 0.042
## 90 Percent confidence interval - upper 0.137
## P-value H_0: RMSEA <= 0.050 0.080
## P-value H_0: RMSEA >= 0.080 0.659
##
## Robust RMSEA 0.088
## 90 Percent confidence interval - lower 0.042
## 90 Percent confidence interval - upper 0.137
## P-value H_0: Robust RMSEA <= 0.050 0.080
## P-value H_0: Robust RMSEA >= 0.080 0.659
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.057
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept =~
## read_t0 1.000 0.744 0.789
## read_t1 1.000 0.744 0.792
## read_t2 1.000 0.744 0.691
## read_t3 1.000 0.744 0.594
## slope =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.269 0.286
## read_t2 2.000 0.539 0.500
## read_t3 3.000 0.808 0.644
## quadratic =~
## read_t0 0.000 0.000 0.000
## read_t1 1.000 0.000 0.000
## read_t2 4.000 0.000 0.000
## read_t3 9.000 0.000 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## intercept ~
## male -0.143 0.116 -1.239 0.215 -0.193 -0.096
## homecog 0.045 0.024 1.910 0.056 0.061 0.149
## slope ~
## male 0.019 0.047 0.392 0.695 0.069 0.034
## homecog 0.019 0.010 1.999 0.046 0.072 0.176
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .intercept ~~
## .slope 0.020 0.024 0.831 0.406 0.101 0.101
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .read_t0 2.181 0.232 9.388 0.000 2.181 2.312
## .read_t1 3.520 0.230 15.277 0.000 3.520 3.745
## .read_t2 4.314 0.265 16.285 0.000 4.314 4.005
## .read_t3 4.911 0.323 15.187 0.000 4.911 3.918
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## quadratic 0.000 NaN NaN
## .read_t0 0.336 0.054 6.247 0.000 0.336 0.377
## .read_t1 0.209 0.029 7.258 0.000 0.209 0.236
## .read_t2 0.220 0.033 6.743 0.000 0.220 0.189
## .read_t3 0.220 0.054 4.034 0.000 0.220 0.140
## .intercept 0.537 0.072 7.436 0.000 0.969 0.969
## .slope 0.070 0.014 5.046 0.000 0.968 0.968
Download the read_long.dta
data file from the website. Download the
file and store it in a sensible folder in your computer (preferably the
folder of your R Project). Then, we can use the
read_stata()
function to import the dataset into
R
.
Again, because the data is available online, we can also use the
following function to import data into R
, which is more
efficient.
# import data into R
physfunc <- read_stata("https://thiagoroliveira.com/LDA-2025/data/physfunc.dta")
The data file physfunc.dta
contains repeated
measurements (for up to six occasions) on health functioning from a
study of British civil servants, the Whitehall II study. Health
functioning was assessed by the SF-36, a 36 item instrument that
comprises eight subscales covering physical, psychological and social
functioning. These eight scales can be summarised into physical and
mental health components. These are scaled using general US population
norms to have mean values of 50 and low scores imply poor functioning.
We will analyse physical health functioning (phf
).
The variables available in the physfunc
dataset are the
following:
Name | Description |
---|---|
id |
Identifier for each respondent |
occ |
Occasion (1, 2, 3, 4, 5, 6) |
female |
Gender (1 = female) |
grade |
Employment grade at baseline (1 = high grade, 2 = intermediate grade, 3 = low grade) |
age50 |
Mean age at baseline |
phf |
Physical health functioning |
The aim of this exercise is to investigate the following questions:
Considering the subset of female respondents only, carry out the following analysis and interpret the results:
Obtain summary statistics for phf
and
age50
at each occasion
Plot physical health functioning trajectories by age for any five individuals in the dataset.
Fit a simple random intercept growth model for phf
with no explanatory variables.
Fit a random slope growth model for phf
which allows
the rate of change in physical health functioning to vary among
individuals.
Fit a quadratic growth model for phf
which allows
for non-linear trajectories.
Which model is preferred?