Longitudinal Data Analysis

In this section, we will analyse the same data as we did this morning: children of female respondents to the US National Longitudinal Survey of Youth (a subsample from the National Longitudinal Survey of Youth). Starting in 1986, children of the female respondents of the original NLSY Youth sample were assessed in 1986, and again in 1988, 1990 and 1992. For inclusion in the subsample considered here, children had to be between 6 and 8 years old at the first measurement. We also restrict the analysis to 221 children who were assessed at all four occasions.

The difference is that we will now fit latent growth curve models under an SEM approach. As usual, let’s start by installing and/or loading the packages we will need for the session. In this session, we will use three packages: haven, tidyverse, and lavaan.

## We are also going to use the tidyverse package, so let's load it now

# Install the tidyverse package; you only need to do that once
## install.packages("tidyverse") # (uncomment if you need to install)

# Load the tidyverse package
library(tidyverse)

## We also need the haven package to import datasets into R
## install.packages("haven") # (uncomment if you need to install)
library(haven)

# Finally, we also need the lavaan package---the SEM package for R
## install.packages("lavaan") # (uncomment if you need to install)

# Load the lavaan package
library(lavaan)

You should already have downloaded the read_long.dta data file from the website and stored it in your computer. Now, use the read_stata() function to import the dataset into R.

Because the data is available online, we can also use the following function to import data into R, which is more efficient. (Though do note that most datasets are not readily available online).

# import data into R
reading <- read_stata("https://thiagoroliveira.com/LDA-2025/data/read_long.dta")

The variables available in the reading dataset are the following:

Name	Description
`childid`	Identifier for each respondent
`t`	Occasion (0 = 1986, 1 = 1988, 2 = 1990, 3 = 1992)
`male`	Gender (1 = male)
`homecog`	Amount of cognitive support at home
`read`	Reading scores
`year`	Year

First part

This morning, we produced some analyses fitting growth curve models under a multilevel approach. Now, let’s do the same under an SEM approach.

Under an SEM framework, longitudinal datasets need to be set as wide. Therefore, let’s start reshaping our dataset into a wide format.

Show me how

# reshaping the dataset into a wider format
reading_wide <- 
  reading %>%
  pivot_wider(
    id_cols = c(childid, male, homecog),             # variables that do NOT vary with time
    names_from = t,                         # time variable
    values_from = c(read, year),   # time-varying variables
    names_sep = "_t"                        # e.g., read_t0, read_t1, etc.
  )

# double check it worked
head(reading_wide)

## # A tibble: 6 × 11
##   childid  male homecog read_t0 read_t1 read_t2 read_t3 year_t0 year_t1 year_t2
##     <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1       1     1       9    2.10    2.90    4.5     4.5     1986    1988    1990
## 2       2     0       9    2.30    4.5     4.20    4.60    1986    1988    1990
## 3       3     0      10    2.30    3.80    4.30    6.20    1986    1988    1990
## 4       4     1       8    1.80    2.60    4.10    4       1986    1988    1990
## 5       5     1      10    3.5     4.80    5.80    7.5     1986    1988    1990
## 6       6     0       9    3.5     5.70    7       6.90    1986    1988    1990
## # ℹ 1 more variable: year_t3 <dbl>

To perform any analysis under the SEM framework using lavaan, we can use the sem() function. We essentially need to type out all possible relationships theorised by our structural model. For example:

# example of how to use lavaan
my_model <- "latent_variable_1 =~ x1 + x2 + x3 + x4          # specify freely estimated measurement model
             latent_variable_2 =~ 1*y1 + 1*y2 + 1*y3 + 1*y4  # specify measurement model with loadings constrained to 1
             latent_variable_3 =~ 0*z1 + 1*z2 + 2*z3 + 3*z4  # specify measurement model with loadings constrained to 0, 1, 2, 3
             
             latent_variable_1 ~~ latent_variable_2          # variables are allowed to correlate
             latent_variable_3 ~ a1 + b1 + c1                # specify a regression model
            "

The =~ specification implies “is defined by” and is used to specify measurement models. The ~~ specification implies “is allowed to correlate with” and is used to specify covariances. The ~ specification implies “is regressed by” and is used to specify regression models. Finally, we use the * to constrain paramaters to certain values.

Now, use the sem function to estimate four models:

Fit a linear random intercept latent growth model

Show me how

# specify the LGCM
lgcm_ri <- '
  # Latent intercept factor
  intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
'

# estimate the LGCM
fit_lgcm_ri <- sem(lgcm_ri, data = reading_wide, missing = "fiml")

# check results
summary(fit_lgcm_ri, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
## 
##   Number of observations                           221
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                91.823
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               601.612
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.854
##   Tucker-Lewis Index (TLI)                       0.825
##                                                       
##   Robust Comparative Fit Index (CFI)             0.854
##   Robust Tucker-Lewis Index (TLI)                0.825
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1034.943
##   Loglikelihood unrestricted model (H1)       -989.032
##                                                       
##   Akaike (AIC)                                2087.886
##   Bayesian (BIC)                              2118.470
##   Sample-size adjusted Bayesian (SABIC)       2089.949
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.280
##   90 Percent confidence interval - lower         0.232
##   90 Percent confidence interval - upper         0.332
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##                                                       
##   Robust RMSEA                                   0.280
##   90 Percent confidence interval - lower         0.232
##   90 Percent confidence interval - upper         0.332
##   P-value H_0: Robust RMSEA <= 0.050             0.000
##   P-value H_0: Robust RMSEA >= 0.080             1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.267
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept =~                                                          
##     read_t0           1.000                               0.903    0.769
##     read_t1           1.000                               0.903    0.909
##     read_t2           1.000                               0.903    0.873
##     read_t3           1.000                               0.903    0.801
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           2.516    0.079   31.840    0.000    2.516    2.142
##    .read_t1           4.041    0.067   60.419    0.000    4.041    4.064
##    .read_t2           5.021    0.070   72.123    0.000    5.021    4.852
##    .read_t3           5.803    0.076   76.534    0.000    5.803    5.148
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           0.564    0.063    9.012    0.000    0.564    0.409
##    .read_t1           0.173    0.028    6.104    0.000    0.173    0.175
##    .read_t2           0.255    0.035    7.235    0.000    0.255    0.238
##    .read_t3           0.455    0.053    8.625    0.000    0.455    0.358
##     intercept         0.816    0.086    9.495    0.000    1.000    1.000

Fit a linear random slope latent growth model

Show me how

# specify the LGCM
lgcm_slope <- '
  # Latent intercept factor
  intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
  
  # Latent slope factor
  slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
  
  # Latent variables allowed to covary
  intercept ~~ slope
'

# estimate the LGCM
fit_lgcm_slope <- sem(lgcm_slope, data = reading_wide, missing = "fiml")

# check results
summary(fit_lgcm_slope, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        11
## 
##   Number of observations                           221
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                17.584
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.001
## 
## Model Test Baseline Model:
## 
##   Test statistic                               601.612
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.976
##   Tucker-Lewis Index (TLI)                       0.951
##                                                       
##   Robust Comparative Fit Index (CFI)             0.976
##   Robust Tucker-Lewis Index (TLI)                0.951
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -997.824
##   Loglikelihood unrestricted model (H1)       -989.032
##                                                       
##   Akaike (AIC)                                2017.647
##   Bayesian (BIC)                              2055.027
##   Sample-size adjusted Bayesian (SABIC)       2020.168
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.148
##   90 Percent confidence interval - lower         0.086
##   90 Percent confidence interval - upper         0.219
##   P-value H_0: RMSEA <= 0.050                    0.006
##   P-value H_0: RMSEA >= 0.080                    0.964
##                                                       
##   Robust RMSEA                                   0.148
##   90 Percent confidence interval - lower         0.086
##   90 Percent confidence interval - upper         0.219
##   P-value H_0: Robust RMSEA <= 0.050             0.006
##   P-value H_0: Robust RMSEA >= 0.080             0.964
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.077
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept =~                                                          
##     read_t0           1.000                               0.744    0.790
##     read_t1           1.000                               0.744    0.791
##     read_t2           1.000                               0.744    0.691
##     read_t3           1.000                               0.744    0.594
##   slope =~                                                              
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.270    0.287
##     read_t2           2.000                               0.540    0.501
##     read_t3           3.000                               0.810    0.646
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept ~~                                                          
##     slope             0.024    0.024    0.994    0.320    0.119    0.119
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           2.516    0.063   39.713    0.000    2.516    2.671
##    .read_t1           4.041    0.063   63.875    0.000    4.041    4.297
##    .read_t2           5.021    0.072   69.294    0.000    5.021    4.661
##    .read_t3           5.803    0.084   68.840    0.000    5.803    4.631
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           0.334    0.054    6.209    0.000    0.334    0.376
##    .read_t1           0.210    0.029    7.274    0.000    0.210    0.238
##    .read_t2           0.220    0.033    6.689    0.000    0.220    0.189
##    .read_t3           0.218    0.055    3.966    0.000    0.218    0.139
##     intercept         0.554    0.074    7.507    0.000    1.000    1.000
##     slope             0.073    0.014    5.140    0.000    1.000    1.000

Fit a quadratic latent growth model with a fixed quadratic term

Show me how

# specify the LGCM
lgcm_quadratic <- '
  # Latent intercept factor
  intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
  
  # Latent slope factor
  slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
  
  # Latent quadratic factor
  quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
  
  # Fix variance of quad to zero (fixed effect only)
  quadratic ~~ 0*quadratic
  
  # Latent variables allowed to covary
  intercept ~~ slope
'

# estimate the LGCM
fit_lgcm_quadratic <- sem(lgcm_quadratic, data = reading_wide, missing = "fiml")

## Warning: lavaan->lav_object_post_check():  
##    covariance matrix of latent variables is not positive definite ; use 
##    lavInspect(fit, "cov.lv") to investigate.

# check results
summary(fit_lgcm_quadratic, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 34 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        13
## 
##   Number of observations                           221
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.142
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.706
## 
## Model Test Baseline Model:
## 
##   Test statistic                               601.612
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.009
##                                                       
##   Robust Comparative Fit Index (CFI)             1.000
##   Robust Tucker-Lewis Index (TLI)                1.009
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -989.103
##   Loglikelihood unrestricted model (H1)       -989.032
##                                                       
##   Akaike (AIC)                                2004.205
##   Bayesian (BIC)                              2048.382
##   Sample-size adjusted Bayesian (SABIC)       2007.184
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.130
##   P-value H_0: RMSEA <= 0.050                    0.774
##   P-value H_0: RMSEA >= 0.080                    0.150
##                                                       
##   Robust RMSEA                                   0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.130
##   P-value H_0: Robust RMSEA <= 0.050             0.774
##   P-value H_0: Robust RMSEA >= 0.080             0.150
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.003
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept =~                                                          
##     read_t0           1.000                               0.753    0.858
##     read_t1           1.000                               0.753    0.751
##     read_t2           1.000                               0.753    0.687
##     read_t3           1.000                               0.753    0.619
##   slope =~                                                              
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.574    0.573
##     read_t2           2.000                               1.148    1.046
##     read_t3           3.000                               1.722    1.415
##   quadratic =~                                                          
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.000    0.000
##     read_t2           4.000                               0.000    0.000
##     read_t3           9.000                               0.000    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept ~~                                                          
##     slope            -0.022    0.127   -0.173    0.863   -0.051   -0.051
##     quadratic        -0.002    0.032   -0.053    0.957     -Inf     -Inf
##   slope ~~                                                              
##     quadratic        -0.044    0.021   -2.134    0.033     -Inf     -Inf
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           2.516    0.059   42.622    0.000    2.516    2.867
##    .read_t1           4.041    0.067   59.912    0.000    4.041    4.030
##    .read_t2           5.021    0.074   68.019    0.000    5.021    4.575
##    .read_t3           5.803    0.082   70.878    0.000    5.803    4.768
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     quadratic         0.000                                 NaN      NaN
##    .read_t0           0.203    0.116    1.743    0.081    0.203    0.263
##    .read_t1           0.244    0.044    5.483    0.000    0.244    0.242
##    .read_t2           0.126    0.049    2.563    0.010    0.126    0.105
##    .read_t3           0.494    0.134    3.682    0.000    0.494    0.334
##     intercept         0.568    0.126    4.492    0.000    1.000    1.000
##     slope             0.330    0.124    2.654    0.008    1.000    1.000

Fit a quadratic latent growth model with a latent quadratic term

Show me how

# specify the LGCM
lgcm_quadratic_latent <- '
  # Latent intercept factor
  intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
  
  # Latent slope factor
  slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
  
  # Latent quadratic factor
  quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
  
  # Latent variables allowed to covary
  intercept ~~ slope
  intercept ~~ quadratic
  slope ~~ quadratic
'

# estimate the LGCM
fit_lgcm_quadratic_latent <- sem(lgcm_quadratic_latent, data = reading_wide, missing = "fiml")

## Warning: lavaan->lav_object_post_check():  
##    covariance matrix of latent variables is not positive definite ; use 
##    lavInspect(fit, "cov.lv") to investigate.

# check results
summary(fit_lgcm_quadratic_latent, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 65 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
## 
##   Number of observations                           221
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                               601.612
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
##                                                       
##   Robust Comparative Fit Index (CFI)             1.000
##   Robust Tucker-Lewis Index (TLI)                1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -989.032
##   Loglikelihood unrestricted model (H1)       -989.032
##                                                       
##   Akaike (AIC)                                2006.063
##   Bayesian (BIC)                              2053.638
##   Sample-size adjusted Bayesian (SABIC)       2009.271
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
##                                                       
##   Robust RMSEA                                   0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: Robust RMSEA <= 0.050                NA
##   P-value H_0: Robust RMSEA >= 0.080                NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept =~                                                          
##     read_t0           1.000                               0.761    0.867
##     read_t1           1.000                               0.761    0.760
##     read_t2           1.000                               0.761    0.691
##     read_t3           1.000                               0.761    0.627
##   slope =~                                                              
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.602    0.601
##     read_t2           2.000                               1.204    1.094
##     read_t3           3.000                               1.805    1.487
##   quadratic =~                                                          
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.067    0.067
##     read_t2           4.000                               0.269    0.244
##     read_t3           9.000                               0.605    0.498
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept ~~                                                          
##     slope            -0.036    0.131   -0.272    0.786   -0.078   -0.078
##     quadratic         0.002    0.033    0.060    0.952    0.039    0.039
##   slope ~~                                                              
##     quadratic        -0.056    0.037   -1.515    0.130   -1.378   -1.378
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           2.516    0.059   42.631    0.000    2.516    2.868
##    .read_t1           4.041    0.067   59.977    0.000    4.041    4.034
##    .read_t2           5.021    0.074   67.815    0.000    5.021    4.562
##    .read_t3           5.803    0.082   71.074    0.000    5.803    4.781
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           0.191    0.118    1.620    0.105    0.191    0.248
##    .read_t1           0.237    0.047    4.996    0.000    0.237    0.236
##    .read_t2           0.130    0.050    2.611    0.009    0.130    0.107
##    .read_t3           0.456    0.165    2.762    0.006    0.456    0.310
##     intercept         0.579    0.129    4.489    0.000    1.000    1.000
##     slope             0.362    0.150    2.412    0.016    1.000    1.000
##     quadratic         0.005    0.012    0.380    0.704    1.000    1.000

Let’s compare all four models. Which model specification is the preferred solution?

Show me how

# check fit measures of each of the four models
fitMeasures(fit_lgcm_ri, c("aic", "bic", "rmsea", "cfi", "tli"))

##      aic      bic    rmsea      cfi      tli 
## 2087.886 2118.470    0.280    0.854    0.825

fitMeasures(fit_lgcm_slope, c("aic", "bic", "rmsea", "cfi", "tli"))

##      aic      bic    rmsea      cfi      tli 
## 2017.647 2055.027    0.148    0.976    0.951

fitMeasures(fit_lgcm_quadratic, c("aic", "bic", "rmsea", "cfi", "tli"))

##      aic      bic    rmsea      cfi      tli 
## 2004.205 2048.382    0.000    1.000    1.009

fitMeasures(fit_lgcm_quadratic_latent, c("aic", "bic", "rmsea", "cfi", "tli"))

##      aic      bic    rmsea      cfi      tli 
## 2006.063 2053.638    0.000    1.000    1.000

The quadratic latent growth model with a fixed coefficient for age squared is the preferred solution.

Finally, using the model specification with the preferred solution, fit a latent growth curve model adjusting for gender and amount of cognitive support at home. How can we interpret this growth curve model with covariates?

Show me how

# specify the LGCM
lgcm_quadratic_covariates <- '
  # Latent intercept factor
  intercept =~ 1*read_t0 + 1*read_t1 + 1*read_t2 + 1*read_t3
  
  # Latent slope factor
  slope =~ 0*read_t0 + 1*read_t1 + 2*read_t2 + 3*read_t3
  
  # Latent quadratic factor
  quadratic =~ 0*read_t0 + 1*read_t1 + 4*read_t2 + 9*read_t3
  
  # Fix variance of quad to zero (fixed effect only)
  quadratic ~~ 0*quadratic
  
  # Latent variables allowed to covary
  intercept ~~ slope
  
  # Regress growth factors on time-invariant covariates
  intercept ~ male + homecog
  slope     ~ male + homecog
  
  # If we had, this is how we would regress observed outcomes on time-varying covariates
  # read_t0 ~ x_t0
  # read_t1 ~ x_t1
  # read_t2 ~ x_t2
  # read_t3 ~ x_t3
'

# estimate the LGCM
fit_lgcm_quadratic_covariates <- sem(lgcm_quadratic_covariates, data = reading_wide, missing = "fiml")

# check results
summary(fit_lgcm_quadratic_covariates, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-19 ended normally after 37 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
## 
##   Number of observations                           221
##   Number of missing patterns                         1
## 
## Model Test User Model:
##                                                       
##   Test statistic                                19.084
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.008
## 
## Model Test Baseline Model:
## 
##   Test statistic                               614.316
##   Degrees of freedom                                14
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.980
##   Tucker-Lewis Index (TLI)                       0.960
##                                                       
##   Robust Comparative Fit Index (CFI)             0.980
##   Robust Tucker-Lewis Index (TLI)                0.960
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -992.222
##   Loglikelihood unrestricted model (H1)       -982.680
##                                                       
##   Akaike (AIC)                                2014.444
##   Bayesian (BIC)                              2065.416
##   Sample-size adjusted Bayesian (SABIC)       2017.881
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.088
##   90 Percent confidence interval - lower         0.042
##   90 Percent confidence interval - upper         0.137
##   P-value H_0: RMSEA <= 0.050                    0.080
##   P-value H_0: RMSEA >= 0.080                    0.659
##                                                       
##   Robust RMSEA                                   0.088
##   90 Percent confidence interval - lower         0.042
##   90 Percent confidence interval - upper         0.137
##   P-value H_0: Robust RMSEA <= 0.050             0.080
##   P-value H_0: Robust RMSEA >= 0.080             0.659
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.057
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept =~                                                          
##     read_t0           1.000                               0.744    0.789
##     read_t1           1.000                               0.744    0.792
##     read_t2           1.000                               0.744    0.691
##     read_t3           1.000                               0.744    0.594
##   slope =~                                                              
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.269    0.286
##     read_t2           2.000                               0.539    0.500
##     read_t3           3.000                               0.808    0.644
##   quadratic =~                                                          
##     read_t0           0.000                               0.000    0.000
##     read_t1           1.000                               0.000    0.000
##     read_t2           4.000                               0.000    0.000
##     read_t3           9.000                               0.000    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intercept ~                                                           
##     male             -0.143    0.116   -1.239    0.215   -0.193   -0.096
##     homecog           0.045    0.024    1.910    0.056    0.061    0.149
##   slope ~                                                               
##     male              0.019    0.047    0.392    0.695    0.069    0.034
##     homecog           0.019    0.010    1.999    0.046    0.072    0.176
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .intercept ~~                                                          
##    .slope             0.020    0.024    0.831    0.406    0.101    0.101
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .read_t0           2.181    0.232    9.388    0.000    2.181    2.312
##    .read_t1           3.520    0.230   15.277    0.000    3.520    3.745
##    .read_t2           4.314    0.265   16.285    0.000    4.314    4.005
##    .read_t3           4.911    0.323   15.187    0.000    4.911    3.918
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     quadratic         0.000                                 NaN      NaN
##    .read_t0           0.336    0.054    6.247    0.000    0.336    0.377
##    .read_t1           0.209    0.029    7.258    0.000    0.209    0.236
##    .read_t2           0.220    0.033    6.743    0.000    0.220    0.189
##    .read_t3           0.220    0.054    4.034    0.000    0.220    0.140
##    .intercept         0.537    0.072    7.436    0.000    0.969    0.969
##    .slope             0.070    0.014    5.046    0.000    0.968    0.968

Second part

Download the read_long.dta data file from the website. Download the file and store it in a sensible folder in your computer (preferably the folder of your R Project). Then, we can use the read_stata() function to import the dataset into R.

Again, because the data is available online, we can also use the following function to import data into R, which is more efficient.

# import data into R
physfunc <- read_stata("https://thiagoroliveira.com/LDA-2025/data/physfunc.dta")

The data file physfunc.dta contains repeated measurements (for up to six occasions) on health functioning from a study of British civil servants, the Whitehall II study. Health functioning was assessed by the SF-36, a 36 item instrument that comprises eight subscales covering physical, psychological and social functioning. These eight scales can be summarised into physical and mental health components. These are scaled using general US population norms to have mean values of 50 and low scores imply poor functioning. We will analyse physical health functioning (phf).

The variables available in the physfunc dataset are the following:

Name	Description
`id`	Identifier for each respondent
`occ`	Occasion (1, 2, 3, 4, 5, 6)
`female`	Gender (1 = female)
`grade`	Employment grade at baseline (1 = high grade, 2 = intermediate grade, 3 = low grade)
`age50`	Mean age at baseline
`phf`	Physical health functioning

The aim of this exercise is to investigate the following questions:

How does women’s physical functioning change as they get older?
To what extent does this vary between women?

Considering the subset of female respondents only, carry out the following analysis and interpret the results:

Obtain summary statistics for phf and age50 at each occasion
Plot physical health functioning trajectories by age for any five individuals in the dataset.
Fit a simple random intercept growth model for phf with no explanatory variables.
Fit a random slope growth model for phf which allows the rate of change in physical health functioning to vary among individuals.
Fit a quadratic growth model for phf which allows for non-linear trajectories.
Which model is preferred?

Longitudinal Data Analysis

Day 2: Latent Growth Curve Models—an SEM approach

Thiago R Oliveira

First part

Second part