Today we are learning how to use panel data to assess reverse causality, temporal ordering, and reciprocal relationships, implementing some cross-lagged panel models. We will learn how to reshape data between wide and long formats, fit the traditional cross-lagged panel model, and fit Allison’s dynamic panel model with fixed effects.

Police misconduct and violent crime in Chicago

The relationship between policing and crime is a tricky one. Some researchers argue that heavy-handed policing can contribute to increase crime via legal cynicism and strain. However, more police resources—including use of force—are often deployed in high-crime areas. Here, we examine evidence from Chicago. We use publicly available data on violent crimes (i.e., homicides and robberies) and complaints about police misuse of force across all 343 Chicago neighbourhoods between 1991 and 2016. We will use cross-lagged panel models to examine this relationship.

The chicago.Rdata file contains data on 343 Chicago neighbourhood clusters, and 25 years (1991 through 2016). The data.frame includes the following variables:

Name	Description
`NC_NUM`	an ID variable indicating Chicago neighbourhood clusters
`year`	the year, ranging from 1991 to 2016
`complaints`	(logged) number of complaints about police misuse of force in Chicago
`violent`	(logged) number of violent crimes reported
`population_density`	(logged) population density (measured in 1990)
`FAC_disadv`	factor scores indicating concentrated disadvantage (measured in 1990)

Please download the chicago.Rdata from this link. Then use the load() function to load the downloaded data into R now.

load("chicago.Rdata")

Question 1

Is the dataset set up in a wide format or in a long format?

Question 2

To fit a cross-lagged panel model, we need to have a wide dataset. Let’s use the pivot_wider() function from the tidyr package. To avoid tedious coding in the next question, let’s filter the dataset so that it includes data from 2011 to 2014 only.

# load the dplyr and tidy packages
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

# filter and reshape the dataset
chicago_wide <- chicago %>%
  filter(year > 2010 & year < 2015) %>%
  pivot_wider(names_from = year,
              values_from = -c(NC_NUM, year, FAC_disadv, population_density))

# check columns of the wide dataset
names(chicago_wide)

##  [1] "NC_NUM"             "population_density" "FAC_disadv"        
##  [4] "complaints_2011"    "complaints_2012"    "complaints_2013"   
##  [7] "complaints_2014"    "violent_2011"       "violent_2012"      
## [10] "violent_2013"       "violent_2014"

Question 3 Using the sem() function from the lavaan() package, let’s fit our first cross-lagged panel model assessing the reciprocal relationship between complaints and violent crime.

# load the lavaan package
library(lavaan)

## Warning: package 'lavaan' was built under R version 4.3.3

## This is lavaan 0.6-18
## lavaan is FREE software! Please report any bugs.

# specify the model
clpm_model <- '
  violent_2014 ~ a * violent_2013 + b * complaints_2013
  violent_2013 ~ a * violent_2012 + b * complaints_2012
  violent_2012 ~ a * violent_2011 + b * complaints_2011
  
  complaints_2014 ~ c * complaints_2013 + d * violent_2013
  complaints_2013 ~ c * complaints_2012 + d * violent_2012
  complaints_2012 ~ c * complaints_2011 + d * violent_2011
'

# fit the CLPM
clpm_fit <- sem(clpm_model, estimator = "ML", data = chicago_wide)

# summary results
summary(clpm_fit)

## lavaan 0.6-18 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        19
##   Number of equality constraints                     8
## 
##   Number of observations                           342
## 
## Model Test User Model:
##                                                       
##   Test statistic                               260.675
##   Degrees of freedom                                22
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                     Estimate  Std.Err  z-value  P(>|z|)
##   violent_2014 ~                                       
##     vilnt_2013 (a)     0.886    0.014   65.178    0.000
##     cmpln_2013 (b)     0.056    0.014    3.871    0.000
##   violent_2013 ~                                       
##     vilnt_2012 (a)     0.886    0.014   65.178    0.000
##     cmpln_2012 (b)     0.056    0.014    3.871    0.000
##   violent_2012 ~                                       
##     vilnt_2011 (a)     0.886    0.014   65.178    0.000
##     cmpln_2011 (b)     0.056    0.014    3.871    0.000
##   complaints_2014 ~                                    
##     cmpln_2013 (c)     0.264    0.030    8.925    0.000
##     vilnt_2013 (d)     0.283    0.028   10.107    0.000
##   complaints_2013 ~                                    
##     cmpln_2012 (c)     0.264    0.030    8.925    0.000
##     vilnt_2012 (d)     0.283    0.028   10.107    0.000
##   complaints_2012 ~                                    
##     cmpln_2011 (c)     0.264    0.030    8.925    0.000
##     vilnt_2011 (d)     0.283    0.028   10.107    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .violent_2014 ~~                                     
##    .complants_2014    0.041    0.015    2.716    0.007
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .violent_2014      0.137    0.010   13.077    0.000
##    .violent_2013      0.141    0.011   13.077    0.000
##    .violent_2012      0.104    0.008   13.077    0.000
##    .complants_2014    0.557    0.043   13.077    0.000
##    .complants_2013    0.480    0.037   13.077    0.000
##    .complants_2012    0.563    0.043   13.077    0.000

Question 4 Now, let’s add two time-constant control variables to the cross-lagged panel model: population_density and FAC_disadv.

# specify the model
clpm_model_covariates <- '
  violent_2014 ~ a * violent_2013 + b * complaints_2013
  violent_2013 ~ a * violent_2012 + b * complaints_2012
  violent_2012 ~ a * violent_2011 + b * complaints_2011
  
  complaints_2014 ~ c * complaints_2013 + d * violent_2013
  complaints_2013 ~ c * complaints_2012 + d * violent_2012
  complaints_2012 ~ c * complaints_2011 + d * violent_2011
  
  complaints_2011 ~ population_density + FAC_disadv
  violent_2011 ~ population_density + FAC_disadv
'

# fit the CLPM
clpm_fit_cov <- sem(clpm_model_covariates, estimator = "ML", data = chicago_wide)

# summary results
summary(clpm_fit_cov)

## lavaan 0.6-18 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                     8
## 
##   Number of observations                           342
## 
## Model Test User Model:
##                                                       
##   Test statistic                               380.718
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                     Estimate  Std.Err  z-value  P(>|z|)
##   violent_2014 ~                                       
##     vilnt_2013 (a)     0.886    0.013   69.806    0.000
##     cmpln_2013 (b)     0.056    0.013    4.140    0.000
##   violent_2013 ~                                       
##     vilnt_2012 (a)     0.886    0.013   69.806    0.000
##     cmpln_2012 (b)     0.056    0.013    4.140    0.000
##   violent_2012 ~                                       
##     vilnt_2011 (a)     0.886    0.013   69.806    0.000
##     cmpln_2011 (b)     0.056    0.013    4.140    0.000
##   complaints_2014 ~                                    
##     cmpln_2013 (c)     0.264    0.028    9.426    0.000
##     vilnt_2013 (d)     0.283    0.026   10.703    0.000
##   complaints_2013 ~                                    
##     cmpln_2012 (c)     0.264    0.028    9.426    0.000
##     vilnt_2012 (d)     0.283    0.026   10.703    0.000
##   complaints_2012 ~                                    
##     cmpln_2011 (c)     0.264    0.028    9.426    0.000
##     vilnt_2011 (d)     0.283    0.026   10.703    0.000
##   complaints_2011 ~                                    
##     ppltn_dnst        -0.202    0.075   -2.708    0.007
##     FAC_disadv         0.206    0.042    4.956    0.000
##   violent_2011 ~                                       
##     ppltn_dnst         0.117    0.076    1.536    0.124
##     FAC_disadv         0.244    0.042    5.750    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .violent_2014 ~~                                     
##    .complants_2014    0.041    0.015    2.716    0.007
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .violent_2014      0.137    0.010   13.077    0.000
##    .violent_2013      0.141    0.011   13.077    0.000
##    .violent_2012      0.104    0.008   13.077    0.000
##    .complants_2014    0.557    0.043   13.077    0.000
##    .complants_2013    0.480    0.037   13.077    0.000
##    .complants_2012    0.563    0.043   13.077    0.000
##    .complants_2011    0.726    0.056   13.077    0.000
##    .violent_2011      0.752    0.058   13.077    0.000

Question 5 As we discussed, the traditional cross-lagged panel model does not properly handle unobserved heterogeneity. Let’s fit Allison et al.’s dynamic panel model with fixed effects. We can use the dpm() function from the dpm package.

This function requires the dataset to be in a long format. Coding is simple, so we can use all 25 years of data! We just need to tell R that this is a panel dataset using the panel_data() function.

# load the dpm package
library(dpm)

# treat the dataset as a panel dataset
chicago.panel <- chicago %>%
  mutate(wave = year - 1990) %>%
  panel_data(id = NC_NUM,
             wave = wave)

The DPM approach does not permit modelling reciprocal relationships. So, let’s start modelling the effects of complaints about police misuse of force on violent crime while accounting for reverse causality.

# fit a DPM with FEs
dpm_violent <- dpm(violent ~ pre(lag(complaints)),
                data = chicago.panel, error.inv = T, information = "observed", missing = "ML")

# print results
summary(dpm_violent)

## MODEL INFO:
## Dependent variable: violent 
## Total observations: 342 
## Complete observations: 342 
## Time periods: 2 - 26 
## 
## MODEL FIT:
## 𝛘²(669) = 2430.269
## RMSEA = 0.088, 90% CI [0.084, 0.092]
## p(RMSEA < .05) = 0
## SRMR = 0.045 
## 
## |                        |  Est. |  S.E. | z val. |     p |
## |:-----------------------|------:|------:|-------:|------:|
## | complaints (t - 1)     | 0.020 | 0.005 |  3.922 | 0.000 |
## | violent (t - 1)        | 0.539 | 0.010 | 52.503 | 0.000 |
## 
## Model converged after 412 iterations

Question 6 Now, let’s add two time-constant control variables to the cross-lagged panel model: population_density and FAC_disadv.

# fit a DPM with FEs
dpm_violent_cov <- dpm(violent ~ pre(lag(complaints)) | FAC_disadv + population_density,
                data = chicago.panel, error.inv = T, information = "observed", missing = "ML")

# print results
summary(dpm_violent_cov)

## MODEL INFO:
## Dependent variable: violent 
## Total observations: 342 
## Complete observations: 342 
## Time periods: 2 - 26 
## 
## MODEL FIT:
## 𝛘²(717) = 2558.631
## RMSEA = 0.087, 90% CI [0.083, 0.09]
## p(RMSEA < .05) = 0
## SRMR = 0.047 
## 
## |                        |  Est. |  S.E. | z val. |     p |
## |:-----------------------|------:|------:|-------:|------:|
## | complaints (t - 1)     | 0.021 | 0.005 |  4.050 | 0.000 |
## | FAC_disadv             | 0.129 | 0.017 |  7.662 | 0.000 |
## | population_density     | 0.089 | 0.030 |  3.001 | 0.003 |
## | violent (t - 1)        | 0.539 | 0.010 | 52.492 | 0.000 |
## 
## Model converged after 510 iterations

Question 7 Repeat the procedures above to model the effects of violent crime on complaints about police misuse of force accounting for reverse causality.

Longitudinal Data Analysis

Cross-lagged panel models

Thiago R Oliveira

Police misconduct and violent crime in Chicago