Today we are learning how to use panel data to assess reverse causality, temporal ordering, and reciprocal relationships, implementing some cross-lagged panel models. We will learn how to reshape data between wide and long formats, fit the traditional cross-lagged panel model, and fit Allison’s dynamic panel model with fixed effects.
The relationship between policing and crime is a tricky one. Some researchers argue that heavy-handed policing can contribute to increase crime via legal cynicism and strain. However, more police resources—including use of force—are often deployed in high-crime areas. Here, we examine evidence from Chicago. We use publicly available data on violent crimes (i.e., homicides and robberies) and complaints about police misuse of force across all 343 Chicago neighbourhoods between 1991 and 2016. We will use cross-lagged panel models to examine this relationship.
The chicago.Rdata
file contains data on 343 Chicago
neighbourhood clusters, and 25 years (1991 through 2016). The
data.frame
includes the following variables:
Name | Description |
---|---|
NC_NUM |
an ID variable indicating Chicago neighbourhood clusters |
year |
the year, ranging from 1991 to 2016 |
complaints |
(logged) number of complaints about police misuse of force in Chicago |
violent |
(logged) number of violent crimes reported |
population_density |
(logged) population density (measured in 1990) |
FAC_disadv |
factor scores indicating concentrated disadvantage (measured in 1990) |
Please download the chicago.Rdata
from this
link. Then use the load()
function to load the
downloaded data into R now.
load("chicago.Rdata")
Question 1
Is the dataset set up in a wide format or in a long format?
Question 2
To fit a cross-lagged panel model, we need to have a wide dataset.
Let’s use the pivot_wider()
function from the
tidyr
package. To avoid tedious coding in the next
question, let’s filter the dataset so that it includes data from 2011 to
2014 only.
# load the dplyr and tidy packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
# filter and reshape the dataset
chicago_wide <- chicago %>%
filter(year > 2010 & year < 2015) %>%
pivot_wider(names_from = year,
values_from = -c(NC_NUM, year, FAC_disadv, population_density))
# check columns of the wide dataset
names(chicago_wide)
## [1] "NC_NUM" "population_density" "FAC_disadv"
## [4] "complaints_2011" "complaints_2012" "complaints_2013"
## [7] "complaints_2014" "violent_2011" "violent_2012"
## [10] "violent_2013" "violent_2014"
Question 3 Using the sem()
function
from the lavaan()
package, let’s fit our first cross-lagged
panel model assessing the reciprocal relationship between complaints and
violent crime.
# load the lavaan package
library(lavaan)
## Warning: package 'lavaan' was built under R version 4.3.3
## This is lavaan 0.6-18
## lavaan is FREE software! Please report any bugs.
# specify the model
clpm_model <- '
violent_2014 ~ a * violent_2013 + b * complaints_2013
violent_2013 ~ a * violent_2012 + b * complaints_2012
violent_2012 ~ a * violent_2011 + b * complaints_2011
complaints_2014 ~ c * complaints_2013 + d * violent_2013
complaints_2013 ~ c * complaints_2012 + d * violent_2012
complaints_2012 ~ c * complaints_2011 + d * violent_2011
'
# fit the CLPM
clpm_fit <- sem(clpm_model, estimator = "ML", data = chicago_wide)
# summary results
summary(clpm_fit)
## lavaan 0.6-18 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
## Number of equality constraints 8
##
## Number of observations 342
##
## Model Test User Model:
##
## Test statistic 260.675
## Degrees of freedom 22
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## violent_2014 ~
## vilnt_2013 (a) 0.886 0.014 65.178 0.000
## cmpln_2013 (b) 0.056 0.014 3.871 0.000
## violent_2013 ~
## vilnt_2012 (a) 0.886 0.014 65.178 0.000
## cmpln_2012 (b) 0.056 0.014 3.871 0.000
## violent_2012 ~
## vilnt_2011 (a) 0.886 0.014 65.178 0.000
## cmpln_2011 (b) 0.056 0.014 3.871 0.000
## complaints_2014 ~
## cmpln_2013 (c) 0.264 0.030 8.925 0.000
## vilnt_2013 (d) 0.283 0.028 10.107 0.000
## complaints_2013 ~
## cmpln_2012 (c) 0.264 0.030 8.925 0.000
## vilnt_2012 (d) 0.283 0.028 10.107 0.000
## complaints_2012 ~
## cmpln_2011 (c) 0.264 0.030 8.925 0.000
## vilnt_2011 (d) 0.283 0.028 10.107 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .violent_2014 ~~
## .complants_2014 0.041 0.015 2.716 0.007
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .violent_2014 0.137 0.010 13.077 0.000
## .violent_2013 0.141 0.011 13.077 0.000
## .violent_2012 0.104 0.008 13.077 0.000
## .complants_2014 0.557 0.043 13.077 0.000
## .complants_2013 0.480 0.037 13.077 0.000
## .complants_2012 0.563 0.043 13.077 0.000
Question 4 Now, let’s add two time-constant control
variables to the cross-lagged panel model:
population_density
and FAC_disadv
.
# specify the model
clpm_model_covariates <- '
violent_2014 ~ a * violent_2013 + b * complaints_2013
violent_2013 ~ a * violent_2012 + b * complaints_2012
violent_2012 ~ a * violent_2011 + b * complaints_2011
complaints_2014 ~ c * complaints_2013 + d * violent_2013
complaints_2013 ~ c * complaints_2012 + d * violent_2012
complaints_2012 ~ c * complaints_2011 + d * violent_2011
complaints_2011 ~ population_density + FAC_disadv
violent_2011 ~ population_density + FAC_disadv
'
# fit the CLPM
clpm_fit_cov <- sem(clpm_model_covariates, estimator = "ML", data = chicago_wide)
# summary results
summary(clpm_fit_cov)
## lavaan 0.6-18 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 25
## Number of equality constraints 8
##
## Number of observations 342
##
## Model Test User Model:
##
## Test statistic 380.718
## Degrees of freedom 35
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## violent_2014 ~
## vilnt_2013 (a) 0.886 0.013 69.806 0.000
## cmpln_2013 (b) 0.056 0.013 4.140 0.000
## violent_2013 ~
## vilnt_2012 (a) 0.886 0.013 69.806 0.000
## cmpln_2012 (b) 0.056 0.013 4.140 0.000
## violent_2012 ~
## vilnt_2011 (a) 0.886 0.013 69.806 0.000
## cmpln_2011 (b) 0.056 0.013 4.140 0.000
## complaints_2014 ~
## cmpln_2013 (c) 0.264 0.028 9.426 0.000
## vilnt_2013 (d) 0.283 0.026 10.703 0.000
## complaints_2013 ~
## cmpln_2012 (c) 0.264 0.028 9.426 0.000
## vilnt_2012 (d) 0.283 0.026 10.703 0.000
## complaints_2012 ~
## cmpln_2011 (c) 0.264 0.028 9.426 0.000
## vilnt_2011 (d) 0.283 0.026 10.703 0.000
## complaints_2011 ~
## ppltn_dnst -0.202 0.075 -2.708 0.007
## FAC_disadv 0.206 0.042 4.956 0.000
## violent_2011 ~
## ppltn_dnst 0.117 0.076 1.536 0.124
## FAC_disadv 0.244 0.042 5.750 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .violent_2014 ~~
## .complants_2014 0.041 0.015 2.716 0.007
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .violent_2014 0.137 0.010 13.077 0.000
## .violent_2013 0.141 0.011 13.077 0.000
## .violent_2012 0.104 0.008 13.077 0.000
## .complants_2014 0.557 0.043 13.077 0.000
## .complants_2013 0.480 0.037 13.077 0.000
## .complants_2012 0.563 0.043 13.077 0.000
## .complants_2011 0.726 0.056 13.077 0.000
## .violent_2011 0.752 0.058 13.077 0.000
Question 5 As we discussed, the traditional
cross-lagged panel model does not properly handle unobserved
heterogeneity. Let’s fit Allison et al.’s dynamic panel model with fixed
effects. We can use the dpm()
function from the
dpm
package.
This function requires the dataset to be in a long format. Coding is
simple, so we can use all 25 years of data! We just need to tell
R
that this is a panel dataset using the
panel_data()
function.
# load the dpm package
library(dpm)
# treat the dataset as a panel dataset
chicago.panel <- chicago %>%
mutate(wave = year - 1990) %>%
panel_data(id = NC_NUM,
wave = wave)
The DPM approach does not permit modelling reciprocal relationships. So, let’s start modelling the effects of complaints about police misuse of force on violent crime while accounting for reverse causality.
# fit a DPM with FEs
dpm_violent <- dpm(violent ~ pre(lag(complaints)),
data = chicago.panel, error.inv = T, information = "observed", missing = "ML")
# print results
summary(dpm_violent)
## MODEL INFO:
## Dependent variable: violent
## Total observations: 342
## Complete observations: 342
## Time periods: 2 - 26
##
## MODEL FIT:
## 𝛘²(669) = 2430.269
## RMSEA = 0.088, 90% CI [0.084, 0.092]
## p(RMSEA < .05) = 0
## SRMR = 0.045
##
## | | Est. | S.E. | z val. | p |
## |:-----------------------|------:|------:|-------:|------:|
## | complaints (t - 1) | 0.020 | 0.005 | 3.922 | 0.000 |
## | violent (t - 1) | 0.539 | 0.010 | 52.503 | 0.000 |
##
## Model converged after 412 iterations
Question 6 Now, let’s add two time-constant control
variables to the cross-lagged panel model:
population_density
and FAC_disadv
.
# fit a DPM with FEs
dpm_violent_cov <- dpm(violent ~ pre(lag(complaints)) | FAC_disadv + population_density,
data = chicago.panel, error.inv = T, information = "observed", missing = "ML")
# print results
summary(dpm_violent_cov)
## MODEL INFO:
## Dependent variable: violent
## Total observations: 342
## Complete observations: 342
## Time periods: 2 - 26
##
## MODEL FIT:
## 𝛘²(717) = 2558.631
## RMSEA = 0.087, 90% CI [0.083, 0.09]
## p(RMSEA < .05) = 0
## SRMR = 0.047
##
## | | Est. | S.E. | z val. | p |
## |:-----------------------|------:|------:|-------:|------:|
## | complaints (t - 1) | 0.021 | 0.005 | 4.050 | 0.000 |
## | FAC_disadv | 0.129 | 0.017 | 7.662 | 0.000 |
## | population_density | 0.089 | 0.030 | 3.001 | 0.003 |
## | violent (t - 1) | 0.539 | 0.010 | 52.492 | 0.000 |
##
## Model converged after 510 iterations
Question 7 Repeat the procedures above to model the effects of violent crime on complaints about police misuse of force accounting for reverse causality.