Multilevel Data Analysis and Longitudinal Data Analysis

2025-09-30

Prerequisites

Required Packages

Let’s start by loading the necessary packages for our analysis:

# Load required packages
library(lme4)      # For linear mixed-effects models
library(tidyverse) # For data manipulation and visualization
library(haven)     # For reading Stata files

Additional Useful Packages

# Additional packages for analysis
library(lmerTest)  # Adds p-values to lme4 output
library(geepack)   # For GEE models
library(sjPlot)    # For model visualization
library(performance) # For model diagnostics

Part 1 - Linear Mixed Effect Models

Introduction to Mixed Effect Models and Linear Effect Model

What are Mixed Effect Models?

  • Mixed-effects models incorporate both fixed effects and random effects
  • Fixed effects: Parameters associated with entire population or repeatable levels of experimental factors

What are Mixed Effect Models?

  • Random effects: Associated with individual experimental units drawn at random from a population
  • Also known as hierarchical linear models, multilevel models, or random effects models

Linear Mixed Effect Models

  • Extension of linear regression that accounts for:

    • Non-independence of observations
    • Hierarchical data structures
    • Repeated measures data
    • Individual variability across participants/items

Motivations for Linear Mixed Effect Models

Why Use Mixed Effects Models?

  1. Account for clustering in data
  • Observations within groups are more similar than across groups
  • Violates independence assumption of standard regression
  1. Handle missing data better than ANOVA
  • Uses all available data rather than listwise deletion
  • Precision-weighted parameter estimates

Motivations for Linear Mixed Effect Models

Why Use Mixed Effects Models?

  1. Model both participant and item variability simultaneously
  • Accounts for individual differences
  • Generalizable across both subjects and stimuli
  1. Flexible with unbalanced designs
  • No requirement for equal group sizes
  • Handles irregular measurement occasions

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

  1. Multilevel/Hierarchical Data
  • Students nested within schools
  • Patients within hospitals
  • Employees within companies

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

  1. Longitudinal/Repeated Measures Data
  • Multiple measurements per participant over time
  • Growth curve analysis
  • Panel studies

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

  1. Crossed Random Effects
  • Participants respond to multiple items
  • Multiple raters evaluate multiple subjects

Data Suitable for Linear Mixed Effect Models

Data Format Requirements

  • Long format: Each row = one observation
  • Variables: Subject ID, Item ID, dependent variable, predictors

Assumptions for Linear Mixed Effect Models

Key Assumptions

  1. Linearity
  • Linear relationship between predictors and outcome
  1. Independence of Residuals
  • After accounting for random effects structure

Assumptions for Linear Mixed Effect Models

Key Assumptions

  1. Normality of Random Effects
  • Random intercepts and slopes normally distributed
  • Mean = 0, variance estimated by model
  1. Homoscedasticity
  • Constant variance of residuals

Assumptions for Linear Mixed Effect Models

Key Assumptions

  1. No Perfect Multicollinearity
  • Among fixed effects predictors

Less Restrictive Than Traditional ANOVA

  • More robust to violations of normality assumption
  • Handles unbalanced designs naturally

R Functions and Packages for Linear Mixed Effect Models

Primary Packages

lme4 Package (most common)

```{r}
library(lme4)

# Basic syntax for linear mixed models
lmer(outcome ~ fixed_effects + (random_effects|grouping_factor), 
     data = dataset)

# Example with random intercepts and slopes
lmer(RT ~ condition + (1 + condition|participant) + (1|item), 
     data = mydata)
```

R Functions and Packages for Linear Mixed Effect Models

Primary Packages

nlme Package (alternative with more covariance structures)

```{r}
#| echo: true
#| eval: false

library(nlme)
lme(outcome ~ fixed_effects, random = ~ random_effects|group, 
    data = dataset)
```

R Functions and Packages for Linear Mixed Effect Models

Additional Useful Packages

  • lmerTest: Adds p-values to lme4 output
  • afex: Simplified ANOVA-type analysis with mixed models
  • sjPlot: Easy visualization of mixed model results
  • performance: Model diagnostics and assumption checking

Loading and Exploring Our Dataset

Let’s read the Stata dataset and explore its structure:

# Read Stata data file
mydata <- read_dta("5.1.dta")
# First few rows
head(mydata)
# A tibble: 6 × 9
  caseid schoolid score cohort90 female sclass schtype schurban schdenom
   <dbl>    <dbl> <dbl>    <dbl>  <dbl>  <dbl>   <dbl>    <dbl>    <dbl>
1     18        1     0       -6      1      2       0        1        0
2     17        1    10       -6      1      2       0        1        0
3     19        1     0       -6      1      4       0        1        0
4     20        1    40       -6      1      3       0        1        0
5     21        1    42       -6      1      2       0        1        0
6     13        1     4       -6      1      2       0        1        0

Practical Demonstration with 5.1.dta Dataset

Data Exploration

Loading and Exploring Our Dataset

# Summary of key variables
summary(mydata$score)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   19.00   33.00   31.09   45.00   75.00 

Loading and Exploring Our Dataset

# Basic visualization - score by schoolid
mydata %>%
  ggplot(aes(x = as.factor(schoolid), y = score)) +
  geom_boxplot() +
  labs(title = "Distribution of Score by School ID",
       x = "School ID", y = "Score") +
  theme_minimal()

Fitting Linear Mixed Effect Models

Random Intercept Model

  • Model 1: Random intercept only with schoolid as grouping factor
model1 <- lmer(score ~ 1 + (1|schoolid), data = mydata)
summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: score ~ 1 + (1 | schoolid)
   Data: mydata

REML criterion at convergence: 286539.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9764 -0.7011  0.1017  0.7391  3.0819 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept)  61.17    7.821  
 Residual             258.36   16.073  
Number of obs: 33988, groups:  schoolid, 508

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  30.6006     0.3698 450.7146   82.74   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fitting Linear Mixed Effect Models

Random Slope Model with cohort90

  • Model 2: Random slope model with cohort90 as random slope factor
  • This model allows the effect of cohort90 to vary by school
# Fit random slope model with cohort90
model2 <- lmer(score ~ cohort90 + (1 + cohort90|schoolid), data = mydata)
summary(model2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: score ~ cohort90 + (1 + cohort90 | schoolid)
   Data: mydata

REML criterion at convergence: 280692.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1010 -0.7203  0.0387  0.7264  3.5217 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 schoolid (Intercept)  42.9666  6.5549       
          cohort90      0.1614  0.4017  -0.39
 Residual             215.7368 14.6880       
Number of obs: 33988, groups:  schoolid, 508

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)  30.60958    0.31381 425.94021   97.54   <2e-16 ***
cohort90      1.23388    0.02535 315.70740   48.67   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
         (Intr)
cohort90 -0.266

Part 2 - Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis

What is Longitudinal Data?

  • Longitudinal data: Multiple observations per subject over time
  • Also called: panel data, repeated measures data
  • Captures individual change trajectories over time

Characteristics

  • Within-subject correlation: Observations from same person are correlated
  • Time-varying predictors: Can include variables that change over time
  • Missing data patterns: Common due to dropout or missed visits
  • Individual differences: People may have different starting points and rates of change

Introduction to Longitudinal Data Analysis

Research Questions Addressed

  • How does the outcome change over time on average?
  • Do individuals differ in their rates of change?
  • What predicts individual differences in change?
  • Do interventions affect the rate of change?

Different Approaches to Longitudinal Data Analysis

Traditional Approaches

  1. Repeated Measures ANOVA

    • Assumes complete data
    • Limited flexibility with time
    • Requires balanced designs

Different Approaches to Longitudinal Data Analysis

Traditional Approaches

  1. MANOVA (Multivariate ANOVA)

    • Treats each time point as separate variable
    • Requires complete data
    • Not suitable for many time points

Different Approaches to Longitudinal Data Analysis

Modern Robust Approaches

  1. Generalized Estimating Equations (GEE)

    • Population-averaged effects
    • Robust to misspecification of correlation structure

Different Approaches to Longitudinal Data Analysis

Modern Robust Approaches

  1. Linear Mixed Effect Models

    • Subject-specific effects
    • Flexible with missing data and unbalanced designs
    • Can model complex growth patterns

Generalized Estimating Equations (GEE)

What is GEE?

  • Population-averaged approach to longitudinal data
  • Estimates marginal effects averaged across the population
  • Focuses on mean response as function of covariates
  • Treats within-subject correlation as “nuisance”

Generalized Estimating Equations (GEE)

Key Features

Working Correlation Structure:

  • Independence: No correlation assumed
  • Exchangeable: Constant correlation across all time points
  • Autoregressive: Correlation decreases with time separation
  • Unstructured: Different correlation for each pair of time points

Generalized Estimating Equations (GEE)

R Implementation with sleepstudy Data

  • Load sleepstudy data from lme4 package
data(sleepstudy)
  • Explore the data structure

Generalized Estimating Equations (GEE)

Fit GEE model using sleepstudy data

library(geepack)

gee_model <- geeglm(Reaction ~ Days, 
                    data = sleepstudy,
                    id = Subject,
                    family = gaussian,
                    corstr = "exchangeable")
summary(gee_model)

Call:
geeglm(formula = Reaction ~ Days, family = gaussian, data = sleepstudy, 
    id = Subject, corstr = "exchangeable")

 Coefficients:
            Estimate Std.err    Wald Pr(>|W|)    
(Intercept)  251.405   6.632 1436.89  < 2e-16 ***
Days          10.467   1.502   48.55 3.22e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     2251   536.6
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.576  0.1114
Number of clusters:   18  Maximum cluster size: 10 

Interpretation of GEE Results

  • Days coefficient: Population-averaged increase in reaction time per day of sleep deprivation
  • Intercept: Population-averaged reaction time at baseline (Day 0)
  • Robust standard errors account for within-subject correlation
  • Results represent marginal (population-level) effects

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: Random Intercept Model

```{r}
# Individual starting points vary, same growth rate
model1 <- lmer(outcome ~ time + (1|subject), data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: Random Slope Model

```{r}

# Both starting points and growth rates vary
model2 <- lmer(outcome ~ time + (1 + time|subject), data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: With Treatment Effects

```{r}

# Treatment affects both intercept and slope
model3 <- lmer(outcome ~ time * treatment + (1 + time|subject),
               data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

R Code Example with sleepstudy Data

# Load required packages
library(lme4)
library(lmerTest)  # for p-values

Linear Mixed Effect Model for Longitudinal Data

Fit random intercept model

model1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ Days + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1786

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.226 -0.553  0.011  0.519  4.251 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378     37.1    
 Residual              960     31.0    
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)  251.405      9.747  22.810    25.8   <2e-16 ***
Days          10.467      0.804 161.000    13.0   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
Days -0.371

Fit random intercept and slope model

model2 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)
summary(model2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ Days + (1 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1744

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.954 -0.463  0.023  0.463  5.179 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.1    24.74        
          Days         35.1     5.92    0.07
 Residual             654.9    25.59        
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)   251.41       6.82  17.00   36.84  < 2e-16 ***
Days           10.47       1.55  17.00    6.77  3.3e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
Days -0.138

Linear Mixed Effect Model for Longitudinal Data

R Code Example with sleepstudy Data

# Model comparison
anova(model1, model2)
Data: sleepstudy
Models:
model1: Reaction ~ Days + (1 | Subject)
model2: Reaction ~ Days + (1 + Days | Subject)
       npar  AIC  BIC logLik -2*log(L) Chisq Df Pr(>Chisq)    
model1    4 1802 1815   -897      1794                        
model2    6 1764 1783   -876      1752  42.1  2    7.1e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Mixed Effect Model for Longitudinal Data

Interpretation of Linear Mixed Model Results

  • Fixed Effects: Population-average effects
    • (Intercept): Average reaction time at Day 0 across all subjects
    • Days: Average increase in reaction time per day of sleep deprivation

Linear Mixed Effect Model for Longitudinal Data

Interpretation of Linear Mixed Model Results

  • Random Effects: Individual variability
    • (Intercept): Variability in individual baseline reaction times
    • Days: Variability in individual slopes (how much each person’s reaction time increases per day)
    • Correlation: Relationship between individual baselines and slopes

Linear Mixed Effect Model for Longitudinal Data

Model Diagnostics

# Check model fit
plot(model2)

qqnorm(resid(model2))
qqline(resid(model2))

Linear Mixed Effect Model for Longitudinal Data

Model Comparison and Selection

```{r}
# Compare models using likelihood ratio tests
model_simple <- lmer(outcome ~ time + (1|subject), data = longdata)
model_complex <- lmer(outcome ~ time + treatment + time:treatment +
                      (1 + time|subject), data = longdata)

anova(model_simple, model_complex)

# Check model assumptions
library(performance)
check_model(model_complex)
```

Summary

When to Use Each Approach

GEE:

  • Interest in population-averaged effects
  • Robust inference about mean trends
  • Less concern about individual predictions

Summary

When to Use Each Approach

Linear Mixed Models:

  • Interest in individual trajectories
  • Need to predict individual outcomes
  • Want to model complex growth patterns
  • Handle irregular measurement occasions

Summary

When to Use Each Approach

Best Practices

  1. Explore data graphically before modeling
  2. Start with simple models and build complexity
  3. Check model assumptions and fit
  4. Compare multiple models using information criteria
  5. Interpret results in context of research questions

Thank You

Questions and Answer

Resources:

  • Pinheiro & Bates (2000). Mixed-Effects Models in S and S-PLUS
  • Brown (2021). An Introduction to Linear Mixed-Effects Modeling in R
  • Twisk (2013). Applied Longitudinal Data Analysis for Epidemiology