Multilevel Data Analysis and Longitudinal Data Analysis

2025-09-30

Prerequisites

Required Packages

Let’s start by loading the necessary packages for our analysis:

# Load required packages
library(lme4)      # For linear mixed-effects models
library(tidyverse) # For data manipulation and visualization
library(haven)     # For reading Stata files

Additional Useful Packages

# Additional packages for analysis
library(lmerTest)  # Adds p-values to lme4 output
library(geepack)   # For GEE models
library(sjPlot)    # For model visualization
library(performance) # For model diagnostics

Part 1 - Linear Mixed Effect Models

Introduction to Mixed Effect Models and Linear Effect Model

What are Mixed Effect Models?

Mixed-effects models incorporate both fixed effects and random effects
Fixed effects: Parameters associated with entire population or repeatable levels of experimental factors

What are Mixed Effect Models?

Random effects: Associated with individual experimental units drawn at random from a population
Also known as hierarchical linear models, multilevel models, or random effects models

Linear Mixed Effect Models

Extension of linear regression that accounts for:
- Non-independence of observations
- Hierarchical data structures
- Repeated measures data
- Individual variability across participants/items

Motivations for Linear Mixed Effect Models

Why Use Mixed Effects Models?

Account for clustering in data

Observations within groups are more similar than across groups
Violates independence assumption of standard regression

Handle missing data better than ANOVA

Uses all available data rather than listwise deletion
Precision-weighted parameter estimates

Motivations for Linear Mixed Effect Models

Why Use Mixed Effects Models?

Model both participant and item variability simultaneously

Accounts for individual differences
Generalizable across both subjects and stimuli

Flexible with unbalanced designs

No requirement for equal group sizes
Handles irregular measurement occasions

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

Multilevel/Hierarchical Data

Students nested within schools
Patients within hospitals
Employees within companies

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

Longitudinal/Repeated Measures Data

Multiple measurements per participant over time
Growth curve analysis
Panel studies

Data Suitable for Linear Mixed Effect Models

Types of Data Structures

Crossed Random Effects

Participants respond to multiple items
Multiple raters evaluate multiple subjects

Data Suitable for Linear Mixed Effect Models

Data Format Requirements

Long format: Each row = one observation
Variables: Subject ID, Item ID, dependent variable, predictors

Assumptions for Linear Mixed Effect Models

Key Assumptions

Linearity

Linear relationship between predictors and outcome

Independence of Residuals

After accounting for random effects structure

Assumptions for Linear Mixed Effect Models

Key Assumptions

Normality of Random Effects

Random intercepts and slopes normally distributed
Mean = 0, variance estimated by model

Homoscedasticity

Constant variance of residuals

Assumptions for Linear Mixed Effect Models

Key Assumptions

No Perfect Multicollinearity

Among fixed effects predictors

Less Restrictive Than Traditional ANOVA

More robust to violations of normality assumption
Handles unbalanced designs naturally

R Functions and Packages for Linear Mixed Effect Models

Primary Packages

lme4 Package (most common)

```{r}
library(lme4)

# Basic syntax for linear mixed models
lmer(outcome ~ fixed_effects + (random_effects|grouping_factor), 
     data = dataset)

# Example with random intercepts and slopes
lmer(RT ~ condition + (1 + condition|participant) + (1|item), 
     data = mydata)
```

R Functions and Packages for Linear Mixed Effect Models

Primary Packages

nlme Package (alternative with more covariance structures)

```{r}
#| echo: true
#| eval: false

library(nlme)
lme(outcome ~ fixed_effects, random = ~ random_effects|group, 
    data = dataset)
```

R Functions and Packages for Linear Mixed Effect Models

Additional Useful Packages

lmerTest: Adds p-values to lme4 output
afex: Simplified ANOVA-type analysis with mixed models
sjPlot: Easy visualization of mixed model results
performance: Model diagnostics and assumption checking

Loading and Exploring Our Dataset

Let’s read the Stata dataset and explore its structure:

# Read Stata data file
mydata <- read_dta("5.1.dta")

# First few rows
head(mydata)

# A tibble: 6 × 9
  caseid schoolid score cohort90 female sclass schtype schurban schdenom
   <dbl>    <dbl> <dbl>    <dbl>  <dbl>  <dbl>   <dbl>    <dbl>    <dbl>
1     18        1     0       -6      1      2       0        1        0
2     17        1    10       -6      1      2       0        1        0
3     19        1     0       -6      1      4       0        1        0
4     20        1    40       -6      1      3       0        1        0
5     21        1    42       -6      1      2       0        1        0
6     13        1     4       -6      1      2       0        1        0

Practical Demonstration with 5.1.dta Dataset

Data Exploration

Loading and Exploring Our Dataset

# Summary of key variables
summary(mydata$score)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   19.00   33.00   31.09   45.00   75.00

Loading and Exploring Our Dataset

# Basic visualization - score by schoolid
mydata %>%
  ggplot(aes(x = as.factor(schoolid), y = score)) +
  geom_boxplot() +
  labs(title = "Distribution of Score by School ID",
       x = "School ID", y = "Score") +
  theme_minimal()

Fitting Linear Mixed Effect Models

Random Intercept Model

Model 1: Random intercept only with schoolid as grouping factor

model1 <- lmer(score ~ 1 + (1|schoolid), data = mydata)

summary(model1)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: score ~ 1 + (1 | schoolid)
   Data: mydata

REML criterion at convergence: 286539.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9764 -0.7011  0.1017  0.7391  3.0819 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept)  61.17    7.821  
 Residual             258.36   16.073  
Number of obs: 33988, groups:  schoolid, 508

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  30.6006     0.3698 450.7146   82.74   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fitting Linear Mixed Effect Models

Random Slope Model with cohort90

Model 2: Random slope model with cohort90 as random slope factor
This model allows the effect of cohort90 to vary by school

# Fit random slope model with cohort90
model2 <- lmer(score ~ cohort90 + (1 + cohort90|schoolid), data = mydata)

summary(model2)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: score ~ cohort90 + (1 + cohort90 | schoolid)
   Data: mydata

REML criterion at convergence: 280692.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1010 -0.7203  0.0387  0.7264  3.5217 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 schoolid (Intercept)  42.9666  6.5549       
          cohort90      0.1614  0.4017  -0.39
 Residual             215.7368 14.6880       
Number of obs: 33988, groups:  schoolid, 508

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)  30.60958    0.31381 425.94021   97.54   <2e-16 ***
cohort90      1.23388    0.02535 315.70740   48.67   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
         (Intr)
cohort90 -0.266

Part 2 - Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis

What is Longitudinal Data?

Longitudinal data: Multiple observations per subject over time
Also called: panel data, repeated measures data
Captures individual change trajectories over time

Characteristics

Within-subject correlation: Observations from same person are correlated
Time-varying predictors: Can include variables that change over time
Missing data patterns: Common due to dropout or missed visits
Individual differences: People may have different starting points and rates of change

Introduction to Longitudinal Data Analysis

Research Questions Addressed

How does the outcome change over time on average?
Do individuals differ in their rates of change?
What predicts individual differences in change?
Do interventions affect the rate of change?

Different Approaches to Longitudinal Data Analysis

Traditional Approaches

Repeated Measures ANOVA
- Assumes complete data
- Limited flexibility with time
- Requires balanced designs

Different Approaches to Longitudinal Data Analysis

Traditional Approaches

MANOVA (Multivariate ANOVA)
- Treats each time point as separate variable
- Requires complete data
- Not suitable for many time points

Different Approaches to Longitudinal Data Analysis

Modern Robust Approaches

Generalized Estimating Equations (GEE)
- Population-averaged effects
- Robust to misspecification of correlation structure

Different Approaches to Longitudinal Data Analysis

Modern Robust Approaches

Linear Mixed Effect Models
- Subject-specific effects
- Flexible with missing data and unbalanced designs
- Can model complex growth patterns

Generalized Estimating Equations (GEE)

What is GEE?

Population-averaged approach to longitudinal data
Estimates marginal effects averaged across the population
Focuses on mean response as function of covariates
Treats within-subject correlation as “nuisance”

Generalized Estimating Equations (GEE)

Key Features

Working Correlation Structure:

Independence: No correlation assumed
Exchangeable: Constant correlation across all time points
Autoregressive: Correlation decreases with time separation
Unstructured: Different correlation for each pair of time points

Generalized Estimating Equations (GEE)

R Implementation with sleepstudy Data

Load sleepstudy data from lme4 package

data(sleepstudy)

Explore the data structure

Generalized Estimating Equations (GEE)

Fit GEE model using sleepstudy data

library(geepack)

gee_model <- geeglm(Reaction ~ Days, 
                    data = sleepstudy,
                    id = Subject,
                    family = gaussian,
                    corstr = "exchangeable")

summary(gee_model)


Call:
geeglm(formula = Reaction ~ Days, family = gaussian, data = sleepstudy, 
    id = Subject, corstr = "exchangeable")

 Coefficients:
            Estimate Std.err    Wald Pr(>|W|)    
(Intercept)  251.405   6.632 1436.89  < 2e-16 ***
Days          10.467   1.502   48.55 3.22e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     2251   536.6
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.576  0.1114
Number of clusters:   18  Maximum cluster size: 10

Interpretation of GEE Results

Days coefficient: Population-averaged increase in reaction time per day of sleep deprivation
Intercept: Population-averaged reaction time at baseline (Day 0)
Robust standard errors account for within-subject correlation
Results represent marginal (population-level) effects

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: Random Intercept Model

```{r}
# Individual starting points vary, same growth rate
model1 <- lmer(outcome ~ time + (1|subject), data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: Random Slope Model

```{r}

# Both starting points and growth rates vary
model2 <- lmer(outcome ~ time + (1 + time|subject), data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

Growth Curve Models: With Treatment Effects

```{r}

# Treatment affects both intercept and slope
model3 <- lmer(outcome ~ time * treatment + (1 + time|subject),
               data = longdata)
```

Linear Mixed Effect Model for Longitudinal Data

R Code Example with sleepstudy Data

# Load required packages
library(lme4)
library(lmerTest)  # for p-values

Linear Mixed Effect Model for Longitudinal Data

Fit random intercept model

model1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)

summary(model1)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ Days + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1786

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.226 -0.553  0.011  0.519  4.251 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378     37.1    
 Residual              960     31.0    
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)  251.405      9.747  22.810    25.8   <2e-16 ***
Days          10.467      0.804 161.000    13.0   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
Days -0.371

Fit random intercept and slope model

model2 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)

summary(model2)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ Days + (1 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1744

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.954 -0.463  0.023  0.463  5.179 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.1    24.74        
          Days         35.1     5.92    0.07
 Residual             654.9    25.59        
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)   251.41       6.82  17.00   36.84  < 2e-16 ***
Days           10.47       1.55  17.00    6.77  3.3e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
Days -0.138

Linear Mixed Effect Model for Longitudinal Data

R Code Example with sleepstudy Data

# Model comparison
anova(model1, model2)

Data: sleepstudy
Models:
model1: Reaction ~ Days + (1 | Subject)
model2: Reaction ~ Days + (1 + Days | Subject)
       npar  AIC  BIC logLik -2*log(L) Chisq Df Pr(>Chisq)    
model1    4 1802 1815   -897      1794                        
model2    6 1764 1783   -876      1752  42.1  2    7.1e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Mixed Effect Model for Longitudinal Data

Interpretation of Linear Mixed Model Results

Fixed Effects: Population-average effects
- (Intercept): Average reaction time at Day 0 across all subjects
- Days: Average increase in reaction time per day of sleep deprivation

Linear Mixed Effect Model for Longitudinal Data

Interpretation of Linear Mixed Model Results

Random Effects: Individual variability
- (Intercept): Variability in individual baseline reaction times
- Days: Variability in individual slopes (how much each person’s reaction time increases per day)
- Correlation: Relationship between individual baselines and slopes

Linear Mixed Effect Model for Longitudinal Data

Model Diagnostics

# Check model fit
plot(model2)

qqnorm(resid(model2))
qqline(resid(model2))

Linear Mixed Effect Model for Longitudinal Data

Model Comparison and Selection

```{r}
# Compare models using likelihood ratio tests
model_simple <- lmer(outcome ~ time + (1|subject), data = longdata)
model_complex <- lmer(outcome ~ time + treatment + time:treatment +
                      (1 + time|subject), data = longdata)

anova(model_simple, model_complex)

# Check model assumptions
library(performance)
check_model(model_complex)
```

Summary

When to Use Each Approach

GEE:

Interest in population-averaged effects
Robust inference about mean trends
Less concern about individual predictions

Summary

When to Use Each Approach

Linear Mixed Models:

Interest in individual trajectories
Need to predict individual outcomes
Want to model complex growth patterns
Handle irregular measurement occasions

Summary

When to Use Each Approach

Best Practices

Explore data graphically before modeling
Start with simple models and build complexity
Check model assumptions and fit
Compare multiple models using information criteria
Interpret results in context of research questions

Thank You

Questions and Answer

Resources:

Pinheiro & Bates (2000). Mixed-Effects Models in S and S-PLUS
Brown (2021). An Introduction to Linear Mixed-Effects Modeling in R
Twisk (2013). Applied Longitudinal Data Analysis for Epidemiology