Non-Causal Association in Epidemiological Studies

Confounding and DAG, GDT 508: Public Health Epidemiology

Dr. Kamarul Imran Musa

Professor in Epidemiology and Statistics, Universiti Sains Malaysia

Learning Objectives

Understand ‘Confounding’ in epidemiological studies
Identify ‘Confounding’ using Directed Acyclic Graphs (DAG)
Control ‘Confounding’ in epidemiological studies
Distinguish confounding from mediation and suppression

Introduction to Confounding

Confounding refers to a situation where a noncausal association between an exposure and outcome is observed due to the influence of a third variable (confounder).

Key Point: If our goal is primary prevention, distinguishing causal from noncausal associations is crucial.

More common in observational studies
Less common in experimental studies (randomization helps!)

Why Does Confounding Matter?

Consider studying the effect of chronic kidney disease (CKD) on mortality:

CKD patients are older on average
Older people have higher mortality
Age affects both CKD status and mortality
Without adjustment, we mix the effect of CKD with the effect of age

Result: Confounding by age distorts the true causal effect of CKD on mortality

The Counterfactual Model

Fundamental question: Would the outcome in exposed individuals differ if they had not been exposed?

The unexposed group is “counterfactual” (unobservable)
We use a separate unexposed group for comparison
Confounding exists when this comparison group is not truly comparable

Traditional Definition of a Confounder

A confounder must satisfy three criteria:

Associated with the outcome (risk factor for outcome)
Associated with the exposure
NOT in the causal pathway between exposure and outcome (not an intermediate variable)

Visual Representation

The confounder is associated with both exposure and outcome, creating a backdoor path

Example 1: Age, CKD, and Mortality

Research Question: Does CKD increase mortality risk?

Age qualifies as a confounder because:

Age → CKD (older people more likely to have CKD)
Age → Mortality (older people have higher mortality)
Age is not caused by CKD

Example 2: Gender, Outdoor Work, and Malaria

Gender	Cases	Controls	Odds Ratio
Males	88	68	1.71
Females	62	82

But is this the true effect of gender on malaria?

Stratifying by Work Environment

Outdoor workers: - Males: 53 cases, 15 controls (OR = 1.06)

Indoor workers: - Males: 35 cases, 53 controls (OR = 1.00)

Conclusion: Gender effect disappears when we account for outdoor occupation. Gender was confounded by work environment!

Exceptions to the Definition

Random association: Sometimes sampling variability creates imbalance in case-control studies
Surrogate confounders: Variables may represent complex constructs
- Education as proxy for socioeconomic status
- Gender as marker for behaviors/exposures

Directed Acyclic Graphs (DAGs)

DAGs provide a structured visual framework for:

Mapping causal assumptions explicitly
Identifying confounding pathways
Determining adjustment strategies

Key features: - Directed: arrows show causal direction - Acyclic: no closed loops (future cannot cause past)

DAG Terminology

Node/Variable: Each box or circle in the DAG
Arrow/Edge: Represents direct causal effect
Path: Sequence of arrows connecting variables
Directed path: All arrows point same direction
Backdoor path: Path from exposure to outcome starting with arrow pointing TO exposure

Example DAG: Aspirin and CHD

Consider aspirin reducing coronary heart disease (CHD) through decreased platelet aggregation:

Mediation model: - Aspirin → Platelet aggregation - Platelet aggregation → CHD

With genetic confounder: - Genetic variant → Platelet aggregation - Platelet aggregation → CHD - Creates collider at platelet aggregation!

Confounding vs. Mediation

Confounding:

Third variable explains spurious association
Not necessarily causal
Remove to get unbiased estimate

Mediation:

Third variable in causal pathway
X → M → Y
Part of the causal mechanism

Example: Obesity, Ethnicity, and Kidney Function

Research question: Does obesity cause decline in kidney function?

Known facts: - African Americans have higher obesity rates - African Americans progress faster to kidney disease

DAG for Obesity and Kidney Function

Ethnicity is a confounder:

Ethnicity → Obesity (associated with exposure)
Ethnicity → Kidney decline (risk factor for outcome)
Ethnicity NOT caused by obesity

Action: Adjust for ethnicity to remove confounding

Residual Confounding

Even with measurement and adjustment, confounding may remain:

Unmeasured confounders: Variables not collected
Measurement error: Imperfect measurement of confounders
Self-reported data: Race/ethnicity may not capture true biological background

Result: Adjustments only partially successful

Example 3: Lead, PKD, and GFR

Question: Does lead poisoning cause polycystic kidney disease (PKD)?

DAG shows: - Lead poisoning → GFR (affects kidney function) - PKD → GFR (affects kidney function) - GFR is a common effect (collider)

Critical insight: GFR is NOT a confounder; it’s a collider!

Colliders in DAGs

Collider: A variable caused by two other variables (arrows point TO it)

Conditioning on a collider: - Opens a path between its causes - Can introduce bias (collider-stratification bias) - Should generally NOT be adjusted for

Lead, PKD, GFR: The Problem

If we restrict study to patients with low GFR:

Creates inverse association between lead and PKD
Not a true causal relationship
Result of selection bias (conditioning on collider)

Lesson: Understanding causal structure prevents analytical errors

Building DAGs: The ESC-DAG Approach

Evidence Synthesis for Constructing DAGs involves:

Mapping: Identify variables from literature
Translation: Convert to DAG format
Integration 1 - Synthesis: Combine evidence
Integration 2 - Recombination: Finalize causal structure

Using DAGitty

DAGitty is a browser-based tool for:

Creating causal diagrams
Analyzing DAGs
Identifying minimal adjustment sets
Available at: dagitty.net

Also available: R package ‘dagitty’ on CRAN

Assessing Confounding Presence

Three key questions:

Is the confounder related to both exposure and outcome?
Does the crude association differ from stratified associations?
Does the crude association differ from the adjusted association?

Comparing Crude vs. Adjusted Estimates

The presence of confounding is assessed by comparing:

τ: crude (unadjusted) effect
τ’: adjusted effect (controlling for confounder)

Confounding present when: τ ≠ τ’

Measure: Can use percent excess risk explained

Percent Excess Risk Explained

\[\text{% Excess Risk Explained} = \frac{RR_U - RR_A}{RR_U - 1.0} \times 100\]

Where: - RR_U = Unadjusted relative risk - RR_A = Adjusted relative risk

Tells us what proportion of association is due to confounding

COPD Treatment Example

Study of long-term oxygen therapy in COPD patients:

Treatment	Crude HR	Adjusted HR	% Explained
Oxygen therapy	2.36	1.38	72%

Interpretation: Disease severity markers explained 72% of the paradoxical adverse effect

Mediation, Confounding, and Suppression

All three involve a third variable affecting X-Y relationship:

Effect	τ-τ’	τ’	Interpretation
Mediation/Confounding	+	+	Consistent direction
Suppression	+	-	Opposite signs

Suppression Effect

Suppression occurs when including a third variable increases the association magnitude

Example: Intelligence and assembly line errors - Direct effect: More intelligence → fewer errors (negative) - Indirect effect: Intelligence → boredom → more errors (positive) - Effects may cancel out!

Types of Confounding

Negative confounding: Underestimates true effect
Positive confounding: Overestimates true effect
Qualitative confounding: Reverses direction of association

Qualitative Confounding Example

US vs. Venezuela mortality rates:

Crude rate ratio: 8.7/4.4 = 1.98
Age-adjusted ratio: 3.6/4.6 = 0.78

Direction reversed! Due to striking age distribution differences

Controlling Confounding: Restriction

Restriction: Limit study to specific values of confounder

Example: Study only males to eliminate gender confounding
Advantages: Simple, complete control
Disadvantages: Limits generalizability, reduces sample size

Controlling Confounding: Stratification

Stratification: Analyze exposure-outcome relationship within strata of confounder

When confounding present: - Stratified estimates similar to each other - Both differ from crude estimate

Mantel-Haenszel Method

Combines stratum-specific estimates into single adjusted measure:

\[RR_{MH} = \frac{\sum_{i} a_i N_{0i}/N_i}{\sum_{i} c_i N_{1i}/N_i}\]

Provides: Single summary estimate adjusted for confounding

Multivariable Regression

Multiple regression adjusts for confounders simultaneously:

\[Y = \beta_0 + \beta_1 X + \beta_2 C_1 + \beta_3 C_2 + ... + \epsilon\]

Advantages: - Handles continuous confounders - Adjusts for multiple confounders - Provides adjusted effect estimates

Alternative Methods: Instrumental Variables

Instrumental variable (IV): A variable that:

Affects exposure
Does NOT directly affect outcome
Not associated with confounders

Use: When confounding cannot be measured/controlled Example: Mendelian randomization using genetic variants

Alternative Methods: Propensity Scores

Propensity score: Probability of receiving exposure given covariates

Model probability of exposure
Use scores for matching, stratification, or weighting
Balance confounders between exposed/unexposed

Incomplete Adjustment

Residual confounding remains when adjustment is incomplete due to:

Unmeasured confounders
Measurement error in confounders
Misspecification of confounder-outcome relationship
Categorizing continuous confounders

Overadjustment

Overadjustment bias occurs when adjusting for:

Variables in causal pathway (mediators)
Colliders
Descendants of outcome

Result: Can introduce bias or mask true effects

Statistical Significance ≠ Confounding

Warning

Important: Do NOT rely solely on p-values to identify confounding

Confounding about effect size change, not statistical significance
Consider using lenient p-value cutoffs (e.g., p < 0.20) if testing
Better: Assess based on a priori knowledge and effect magnitude

Primary vs. Secondary Prevention

Goal	Evidence Needed
Primary prevention	Causal association required; confounding must be removed
Secondary prevention	Association sufficient (may be confounded); prediction focus

DAGs: Advantages

Make causal assumptions explicit
Identify all confounding pathways
Determine minimal sufficient adjustment sets
Prevent collider-stratification bias
Facilitate communication among researchers

DAGs: Limitations

Require strong prior knowledge
Cannot be validated by data alone
Assumptions may be wrong
Different researchers may draw different DAGs

Solution: Transparency and sensitivity analyses

Practical Steps for Analysis

Draw your DAG based on prior knowledge
Identify all backdoor paths
Determine minimal adjustment set
Check for colliders (do NOT adjust!)
Perform stratified and adjusted analyses
Calculate percent change in estimate
Report both crude and adjusted estimates

Common Mistakes to Avoid

Adjusting for variables on causal pathway
Conditioning on colliders
Over-relying on statistical significance
Ignoring unmeasured confounding
Not considering residual confounding
Assuming all third variables are confounders

Key Takeaways

Confounding obscures true causal effects
DAGs help identify and control confounding
Colliders should NOT be adjusted for
Mediation ≠ Confounding (conceptually different)
Multiple methods available for control
Residual confounding often remains
Transparency about assumptions is critical

Recommendations for Research

Draw DAGs before data analysis
Consider multiple possible causal structures
Report crude AND adjusted estimates
Acknowledge potential unmeasured confounding
Use sensitivity analyses
Consider alternative explanations

Resources

Main readings: - Szklo & Nieto: Epidemiology Beyond the Basics (Chapter 5) - Ferguson et al. (2020): ESC-DAGs method

Online tools: - DAGitty: dagitty.net - R packages: dagitty, ggdag

Additional: - Hernán & Robins: Causal Inference book (free online)

Summary

Confounding is a fundamental challenge in observational epidemiology that requires:

Clear thinking about causal relationships
Explicit assumptions via DAGs
Appropriate methods for control
Transparency about limitations

Remember: The goal is valid causal inference for effective prevention!

Questions?

Contact information and additional resources available through course materials

Thank you for your attention!