Non-Causal Association in Epidemiological Studies

Confounding and DAG, GDT 508: Public Health Epidemiology

Dr. Kamarul Imran Musa

Professor in Epidemiology and Statistics, Universiti Sains Malaysia

Learning Objectives

  1. Understand ‘Confounding’ in epidemiological studies
  2. Identify ‘Confounding’ using Directed Acyclic Graphs (DAG)
  3. Control ‘Confounding’ in epidemiological studies
  4. Distinguish confounding from mediation and suppression

Introduction to Confounding

Confounding refers to a situation where a noncausal association between an exposure and outcome is observed due to the influence of a third variable (confounder).

Key Point: If our goal is primary prevention, distinguishing causal from noncausal associations is crucial.

  • More common in observational studies
  • Less common in experimental studies (randomization helps!)

Why Does Confounding Matter?

Consider studying the effect of chronic kidney disease (CKD) on mortality:

  • CKD patients are older on average
  • Older people have higher mortality
  • Age affects both CKD status and mortality
  • Without adjustment, we mix the effect of CKD with the effect of age

Result: Confounding by age distorts the true causal effect of CKD on mortality

The Counterfactual Model

Fundamental question: Would the outcome in exposed individuals differ if they had not been exposed?

  • The unexposed group is “counterfactual” (unobservable)
  • We use a separate unexposed group for comparison
  • Confounding exists when this comparison group is not truly comparable

Traditional Definition of a Confounder

A confounder must satisfy three criteria:

  1. Associated with the outcome (risk factor for outcome)
  2. Associated with the exposure
  3. NOT in the causal pathway between exposure and outcome (not an intermediate variable)

Visual Representation

The confounder is associated with both exposure and outcome, creating a backdoor path

Example 1: Age, CKD, and Mortality

Research Question: Does CKD increase mortality risk?

Age qualifies as a confounder because:

  • Age → CKD (older people more likely to have CKD)
  • Age → Mortality (older people have higher mortality)
  • Age is not caused by CKD

Example 2: Gender, Outdoor Work, and Malaria

Gender Cases Controls Odds Ratio
Males 88 68 1.71
Females 62 82

But is this the true effect of gender on malaria?

Stratifying by Work Environment

Outdoor workers: - Males: 53 cases, 15 controls (OR = 1.06)

Indoor workers: - Males: 35 cases, 53 controls (OR = 1.00)

Conclusion: Gender effect disappears when we account for outdoor occupation. Gender was confounded by work environment!

Exceptions to the Definition

  1. Random association: Sometimes sampling variability creates imbalance in case-control studies

  2. Surrogate confounders: Variables may represent complex constructs

    • Education as proxy for socioeconomic status
    • Gender as marker for behaviors/exposures

Directed Acyclic Graphs (DAGs)

DAGs provide a structured visual framework for:

  • Mapping causal assumptions explicitly
  • Identifying confounding pathways
  • Determining adjustment strategies

Key features: - Directed: arrows show causal direction - Acyclic: no closed loops (future cannot cause past)

DAG Terminology

  • Node/Variable: Each box or circle in the DAG
  • Arrow/Edge: Represents direct causal effect
  • Path: Sequence of arrows connecting variables
  • Directed path: All arrows point same direction
  • Backdoor path: Path from exposure to outcome starting with arrow pointing TO exposure

Example DAG: Aspirin and CHD

Consider aspirin reducing coronary heart disease (CHD) through decreased platelet aggregation:

Mediation model: - Aspirin → Platelet aggregation - Platelet aggregation → CHD

With genetic confounder: - Genetic variant → Platelet aggregation - Platelet aggregation → CHD - Creates collider at platelet aggregation!

Confounding vs. Mediation

Confounding:

  • Third variable explains spurious association
  • Not necessarily causal
  • Remove to get unbiased estimate

Mediation:

  • Third variable in causal pathway
  • X → M → Y
  • Part of the causal mechanism

Example: Obesity, Ethnicity, and Kidney Function

Research question: Does obesity cause decline in kidney function?

Known facts: - African Americans have higher obesity rates - African Americans progress faster to kidney disease

DAG for Obesity and Kidney Function

Ethnicity is a confounder:

  • Ethnicity → Obesity (associated with exposure)
  • Ethnicity → Kidney decline (risk factor for outcome)
  • Ethnicity NOT caused by obesity

Action: Adjust for ethnicity to remove confounding

Residual Confounding

Even with measurement and adjustment, confounding may remain:

  1. Unmeasured confounders: Variables not collected
  2. Measurement error: Imperfect measurement of confounders
  3. Self-reported data: Race/ethnicity may not capture true biological background

Result: Adjustments only partially successful

Example 3: Lead, PKD, and GFR

Question: Does lead poisoning cause polycystic kidney disease (PKD)?

DAG shows: - Lead poisoning → GFR (affects kidney function) - PKD → GFR (affects kidney function) - GFR is a common effect (collider)

Critical insight: GFR is NOT a confounder; it’s a collider!

Colliders in DAGs

Collider: A variable caused by two other variables (arrows point TO it)

Conditioning on a collider: - Opens a path between its causes - Can introduce bias (collider-stratification bias) - Should generally NOT be adjusted for

Lead, PKD, GFR: The Problem

If we restrict study to patients with low GFR:

  • Creates inverse association between lead and PKD
  • Not a true causal relationship
  • Result of selection bias (conditioning on collider)

Lesson: Understanding causal structure prevents analytical errors

Building DAGs: The ESC-DAG Approach

Evidence Synthesis for Constructing DAGs involves:

  1. Mapping: Identify variables from literature
  2. Translation: Convert to DAG format
  3. Integration 1 - Synthesis: Combine evidence
  4. Integration 2 - Recombination: Finalize causal structure

Using DAGitty

DAGitty is a browser-based tool for:

  • Creating causal diagrams
  • Analyzing DAGs
  • Identifying minimal adjustment sets
  • Available at: dagitty.net

Also available: R package ‘dagitty’ on CRAN

Assessing Confounding Presence

Three key questions:

  1. Is the confounder related to both exposure and outcome?
  2. Does the crude association differ from stratified associations?
  3. Does the crude association differ from the adjusted association?

Comparing Crude vs. Adjusted Estimates

The presence of confounding is assessed by comparing:

  • τ: crude (unadjusted) effect
  • τ’: adjusted effect (controlling for confounder)

Confounding present when: τ ≠ τ’

Measure: Can use percent excess risk explained

Percent Excess Risk Explained

\[\text{% Excess Risk Explained} = \frac{RR_U - RR_A}{RR_U - 1.0} \times 100\]

Where: - RRU = Unadjusted relative risk - RRA = Adjusted relative risk

Tells us what proportion of association is due to confounding

COPD Treatment Example

Study of long-term oxygen therapy in COPD patients:

Treatment Crude HR Adjusted HR % Explained
Oxygen therapy 2.36 1.38 72%

Interpretation: Disease severity markers explained 72% of the paradoxical adverse effect

Mediation, Confounding, and Suppression

All three involve a third variable affecting X-Y relationship:

Effect τ-τ’ τ’ Interpretation
Mediation/Confounding + + Consistent direction
Suppression + - Opposite signs

Suppression Effect

Suppression occurs when including a third variable increases the association magnitude

Example: Intelligence and assembly line errors - Direct effect: More intelligence → fewer errors (negative) - Indirect effect: Intelligence → boredom → more errors (positive) - Effects may cancel out!

Types of Confounding

  1. Negative confounding: Underestimates true effect
  2. Positive confounding: Overestimates true effect
  3. Qualitative confounding: Reverses direction of association

Qualitative Confounding Example

US vs. Venezuela mortality rates:

  • Crude rate ratio: 8.7/4.4 = 1.98
  • Age-adjusted ratio: 3.6/4.6 = 0.78

Direction reversed! Due to striking age distribution differences

Controlling Confounding: Restriction

Restriction: Limit study to specific values of confounder

  • Example: Study only males to eliminate gender confounding
  • Advantages: Simple, complete control
  • Disadvantages: Limits generalizability, reduces sample size

Controlling Confounding: Stratification

Stratification: Analyze exposure-outcome relationship within strata of confounder

When confounding present: - Stratified estimates similar to each other - Both differ from crude estimate

Mantel-Haenszel Method

Combines stratum-specific estimates into single adjusted measure:

\[RR_{MH} = \frac{\sum_{i} a_i N_{0i}/N_i}{\sum_{i} c_i N_{1i}/N_i}\]

Provides: Single summary estimate adjusted for confounding

Multivariable Regression

Multiple regression adjusts for confounders simultaneously:

\[Y = \beta_0 + \beta_1 X + \beta_2 C_1 + \beta_3 C_2 + ... + \epsilon\]

Advantages: - Handles continuous confounders - Adjusts for multiple confounders - Provides adjusted effect estimates

Alternative Methods: Instrumental Variables

Instrumental variable (IV): A variable that:

  • Affects exposure
  • Does NOT directly affect outcome
  • Not associated with confounders

Use: When confounding cannot be measured/controlled Example: Mendelian randomization using genetic variants

Alternative Methods: Propensity Scores

Propensity score: Probability of receiving exposure given covariates

  1. Model probability of exposure
  2. Use scores for matching, stratification, or weighting
  3. Balance confounders between exposed/unexposed

Incomplete Adjustment

Residual confounding remains when adjustment is incomplete due to:

  • Unmeasured confounders
  • Measurement error in confounders
  • Misspecification of confounder-outcome relationship
  • Categorizing continuous confounders

Overadjustment

Overadjustment bias occurs when adjusting for:

  • Variables in causal pathway (mediators)
  • Colliders
  • Descendants of outcome

Result: Can introduce bias or mask true effects

Statistical Significance ≠ Confounding

Warning

Important: Do NOT rely solely on p-values to identify confounding

  • Confounding about effect size change, not statistical significance
  • Consider using lenient p-value cutoffs (e.g., p < 0.20) if testing
  • Better: Assess based on a priori knowledge and effect magnitude

Primary vs. Secondary Prevention

Goal Evidence Needed
Primary prevention Causal association required; confounding must be removed
Secondary prevention Association sufficient (may be confounded); prediction focus

DAGs: Advantages

  1. Make causal assumptions explicit
  2. Identify all confounding pathways
  3. Determine minimal sufficient adjustment sets
  4. Prevent collider-stratification bias
  5. Facilitate communication among researchers

DAGs: Limitations

  1. Require strong prior knowledge
  2. Cannot be validated by data alone
  3. Assumptions may be wrong
  4. Different researchers may draw different DAGs

Solution: Transparency and sensitivity analyses

Practical Steps for Analysis

  1. Draw your DAG based on prior knowledge
  2. Identify all backdoor paths
  3. Determine minimal adjustment set
  4. Check for colliders (do NOT adjust!)
  5. Perform stratified and adjusted analyses
  6. Calculate percent change in estimate
  7. Report both crude and adjusted estimates

Common Mistakes to Avoid

  • Adjusting for variables on causal pathway
  • Conditioning on colliders
  • Over-relying on statistical significance
  • Ignoring unmeasured confounding
  • Not considering residual confounding
  • Assuming all third variables are confounders

Key Takeaways

  1. Confounding obscures true causal effects
  2. DAGs help identify and control confounding
  3. Colliders should NOT be adjusted for
  4. Mediation ≠ Confounding (conceptually different)
  5. Multiple methods available for control
  6. Residual confounding often remains
  7. Transparency about assumptions is critical

Recommendations for Research

  • Draw DAGs before data analysis
  • Consider multiple possible causal structures
  • Report crude AND adjusted estimates
  • Acknowledge potential unmeasured confounding
  • Use sensitivity analyses
  • Consider alternative explanations

Resources

Main readings: - Szklo & Nieto: Epidemiology Beyond the Basics (Chapter 5) - Ferguson et al. (2020): ESC-DAGs method

Online tools: - DAGitty: dagitty.net - R packages: dagitty, ggdag

Additional: - Hernán & Robins: Causal Inference book (free online)

Summary

Confounding is a fundamental challenge in observational epidemiology that requires:

  • Clear thinking about causal relationships
  • Explicit assumptions via DAGs
  • Appropriate methods for control
  • Transparency about limitations

Remember: The goal is valid causal inference for effective prevention!

Questions?

Contact information and additional resources available through course materials

Thank you for your attention!