Understanding Bias in Epidemiological Studies

GDT 508: Public Health Epidemiology

Dr. Kamarul Imran Musa

Professor in Epidemiology and Statistics, Universiti Sains Malaysia

Learning Objectives

By the end of this session, you should be able to:

  1. Understand the concept of lack of validity or bias in epidemiological studies
  2. Identify different types of bias in epidemiological research
  3. Prevent and control bias in study design and analysis

What is Bias?

Definition:

Bias is the result of a systematic error in the design or conduct of a study

  • Results from flaws in:
    • Selection of study participants
    • Procedures for gathering exposure/disease information
  • Observed results tend to differ from true results

Key Concept:

When there is lack of validity, there is bias

Important

Bias relates to the process (design and procedures), not the results of any particular study

Bias vs Random Error

Bias (Systematic Error)

  • Consistent deviation from truth
  • Affects internal validity
  • Cannot be reduced by increasing sample size
  • Must be prevented by design

Random Error

  • Fluctuation around truth
  • Affects precision
  • Reduced by larger samples
  • Addressed by statistics

Effect of Bias

Bias can move estimates in different directions:

  1. Toward the null (negative bias)
    • Estimates closer to null value (OR closer to 1)
    • Underestimates true association
  2. Away from the null (positive bias)
    • Estimates further from null value
    • Overestimates true association
  3. Switch-over bias (extreme case)
    • Changes direction of association
    • True OR > 1 becomes < 1, or vice versa

Classification of Bias

Three main categories:

  1. Selection Bias
    • Different probabilities of study inclusion based on exposure/outcome
  2. Information Bias
    • Systematic errors in measuring exposure/outcome
    • Leads to misclassification
  3. Confounding
    • Association between exposure and outcome due to third variable

Selection Bias: Definition

Occurs when:

  • Study population does not represent target population
  • Systematic error in recruitment or retention of subjects
  • Different inclusion probabilities based on exposure and outcome

Can be introduced at:

  1. Design stage
    • Inappropriate definition of eligible population
    • Lack of accuracy of sampling frame
    • Uneven diagnostic procedures
  2. Implementation stage
    • Losses to follow-up
    • Non-response bias
    • Missing information

Example: Selection Bias

Case-Control Study without Selection Bias

Cases Controls
Exposed 500 1800
Unexposed 500 7200
Total 1000 9000
Exposure odds 1.0:1.0 1.0:4.0

Odds Ratio = 4.0 (True value)

Example: With Selection Bias

50% Sample of Cases, 10% Sample of Controls

Cases Controls
Exposed 250 180
Unexposed 250 720
Total 500 900
Exposure odds 1.0:1.0 1.0:4.0

Odds Ratio = 4.0 (Unbiased - equal sampling fractions)

Example: Differential Selection

50% Sample of Cases, Different for Controls by Exposure

Cases Controls
Exposed 300 180
Unexposed 200 720
Total 500 900
Exposure odds 1.5:1.0 1.0:4.0

Odds Ratio = 6.0 (Biased - differential sampling)

Consequence: Biased exposure odds in cases, unbiased in controls → Biased odds ratio

Types of Selection Bias

  1. Inappropriate definition of eligible population
    • Berkson’s bias (hospital-based studies)
    • Healthy worker effect
    • Neyman bias (incidence-prevalence bias)
    • Competing risks
  2. Lack of accuracy of sampling frame
    • Publication bias
    • Citation bias
  3. Uneven diagnostic procedures
    • Detection bias
  4. During implementation
    • Losses to follow-up
    • Non-response bias

Information Bias: Definition

Results from:

  • Imperfect definitions of study variables
  • Flawed data collection procedures
  • Systematic errors in measurement

Leads to:

  • Misclassification of exposure and/or outcome
  • Exposed classified as unexposed (or vice versa)
  • Diseased classified as non-diseased (or vice versa)

Note

Most studies must assume some degree of misclassification since perfect measurement tools are uncommon

Types of Misclassification

Non-Differential

  • Same misclassification across groups
  • Usually biases toward the null
  • For binary variables: always toward null
  • For polytomous variables: can be either direction

Differential

  • Different misclassification between groups
  • Can bias in either direction
  • More serious problem
  • Examples: recall bias, observer bias

Example: Non-Differential Misclassification

No Misclassification: OR = 4.0

Exposure Cases Controls
Yes 50 20
No 50 80

30% Misclassification in Each Group

Exposure Cases Controls
Yes 35 14
No 65 86

OR = 3.3 (diluted toward null value of 1.0)

Example: Differential Misclassification

True Distribution

Exposure Cases (100) Controls (100)
Yes 50 20
No 50 80

Misclassified (High Se in cases, Low Sp in controls)

Exposure Cases Controls
Yes 48 30
No 52 70

True OR = 4.0 → Misclassified OR = 2.1

Common Information Biases

  1. Recall Bias
    • Disease status influences memory of exposure
    • Common in case-control studies
  2. Observer/Interviewer Bias
    • Knowledge of hypothesis affects data collection
    • Observer expectation influences recording
  3. Reporting Bias
    • Participants give socially desirable answers
    • Underreporting of sensitive behaviors
  4. Detection Bias
    • Exposure influences disease diagnosis

Confounding

Definition:

A variable that:

  • Is a risk factor for the outcome among non-exposed
  • Associated with the exposure of interest
  • NOT affected by exposure or disease
  • NOT an intermediate step in causal pathway

Result:

  • Observed association is distorted
  • Can create spurious associations
  • Can mask true associations

Controlling Bias: Design Stage

Prevention strategies:

  1. Appropriate study design
    • Careful definition of study population
    • Proper sampling procedures
    • Randomization (when feasible)
  2. Blinding/Masking
    • Participants blind to intervention
    • Observers blind to exposure/outcome status
    • Analysts blind to group labels
  3. Standardized procedures
    • Valid and reliable data collection
    • Objective measurements when possible
    • Use of biological markers

Controlling Bias: Analysis Stage

Strategies:

  1. For Selection Bias
    • Imputation methods for missing data
    • Sensitivity analyses
    • Inverse probability weighting
  2. For Information Bias
    • Correction formulas (if sensitivity/specificity known)
    • Sensitivity analysis with plausible misclassification rates
  3. For Confounding
    • Stratification
    • Regression adjustment
    • Propensity score methods

Stratification for Confounding

Mantel-Haenszel Method:

  • Adjusts for confounding variables
  • Estimates common odds ratio across strata
  • Tests for homogeneity of effect

Requirements:

  • Confounder must be measured
  • Sufficient data in each stratum
  • No interaction (effect modification)

Prevention and Control Summary

Three levels of control:

  1. Study Design
    • Appropriate selection procedures
    • Address study hypotheses properly
  2. Data Collection
    • Valid and reliable procedures
    • Careful monitoring of processes
  3. Analysis
    • Appropriate analytical procedures
    • Adjustment for measured confounders
    • Sensitivity analyses

Bias in Clinical Trials

Specific trial-related biases:

  1. Allocation of intervention bias
    • Non-concealed randomization
    • Predictable allocation sequence
  2. Compliance bias
    • Differential adherence to intervention
  3. Contamination bias
    • Intervention activities reach control group
  4. Lack of intention-to-treat analysis
    • Excluding non-compliant participants

Assessing Bias in Published Studies

Critical questions:

  1. How were participants selected?
  2. Were there losses to follow-up?
  3. How were exposures measured?
  4. How were outcomes ascertained?
  5. Were observers blinded?
  6. What confounders were controlled?
  7. What is the potential magnitude of bias?

Sensitivity Analysis

Purpose:

Assess impact of potential biases on results

Approaches:

  1. Selection bias:
    • Vary assumptions about non-respondents
    • Different imputation methods
  2. Information bias:
    • Range of plausible misclassification rates
    • Different sensitivity/specificity values
  3. Unmeasured confounding:
    • External adjustment methods
    • Quantitative bias analysis

Practical Implications

For researchers:

  • Design studies to minimize bias
  • Measure potential sources of bias
  • Report limitations transparently

For readers:

  • Critically evaluate study validity
  • Consider magnitude of potential biases
  • Assess generalizability

For policy makers:

  • Weight evidence quality
  • Consider consistency across studies
  • Account for methodological limitations

Case Study Discussion

Scenario:

A case-control study found aspirin use associated with reduced risk of colorectal cancer (OR = 0.6)

Consider:

  1. What selection biases might occur?
  2. What information biases are possible?
  3. What confounders should be controlled?
  4. How would you assess validity of findings?

Key Takeaways

  1. Bias is systematic error that affects internal validity
  2. Three main types: Selection, Information, Confounding
  3. Prevention is better than correction
  4. Design and conduct are crucial
  5. Critical evaluation essential for all studies
  6. No study is perfect - assess magnitude of potential biases

Summary

Remember:

  • Bias ≠ Random error
  • Affects internal validity
  • Can be toward or away from null
  • Prevented by good design
  • Partially corrected in analysis

Action items:

  • Learn to identify biases
  • Design studies to minimize bias
  • Use appropriate analytical methods
  • Report limitations honestly
  • Critically evaluate published research

References

  1. Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. 4th ed.
  2. Delgado-Rodríguez M, Llorca J. Bias. J Epidemiol Community Health. 2004;58(8):635-641.
  3. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed.
  4. Greenland S. Validity and bias in epidemiological research. Oxford Textbook of Public Health. 5th ed.

Questions?

Contact:

Dr. Kamarul Universiti Sains Malaysia

Next session:

Non-causal associations and confounding