Understanding Bias in Epidemiological Studies

GDT 508: Public Health Epidemiology

Dr. Kamarul Imran Musa

drki.musa@gmail.com

Professor in Epidemiology and Statistics, Universiti Sains Malaysia

Learning Objectives

By the end of this session, you should be able to:

Understand the concept of lack of validity or bias in epidemiological studies
Identify different types of bias in epidemiological research
Prevent and control bias in study design and analysis

What is Bias?

Definition:

Bias is the result of a systematic error in the design or conduct of a study

Results from flaws in:
- Selection of study participants
- Procedures for gathering exposure/disease information
Observed results tend to differ from true results

Key Concept:

When there is lack of validity, there is bias

Important

Bias relates to the process (design and procedures), not the results of any particular study

Bias vs Random Error

Bias (Systematic Error)

Consistent deviation from truth
Affects internal validity
Cannot be reduced by increasing sample size
Must be prevented by design

Random Error

Fluctuation around truth
Affects precision
Reduced by larger samples
Addressed by statistics

Effect of Bias

Bias can move estimates in different directions:

Toward the null (negative bias)
- Estimates closer to null value (OR closer to 1)
- Underestimates true association
Away from the null (positive bias)
- Estimates further from null value
- Overestimates true association
Switch-over bias (extreme case)
- Changes direction of association
- True OR > 1 becomes < 1, or vice versa

Classification of Bias

Three main categories:

Selection Bias
- Different probabilities of study inclusion based on exposure/outcome
Information Bias
- Systematic errors in measuring exposure/outcome
- Leads to misclassification
Confounding
- Association between exposure and outcome due to third variable

Selection Bias: Definition

Occurs when:

Study population does not represent target population
Systematic error in recruitment or retention of subjects
Different inclusion probabilities based on exposure and outcome

Can be introduced at:

Design stage
- Inappropriate definition of eligible population
- Lack of accuracy of sampling frame
- Uneven diagnostic procedures
Implementation stage
- Losses to follow-up
- Non-response bias
- Missing information

Example: Selection Bias

Case-Control Study without Selection Bias

	Cases	Controls
Exposed	500	1800
Unexposed	500	7200
Total	1000	9000
Exposure odds	1.0:1.0	1.0:4.0

Odds Ratio = 4.0 (True value)

Example: With Selection Bias

50% Sample of Cases, 10% Sample of Controls

	Cases	Controls
Exposed	250	180
Unexposed	250	720
Total	500	900
Exposure odds	1.0:1.0	1.0:4.0

Odds Ratio = 4.0 (Unbiased - equal sampling fractions)

Example: Differential Selection

50% Sample of Cases, Different for Controls by Exposure

	Cases	Controls
Exposed	300	180
Unexposed	200	720
Total	500	900
Exposure odds	1.5:1.0	1.0:4.0

Odds Ratio = 6.0 (Biased - differential sampling)

Consequence: Biased exposure odds in cases, unbiased in controls → Biased odds ratio

Types of Selection Bias

Inappropriate definition of eligible population
- Berkson’s bias (hospital-based studies)
- Healthy worker effect
- Neyman bias (incidence-prevalence bias)
- Competing risks
Lack of accuracy of sampling frame
- Publication bias
- Citation bias
Uneven diagnostic procedures
- Detection bias
During implementation
- Losses to follow-up
- Non-response bias

Information Bias: Definition

Results from:

Imperfect definitions of study variables
Flawed data collection procedures
Systematic errors in measurement

Leads to:

Misclassification of exposure and/or outcome
Exposed classified as unexposed (or vice versa)
Diseased classified as non-diseased (or vice versa)

Note

Most studies must assume some degree of misclassification since perfect measurement tools are uncommon

Types of Misclassification

Non-Differential

Same misclassification across groups
Usually biases toward the null
For binary variables: always toward null
For polytomous variables: can be either direction

Differential

Different misclassification between groups
Can bias in either direction
More serious problem
Examples: recall bias, observer bias

Example: Non-Differential Misclassification

No Misclassification: OR = 4.0

Exposure	Cases	Controls
Yes	50	20
No	50	80

30% Misclassification in Each Group

Exposure	Cases	Controls
Yes	35	14
No	65	86

OR = 3.3 (diluted toward null value of 1.0)

Example: Differential Misclassification

True Distribution

Exposure	Cases (100)	Controls (100)
Yes	50	20
No	50	80

Misclassified (High Se in cases, Low Sp in controls)

Exposure	Cases	Controls
Yes	48	30
No	52	70

True OR = 4.0 → Misclassified OR = 2.1

Common Information Biases

Recall Bias
- Disease status influences memory of exposure
- Common in case-control studies
Observer/Interviewer Bias
- Knowledge of hypothesis affects data collection
- Observer expectation influences recording
Reporting Bias
- Participants give socially desirable answers
- Underreporting of sensitive behaviors
Detection Bias
- Exposure influences disease diagnosis

Confounding

Definition:

A variable that:

Is a risk factor for the outcome among non-exposed
Associated with the exposure of interest
NOT affected by exposure or disease
NOT an intermediate step in causal pathway

Result:

Observed association is distorted
Can create spurious associations
Can mask true associations

Controlling Bias: Design Stage

Prevention strategies:

Appropriate study design
- Careful definition of study population
- Proper sampling procedures
- Randomization (when feasible)
Blinding/Masking
- Participants blind to intervention
- Observers blind to exposure/outcome status
- Analysts blind to group labels
Standardized procedures
- Valid and reliable data collection
- Objective measurements when possible
- Use of biological markers

Controlling Bias: Analysis Stage

Strategies:

For Selection Bias
- Imputation methods for missing data
- Sensitivity analyses
- Inverse probability weighting
For Information Bias
- Correction formulas (if sensitivity/specificity known)
- Sensitivity analysis with plausible misclassification rates
For Confounding
- Stratification
- Regression adjustment
- Propensity score methods

Stratification for Confounding

Mantel-Haenszel Method:

Adjusts for confounding variables
Estimates common odds ratio across strata
Tests for homogeneity of effect

Requirements:

Confounder must be measured
Sufficient data in each stratum
No interaction (effect modification)

Prevention and Control Summary

Three levels of control:

Study Design
- Appropriate selection procedures
- Address study hypotheses properly
Data Collection
- Valid and reliable procedures
- Careful monitoring of processes
Analysis
- Appropriate analytical procedures
- Adjustment for measured confounders
- Sensitivity analyses

Bias in Clinical Trials

Specific trial-related biases:

Allocation of intervention bias
- Non-concealed randomization
- Predictable allocation sequence
Compliance bias
- Differential adherence to intervention
Contamination bias
- Intervention activities reach control group
Lack of intention-to-treat analysis
- Excluding non-compliant participants

Assessing Bias in Published Studies

Critical questions:

How were participants selected?
Were there losses to follow-up?
How were exposures measured?
How were outcomes ascertained?
Were observers blinded?
What confounders were controlled?
What is the potential magnitude of bias?

Sensitivity Analysis

Purpose:

Assess impact of potential biases on results

Approaches:

Selection bias:
- Vary assumptions about non-respondents
- Different imputation methods
Information bias:
- Range of plausible misclassification rates
- Different sensitivity/specificity values
Unmeasured confounding:
- External adjustment methods
- Quantitative bias analysis

Practical Implications

For researchers:

Design studies to minimize bias
Measure potential sources of bias
Report limitations transparently

For readers:

Critically evaluate study validity
Consider magnitude of potential biases
Assess generalizability

For policy makers:

Weight evidence quality
Consider consistency across studies
Account for methodological limitations

Case Study Discussion

Scenario:

A case-control study found aspirin use associated with reduced risk of colorectal cancer (OR = 0.6)

Consider:

What selection biases might occur?
What information biases are possible?
What confounders should be controlled?
How would you assess validity of findings?

Key Takeaways

Bias is systematic error that affects internal validity
Three main types: Selection, Information, Confounding
Prevention is better than correction
Design and conduct are crucial
Critical evaluation essential for all studies
No study is perfect - assess magnitude of potential biases

Summary

Remember:

Bias ≠ Random error
Affects internal validity
Can be toward or away from null
Prevented by good design
Partially corrected in analysis

Action items:

Learn to identify biases
Design studies to minimize bias
Use appropriate analytical methods
Report limitations honestly
Critically evaluate published research

References

Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. 4th ed.
Delgado-Rodríguez M, Llorca J. Bias. J Epidemiol Community Health. 2004;58(8):635-641.
Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed.
Greenland S. Validity and bias in epidemiological research. Oxford Textbook of Public Health. 5th ed.

Questions?

Contact:

Dr. Kamarul Universiti Sains Malaysia

Next session:

Non-causal associations and confounding