By the end of this session, students will be able to:
Define Analysis of Variance (ANOVA) and explain its purpose in statistical analysis
Identify appropriate situations for using different types of ANOVA in epidemiological studies
Distinguish between one-way, two-way ANOVA, and ANCOVA designs
Apply ANOVA techniques using R programming with tidyverse principles
Evaluate ANOVA assumptions and implement appropriate diagnostic procedures
Interpret ANOVA results including F-statistics, p-values, and effect sizes
Conduct post-hoc analyses when significant differences are detected
Address violations of ANOVA assumptions using appropriate statistical methods
2 Definition
Analysis of Variance (ANOVA) is a statistical method used to test whether there are statistically significant differences between the means of three or more groups. Despite its name suggesting analysis of variance, ANOVA actually compares means by analyzing the variance within and between groups.
2.1 Mathematical Foundation
ANOVA partitions the total variation in the data into components:
Interpretation: - F-statistic tests the null hypothesis that all group means are equal - A significant p-value indicates at least one group differs from others - Effect size (η²) indicates practical significance
3.2 2. Two-Way ANOVA
Purpose: Examine the effects of two factors and their interaction.
Research Question: “Do blood pressure levels differ by treatment and gender, and is there an interaction?”
Purpose: Compare group means while controlling for a continuous covariate.
Research Question: “Do treatment effects on blood pressure remain significant after controlling for age?”
Code
# Add age as covariate to original dataset.seed(789)bp_data_ancova <- bp_data %>%mutate(age =rnorm(n(), 55, 12),# Adjust BP based on age (positive correlation)systolic_bp_adj = systolic_bp +0.3* (age -55) +rnorm(n(), 0, 3) )# Visualize relationshipbp_data_ancova %>%ggplot(aes(x = age, y = systolic_bp_adj, color = treatment)) +geom_point(alpha =0.7) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Blood Pressure vs Age by Treatment (with regression lines)",x ="Age (years)",y ="Adjusted Systolic BP (mmHg)",color ="Treatment" ) +theme_minimal()
Kruskal-Wallis rank sum test
data: systolic_bp by treatment
Kruskal-Wallis chi-squared = 15.265, df = 3, p-value = 0.001604
Code
# 4. Robust ANOVA (using WRS2 package if available)# install.packages("WRS2")# library(WRS2)# t1way(systolic_bp ~ treatment, data = bp_data)
5 Post-Hoc Analysis
5.1 Why Post-Hoc Tests Are Important
When ANOVA indicates significant differences, post-hoc tests help identify: - Which specific groups differ from each other - The magnitude of differences - Control family-wise error rate
exercise = High:
contrast estimate SE df t.ratio p.value
Control - Low_fat 3.76 5.25 125 0.717 0.7542
Control - Mediterranean 11.52 5.23 125 2.201 0.0749
Low_fat - Mediterranean 7.76 5.23 125 1.485 0.3017
exercise = Low:
contrast estimate SE df t.ratio p.value
Control - Low_fat 5.68 5.23 125 1.087 0.5239
Control - Mediterranean 11.85 5.22 125 2.269 0.0640
Low_fat - Mediterranean 6.17 5.23 125 1.180 0.4670
exercise = Moderate:
contrast estimate SE df t.ratio p.value
Control - Low_fat 10.58 5.23 125 2.023 0.1108
Control - Mediterranean 20.75 5.24 125 3.964 0.0004
Low_fat - Mediterranean 10.17 5.26 125 1.935 0.1332
P value adjustment: tukey method for comparing a family of 3 estimates
7 Summary and Best Practices
7.1 When to Use Each Type of ANOVA
Design
Use When
Example
One-way ANOVA
Comparing 3+ groups on one factor
Treatment efficacy across multiple drugs
Two-way ANOVA
Two factors, interested in main effects and interactions
Treatment × Gender effects
One-way ANCOVA
One factor + continuous covariate
Treatment effects controlling for age
Two-way ANCOVA
Two factors + continuous covariate
Treatment × Gender controlling for baseline
7.2 Recommendations
Always check assumptions before interpreting results
Use post-hoc tests only when overall ANOVA is significant
Report effect sizes alongside p-values
Consider practical significance not just statistical significance
Use appropriate corrections for multiple comparisons
Visualize your data before and after analysis
7.3 R Code Best Practices
Code
# 1. Use tidyverse for data manipulationdata %>%filter(condition) %>%group_by(factor) %>%summarise(mean_outcome =mean(outcome), .groups ="drop")# 2. Use broom for tidy model outputsmodel %>%tidy() %>%kable()# 3. Use emmeans for post-hoc analysisemmeans(model, "factor") %>%pairs(adjust ="tukey")# 4. Always visualizeggplot(data, aes(x = factor, y = outcome)) +geom_boxplot() +theme_minimal()# 5. Check assumptions systematically# Normalityshapiro.test(residuals(model))# HomogeneityleveneTest(outcome ~ factor, data = data)# Independence (by design)
8 Conclusion
ANOVA is a powerful statistical tool in epidemiology and public health research. Understanding when to apply different types of ANOVA, how to check assumptions, and how to interpret results is crucial for making valid statistical inferences. Always remember that statistical significance should be accompanied by practical significance and proper effect size reporting.
The combination of R programming with tidyverse principles provides an efficient and reproducible approach to conducting ANOVA analyses in epidemiological research.