Diagnostic Test Validity and Accuracy

Understanding Screening and Diagnostic Tests

Author

Kamarul Imran Musa

Published

August 12, 2025

1 Introduction

1.1 Learning Objectives

By the end of this presentation, you will be able to:

Define key concepts in diagnostic test accuracy
Calculate sensitivity, specificity, and predictive values
Understand the impact of disease prevalence on test performance
Interpret ROC curves and likelihood ratios
Apply these concepts using R programming with tidyverse principles

2 Part 1: Fundamentals of Diagnostic Tests

2.1 Gold Standard Tests

Definition: A test that provides the definitive diagnosis against which other tests are compared

Characteristics:

High accuracy (ideally 100% sensitivity and specificity)
Often invasive, expensive, or time-consuming
May not be practical for routine screening

Examples:

Biopsy for cancer diagnosis
Angiography for coronary artery disease
Culture for infectious diseases

2.2 The 2×2 Table: Foundation of Test Evaluation

2x2 Contingency Table for Diagnostic Tests
	True Disease Status		Total
	Disease Present	Disease Absent	Total
Test Positive	True Positive (TP)	False Positive (FP)	TP + FP
Test Negative	False Negative (FN)	True Negative (TN)	FN + TN
Total	TP + FN	FP + TN	N

2.3 Key Performance Measures

Sensitivity (True Positive Rate) \[Sensitivity = \frac{TP}{TP + FN}\] Proportion of diseased individuals correctly identified

Specificity (True Negative Rate) \[Specificity = \frac{TN}{TN + FP}\] Proportion of healthy individuals correctly identified

2.4 Predictive Values

Positive Predictive Value (PPV) \[PPV = \frac{TP}{TP + FP}\] Probability that a positive test indicates disease

Negative Predictive Value (NPV) \[NPV = \frac{TN}{TN + FN}\] Probability that a negative test indicates no disease

2.5 Types of Errors in Testing

False Positive (Type I Error)

Test positive, but no disease
Consequences:
Unnecessary anxiety
Additional testing
Inappropriate treatment

False Negative (Type II Error)

Test negative, but disease present
Consequences:
Delayed diagnosis
Disease progression
Missed treatment opportunity

2.6 Common Biases in Screening Tests

Verification Bias - Non-random selection for confirmatory testing

Spectrum Bias - Study population not representative of target population

Lead Time Bias - Earlier detection appears to improve survival

Length Bias - Slowly progressing diseases overrepresented

Overdiagnosis Bias - Detection of disease that wouldn’t cause clinical problems

2.7 The Critical Role of Prevalence

Key Concept: Predictive values depend heavily on disease prevalence in the population

2.8 Positive Likelihood Ratio

Definition: Ratio of the probability of a positive test in diseased vs. non-diseased individuals

Formula: \[LR+ = \frac{Sensitivity}{1 - Specificity} = \frac{True\ Positive\ Rate}{False\ Positive\ Rate}\]

Interpretation:

LR+ > 10: Strong evidence for disease
LR+ 5-10: Moderate evidence for disease
LR+ 2-5: Weak evidence for disease
LR+ = 1: No diagnostic value
LR+ < 1: Evidence against disease

3 Part 2: Worked Example with Simulated Data

3.1 Setting Up Our Example

Let’s create a hypothetical screening scenario using tidyverse:

Code

# Set seed for reproducibility
set.seed(123)

# Create simulated data for breast cancer screening using tidyverse
n <- 1000
prevalence <- 0.05  # 5% prevalence
sensitivity <- 0.85
specificity <- 0.90

# Generate dataset using tidyverse approach
screening_data <- tibble(
  id = 1:n,
  true_disease = sample(c("Diseased", "Healthy"), 
                        size = n, 
                        prob = c(prevalence, 1-prevalence), 
                        replace = TRUE)
) %>%
  mutate(
    test_result = case_when(
      true_disease == "Diseased" ~ 
        sample(c("Positive", "Negative"), 
               size = n(), 
               prob = c(sensitivity, 1-sensitivity),
               replace = TRUE),
      true_disease == "Healthy" ~ 
        sample(c("Positive", "Negative"), 
               size = n(), 
               prob = c(1-specificity, specificity),
               replace = TRUE)
    )
  ) %>%
  mutate(
    true_disease = factor(true_disease, levels = c("Healthy", "Diseased")),
    test_result = factor(test_result, levels = c("Negative", "Positive"))
  )

Code

# Create contingency table using tidyverse
contingency_summary <- screening_data %>%
  count(test_result, true_disease, .drop = FALSE) %>%
  pivot_wider(names_from = true_disease, values_from = n) %>%
  mutate(Total = Healthy + Diseased)

contingency_summary %>%
  gt() %>%
  tab_header(
    title = "Contingency Table: Test Results vs True Disease Status"
  ) %>%
  tab_spanner(
    label = "True Disease Status",
    columns = c("Healthy", "Diseased")
  )

Contingency Table: Test Results vs True Disease Status
test_result	True Disease Status		Total
test_result	Healthy	Diseased	Total
Negative	847	8	855
Positive	100	45	145

3.2 Calculating Performance Measures Using Tidyverse

Code

# Calculate performance measures using tidyverse approach
performance_metrics <- screening_data %>%
  summarise(
    TP = sum(test_result == "Positive" & true_disease == "Diseased"),
    FN = sum(test_result == "Negative" & true_disease == "Diseased"),
    FP = sum(test_result == "Positive" & true_disease == "Healthy"),
    TN = sum(test_result == "Negative" & true_disease == "Healthy"),
    .groups = "drop"
  ) %>%
  mutate(
    sensitivity = TP / (TP + FN),
    specificity = TN / (TN + FP),
    ppv = TP / (TP + FP),
    npv = TN / (TN + FN),
    accuracy = (TP + TN) / (TP + TN + FP + FN),
    youden_index = sensitivity + specificity - 1,
    lr_positive = sensitivity / (1 - specificity),
    lr_negative = (1 - sensitivity) / specificity
  )

Code

# Display results in a formatted table
performance_summary <- performance_metrics %>%
  pivot_longer(
    cols = everything(),
    names_to = "Measure",
    values_to = "Value"
  ) %>%
  mutate(
    Measure = case_when(
      Measure == "TP" ~ "True Positives",
      Measure == "FN" ~ "False Negatives", 
      Measure == "FP" ~ "False Positives",
      Measure == "TN" ~ "True Negatives",
      Measure == "sensitivity" ~ "Sensitivity",
      Measure == "specificity" ~ "Specificity",
      Measure == "ppv" ~ "Positive Predictive Value",
      Measure == "npv" ~ "Negative Predictive Value",
      Measure == "accuracy" ~ "Accuracy",
      Measure == "youden_index" ~ "Youden Index",
      Measure == "lr_positive" ~ "Likelihood Ratio (+)",
      Measure == "lr_negative" ~ "Likelihood Ratio (-)"
    ),
    Formula = case_when(
      str_detect(Measure, "True|False") ~ "",
      Measure == "Sensitivity" ~ "TP/(TP+FN)",
      Measure == "Specificity" ~ "TN/(TN+FP)",
      Measure == "Positive Predictive Value" ~ "TP/(TP+FP)",
      Measure == "Negative Predictive Value" ~ "TN/(TN+FN)",
      Measure == "Accuracy" ~ "(TP+TN)/Total",
      Measure == "Youden Index" ~ "Sensitivity + Specificity - 1",
      Measure == "Likelihood Ratio (+)" ~ "Sensitivity/(1-Specificity)",
      Measure == "Likelihood Ratio (-)" ~ "(1-Sensitivity)/Specificity"
    ),
    Value = case_when(
      str_detect(Measure, "True|False") ~ as.character(Value),
      TRUE ~ sprintf("%.3f", Value)
    )
  )

Code

performance_summary %>%
  gt() %>%
  tab_header(
    title = "Diagnostic Test Performance Metrics"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  )

Diagnostic Test Performance Metrics
Measure	Value	Formula
True Positives	45
False Negatives	8
False Positives	100
True Negatives	847
Sensitivity	0.849	TP/(TP+FN)
Specificity	0.894	TN/(TN+FP)
Positive Predictive Value	0.310	TP/(TP+FP)
Negative Predictive Value	0.991	TN/(TN+FN)
Accuracy	0.892	(TP+TN)/Total
Youden Index	0.743	Sensitivity + Specificity - 1
Likelihood Ratio (+)	8.041	Sensitivity/(1-Specificity)
Likelihood Ratio (-)	0.169	(1-Sensitivity)/Specificity

3.3 Enhanced 2×2 Table with Calculations

Code

# Create enhanced table with actual values
enhanced_summary <- performance_metrics %>%
  select(TP, FN, FP, TN) %>%
  mutate(
    diseased_total = TP + FN,
    healthy_total = FP + TN,
    positive_total = TP + FP,
    negative_total = FN + TN,
    total = TP + FN + FP + TN
  )

enhanced_table <- tibble(
  ` ` = c("Test Positive", "Test Negative", "Total"),
  `Disease Present` = c(
    paste0("TP = ", enhanced_summary$TP), 
    paste0("FN = ", enhanced_summary$FN), 
    paste0("Total = ", enhanced_summary$diseased_total)
  ),
  `Disease Absent` = c(
    paste0("FP = ", enhanced_summary$FP), 
    paste0("TN = ", enhanced_summary$TN), 
    paste0("Total = ", enhanced_summary$healthy_total)
  ),
  `Total` = c(
    paste0("Total = ", enhanced_summary$positive_total), 
    paste0("Total = ", enhanced_summary$negative_total), 
    paste0("N = ", enhanced_summary$total)
  )
)

enhanced_table %>%
  gt() %>%
  tab_header(
    title = "Enhanced 2×2 Table with Calculated Values"
  ) %>%
  tab_spanner(
    label = "True Disease Status",
    columns = c("Disease Present", "Disease Absent")
  )

Enhanced 2×2 Table with Calculated Values
	True Disease Status		Total
	Disease Present	Disease Absent	Total
Test Positive	TP = 45	FP = 100	Total = 145
Test Negative	FN = 8	TN = 847	Total = 855
Total	Total = 53	Total = 947	N = 1000

Key Findings:

Sensitivity: 84.9% (good at detecting disease)
Specificity: 89.4% (good at ruling out disease)
PPV: 31% (moderate confidence in positive results)
NPV: 99.1% (high confidence in negative results)

4 Part 3: Advanced Analysis with Logistic Regression

4.1 Creating Complex Simulation Dataset Using Tidyverse

Code

# Set seed for reproducibility
set.seed(456)
n <- 1000

# Generate complex dataset using tidyverse
cancer_data <- tibble(
  id = 1:n,
  age = pmax(pmin(rnorm(n, mean = 55, sd = 15), 90), 20)  # Age between 20-90
) %>%
  mutate(
    # Generate true cancer status (influenced by age)
    cancer_prob = plogis(-3 + 0.05 * age),
    cancer = rbinom(n, 1, cancer_prob),
    
    # Generate ToolA results (mammography-like)
    toolA_sensitivity = pmin(0.75 + 0.01 * pmax(age - 40, 0), 0.95),
    toolA_specificity = 0.88,
    toolA_prob = case_when(
      cancer == 1 ~ toolA_sensitivity,
      cancer == 0 ~ 1 - toolA_specificity
    ),
    toolA = rbinom(n, 1, toolA_prob),
    
    # Generate ToolB results (genetic test-like)
    toolB_sensitivity = 0.92,
    toolB_specificity = 0.82,
    toolB_prob = case_when(
      cancer == 1 ~ toolB_sensitivity,
      cancer == 0 ~ 1 - toolB_specificity
    ),
    toolB = rbinom(n, 1, toolB_prob)
  ) %>%
  mutate(
    toolA = factor(toolA, levels = c(0,1), labels = c("Negative", "Positive")),
    toolB = factor(toolB, levels = c(0,1), labels = c("Negative", "Positive")),
    cancer = factor(cancer, levels = c(0,1), labels = c("No Cancer", "Cancer"))
  ) %>%
  select(id, age, toolA, toolB, cancer)

Code

# Display first few rows
cancer_data %>%
  slice_head(n = 10) %>%
  gt() %>%
  tab_header(
    title = "Sample of Cancer Screening Dataset"
  ) %>%
  fmt_number(columns = age, decimals = 1)

Sample of Cancer Screening Dataset
id	age	toolA	toolB	cancer
1	34.8	Negative	Negative	No Cancer
2	64.3	Positive	Positive	Cancer
3	67.0	Positive	Positive	Cancer
4	34.2	Negative	Negative	No Cancer
5	44.3	Positive	Positive	Cancer
6	50.1	Positive	Positive	Cancer
7	65.4	Positive	Positive	Cancer
8	58.8	Negative	Negative	No Cancer
9	70.1	Negative	Negative	No Cancer
10	63.6	Positive	Positive	Cancer

4.2 Logistic Regression Analysis Using Broom and gtsummary

Code

# Fit logistic regression model
model <- glm(cancer ~ toolA + toolB, 
             data = cancer_data, 
             family = binomial())

# Display model summary using gtsummary
model %>%
  tbl_regression(exponentiate = TRUE) %>%
  add_glance_table() %>%
  modify_header(label = "**Variable**") %>%
  modify_footnote(ci = "CI = Confidence Interval") %>%
  bold_labels()

Variable	OR	95% CI	p-value
toolA
Negative	—	—
Positive	50.8	29.7, 91.6	<0.001
toolB
Negative	—	—
Positive	50.0	28.8, 91.8	<0.001
Null deviance	1,377
Null df	999
Log-likelihood	-209
AIC	425
BIC	439
Deviance	419
Residual df	997
No. Obs.	1,000
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

4.3 Model Coefficients Using Broom

Code

# Create detailed model summary using broom
model_tidy <- model %>%
  tidy(conf.int = TRUE, exponentiate = TRUE) %>%
  mutate(
    term = case_when(
      term == "(Intercept)" ~ "Intercept",
      term == "toolAPositive" ~ "Tool A Positive",
      term == "toolBPositive" ~ "Tool B Positive"
    )
  )

Code

model_tidy %>%
  gt() %>%
  tab_header(
    title = "Logistic Regression Model Coefficients"
  ) %>%
  fmt_number(columns = c(estimate, conf.low, conf.high), decimals = 3) %>%
  fmt_scientific(columns = c(std.error, statistic, p.value), decimals = 3) %>%
  cols_label(
    term = "Term",
    estimate = "Odds Ratio", 
    std.error = "Std Error",
    statistic = "Z Statistic",
    p.value = "P-value",
    conf.low = "Lower CI",
    conf.high = "Upper CI"
  )

Logistic Regression Model Coefficients
Term	Odds Ratio	Std Error	Z Statistic	P-value	Lower CI	Upper CI
Intercept	0.011	3.101 × 10⁻¹	−1.453 × 10¹	7.345 × 10⁻⁴⁸	0.006	0.020
Tool A Positive	50.842	2.860 × 10⁻¹	1.374 × 10¹	5.938 × 10⁻⁴³	29.723	91.639
Tool B Positive	50.012	2.947 × 10⁻¹	1.328 × 10¹	3.226 × 10⁻⁴⁰	28.766	91.760

Interpretation: - Tool A Positive: 50.8 times higher odds of cancer - Tool B Positive: 50 times higher odds of cancer

4.4 Model Performance Evaluation Using Broom

Code

# Generate predictions using augment
model_predictions <- model %>%
  augment(data = cancer_data, type.predict = "response") %>%
  mutate(
    predicted_class = factor(
      ifelse(.fitted > 0.5, "Cancer", "No Cancer"),
      levels = c("No Cancer", "Cancer")
    )
  )

Code

# Calculate performance metrics using tidyverse
performance_summary <- model_predictions %>%
  summarise(
    TP = sum(predicted_class == "Cancer" & cancer == "Cancer"),
    FN = sum(predicted_class == "No Cancer" & cancer == "Cancer"),
    FP = sum(predicted_class == "Cancer" & cancer == "No Cancer"),
    TN = sum(predicted_class == "No Cancer" & cancer == "No Cancer"),
    .groups = "drop"
  ) %>%
  mutate(
    accuracy = (TP + TN) / (TP + TN + FP + FN),
    sensitivity = TP / (TP + FN),
    specificity = TN / (TN + FP),
    ppv = TP / (TP + FP),
    npv = TN / (TN + FN)
  )

Code

# Display confusion matrix
confusion_matrix <- model_predictions %>%
  count(predicted_class, cancer, .drop = FALSE) %>%
  pivot_wider(names_from = cancer, values_from = n)

confusion_matrix %>%
  gt() %>%
  tab_header(
    title = "Confusion Matrix - Logistic Regression Model"
  ) %>%
  tab_spanner(
    label = "Actual",
    columns = c("No Cancer", "Cancer")
  ) %>%
  tab_stubhead(label = "Predicted")

Confusion Matrix - Logistic Regression Model
predicted_class	Actual
predicted_class	No Cancer	Cancer
No Cancer	532	77
Cancer	17	374

Code

# Display performance metrics
performance_summary %>%
  pivot_longer(
    cols = accuracy:npv,
    names_to = "Metric",
    values_to = "Value"
  ) %>%
  mutate(
    Metric = str_to_title(str_replace_all(Metric, "_", " "))
  ) %>%
  gt() %>%
  tab_header(
    title = "Model Performance Metrics"
  ) %>%
  fmt_number(columns = Value, decimals = 3)

Model Performance Metrics
TP	FN	FP	TN	Metric	Value
374	77	17	532	Accuracy	0.906
374	77	17	532	Sensitivity	0.829
374	77	17	532	Specificity	0.969
374	77	17	532	Ppv	0.957
374	77	17	532	Npv	0.874

4.5 ROC Curve Analysis Using Tidyverse and pROC

Code

# Create ROC curves for individual tools and combined model
roc_data <- cancer_data %>%
  mutate(
    toolA_numeric = as.numeric(toolA == "Positive"),
    toolB_numeric = as.numeric(toolB == "Positive"),
    cancer_numeric = as.numeric(cancer == "Cancer")
  )

# Calculate ROC curves
roc_toolA <- roc(roc_data$cancer_numeric, roc_data$toolA_numeric)
roc_toolB <- roc(roc_data$cancer_numeric, roc_data$toolB_numeric)
roc_model <- roc(roc_data$cancer_numeric, model_predictions$.fitted)

# Create ROC curve data for ggplot
create_roc_data <- function(roc_obj, name) {
  tibble(
    sensitivity = roc_obj$sensitivities,
    specificity = roc_obj$specificities,
    test = name,
    auc = as.numeric(auc(roc_obj))
  ) %>%
    mutate(
      fpr = 1 - specificity,
      test_auc = paste0(test, " (AUC = ", round(auc, 3), ")")
    )
}

Code

roc_plot_data <- bind_rows(
  create_roc_data(roc_toolA, "Tool A"),
  create_roc_data(roc_toolB, "Tool B"),
  create_roc_data(roc_model, "Combined Model")
)

# Create ROC curve plot
roc_plot_data %>%
  ggplot(aes(x = fpr, y = sensitivity, color = test_auc)) +
  geom_line(size = 1.2) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") +
  labs(
    title = "ROC Curves Comparison",
    subtitle = "Individual Tools vs Combined Logistic Regression Model",
    x = "False Positive Rate (1 - Specificity)",
    y = "True Positive Rate (Sensitivity)",
    color = "Test"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_x_continuous(limits = c(0, 1)) +
  scale_y_continuous(limits = c(0, 1))

4.6 Comprehensive Test Comparison

Code

# Calculate individual test performance using tidyverse
calculate_test_performance <- function(test_result, true_outcome) {
  pos_test <- test_result == "Positive"
  cancer_present <- true_outcome == "Cancer"
  
  tibble(
    sensitivity = sum(pos_test & cancer_present) / sum(cancer_present),
    specificity = sum(!pos_test & !cancer_present) / sum(!cancer_present),
    auc = as.numeric(auc(roc(as.numeric(cancer_present), as.numeric(pos_test))))
  )
}

# Calculate performance for individual tools
toolA_performance <- calculate_test_performance(cancer_data$toolA, cancer_data$cancer)
toolB_performance <- calculate_test_performance(cancer_data$toolB, cancer_data$cancer)

Code

# Create comparison table
comparison_table <- tibble(
  Test = c("Tool A", "Tool B", "Combined Model"),
  AUC = c(
    toolA_performance$auc,
    toolB_performance$auc,
    as.numeric(auc(roc_model))
  ),
  Sensitivity = c(
    toolA_performance$sensitivity,
    toolB_performance$sensitivity,
    performance_summary$sensitivity
  ),
  Specificity = c(
    toolA_performance$specificity,
    toolB_performance$specificity,
    performance_summary$specificity
  )
)

Code

comparison_table %>%
  gt() %>%
  tab_header(
    title = "Test Performance Comparison"
  ) %>%
  fmt_number(columns = c(AUC, Sensitivity, Specificity), decimals = 3) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_body(rows = AUC == max(AUC))
  )

Test Performance Comparison
Test	AUC	Sensitivity	Specificity
Tool A	0.891	0.902	0.880
Tool B	0.885	0.925	0.845
Combined Model	0.963	0.829	0.969

4.7 Summary

Individual Tests:

Each tool has different strengths
Tool A: Better specificity
Tool B: Better sensitivity

Combined Approach:

Logistic regression improves overall performance
Higher AUC indicates better discrimination
Balanced sensitivity and specificity

Clinical Implications:

Combining tests can improve diagnostic accuracy
Consider prevalence when interpreting results
Balance between false positives and false negatives

5 Summary and Clinical Applications

5.1 Key Concepts Reviewed

Test Performance Measures

Sensitivity and specificity are intrinsic test properties
Predictive values depend on disease prevalence
Likelihood ratios provide clinical utility information

Prevalence Matters

Low prevalence reduces positive predictive value
High prevalence reduces negative predictive value
Target high-risk populations for screening

Statistical Modeling

Logistic regression can combine multiple tests
ROC curves help compare test performance
AUC provides overall discrimination measure

5.2 Clinical Decision Making

When choosing diagnostic tests, consider:

Population characteristics and disease prevalence
Consequences of false positives vs. false negatives
Cost and availability of tests
Patient preferences and values

Best practices:

Use high-sensitivity tests for screening
Use high-specificity tests for confirmation
Consider combining tests when appropriate
Always interpret results in clinical context

5.3 R Packages Used in This Analysis

Code

package_summary %>%
  gt() %>%
  tab_header(
    title = "R Packages and Functions Used"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  cols_width(
    Package ~ px(120),
    Purpose ~ px(200),
    `Key Functions Used` ~ px(300)
  )

R Packages and Functions Used
Package	Purpose	Key Functions Used
tidyverse	Data manipulation and visualization	mutate(), summarise(), pivot_longer(), ggplot()
broom	Tidy model outputs	tidy(), augment(), glance()
broom.helpers	Enhanced broom functionality	tidy_plus_plus()
gtsummary	Summary tables for statistical models	tbl_regression(), add_glance_table()
pROC	ROC curve analysis	roc(), auc()
gt	Beautiful table formatting	gt(), fmt_number()
kableExtra	Enhanced kable tables	kable_styling()
caret	Machine learning tools	confusionMatrix()

This enhanced analysis demonstrates the power of combining tidyverse principles with specialized statistical packages for comprehensive diagnostic test evaluation.