Diagnostic Test Validity and Accuracy

Understanding Screening and Diagnostic Tests

Author

Kamarul Imran Musa

Published

August 12, 2025

1 Introduction

1.1 Learning Objectives

By the end of this presentation, you will be able to:

  • Define key concepts in diagnostic test accuracy
  • Calculate sensitivity, specificity, and predictive values
  • Understand the impact of disease prevalence on test performance
  • Interpret ROC curves and likelihood ratios
  • Apply these concepts using R programming with tidyverse principles

2 Part 1: Fundamentals of Diagnostic Tests

2.1 Gold Standard Tests

Definition: A test that provides the definitive diagnosis against which other tests are compared

Characteristics:

  • High accuracy (ideally 100% sensitivity and specificity)
  • Often invasive, expensive, or time-consuming
  • May not be practical for routine screening

Examples:

  • Biopsy for cancer diagnosis
  • Angiography for coronary artery disease
  • Culture for infectious diseases

2.2 The 2×2 Table: Foundation of Test Evaluation

2x2 Contingency Table for Diagnostic Tests
True Disease Status
Total
Disease Present Disease Absent
Test Positive True Positive (TP) False Positive (FP) TP + FP
Test Negative False Negative (FN) True Negative (TN) FN + TN
Total TP + FN FP + TN N

2.3 Key Performance Measures

Sensitivity (True Positive Rate) \[Sensitivity = \frac{TP}{TP + FN}\] Proportion of diseased individuals correctly identified

Specificity (True Negative Rate) \[Specificity = \frac{TN}{TN + FP}\] Proportion of healthy individuals correctly identified

2.4 Predictive Values

Positive Predictive Value (PPV) \[PPV = \frac{TP}{TP + FP}\] Probability that a positive test indicates disease

Negative Predictive Value (NPV) \[NPV = \frac{TN}{TN + FN}\] Probability that a negative test indicates no disease

2.5 Types of Errors in Testing

False Positive (Type I Error)

  • Test positive, but no disease
  • Consequences:
  • Unnecessary anxiety
  • Additional testing
  • Inappropriate treatment

False Negative (Type II Error)

  • Test negative, but disease present
  • Consequences:
  • Delayed diagnosis
  • Disease progression
  • Missed treatment opportunity

2.6 Common Biases in Screening Tests

Verification Bias - Non-random selection for confirmatory testing

Spectrum Bias - Study population not representative of target population

Lead Time Bias - Earlier detection appears to improve survival

Length Bias - Slowly progressing diseases overrepresented

Overdiagnosis Bias - Detection of disease that wouldn’t cause clinical problems

2.7 The Critical Role of Prevalence

Key Concept: Predictive values depend heavily on disease prevalence in the population

2.8 Positive Likelihood Ratio

Definition: Ratio of the probability of a positive test in diseased vs. non-diseased individuals

Formula: \[LR+ = \frac{Sensitivity}{1 - Specificity} = \frac{True\ Positive\ Rate}{False\ Positive\ Rate}\]

Interpretation:

  • LR+ > 10: Strong evidence for disease
  • LR+ 5-10: Moderate evidence for disease
  • LR+ 2-5: Weak evidence for disease
  • LR+ = 1: No diagnostic value
  • LR+ < 1: Evidence against disease

3 Part 2: Worked Example with Simulated Data

3.1 Setting Up Our Example

Let’s create a hypothetical screening scenario using tidyverse:

Code
# Set seed for reproducibility
set.seed(123)

# Create simulated data for breast cancer screening using tidyverse
n <- 1000
prevalence <- 0.05  # 5% prevalence
sensitivity <- 0.85
specificity <- 0.90

# Generate dataset using tidyverse approach
screening_data <- tibble(
  id = 1:n,
  true_disease = sample(c("Diseased", "Healthy"), 
                        size = n, 
                        prob = c(prevalence, 1-prevalence), 
                        replace = TRUE)
) %>%
  mutate(
    test_result = case_when(
      true_disease == "Diseased" ~ 
        sample(c("Positive", "Negative"), 
               size = n(), 
               prob = c(sensitivity, 1-sensitivity),
               replace = TRUE),
      true_disease == "Healthy" ~ 
        sample(c("Positive", "Negative"), 
               size = n(), 
               prob = c(1-specificity, specificity),
               replace = TRUE)
    )
  ) %>%
  mutate(
    true_disease = factor(true_disease, levels = c("Healthy", "Diseased")),
    test_result = factor(test_result, levels = c("Negative", "Positive"))
  )
Code
# Create contingency table using tidyverse
contingency_summary <- screening_data %>%
  count(test_result, true_disease, .drop = FALSE) %>%
  pivot_wider(names_from = true_disease, values_from = n) %>%
  mutate(Total = Healthy + Diseased)

contingency_summary %>%
  gt() %>%
  tab_header(
    title = "Contingency Table: Test Results vs True Disease Status"
  ) %>%
  tab_spanner(
    label = "True Disease Status",
    columns = c("Healthy", "Diseased")
  )
Contingency Table: Test Results vs True Disease Status
test_result
True Disease Status
Total
Healthy Diseased
Negative 847 8 855
Positive 100 45 145

3.2 Calculating Performance Measures Using Tidyverse

Code
# Calculate performance measures using tidyverse approach
performance_metrics <- screening_data %>%
  summarise(
    TP = sum(test_result == "Positive" & true_disease == "Diseased"),
    FN = sum(test_result == "Negative" & true_disease == "Diseased"),
    FP = sum(test_result == "Positive" & true_disease == "Healthy"),
    TN = sum(test_result == "Negative" & true_disease == "Healthy"),
    .groups = "drop"
  ) %>%
  mutate(
    sensitivity = TP / (TP + FN),
    specificity = TN / (TN + FP),
    ppv = TP / (TP + FP),
    npv = TN / (TN + FN),
    accuracy = (TP + TN) / (TP + TN + FP + FN),
    youden_index = sensitivity + specificity - 1,
    lr_positive = sensitivity / (1 - specificity),
    lr_negative = (1 - sensitivity) / specificity
  )
Code
# Display results in a formatted table
performance_summary <- performance_metrics %>%
  pivot_longer(
    cols = everything(),
    names_to = "Measure",
    values_to = "Value"
  ) %>%
  mutate(
    Measure = case_when(
      Measure == "TP" ~ "True Positives",
      Measure == "FN" ~ "False Negatives", 
      Measure == "FP" ~ "False Positives",
      Measure == "TN" ~ "True Negatives",
      Measure == "sensitivity" ~ "Sensitivity",
      Measure == "specificity" ~ "Specificity",
      Measure == "ppv" ~ "Positive Predictive Value",
      Measure == "npv" ~ "Negative Predictive Value",
      Measure == "accuracy" ~ "Accuracy",
      Measure == "youden_index" ~ "Youden Index",
      Measure == "lr_positive" ~ "Likelihood Ratio (+)",
      Measure == "lr_negative" ~ "Likelihood Ratio (-)"
    ),
    Formula = case_when(
      str_detect(Measure, "True|False") ~ "",
      Measure == "Sensitivity" ~ "TP/(TP+FN)",
      Measure == "Specificity" ~ "TN/(TN+FP)",
      Measure == "Positive Predictive Value" ~ "TP/(TP+FP)",
      Measure == "Negative Predictive Value" ~ "TN/(TN+FN)",
      Measure == "Accuracy" ~ "(TP+TN)/Total",
      Measure == "Youden Index" ~ "Sensitivity + Specificity - 1",
      Measure == "Likelihood Ratio (+)" ~ "Sensitivity/(1-Specificity)",
      Measure == "Likelihood Ratio (-)" ~ "(1-Sensitivity)/Specificity"
    ),
    Value = case_when(
      str_detect(Measure, "True|False") ~ as.character(Value),
      TRUE ~ sprintf("%.3f", Value)
    )
  )
Code
performance_summary %>%
  gt() %>%
  tab_header(
    title = "Diagnostic Test Performance Metrics"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  )
Diagnostic Test Performance Metrics
Measure Value Formula
True Positives 45
False Negatives 8
False Positives 100
True Negatives 847
Sensitivity 0.849 TP/(TP+FN)
Specificity 0.894 TN/(TN+FP)
Positive Predictive Value 0.310 TP/(TP+FP)
Negative Predictive Value 0.991 TN/(TN+FN)
Accuracy 0.892 (TP+TN)/Total
Youden Index 0.743 Sensitivity + Specificity - 1
Likelihood Ratio (+) 8.041 Sensitivity/(1-Specificity)
Likelihood Ratio (-) 0.169 (1-Sensitivity)/Specificity

3.3 Enhanced 2×2 Table with Calculations

Code
# Create enhanced table with actual values
enhanced_summary <- performance_metrics %>%
  select(TP, FN, FP, TN) %>%
  mutate(
    diseased_total = TP + FN,
    healthy_total = FP + TN,
    positive_total = TP + FP,
    negative_total = FN + TN,
    total = TP + FN + FP + TN
  )

enhanced_table <- tibble(
  ` ` = c("Test Positive", "Test Negative", "Total"),
  `Disease Present` = c(
    paste0("TP = ", enhanced_summary$TP), 
    paste0("FN = ", enhanced_summary$FN), 
    paste0("Total = ", enhanced_summary$diseased_total)
  ),
  `Disease Absent` = c(
    paste0("FP = ", enhanced_summary$FP), 
    paste0("TN = ", enhanced_summary$TN), 
    paste0("Total = ", enhanced_summary$healthy_total)
  ),
  `Total` = c(
    paste0("Total = ", enhanced_summary$positive_total), 
    paste0("Total = ", enhanced_summary$negative_total), 
    paste0("N = ", enhanced_summary$total)
  )
)

enhanced_table %>%
  gt() %>%
  tab_header(
    title = "Enhanced 2×2 Table with Calculated Values"
  ) %>%
  tab_spanner(
    label = "True Disease Status",
    columns = c("Disease Present", "Disease Absent")
  )
Enhanced 2×2 Table with Calculated Values
True Disease Status
Total
Disease Present Disease Absent
Test Positive TP = 45 FP = 100 Total = 145
Test Negative FN = 8 TN = 847 Total = 855
Total Total = 53 Total = 947 N = 1000

Key Findings:

  • Sensitivity: 84.9% (good at detecting disease)
  • Specificity: 89.4% (good at ruling out disease)
  • PPV: 31% (moderate confidence in positive results)
  • NPV: 99.1% (high confidence in negative results)

4 Part 3: Advanced Analysis with Logistic Regression

4.1 Creating Complex Simulation Dataset Using Tidyverse

Code
# Set seed for reproducibility
set.seed(456)
n <- 1000

# Generate complex dataset using tidyverse
cancer_data <- tibble(
  id = 1:n,
  age = pmax(pmin(rnorm(n, mean = 55, sd = 15), 90), 20)  # Age between 20-90
) %>%
  mutate(
    # Generate true cancer status (influenced by age)
    cancer_prob = plogis(-3 + 0.05 * age),
    cancer = rbinom(n, 1, cancer_prob),
    
    # Generate ToolA results (mammography-like)
    toolA_sensitivity = pmin(0.75 + 0.01 * pmax(age - 40, 0), 0.95),
    toolA_specificity = 0.88,
    toolA_prob = case_when(
      cancer == 1 ~ toolA_sensitivity,
      cancer == 0 ~ 1 - toolA_specificity
    ),
    toolA = rbinom(n, 1, toolA_prob),
    
    # Generate ToolB results (genetic test-like)
    toolB_sensitivity = 0.92,
    toolB_specificity = 0.82,
    toolB_prob = case_when(
      cancer == 1 ~ toolB_sensitivity,
      cancer == 0 ~ 1 - toolB_specificity
    ),
    toolB = rbinom(n, 1, toolB_prob)
  ) %>%
  mutate(
    toolA = factor(toolA, levels = c(0,1), labels = c("Negative", "Positive")),
    toolB = factor(toolB, levels = c(0,1), labels = c("Negative", "Positive")),
    cancer = factor(cancer, levels = c(0,1), labels = c("No Cancer", "Cancer"))
  ) %>%
  select(id, age, toolA, toolB, cancer)
Code
# Display first few rows
cancer_data %>%
  slice_head(n = 10) %>%
  gt() %>%
  tab_header(
    title = "Sample of Cancer Screening Dataset"
  ) %>%
  fmt_number(columns = age, decimals = 1)
Sample of Cancer Screening Dataset
id age toolA toolB cancer
1 34.8 Negative Negative No Cancer
2 64.3 Positive Positive Cancer
3 67.0 Positive Positive Cancer
4 34.2 Negative Negative No Cancer
5 44.3 Positive Positive Cancer
6 50.1 Positive Positive Cancer
7 65.4 Positive Positive Cancer
8 58.8 Negative Negative No Cancer
9 70.1 Negative Negative No Cancer
10 63.6 Positive Positive Cancer

4.2 Logistic Regression Analysis Using Broom and gtsummary

Code
# Fit logistic regression model
model <- glm(cancer ~ toolA + toolB, 
             data = cancer_data, 
             family = binomial())

# Display model summary using gtsummary
model %>%
  tbl_regression(exponentiate = TRUE) %>%
  add_glance_table() %>%
  modify_header(label = "**Variable**") %>%
  modify_footnote(ci = "CI = Confidence Interval") %>%
  bold_labels()
Variable OR 95% CI p-value
toolA


    Negative
    Positive 50.8 29.7, 91.6 <0.001
toolB


    Negative
    Positive 50.0 28.8, 91.8 <0.001
Null deviance 1,377

Null df 999

Log-likelihood -209

AIC 425

BIC 439

Deviance 419

Residual df 997

No. Obs. 1,000

Abbreviations: CI = Confidence Interval, OR = Odds Ratio

4.3 Model Coefficients Using Broom

Code
# Create detailed model summary using broom
model_tidy <- model %>%
  tidy(conf.int = TRUE, exponentiate = TRUE) %>%
  mutate(
    term = case_when(
      term == "(Intercept)" ~ "Intercept",
      term == "toolAPositive" ~ "Tool A Positive",
      term == "toolBPositive" ~ "Tool B Positive"
    )
  )
Code
model_tidy %>%
  gt() %>%
  tab_header(
    title = "Logistic Regression Model Coefficients"
  ) %>%
  fmt_number(columns = c(estimate, conf.low, conf.high), decimals = 3) %>%
  fmt_scientific(columns = c(std.error, statistic, p.value), decimals = 3) %>%
  cols_label(
    term = "Term",
    estimate = "Odds Ratio", 
    std.error = "Std Error",
    statistic = "Z Statistic",
    p.value = "P-value",
    conf.low = "Lower CI",
    conf.high = "Upper CI"
  )
Logistic Regression Model Coefficients
Term Odds Ratio Std Error Z Statistic P-value Lower CI Upper CI
Intercept 0.011 3.101 × 10−1 −1.453 × 101 7.345 × 10−48 0.006 0.020
Tool A Positive 50.842 2.860 × 10−1 1.374 × 101 5.938 × 10−43 29.723 91.639
Tool B Positive 50.012 2.947 × 10−1 1.328 × 101 3.226 × 10−40 28.766 91.760

Interpretation: - Tool A Positive: 50.8 times higher odds of cancer - Tool B Positive: 50 times higher odds of cancer

4.4 Model Performance Evaluation Using Broom

Code
# Generate predictions using augment
model_predictions <- model %>%
  augment(data = cancer_data, type.predict = "response") %>%
  mutate(
    predicted_class = factor(
      ifelse(.fitted > 0.5, "Cancer", "No Cancer"),
      levels = c("No Cancer", "Cancer")
    )
  )
Code
# Calculate performance metrics using tidyverse
performance_summary <- model_predictions %>%
  summarise(
    TP = sum(predicted_class == "Cancer" & cancer == "Cancer"),
    FN = sum(predicted_class == "No Cancer" & cancer == "Cancer"),
    FP = sum(predicted_class == "Cancer" & cancer == "No Cancer"),
    TN = sum(predicted_class == "No Cancer" & cancer == "No Cancer"),
    .groups = "drop"
  ) %>%
  mutate(
    accuracy = (TP + TN) / (TP + TN + FP + FN),
    sensitivity = TP / (TP + FN),
    specificity = TN / (TN + FP),
    ppv = TP / (TP + FP),
    npv = TN / (TN + FN)
  )
Code
# Display confusion matrix
confusion_matrix <- model_predictions %>%
  count(predicted_class, cancer, .drop = FALSE) %>%
  pivot_wider(names_from = cancer, values_from = n)

confusion_matrix %>%
  gt() %>%
  tab_header(
    title = "Confusion Matrix - Logistic Regression Model"
  ) %>%
  tab_spanner(
    label = "Actual",
    columns = c("No Cancer", "Cancer")
  ) %>%
  tab_stubhead(label = "Predicted")
Confusion Matrix - Logistic Regression Model
predicted_class
Actual
No Cancer Cancer
No Cancer 532 77
Cancer 17 374
Code
# Display performance metrics
performance_summary %>%
  pivot_longer(
    cols = accuracy:npv,
    names_to = "Metric",
    values_to = "Value"
  ) %>%
  mutate(
    Metric = str_to_title(str_replace_all(Metric, "_", " "))
  ) %>%
  gt() %>%
  tab_header(
    title = "Model Performance Metrics"
  ) %>%
  fmt_number(columns = Value, decimals = 3)
Model Performance Metrics
TP FN FP TN Metric Value
374 77 17 532 Accuracy 0.906
374 77 17 532 Sensitivity 0.829
374 77 17 532 Specificity 0.969
374 77 17 532 Ppv 0.957
374 77 17 532 Npv 0.874

4.5 ROC Curve Analysis Using Tidyverse and pROC

Code
# Create ROC curves for individual tools and combined model
roc_data <- cancer_data %>%
  mutate(
    toolA_numeric = as.numeric(toolA == "Positive"),
    toolB_numeric = as.numeric(toolB == "Positive"),
    cancer_numeric = as.numeric(cancer == "Cancer")
  )

# Calculate ROC curves
roc_toolA <- roc(roc_data$cancer_numeric, roc_data$toolA_numeric)
roc_toolB <- roc(roc_data$cancer_numeric, roc_data$toolB_numeric)
roc_model <- roc(roc_data$cancer_numeric, model_predictions$.fitted)

# Create ROC curve data for ggplot
create_roc_data <- function(roc_obj, name) {
  tibble(
    sensitivity = roc_obj$sensitivities,
    specificity = roc_obj$specificities,
    test = name,
    auc = as.numeric(auc(roc_obj))
  ) %>%
    mutate(
      fpr = 1 - specificity,
      test_auc = paste0(test, " (AUC = ", round(auc, 3), ")")
    )
}
Code
roc_plot_data <- bind_rows(
  create_roc_data(roc_toolA, "Tool A"),
  create_roc_data(roc_toolB, "Tool B"),
  create_roc_data(roc_model, "Combined Model")
)

# Create ROC curve plot
roc_plot_data %>%
  ggplot(aes(x = fpr, y = sensitivity, color = test_auc)) +
  geom_line(size = 1.2) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") +
  labs(
    title = "ROC Curves Comparison",
    subtitle = "Individual Tools vs Combined Logistic Regression Model",
    x = "False Positive Rate (1 - Specificity)",
    y = "True Positive Rate (Sensitivity)",
    color = "Test"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_x_continuous(limits = c(0, 1)) +
  scale_y_continuous(limits = c(0, 1))

4.6 Comprehensive Test Comparison

Code
# Calculate individual test performance using tidyverse
calculate_test_performance <- function(test_result, true_outcome) {
  pos_test <- test_result == "Positive"
  cancer_present <- true_outcome == "Cancer"
  
  tibble(
    sensitivity = sum(pos_test & cancer_present) / sum(cancer_present),
    specificity = sum(!pos_test & !cancer_present) / sum(!cancer_present),
    auc = as.numeric(auc(roc(as.numeric(cancer_present), as.numeric(pos_test))))
  )
}

# Calculate performance for individual tools
toolA_performance <- calculate_test_performance(cancer_data$toolA, cancer_data$cancer)
toolB_performance <- calculate_test_performance(cancer_data$toolB, cancer_data$cancer)
Code
# Create comparison table
comparison_table <- tibble(
  Test = c("Tool A", "Tool B", "Combined Model"),
  AUC = c(
    toolA_performance$auc,
    toolB_performance$auc,
    as.numeric(auc(roc_model))
  ),
  Sensitivity = c(
    toolA_performance$sensitivity,
    toolB_performance$sensitivity,
    performance_summary$sensitivity
  ),
  Specificity = c(
    toolA_performance$specificity,
    toolB_performance$specificity,
    performance_summary$specificity
  )
)
Code
comparison_table %>%
  gt() %>%
  tab_header(
    title = "Test Performance Comparison"
  ) %>%
  fmt_number(columns = c(AUC, Sensitivity, Specificity), decimals = 3) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_body(rows = AUC == max(AUC))
  )
Test Performance Comparison
Test AUC Sensitivity Specificity
Tool A 0.891 0.902 0.880
Tool B 0.885 0.925 0.845
Combined Model 0.963 0.829 0.969

4.7 Summary

Individual Tests:

  • Each tool has different strengths
  • Tool A: Better specificity
  • Tool B: Better sensitivity

Combined Approach:

  • Logistic regression improves overall performance
  • Higher AUC indicates better discrimination
  • Balanced sensitivity and specificity

Clinical Implications:

  • Combining tests can improve diagnostic accuracy
  • Consider prevalence when interpreting results
  • Balance between false positives and false negatives

5 Summary and Clinical Applications

5.1 Key Concepts Reviewed

  1. Test Performance Measures
  • Sensitivity and specificity are intrinsic test properties
  • Predictive values depend on disease prevalence
  • Likelihood ratios provide clinical utility information
  1. Prevalence Matters
  • Low prevalence reduces positive predictive value
  • High prevalence reduces negative predictive value
  • Target high-risk populations for screening
  1. Statistical Modeling
  • Logistic regression can combine multiple tests
  • ROC curves help compare test performance
  • AUC provides overall discrimination measure

5.2 Clinical Decision Making

When choosing diagnostic tests, consider:

  • Population characteristics and disease prevalence
  • Consequences of false positives vs. false negatives
  • Cost and availability of tests
  • Patient preferences and values

Best practices:

  • Use high-sensitivity tests for screening
  • Use high-specificity tests for confirmation
  • Consider combining tests when appropriate
  • Always interpret results in clinical context

5.3 R Packages Used in This Analysis

Code
package_summary %>%
  gt() %>%
  tab_header(
    title = "R Packages and Functions Used"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  cols_width(
    Package ~ px(120),
    Purpose ~ px(200),
    `Key Functions Used` ~ px(300)
  )
R Packages and Functions Used
Package Purpose Key Functions Used
tidyverse Data manipulation and visualization mutate(), summarise(), pivot_longer(), ggplot()
broom Tidy model outputs tidy(), augment(), glance()
broom.helpers Enhanced broom functionality tidy_plus_plus()
gtsummary Summary tables for statistical models tbl_regression(), add_glance_table()
pROC ROC curve analysis roc(), auc()
gt Beautiful table formatting gt(), fmt_number()
kableExtra Enhanced kable tables kable_styling()
caret Machine learning tools confusionMatrix()

This enhanced analysis demonstrates the power of combining tidyverse principles with specialized statistical packages for comprehensive diagnostic test evaluation.