Semi Parametric and Parametric Survival Analysis

Kamarul Imran

2025-09-03

Prerequisites

Required Packages

Let’s start by loading the necessary packages for our analysis:

# Load required packages
library(tidyverse) # For data manipulation and visualization
library(survival)  # For survival analysis functions
library(flexsurv)  # For parametric survival models
library(haven)     # For reading Stata files

Additional Useful Packages

# Additional packages for analysis
library(survminer) # For enhanced survival plots
library(broom)     # For tidying model outputs

Introduction to Survival Analysis

What is Survival Analysis?

Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems.

Key Concepts

  • Event: The outcome of interest (death, disease recurrence, equipment failure)
  • Censoring: When we don’t observe the event for some subjects
  • Hazard Function: Instantaneous risk of event occurrence
  • Survival Function: Probability of surviving beyond time t

Types of Survival Analysis

  1. Non-parametric: Makes no assumptions about survival distribution
    • Strengths: Robust, no distributional assumptions
    • Weaknesses: Limited predictive capability, cannot handle complex relationships
  2. Semi-parametric: Baseline hazard unspecified, covariate effects parametric
    • Strengths: Flexible baseline hazard, handles covariates well
    • Weaknesses: Proportional hazard assumption required
  3. Parametric: Assumes specific distribution for survival times
    • Strengths: Efficient estimates, extrapolation possible, handles complex models
    • Weaknesses: Distributional assumptions must be met

Part 1: Non-Parametric Survival Analysis

Kaplan-Meier Survival Analysis

Introduction

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. It provides a way to measure the fraction of subjects living for a certain amount of time after treatment.

Motivations

  • No distributional assumptions: Works without knowing the underlying survival distribution
  • Handles censoring: Appropriately accounts for incomplete observations
  • Descriptive analysis: Provides clear visualization of survival patterns
  • Group comparisons: Can compare survival between different groups

Suitable Data

  • Time-to-event data: Death, disease progression, equipment failure
  • Right-censored observations: Some subjects not observed until event
  • Binary or categorical grouping variables: Treatment vs control, disease stages
  • Public health applications: Disease surveillance, treatment effectiveness

R Code Example

# Load lung cancer data from survival package
data(lung)

# Create survival object
lung_surv <- Surv(time = lung$time, event = lung$status == 2)

# Kaplan-Meier survival curves
km_fit <- survfit(lung_surv ~ sex, data = lung)
# Summary of survival curves
summary(km_fit)
Call: survfit(formula = lung_surv ~ sex, data = lung)

                sex=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   11    138       3   0.9783  0.0124       0.9542        1.000
   12    135       1   0.9710  0.0143       0.9434        0.999
   13    134       2   0.9565  0.0174       0.9231        0.991
   15    132       1   0.9493  0.0187       0.9134        0.987
   26    131       1   0.9420  0.0199       0.9038        0.982
   30    130       1   0.9348  0.0210       0.8945        0.977
   31    129       1   0.9275  0.0221       0.8853        0.972
   53    128       2   0.9130  0.0240       0.8672        0.961
   54    126       1   0.9058  0.0249       0.8583        0.956
   59    125       1   0.8986  0.0257       0.8496        0.950
   60    124       1   0.8913  0.0265       0.8409        0.945
   65    123       2   0.8768  0.0280       0.8237        0.933
   71    121       1   0.8696  0.0287       0.8152        0.928
   81    120       1   0.8623  0.0293       0.8067        0.922
   88    119       2   0.8478  0.0306       0.7900        0.910
   92    117       1   0.8406  0.0312       0.7817        0.904
   93    116       1   0.8333  0.0317       0.7734        0.898
   95    115       1   0.8261  0.0323       0.7652        0.892
  105    114       1   0.8188  0.0328       0.7570        0.886
  107    113       1   0.8116  0.0333       0.7489        0.880
  110    112       1   0.8043  0.0338       0.7408        0.873
  116    111       1   0.7971  0.0342       0.7328        0.867
  118    110       1   0.7899  0.0347       0.7247        0.861
  131    109       1   0.7826  0.0351       0.7167        0.855
  132    108       2   0.7681  0.0359       0.7008        0.842
  135    106       1   0.7609  0.0363       0.6929        0.835
  142    105       1   0.7536  0.0367       0.6851        0.829
  144    104       1   0.7464  0.0370       0.6772        0.823
  147    103       1   0.7391  0.0374       0.6694        0.816
  156    102       2   0.7246  0.0380       0.6538        0.803
  163    100       3   0.7029  0.0389       0.6306        0.783
  166     97       1   0.6957  0.0392       0.6230        0.777
  170     96       1   0.6884  0.0394       0.6153        0.770
  175     94       1   0.6811  0.0397       0.6076        0.763
  176     93       1   0.6738  0.0399       0.5999        0.757
  177     92       1   0.6664  0.0402       0.5922        0.750
  179     91       2   0.6518  0.0406       0.5769        0.736
  180     89       1   0.6445  0.0408       0.5693        0.730
  181     88       2   0.6298  0.0412       0.5541        0.716
  183     86       1   0.6225  0.0413       0.5466        0.709
  189     83       1   0.6150  0.0415       0.5388        0.702
  197     80       1   0.6073  0.0417       0.5309        0.695
  202     78       1   0.5995  0.0419       0.5228        0.687
  207     77       1   0.5917  0.0420       0.5148        0.680
  210     76       1   0.5839  0.0422       0.5068        0.673
  212     75       1   0.5762  0.0424       0.4988        0.665
  218     74       1   0.5684  0.0425       0.4909        0.658
  222     72       1   0.5605  0.0426       0.4829        0.651
  223     70       1   0.5525  0.0428       0.4747        0.643
  229     67       1   0.5442  0.0429       0.4663        0.635
  230     66       1   0.5360  0.0431       0.4579        0.627
  239     64       1   0.5276  0.0432       0.4494        0.619
  246     63       1   0.5192  0.0433       0.4409        0.611
  267     61       1   0.5107  0.0434       0.4323        0.603
  269     60       1   0.5022  0.0435       0.4238        0.595
  270     59       1   0.4937  0.0436       0.4152        0.587
  283     57       1   0.4850  0.0437       0.4065        0.579
  284     56       1   0.4764  0.0438       0.3979        0.570
  285     54       1   0.4676  0.0438       0.3891        0.562
  286     53       1   0.4587  0.0439       0.3803        0.553
  288     52       1   0.4499  0.0439       0.3716        0.545
  291     51       1   0.4411  0.0439       0.3629        0.536
  301     48       1   0.4319  0.0440       0.3538        0.527
  303     46       1   0.4225  0.0440       0.3445        0.518
  306     44       1   0.4129  0.0440       0.3350        0.509
  310     43       1   0.4033  0.0441       0.3256        0.500
  320     42       1   0.3937  0.0440       0.3162        0.490
  329     41       1   0.3841  0.0440       0.3069        0.481
  337     40       1   0.3745  0.0439       0.2976        0.471
  353     39       2   0.3553  0.0437       0.2791        0.452
  363     37       1   0.3457  0.0436       0.2700        0.443
  364     36       1   0.3361  0.0434       0.2609        0.433
  371     35       1   0.3265  0.0432       0.2519        0.423
  387     34       1   0.3169  0.0430       0.2429        0.413
  390     33       1   0.3073  0.0428       0.2339        0.404
  394     32       1   0.2977  0.0425       0.2250        0.394
  428     29       1   0.2874  0.0423       0.2155        0.383
  429     28       1   0.2771  0.0420       0.2060        0.373
  442     27       1   0.2669  0.0417       0.1965        0.362
  455     25       1   0.2562  0.0413       0.1868        0.351
  457     24       1   0.2455  0.0410       0.1770        0.341
  460     22       1   0.2344  0.0406       0.1669        0.329
  477     21       1   0.2232  0.0402       0.1569        0.318
  519     20       1   0.2121  0.0397       0.1469        0.306
  524     19       1   0.2009  0.0391       0.1371        0.294
  533     18       1   0.1897  0.0385       0.1275        0.282
  558     17       1   0.1786  0.0378       0.1179        0.270
  567     16       1   0.1674  0.0371       0.1085        0.258
  574     15       1   0.1562  0.0362       0.0992        0.246
  583     14       1   0.1451  0.0353       0.0900        0.234
  613     13       1   0.1339  0.0343       0.0810        0.221
  624     12       1   0.1228  0.0332       0.0722        0.209
  643     11       1   0.1116  0.0320       0.0636        0.196
  655     10       1   0.1004  0.0307       0.0552        0.183
  689      9       1   0.0893  0.0293       0.0470        0.170
  707      8       1   0.0781  0.0276       0.0390        0.156
  791      7       1   0.0670  0.0259       0.0314        0.143
  814      5       1   0.0536  0.0239       0.0223        0.128
  883      3       1   0.0357  0.0216       0.0109        0.117

                sex=2 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5     90       1   0.9889  0.0110       0.9675        1.000
   60     89       1   0.9778  0.0155       0.9478        1.000
   61     88       1   0.9667  0.0189       0.9303        1.000
   62     87       1   0.9556  0.0217       0.9139        0.999
   79     86       1   0.9444  0.0241       0.8983        0.993
   81     85       1   0.9333  0.0263       0.8832        0.986
   95     83       1   0.9221  0.0283       0.8683        0.979
  107     81       1   0.9107  0.0301       0.8535        0.972
  122     80       1   0.8993  0.0318       0.8390        0.964
  145     79       2   0.8766  0.0349       0.8108        0.948
  153     77       1   0.8652  0.0362       0.7970        0.939
  166     76       1   0.8538  0.0375       0.7834        0.931
  167     75       1   0.8424  0.0387       0.7699        0.922
  182     71       1   0.8305  0.0399       0.7559        0.913
  186     70       1   0.8187  0.0411       0.7420        0.903
  194     68       1   0.8066  0.0422       0.7280        0.894
  199     67       1   0.7946  0.0432       0.7142        0.884
  201     66       2   0.7705  0.0452       0.6869        0.864
  208     62       1   0.7581  0.0461       0.6729        0.854
  226     59       1   0.7452  0.0471       0.6584        0.843
  239     57       1   0.7322  0.0480       0.6438        0.833
  245     54       1   0.7186  0.0490       0.6287        0.821
  268     51       1   0.7045  0.0501       0.6129        0.810
  285     47       1   0.6895  0.0512       0.5962        0.798
  293     45       1   0.6742  0.0523       0.5791        0.785
  305     43       1   0.6585  0.0534       0.5618        0.772
  310     42       1   0.6428  0.0544       0.5447        0.759
  340     39       1   0.6264  0.0554       0.5267        0.745
  345     38       1   0.6099  0.0563       0.5089        0.731
  348     37       1   0.5934  0.0572       0.4913        0.717
  350     36       1   0.5769  0.0579       0.4739        0.702
  351     35       1   0.5604  0.0586       0.4566        0.688
  361     33       1   0.5434  0.0592       0.4390        0.673
  363     32       1   0.5265  0.0597       0.4215        0.658
  371     30       1   0.5089  0.0603       0.4035        0.642
  426     26       1   0.4893  0.0610       0.3832        0.625
  433     25       1   0.4698  0.0617       0.3632        0.608
  444     24       1   0.4502  0.0621       0.3435        0.590
  450     23       1   0.4306  0.0624       0.3241        0.572
  473     22       1   0.4110  0.0626       0.3050        0.554
  520     19       1   0.3894  0.0629       0.2837        0.534
  524     18       1   0.3678  0.0630       0.2628        0.515
  550     15       1   0.3433  0.0634       0.2390        0.493
  641     11       1   0.3121  0.0649       0.2076        0.469
  654     10       1   0.2808  0.0655       0.1778        0.443
  687      9       1   0.2496  0.0652       0.1496        0.417
  705      8       1   0.2184  0.0641       0.1229        0.388
  728      7       1   0.1872  0.0621       0.0978        0.359
  731      6       1   0.1560  0.0590       0.0743        0.328
  735      5       1   0.1248  0.0549       0.0527        0.295
  765      3       1   0.0832  0.0499       0.0257        0.270
# Plot survival curves
ggsurvplot(km_fit, data = lung, pval = TRUE, risk.table = TRUE,
          title = "Survival by Gender",
          xlab = "Time (days)",
          ylab = "Survival Probability")

Log-rank Test

# Compare survival curves between groups
survdiff(lung_surv ~ sex, data = lung)
Call:
survdiff(formula = lung_surv ~ sex, data = lung)

        N Observed Expected (O-E)^2/E (O-E)^2/V
sex=1 138      112     91.6      4.55      10.3
sex=2  90       53     73.4      5.68      10.3

 Chisq= 10.3  on 1 degrees of freedom, p= 0.001 

Part 2: Semi-Parametric Survival Analysis

Cox Proportional Hazards Regression

Introduction

The Cox proportional hazards model is a semi-parametric model that allows examination of the effect of several variables on survival time. It doesn’t assume a particular distribution for survival times but assumes that the hazard ratio between groups is constant over time.

Motivations

  • Covariate adjustment: Controls for multiple confounding variables simultaneously
  • Hazard ratios: Provides interpretable effect estimates
  • Flexible baseline: No need to specify baseline hazard distribution
  • Widely accepted: Standard method in medical research and epidemiology

Suitable Data

  • Multiple covariates: Age, sex, treatment, comorbidities
  • Proportional hazards assumption: Hazard ratios constant over time
  • Large sample sizes: More reliable with adequate events per variable
  • Public health applications: Risk factor identification, adjusted comparisons

R Code Example

cox_model <- coxph(lung_surv ~ age + sex + ph.ecog + wt.loss, data = lung)

# Model summary
summary(cox_model)
Call:
coxph(formula = lung_surv ~ age + sex + ph.ecog + wt.loss, data = lung)

  n= 213, number of events= 151 
   (15 observations deleted due to missingness)

             coef exp(coef)  se(coef)      z Pr(>|z|)    
age      0.013369  1.013459  0.009628  1.389 0.164951    
sex     -0.590775  0.553898  0.175339 -3.369 0.000754 ***
ph.ecog  0.515111  1.673824  0.125988  4.089 4.34e-05 ***
wt.loss -0.009006  0.991034  0.006658 -1.353 0.176135    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0135     0.9867    0.9945    1.0328
sex        0.5539     1.8054    0.3928    0.7811
ph.ecog    1.6738     0.5974    1.3076    2.1427
wt.loss    0.9910     1.0090    0.9782    1.0041

Concordance= 0.647  (se = 0.026 )
Likelihood ratio test= 31.02  on 4 df,   p=3e-06
Wald test            = 29.94  on 4 df,   p=5e-06
Score (logrank) test = 30.65  on 4 df,   p=4e-06
  • Hazard ratios with confidence intervals
exp(confint(cox_model))
            2.5 %    97.5 %
age     0.9945143 1.0327643
sex     0.3928084 0.7810504
ph.ecog 1.3075807 2.1426504
wt.loss 0.9781867 1.0040508

Model Diagnostics

  • Test proportional hazards assumption
cox.zph(cox_model)
         chisq df    p
age     0.4353  1 0.51
sex     2.6731  1 0.10
ph.ecog 1.6355  1 0.20
wt.loss 0.0457  1 0.83
GLOBAL  4.7516  4 0.31
  • Schoenfeld residuals plot
plot(cox.zph(cox_model))

  • Check model assumptions
ggcoxdiagnostics(cox_model, type = "schoenfeld")

Survival Curves from Cox Model

  • Predict survival curves for different covariate patterns
new_data <- data.frame(
  age = c(65, 65),
  sex = c(1, 2), 
  ph.ecog = c(1, 1),
  wt.loss = c(0, 0)
)
cox_curves <- survfit(cox_model, newdata = new_data)
ggsurvplot(cox_curves, data = lung, legend.labs = c("Male", "Female"))

Part 3: Parametric Survival Analysis

Introduction to Parametric Models

Parametric survival models assume that survival times follow a specific statistical distribution (e.g., exponential, Weibull, log-normal). These models provide more precise estimates when the distributional assumption is correct and allow for extrapolation beyond observed data.

Proportional Hazards vs Accelerated Failure Time Models

Proportional Hazards (PH) Models

In PH models, covariates multiply the baseline hazard function: \[h(t|x) = h_0(t) \exp(\beta'x)\]

  • Hazard ratio interpretation: \(\exp(\beta)\) represents the multiplicative effect on hazard
  • Assumption: Hazard ratios are constant over time
  • Examples: Exponential PH, Weibull PH models

Accelerated Failure Time (AFT) Models

In AFT models, covariates multiply (accelerate/decelerate) the time scale: \[\log(T) = \beta'x + \sigma \epsilon\]

  • Time ratio interpretation: \(\exp(\beta)\) represents the multiplicative effect on survival time
  • Assumption: Covariates change the rate at which time passes
  • Examples: Weibull AFT, Log-normal AFT, Log-logistic AFT models

Motivations for Parametric Models

  • Efficient estimation: Maximum likelihood provides efficient parameter estimates
  • Extrapolation: Can predict beyond observed follow-up time
  • Smooth hazard functions: Estimate instantaneous risk at any time point
  • Model comparison: Information criteria (AIC, BIC) for model selection
  • Economic evaluation: Essential for cost-effectiveness analysis in health economics

Suitable Data

  • Adequate sample size: Sufficient events for stable parameter estimation
  • Known or testable distributions: Can assess distributional assumptions
  • Complete covariate information: Missing data handled appropriately

Suitable Data

  • Public health applications:
    • Disease progression modeling
    • Health technology assessment
    • Resource planning and forecasting
    • Policy evaluation with long-term horizons

R Code Examples

Data Preparation

# Use lung cancer data
data(lung)
lung_clean <- lung %>% 
  mutate(
    status_bin = ifelse(status == 2, 1, 0),  # Convert to 0/1
    sex_factor = factor(sex, labels = c("Male", "Female"))
  ) %>%
  filter(complete.cases(.))

Exponential Model (PH parameterization)

# Fit exponential model using flexsurv
exp_model <- flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                         data = lung_clean, dist = "exp")
# Model summary
exp_model
Call:
flexsurvreg(formula = Surv(time, status_bin) ~ age + sex_factor + 
    ph.ecog, data = lung_clean, dist = "exp")

Estimates: 
                  data mean  est        L95%       U95%       se       
rate                     NA   0.001188   0.000315   0.004490   0.000806
age               62.568862   0.007058  -0.014316   0.028433   0.010906
sex_factorFemale   0.383234  -0.472364  -0.858175  -0.086552   0.196846
ph.ecog            0.958084   0.413043   0.146559   0.679528   0.135964
                  exp(est)   L95%       U95%     
rate                     NA         NA         NA
age                1.007083   0.985786   1.028841
sex_factorFemale   0.623527   0.423935   0.917088
ph.ecog            1.511411   1.157843   1.972946

N = 167,  Events: 120,  Censored: 47
Total time at risk: 51759
Log-likelihood = -839.5201, df = 4
AIC = 1687.04
# Hazard ratios (PH interpretation)
exp(exp_model$coefficients)
            rate              age sex_factorFemale          ph.ecog 
     0.001188465      1.007083418      0.623526750      1.511410731 

Weibull Model (Both PH and AFT)

# Weibull PH model
weibull_ph <- flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                          data = lung_clean, 
                          dist = "weibullPH")

# Weibull AFT model  
weibull_aft <- flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                           data = lung_clean, dist = 
                             "weibull")
# Compare models
print("Weibull PH Model:")
[1] "Weibull PH Model:"
weibull_ph
Call:
flexsurvreg(formula = Surv(time, status_bin) ~ age + sex_factor + 
    ph.ecog, data = lung_clean, dist = "weibullPH")

Estimates: 
                  data mean  est        L95%       U95%       se       
shape                    NA   1.38e+00   1.19e+00   1.58e+00   9.88e-02
scale                    NA   1.27e-04   2.17e-05   7.48e-04   1.15e-04
age                6.26e+01   6.10e-03  -1.54e-02   2.75e-02   1.09e-02
sex_factorFemale   3.83e-01  -5.07e-01  -8.93e-01  -1.22e-01   1.97e-01
ph.ecog            9.58e-01   4.67e-01   1.97e-01   7.37e-01   1.38e-01
                  exp(est)   L95%       U95%     
shape                    NA         NA         NA
scale                    NA         NA         NA
age                1.01e+00   9.85e-01   1.03e+00
sex_factorFemale   6.02e-01   4.09e-01   8.85e-01
ph.ecog            1.59e+00   1.22e+00   2.09e+00

N = 167,  Events: 120,  Censored: 47
Total time at risk: 51759
Log-likelihood = -831.1249, df = 5
AIC = 1672.25
print("Weibull AFT Model:")
[1] "Weibull AFT Model:"
weibull_aft
Call:
flexsurvreg(formula = Surv(time, status_bin) ~ age + sex_factor + 
    ph.ecog, data = lung_clean, dist = "weibull")

Estimates: 
                  data mean  est        L95%       U95%       se       
shape                    NA   1.38e+00   1.19e+00   1.58e+00   9.88e-02
scale                    NA   6.78e+02   2.56e+02   1.80e+03   3.37e+02
age                6.26e+01  -4.43e-03  -2.01e-02   1.12e-02   7.97e-03
sex_factorFemale   3.83e-01   3.69e-01   8.55e-02   6.52e-01   1.45e-01
ph.ecog            9.58e-01  -3.39e-01  -5.37e-01  -1.41e-01   1.01e-01
                  exp(est)   L95%       U95%     
shape                    NA         NA         NA
scale                    NA         NA         NA
age                9.96e-01   9.80e-01   1.01e+00
sex_factorFemale   1.45e+00   1.09e+00   1.92e+00
ph.ecog            7.12e-01   5.84e-01   8.68e-01

N = 167,  Events: 120,  Censored: 47
Total time at risk: 51759
Log-likelihood = -831.1249, df = 5
AIC = 1672.25

Model Selection and Diagnostics

# Fit multiple distributions
models <- list(
  exponential <- 
    flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                           data = lung_clean, dist = "exp"),
  weibull <- 
    flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                       data = lung_clean, dist = "weibull"),
  lognormal <- 
    flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                         data = lung_clean, dist = "lnorm"),
  loglogistic <- 
    flexsurvreg(Surv(time, status_bin) ~ age + sex_factor + ph.ecog, 
                           data = lung_clean, dist = "llogis")
)
# Compare AIC values
model_comparison <- sapply(models, function(x) 
  c(AIC = x$AIC, BIC = -2*x$loglik + 
      log(nrow(lung_clean))*length(x$coefficients)))
print(model_comparison)
        [,1]    [,2]     [,3]     [,4]
AIC 1687.040 1672.25 1693.777 1678.256
BIC 1699.512 1687.84 1709.367 1693.846
# Best fitting model
best_model <- models[[which.min(model_comparison[1,])]]
print(paste("Best model by AIC:", names(models)[which.min(model_comparison[1,])]))
[1] "Best model by AIC: "

Survival and Hazard Curves

# Plot survival curves
plot(weibull_aft, type = "survival", ci = TRUE, 
     main = "Weibull AFT Model - Survival Curves")
# Plot hazard curves  
plot(weibull_aft, type = "hazard", ci = TRUE,
     main = "Weibull AFT Model - Hazard Curves")
# Plot cumulative hazard
plot(weibull_aft, type = "cumhaz", ci = TRUE,
     main = "Weibull AFT Model - Cumulative Hazard")

Prediction and Extrapolation

# Predict survival for new individuals
newdata <- data.frame(
  age = c(60, 70),
  sex_factor = factor(c("Male", "Female"), levels = c("Male", "Female")),
  ph.ecog = c(1, 2)
)
# Predict median survival times
pred_medians <- summary(weibull_aft, newdata = newdata, 
                        type = "quantile", quantiles = 0.5)
print(pred_medians)
age=60,sex_factor=Male,ph.ecog=1 
  quantile      est      lcl      ucl
1      0.5 283.7862 240.9047 335.3861

age=70,sex_factor=Female,ph.ecog=2 
  quantile      est      lcl      ucl
1      0.5 279.6649 212.9405 374.2195
# Predict survival probabilities at specific times
pred_survival <- summary(weibull_aft, newdata = newdata, 
                         type = "survival", t = c(365, 730))
print(pred_survival)
age=60,sex_factor=Male,ph.ecog=1 
  time        est        lcl       ucl
1  365 0.37535397 0.28548871 0.4594446
2  730 0.07867094 0.03667703 0.1368841

age=70,sex_factor=Female,ph.ecog=2 
  time        est        lcl       ucl
1  365 0.36795171 0.22935532 0.5085544
2  730 0.07470848 0.01975916 0.1757239

Interpretation for Public Health

# Extract AFT coefficients (time ratios)
aft_coefs <- weibull_aft$coefficients
time_ratios <- exp(aft_coefs)

print("Time Ratios (AFT interpretation):")
[1] "Time Ratios (AFT interpretation):"
print(time_ratios)
           shape            scale              age sex_factorFemale 
       1.3755518      678.4855095        0.9955772        1.4461953 
         ph.ecog 
       0.7123120 
print("Values > 1 indicate longer survival times")
[1] "Values > 1 indicate longer survival times"
print("Values < 1 indicate shorter survival times")
[1] "Values < 1 indicate shorter survival times"
# Clinical interpretation example
cat("\nClinical Interpretation:\n")

Clinical Interpretation:
cat("- One year increase in age multiplies survival time by", 
    round(time_ratios["age"], 3), "\n")
- One year increase in age multiplies survival time by 0.996 
cat("- Females have", 
    round(time_ratios["sex_factorFemale"], 2), "times the survival time of males\n")
- Females have 1.45 times the survival time of males
cat("- One unit increase in ECOG performance status multiplies survival time by", round(time_ratios["ph.ecog"], 3), "\n")
- One unit increase in ECOG performance status multiplies survival time by 0.712 

Summary

When to Use Each Approach

Non-Parametric (Kaplan-Meier)

  • Initial data exploration and descriptive analysis
  • Small sample sizes or limited events
  • No covariate adjustment needed
  • Comparing two or few groups (log-rank test)

Semi-Parametric (Cox Regression)

  • Multiple covariates need adjustment
  • Hazard ratios are of primary interest
  • Proportional hazards assumption holds
  • Standard practice in medical research

Parametric Models

  • Extrapolation beyond observed data needed
  • Smooth hazard estimates required
  • Health economic evaluation applications
  • Distributional assumptions can be validated
  • Model-based predictions essential

Best Practices for Public Health Workers

  1. Start with Kaplan-Meier for data exploration
  2. Use Cox regression for standard adjusted analyses
  3. Consider parametric models when:
    • Planning long-term health services
    • Conducting economic evaluations
    • Modeling disease progression

Best Practices for Public Health Workers

  1. Always check assumptions using diagnostic plots
  2. Compare multiple models using AIC/BIC
  3. Interpret results clinically not just statistically
  4. Report confidence intervals alongside point estimates
  5. Consider practical significance beyond statistical significance

Closings

  • Each method has its place in survival analysis
  • Non-parametric methods are robust but limited
  • Semi-parametric methods balance flexibility and interpretability
  • Parametric methods enable prediction and extrapolation
  • Choose the method based on research question and data characteristics
  • Model assumptions must always be checked and validated

Thank You

Questions, Discussion and Hands-on

Resources:

  • R packages: survival, flexsurv, survminer
  • Practice datasets: lung (survival package)
  • Further reading: Collett (2015), Kleinbaum & Klein (2012)