2025-09-19
Load the necessary packages for our analysis:
For count outcome: \[P(Y = k) = \frac{e^{-\lambda}\lambda^k}{k!}\]
where λ is the rate parameter
\[ln(count) = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ\]
\[ln(\frac{count}{denominator}) = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ\]
Rows: 120
Columns: 4
$ gender <chr> "female", "male", "male", "female", "male", "male", "female", …
$ res_inf <chr> "yes", "no", "yes", "yes", "yes", "yes", "yes", "yes", "yes", …
$ ghq12 <dbl> 21, 17, 30, 22, 27, 33, 24, 23, 25, 28, 14, 20, 24, 5, 32, 17,…
$ attack <dbl> 6, 4, 8, 5, 2, 3, 2, 1, 2, 2, 4, 4, 3, 0, 2, 2, 4, 1, 5, 0, 2,…
Call:
glm(formula = attack ~ gender, family = "poisson", data = asthma)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.02105 0.07332 13.925 <2e-16 ***
gendermale -0.30000 0.12063 -2.487 0.0129 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 229.56 on 119 degrees of freedom
Residual deviance: 223.23 on 118 degrees of freedom
AIC: 500.3
Number of Fisher Scoring iterations: 5
Call:
glm(formula = attack ~ res_inf, family = "poisson", data = asthma)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2877 0.1213 2.372 0.0177 *
res_infyes 0.9032 0.1382 6.533 6.44e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 229.56 on 119 degrees of freedom
Residual deviance: 180.49 on 118 degrees of freedom
AIC: 457.56
Number of Fisher Scoring iterations: 5
Call:
glm(formula = attack ~ ghq12, family = "poisson", data = asthma)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.230923 0.159128 -1.451 0.147
ghq12 0.059500 0.006919 8.599 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 229.56 on 119 degrees of freedom
Residual deviance: 145.13 on 118 degrees of freedom
AIC: 422.2
Number of Fisher Scoring iterations: 5
Call:
glm(formula = attack ~ res_inf + ghq12, family = "poisson", data = asthma)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.34051 0.16823 -2.024 0.04296 *
res_infyes 0.42816 0.15282 2.802 0.00508 **
ghq12 0.04989 0.00779 6.404 1.51e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 229.56 on 119 degrees of freedom
Residual deviance: 136.80 on 117 degrees of freedom
AIC: 415.86
Number of Fisher Scoring iterations: 5
pois_interaction <- glm(attack ~ res_inf * ghq12, data = asthma, family = "poisson")
summary(pois_interaction)
Call:
glm(formula = attack ~ res_inf * ghq12, family = "poisson", data = asthma)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.63436 0.23408 -2.710 0.00673 **
res_infyes 1.01927 0.32822 3.105 0.00190 **
ghq12 0.06834 0.01186 5.763 8.29e-09 ***
res_infyes:ghq12 -0.03135 0.01531 -2.048 0.04056 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 229.56 on 119 degrees of freedom
Residual deviance: 132.67 on 116 degrees of freedom
AIC: 413.74
Number of Fisher Scoring iterations: 5
# A tibble: 6 × 9
attack res_inf ghq12 .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 6 yes 21 3.11 1.45 0.0147 1.08 0.0136 1.46
2 4 no 17 1.66 1.53 0.0248 1.08 0.0287 1.55
3 8 yes 30 4.88 1.29 0.0346 1.08 0.0248 1.32
4 5 yes 22 3.27 0.886 0.0148 1.08 0.00463 0.893
5 2 yes 27 4.20 -1.20 0.0219 1.08 0.00879 -1.21
6 3 yes 33 5.66 -1.23 0.0571 1.08 0.0268 -1.27
# Present results with IRRs
pois_multi |>
tbl_regression() |>
as_gt() |>
gt::tab_source_note(gt::md("*constant is -0.34*"))| Characteristic | log(IRR) | 95% CI | p-value |
|---|---|---|---|
| res_inf | |||
| no | — | — | |
| yes | 0.43 | 0.13, 0.73 | 0.005 |
| ghq12 | 0.05 | 0.03, 0.07 | <0.001 |
| Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio | |||
| constant is -0.34 | |||
Respiratory Infection: Those with recurrent respiratory infection have a rate ratio of 1.53 for asthmatic attacks (95% CI: 1.14, 2.08)
GHQ12 Score: Each one-point increase in GHQ12 score increases the rate of attacks by 5% (RR = 1.05, 95% CI: 1.04, 1.07)
# Load the smoking dataset
smoke <- read_csv(here("data", "smoke.csv"))
# Explore data structure
glimpse(smoke)Rows: 63
Columns: 4
$ smoke_yrs <chr> "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-4…
$ cigar_day <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.2, 5.2, 5.2,…
$ person_yrs <dbl> 10366, 8162, 5969, 4496, 3512, 2201, 1421, 1121, 826, 3121,…
$ case <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 2, 0, 3, 0, 0, 1,…
# A tibble: 10 × 5
smoke_yrs cigar_day person_yrs case rate
<chr> <dbl> <dbl> <dbl> <dbl>
1 15-19 0 10366 1 0.0001
2 20-24 0 8162 0 0
3 25-29 0 5969 0 0
4 30-34 0 4496 0 0
5 35-39 0 3512 0 0
6 40-44 0 2201 0 0
7 45-49 0 1421 0 0
8 50-54 0 1121 0 0
9 55-59 0 826 2 0.0024
10 15-19 5.2 3121 0 0
pois_cigar <- glm(case ~ cigar_day, data = smoke, family = "poisson",
offset = log(person_yrs))
summary(pois_cigar)
Call:
glm(formula = case ~ cigar_day, family = "poisson", data = smoke,
offset = log(person_yrs))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.159268 0.175063 -46.61 <2e-16 ***
cigar_day 0.070359 0.006468 10.88 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 445.10 on 62 degrees of freedom
Residual deviance: 324.71 on 61 degrees of freedom
AIC: 448.55
Number of Fisher Scoring iterations: 6
pois_smoke_yrs <- glm(case ~ smoke_yrs, data = smoke, family = "poisson",
offset = log(person_yrs))
summary(pois_smoke_yrs)
Call:
glm(formula = case ~ smoke_yrs, family = "poisson", data = smoke,
offset = log(person_yrs))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.335 1.000 -10.335 < 2e-16 ***
smoke_yrs20-24 1.117 1.155 0.968 0.333149
smoke_yrs25-29 1.990 1.080 1.842 0.065423 .
smoke_yrs30-34 3.563 1.020 3.493 0.000477 ***
smoke_yrs35-39 3.615 1.024 3.532 0.000412 ***
smoke_yrs40-44 4.580 1.013 4.521 6.15e-06 ***
smoke_yrs45-49 4.785 1.016 4.707 2.52e-06 ***
smoke_yrs50-54 5.085 1.020 4.987 6.14e-07 ***
smoke_yrs55-59 5.384 1.024 5.261 1.44e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 445.10 on 62 degrees of freedom
Residual deviance: 170.17 on 54 degrees of freedom
AIC: 308.01
Number of Fisher Scoring iterations: 5
pois_rate_multi <- glm(case ~ cigar_day + smoke_yrs, data = smoke,
family = "poisson", offset = log(person_yrs))
summary(pois_rate_multi)
Call:
glm(formula = case ~ cigar_day + smoke_yrs, family = "poisson",
data = smoke, offset = log(person_yrs))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.317162 1.007492 -11.233 < 2e-16 ***
cigar_day 0.064361 0.006401 10.054 < 2e-16 ***
smoke_yrs20-24 0.962417 1.154694 0.833 0.40457
smoke_yrs25-29 1.710894 1.080420 1.584 0.11330
smoke_yrs30-34 3.207676 1.020378 3.144 0.00167 **
smoke_yrs35-39 3.242776 1.024187 3.166 0.00154 **
smoke_yrs40-44 4.208361 1.013726 4.151 3.30e-05 ***
smoke_yrs45-49 4.448972 1.017054 4.374 1.22e-05 ***
smoke_yrs50-54 4.893674 1.019945 4.798 1.60e-06 ***
smoke_yrs55-59 5.372600 1.023404 5.250 1.52e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 445.099 on 62 degrees of freedom
Residual deviance: 68.015 on 53 degrees of freedom
AIC: 207.86
Number of Fisher Scoring iterations: 5
pois_rate_interaction <- glm(case ~ cigar_day * smoke_yrs, data = smoke,
family = "poisson", offset = log(person_yrs))
summary(pois_rate_interaction)
Call:
glm(formula = case ~ cigar_day * smoke_yrs, family = "poisson",
data = smoke, offset = log(person_yrs))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.2463 1.0000 -9.246 < 2e-16 ***
cigar_day -2.8137 319.6351 -0.009 0.992976
smoke_yrs20-24 -0.7514 1.5114 -0.497 0.619084
smoke_yrs25-29 -0.2539 1.3518 -0.188 0.851030
smoke_yrs30-34 1.2493 1.1032 1.132 0.257489
smoke_yrs35-39 0.5856 1.1660 0.502 0.615498
smoke_yrs40-44 1.9706 1.0790 1.826 0.067800 .
smoke_yrs45-49 2.1070 1.0982 1.919 0.055044 .
smoke_yrs50-54 3.0710 1.0761 2.854 0.004318 **
smoke_yrs55-59 3.5704 1.0706 3.335 0.000853 ***
cigar_day:smoke_yrs20-24 2.8609 319.6351 0.009 0.992859
cigar_day:smoke_yrs25-29 2.8736 319.6351 0.009 0.992827
cigar_day:smoke_yrs30-34 2.8736 319.6351 0.009 0.992827
cigar_day:smoke_yrs35-39 2.8997 319.6351 0.009 0.992762
cigar_day:smoke_yrs40-44 2.8845 319.6351 0.009 0.992800
cigar_day:smoke_yrs45-49 2.8886 319.6351 0.009 0.992790
cigar_day:smoke_yrs50-54 2.8669 319.6351 0.009 0.992844
cigar_day:smoke_yrs55-59 2.8641 319.6351 0.009 0.992851
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 445.099 on 62 degrees of freedom
Residual deviance: 60.597 on 45 degrees of freedom
AIC: 216.44
Number of Fisher Scoring iterations: 18
pois_rate_augmented <- augment(pois_rate_multi, type.predict = "response")
head(pois_rate_augmented)# A tibble: 6 × 10
case cigar_day smoke_yrs `(offset)` .fitted .resid .hat .sigma .cooksd
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 15-19 9.25 0.126 1.55 0.128 1.12 0.102
2 0 0 20-24 9.01 0.260 -0.721 0.0918 1.14 0.00289
3 0 0 25-29 8.69 0.402 -0.896 0.0766 1.14 0.00361
4 0 0 30-34 8.41 1.35 -1.64 0.0897 1.12 0.0146
5 0 0 35-39 8.16 1.09 -1.48 0.0816 1.12 0.0106
6 0 0 40-44 7.70 1.80 -1.90 0.0959 1.11 0.0211
# ℹ 1 more variable: .std.resid <dbl>
| Characteristic | IRR | 95% CI | p-value |
|---|---|---|---|
| cigar_day | 1.07 | 1.05, 1.08 | <0.001 |
| smoke_yrs | |||
| 15-19 | — | — | |
| 20-24 | 2.62 | 0.34, 52.9 | 0.4 |
| 25-29 | 5.53 | 0.94, 105 | 0.11 |
| 30-34 | 24.7 | 5.23, 442 | 0.002 |
| 35-39 | 25.6 | 5.35, 459 | 0.002 |
| 40-44 | 67.2 | 14.6, 1,195 | <0.001 |
| 45-49 | 85.5 | 18.3, 1,525 | <0.001 |
| 50-54 | 133 | 28.3, 2,385 | <0.001 |
| 55-59 | 215 | 45.1, 3,862 | <0.001 |
| Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio | |||
Resources:
Poisson Regression Workshop