glm_grouped {parsnip} | R Documentation |
Fit a grouped binomial outcome from a data set with case weights
Description
stats::glm()
assumes that a tabular data set with case weights corresponds
to "different observations have different dispersions" (see ?glm
).
In some cases, the case weights reflect that the same covariate pattern was
observed multiple times (i.e., frequency weights). In this case,
stats::glm()
expects the data to be formatted as the number of events for
each factor level so that the outcome can be given to the formula as
cbind(events_1, events_2)
.
glm_grouped()
converts data with integer case weights to the expected
"number of events" format for binomial data.
Usage
glm_grouped(formula, data, weights, ...)
Arguments
formula |
A formula object with one outcome that is a two-level factors. |
data |
A data frame with the outcomes and predictors (but not case weights). |
weights |
An integer vector of weights whose length is the same as the
number of rows in |
... |
Options to pass to |
Value
A object produced by stats::glm()
.
Examples
#----------------------------------------------------------------------------
# The same data set formatted three ways
# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
nrow(ucb_weighted)
# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
nrow(ucb_long)
# Format where the outcome is formatted as number of events
ucb_events <-
ucb_weighted %>%
tidyr::pivot_wider(
id_cols = c(Gender, Dept),
names_from = Admit,
values_from = Freq,
values_fill = 0L
)
head(ucb_events)
nrow(ucb_events)
#----------------------------------------------------------------------------
# Different model fits
# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
# Weights produce the same statistics
glm(
Admit ~ Gender + Dept,
data = ucb_weighted,
family = binomial,
weights = ucb_weighted$Freq
)
# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
cbind(Rejected, Admitted) ~ Gender + Dept,
data = ucb_events,
family = binomial
)
# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)