impactPO {rms} | R Documentation |
Impact of Proportional Odds Assumpton
Description
Checks the impact of the proportional odds assumption by comparing predicted cell probabilities from a PO model with those from a multinomial or partial proportional odds logistic model that relax assumptions. For a given model formula, fits the model with both lrm
and either nnet::multinom
or VGAM::vglm
or both, and obtains predicted cell probabilities for the PO and relaxed models on the newdata
data frame. A print
method formats the output.
Usage
impactPO(
formula,
relax = if (missing(nonpo)) "multinomial" else "both",
nonpo,
newdata,
data = environment(formula),
minfreq = 15,
B = 0,
...
)
Arguments
formula |
a model formula. To work properly with |
relax |
defaults to |
nonpo |
a formula with no left hand side variable, specifying the variable or variables for which PO is not assumed. Specifying |
newdata |
a data frame or data table with one row per covariate setting for which predictions are to be made |
data |
data frame containing variables to fit; default is the frame in which |
minfreq |
minimum sample size to allow for the least frequent category of the dependent variable. If the observed minimum frequency is less than this, the |
B |
number of bootstrap resamples to do to get confidence intervals for differences in predicted probabilities for relaxed methods vs. PO model fits. Default is not to run the bootstrap. When running the bootstrap make sure that all model variables are explicitly in |
... |
other parameters to pass to |
Details
Since partial proportional odds models and especially multinomial logistic models can have many parameters, it is not feasible to use this model comparison approach when the number of levels of the dependent variable Y is large. By default, the function will use combine.levels()
to combine consecutive levels if the lowest frequency category of Y has fewer than minfreq
observations.
Value
an impactPO
object which is a list with elements estimates
, stats
, mad
, newdata
, nboot
, and boot
. estimates
is a data frame containing the variables and values in newdata
in a tall and thin format with additional variable method
("PO", "Multinomial", "PPO"), y
(current level of the dependent variable), and Probability
(predicted cell probability for covariate values and value of y
in the current row). stats
is a data frame containing Deviance
the model deviance, d.f.
the total number of parameters counting intercepts, AIC
, p
the number of regression coefficients, LR chi^2
the likelihood ratio chi-square statistic for testing the predictors, LR - p
a chance-corrected LR chi-square, LR chi^2 test for PO
the likelihood ratio chi-square test statistic for testing the PO assumption (by comparing -2 log likelihood for a relaxed model to that of a fully PO model), d.f.
the degrees of freedom for this test, Pr(>chi^2)
the P-value for this test, MCS R2
the Maddala-Cox-Snell R2 using the actual sample size, MCS R2 adj
(MCS R2
adjusted for estimating p
regression coefficients by subtracting p
from LR
), McFadden R2
, McFadden R2 adj
(an AIC-like adjustment proposed by McFadden without full justification), Mean |difference} from PO
the overall mean absolute difference between predicted probabilities over all categories of Y and over all covariate settings. mad
contains newdata
and separately by rows in newdata
the mean absolute difference (over Y categories) between estimated probabilities by the indicated relaxed model and those from the PO model. nboot
is the number of successful bootstrap repetitions, and boot
is a 4-way array with dimensions represented by the nboot
resamples, the number of rows in newdata
, the number of outcome levels, and elements for PPO
and multinomial
. For the modifications of the Maddala-Cox-Snell indexes see Hmisc::R2Measures
.
Author(s)
Frank Harrell fh@fharrell.com
References
See Also
nnet::multinom()
, VGAM::vglm()
, lrm()
, Hmisc::propsPO()
, Hmisc::R2Measures()
, Hmisc::combine.levels()
Examples
## Not run:
set.seed(1)
age <- rnorm(500, 50, 10)
sex <- sample(c('female', 'male'), 500, TRUE)
y <- sample(0:4, 500, TRUE)
d <- expand.grid(age=50, sex=c('female', 'male'))
w <- impactPO(y ~ age + sex, nonpo = ~ sex, newdata=d)
w
# Note that PO model is a better model than multinomial (lower AIC)
# since multinomial model's improvement in fit is low in comparison
# with number of additional parameters estimated. Same for PO model
# in comparison with partial PO model.
# Reverse levels of y so stacked bars have higher y located higher
revo <- function(z) {
z <- as.factor(z)
factor(z, levels=rev(levels(as.factor(z))))
}
require(ggplot2)
ggplot(w$estimates, aes(x=method, y=Probability, fill=revo(y))) +
facet_wrap(~ sex) + geom_col() +
xlab('') + guides(fill=guide_legend(title=''))
# Now vary 2 predictors
d <- expand.grid(sex=c('female', 'male'), age=c(40, 60))
w <- impactPO(y ~ age + sex, nonpo = ~ sex, newdata=d)
w
ggplot(w$estimates, aes(x=method, y=Probability, fill=revo(y))) +
facet_grid(age ~ sex) + geom_col() +
xlab('') + guides(fill=guide_legend(title=''))
## End(Not run)