serp {serp} | R Documentation |
Smooth Effects on Response Penalty for CLM
Description
Fits cumulative link models (CLMs) with the
smooth-effect-on-response penalty (SERP) via a modified Newton-Raphson
algorithm. SERP enables the regularization of the parameter space between
the general and the restricted cumulative models, with a resultant shrinkage
of all subject-specific effects to global effects. The Akaike information
critrion (aic
), K-fold cross validation (cv
), among other tuning
aproaches, provide the means of arriving at an optimal tuning parameter in a
in a situation where a user-supplied tuning value is not available.
The slope
argument allows for the selection of a penalized, unparallel,
parallel, or partial slope.
Usage
serp(
formula,
link = c("logit", "probit","loglog", "cloglog", "cauchit"),
slope = c("penalize", "parallel", "unparallel", "partial"),
tuneMethod = c("aic", "cv", "finite", "user"),
reverse = FALSE,
lambdaGrid = NULL,
cvMetric = c("brier", "logloss", "misclass"),
gridType = c("discrete", "fine"),
globalEff = NULL,
data,
subset,
weights = NULL,
weight.type = c("analytic", "frequency"),
na.action = NULL,
lambda = NULL,
contrasts = NULL,
control = list(),
...)
Arguments
formula |
regression formula of the form: response ~ predictors. The response should be a factor (ordered). |
link |
sets the link function for the cumulative link model including: logit, probit, complementary log-log, cloglog, cauchit. |
slope |
selects the form of coefficients used in the model, with
|
tuneMethod |
sets the method of choosing an optimal shrinkage
parameter, including: |
reverse |
false by default, when true the sign of the linear predictor is reversed. |
lambdaGrid |
optional user-supplied lambda grid for the |
cvMetric |
sets the performance metric for the cv tuning, with the brier score used by default. |
gridType |
chooses if a discrete or a continuous lambda grid should be
used to select the optimal tuning parameter. The former is used by default
and could be adjusted as desired in |
globalEff |
specifies variable(s) to be assigned global effects during
penalization or when |
data |
optional dataframe explaining the variables used in the formula. |
subset |
specifies which subset of the rows of the data should be used for fit. All observations are used by default. |
weights |
optional case weights in fitting. Negative weights are not allowed. Defaults to 1. |
weight.type |
chooses between analytic and frequency weights with the former used by default. The latter should be used when weights are mere case counts used to compress the data set. |
na.action |
a function to filter missing data. |
lambda |
a user-supplied single numeric value for the tuning parameter
when using the |
contrasts |
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. |
control |
A list of fit control parameters to replace default values
returned by |
... |
additional arguments. |
Details
The serp
function fits the cumulative link model (CLM)
with smooth-effect-on-response penalty (SERP). The cumulative
model developed by McCullagh (1980) is probably most frequently
used ordinal model. When motivated by an underlying latent
variable, a simple form of the model is expressed as follows:
P(Y\leq r|x) = F(\delta_{0r} + x^T\delta)
where x
is a vector of covariates, \delta
a vector
of regression parameters and F
a continuous distribution
function. This model assumes that the effect of x
does not
depend on the category. However, with this assumption relaxed,
one obtains the following general cumulative model:
P(Y\leq r|x) = F(\delta_{0r} + x^T\delta_{r}),
where r=1,...,k-1. This model, however, has the stochastic ordering
property, which implies that P(Y\leq r-1|x) < P(Y\leq r|x)
holds for all x
and all categories r
. Such assumption
is often problematic, resulting in unstable likelihoods with
ill-conditioned parameter space during the iterative procedure.
SERP offers a means of arriving at stable estimates of the general model. It provides a form of regularization that is based on minimizing the penalized log-likelihood:
l_{p}(\delta)=l(\delta)-J_{\lambda}(\delta)
where l(\delta)
, is the log-likelihood of the general cumulative
model and J_{\lambda}(\delta)=\lambda J(\delta)
the penalty
function weighted by the turning parameter \lambda
. Assuming an
ordered categorical outcome Y \in \{1,\dots,k\}
, and considering
that the corresponding parameters \delta_{1j},\dots \delta_{k-1,j}
vary smoothly over the categories, the following penalty
(Tutz and Gertheiss, 2016),
J_{\lambda}(\delta)= \sum_{j=1}^{p} \sum_{r=1}^{k-2}
(\delta_{r+1,j}-\delta_{rj})^{2}
enables the smoothing of response categories such that all category-specific effects associated with the response turn towards a common global effect. SERP could also be applied to a semi-parallel model with only the category-specific part of the model penalized. See, Ugba (2021), Ugba et al. (2021) for further details and application in empirical studies.
An object of class serp
with the components listed below,
depending on the type of slope modeled. Other summary methods include:
summary
, coef
, predict
, vcov
,
anova
, etc.
Value
aic |
the akaike information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter. |
bic |
the bayesian information criterion, with effective degrees of freedom obtained from the trace of the generalized hat matrix depending on the tuning parameter. |
call |
the matched call. |
coef |
a vector of coefficients of the fitted model. |
converged |
a character vector of fit convergence status. |
contrasts |
(where relevant) the contrasts used in the model. |
control |
list of control parameters from |
cvMetric |
the performance metric used for cv tuning. |
deviance |
the residual deviance. |
edf |
the (effective) number of degrees of freedom used by the model |
fitted.values |
the fitted probabilities. |
globalEff |
variable(s) in model treated as global effect(s) |
gradient |
a column vector of gradients for the coefficients at the model convergence. |
Hessian |
the hessian matrix for the coefficients at the model convergence. |
iter |
number of interactions before convergence or non-convergence. |
lambda |
a user-supplied single numeric value for the |
lambdaGrid |
a numeric vector of lambda values used to determine the optimum tuning parameter. |
logLik |
the realized log-likelihood at the model convergence. |
link |
character vector indicating the link function of the fit. |
message |
character vector stating the type of convergence obtained |
misc |
a list to hold miscellaneous fit information. |
model |
model.frame having variables from formula. |
na.action |
(where relevant) information on the treatment of NAs. |
nobs |
the number of observations. |
nrFold |
the number of k-fold cross validation for the cv tuning method. Default to k = 5. |
rdf |
the residual degrees of freedom |
reverse |
a logical vector indicating the the direction of the cumulative probabilities. Default to P(Y<=r). |
slope |
a character vector indicating the type of slope parameters
fitted. Default to |
Terms |
the terms structure describing the model. |
testError |
numeric value of the cross-validated test error at which the optimal tuning parameter emerged. |
tuneMethod |
a character vector specifying the method for choosing an optimal shrinkage parameter. |
value |
numeric value of AIC or logLik obtained at the optimal tuning
parameter when using |
ylev |
the number of the response levels. |
References
Ugba, E. R. (2021). serp: An R package for smoothing in ordinal regression Journal of Open Source Software, 6(66), 3705. https://doi.org/10.21105/joss.03705
Ugba, E. R., Mörlein, D. and Gertheiss, J. (2021). Smoothing in Ordinal Regression: An Application to Sensory Data. Stats, 4, 616–633. https://doi.org/10.3390/stats4030037
Tutz, G. and Gertheiss, J. (2016). Regularized Regression for Categorical Data (With Discussion and Rejoinder). Statistical Modelling, 16, pp. 161-260. https://doi.org/10.1177/1471082X16642560
McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological), 42, pp. 109-142. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x
See Also
anova.serp
, summary.serp
,
predict.serp
, confint.serp
,
vcov.serp
Examples
require(serp)
## The unpenalized non-proportional odds model returns unbounded estimates, hence,
## not fully identifiable.
f1 <- serp(rating ~ temp + contact, slope = "unparallel",
reverse = TRUE, link = "logit", data = wine)
coef(f1)
## The penalized non-proportional odds model with a user-supplied lambda gives
## a fully identified model with bounded estimates. A suitable tuning criterion
## could as well be used to select lambda (e.g., aic, cv)
f2 <- serp(rating ~ temp + contact, slope = "penalize",
link = "logit", reverse = TRUE, tuneMethod = "user",
lambda = 1e1, data = wine)
coef(f2)
## A penalized partial proportional odds model with some variables set to
## global effect is also possible.
f3 <- serp(rating ~ temp + contact, slope = "penalize",
reverse = TRUE, link = "logit", tuneMethod = "user",
lambda = 2e1, globalEff = ~ temp, data = wine)
coef(f3)
## The unpenalized proportional odds model having constrained estimates can
## as well be fit. Under extreme shrinkage, estimates in f2 equal those in
## this model.
f4 <- serp(rating ~ temp + contact, slope = "parallel",
reverse = FALSE, link = "logit", data = wine)
summary(f4)