ordCV {ordPens} | R Documentation |
Cross-validation for penalized regression with ordinal predictors.
Description
Performs k-fold cross-validation in order to evaluate the performance and/or select an optimal smoothing parameter of a penalized regression model with ordinal predictors.
Usage
ordCV(x, y, u = NULL, z = NULL, k=5, lambda, offset = rep(0,length(y)),
model = c("linear", "logit", "poisson", "cumulative"),
type=c("selection", "fusion"), ...)
Arguments
x |
matrix of integers 1,2,... giving the observed levels of the ordinal factor(s). |
y |
the vector of response values. |
u |
a matrix (or |
z |
a matrix (or |
k |
number of folds. |
lambda |
vector of penalty parameters (in decreasing order). |
offset |
vector of offset values. |
model |
the model which is to be fitted. Possible choices are "linear" (default), "logit", "poisson" or "cumulative". See details below. |
type |
penalty to be applied. If "selection", group lasso penalty for smoothing and selection is used. If "fusion", a fused lasso penalty for fusion and selection is used. |
... |
additional arguments to |
Details
The method assumes that categorical covariates (contained in x
and
u
) take values 1,2,...,max, where max denotes the (columnwise) highest
level observed in the data. If any level between 1 and max is not observed for an ordinal predictor,
a corresponding (dummy) coefficient is fitted anyway. If any level > max is
not observed but possible in principle, and a corresponding coefficient is to
be fitted, the easiest way is to add a corresponding row to x
(and
u
,z
) with corresponding y
value being NA
.
If a linear regression model is fitted, response vector y
may contain
any numeric values; if a logit model is fitted, y
has to be 0/1 coded;
if a poisson model is fitted, y
has to contain count data. If a cumulative logit model is fitted, y
takes values 1,2,...,max.
For the cumulative model, the measure of performance used by the function is the brier score, being the sum of squared differences between (indicator) outcome and predicted probabilities P(Y_i=r)=P(y_{ir})=\pi_{ir}
, with observations i=1,...,n
and classes r=1,...,c
. Otherwise, the deviance is used.
Value
Returns a list containing the following components:
Train |
matrix of size ( |
Test |
Brier/deviance score matrix when looking at the test data set. |
Author(s)
Aisouda Hoshiyar
References
Hoshiyar, A., Gertheiss, L.H., and Gertheiss, J. (2023). Regularization and Model Selection for Item-on-Items Regression with Applications to Food Products' Survey Data. Preprint, available from https://arxiv.org/abs/2309.16373.