ordFusion {ordPens} | R Documentation |
Fusion and selection of dummy coefficients of ordinal predictors
Description
Fits dummy coefficients of ordinally scaled independent variables
with a fused lasso penalty on differences of adjacent dummy coefficients. Using the ordinalNet
algorithm if cumulative logit model is fitted, otherwise glmpath
algorithm is used.
Usage
ordFusion(x, y, u = NULL, z = NULL, offset = rep(0,length(y)), lambda,
model = c("linear", "logit", "poisson", "cumulative"),
restriction = c("refcat", "effect"), scalex = TRUE, nonpenx = NULL,
frac.arclength = NULL, ...)
Arguments
x |
the matrix of ordinal predictors, with each column corresponding to one predictor and containing numeric values from {1,2,...}; for each covariate, category 1 is taken as reference category with zero dummy coefficient. |
y |
the response vector. |
u |
a matrix (or |
z |
a matrix (or |
offset |
vector of offset values. |
lambda |
vector of penalty parameters, i.e., lambda values. |
model |
the model which is to be fitted. Possible choices are "linear" (default), "logit", "poisson" or "cumulative". See details below. |
restriction |
identifiability restriction for dummy coding. "reference" takes category 1 is as reference category (default), while with "effect" dummy coefficients sum up to 0 (known as effect coding). |
scalex |
logical. Should (split-coded) design matrix corresponding to
|
nonpenx |
vectors of indices indicating columns of
|
frac.arclength |
just in case the corresponding |
... |
additional arguments to |
Details
The method assumes that categorical covariates (contained in x
and
u
) take values 1,2,...,max, where max denotes the (columnwise) highest
level observed in the data. If any level between 1 and max is not observed for an ordinal predictor,
a corresponding (dummy) coefficient is fitted anyway (by linear interpolation, due to some additional but small quadratic penalty, see glmpath
for details). If any level > max is
not observed but possible in principle, and a corresponding coefficient is to
be fitted, the easiest way is to add a corresponding row to x
(and
u
,z
) with corresponding y
value being NA
.
If a linear regression model is fitted, response vector y
may contain
any numeric values; if a logit model is fitted, y
has to be 0/1 coded;
if a poisson model is fitted, y
has to contain count data. If a cumulative logit model is fitted, y
takes values 1,2,...,max.
If scalex
is TRUE
, (split-coded) design matrix constructed from x
is scaled to have
unit variance over columns (see standardize
argument of glmpath
or/and ordinalNet
).
Value
An ordPen
object, which is a list containing:
fitted |
the matrix of fitted response values of the training data.
Columns correspond to different |
coefficients |
the matrix of fitted coefficients with respect to dummy-coded (ordinal or nominal) categorical input variables (including the reference category) as well as metric predictors. Columns correspond to different lambda values. |
model |
the type of the fitted model: "linear", "logit", "poisson", or "cumulative". |
restriction |
the type of restriction used for identifiability. |
lambda |
the used lambda values. |
xlevels |
a vector giving the number of levels of the ordinal predictors. |
ulevels |
a vector giving the number of levels of the nominal predictors (if any). |
zcovars |
the number of metric covariates (if any). |
Author(s)
Jan Gertheiss, Aisouda Hoshiyar
References
Gertheiss, J. and G. Tutz (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150-2180.
Hoshiyar, A., Gertheiss, L.H., and Gertheiss, J. (2023). Regularization and Model Selection for Item-on-Items Regression with Applications to Food Products' Survey Data. Preprint, available from https://arxiv.org/abs/2309.16373.
Park, M.Y. and T. Hastie (2007). L1 regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society B, 69, 659-677.
Tutz, G. and J. Gertheiss (2014). Rating scales as predictors – the old question of scale level and some answers. Psychometrica, 79, 357-376.
Tutz, G. and J. Gertheiss (2016). Regularized regression for categorical data. Statistical Modelling, 16, 161-200.
See Also
plot.ordPen
, predict.ordPen
,
ICFCoreSetCWP
Examples
# fusion and selection of ordinal covariates on a simulated dataset
set.seed(123)
# generate (ordinal) predictors
x1 <- sample(1:8,100,replace=TRUE)
x2 <- sample(1:6,100,replace=TRUE)
x3 <- sample(1:7,100,replace=TRUE)
# the response
y <- -1 + log(x1) + sin(3*(x2-1)/pi) + rnorm(100)
# x matrix
x <- cbind(x1,x2,x3)
# lambda values
lambda <- c(80,70,60,50,40,30,20,10,5,1)
# fusion and selection
ofu <- ordFusion(x = x, y = y, lambda = lambda)
# results
round(ofu$coef,digits=3)
plot(ofu)
# If for a certain plot the x-axis should be annotated in a different way,
# this can (for example) be done as follows:
plot(ofu, whx = 1, xlim = c(0,9), xaxt = "n")
axis(side = 1, at = c(1,8), labels = c("no agreement","total agreement"))