R: Selection and/or Smoothing and Principal Components Analysis...

ordPens-package {ordPens}

R Documentation

Selection and/or Smoothing and Principal Components Analysis for Ordinal Variables

Description

Selection, and/or smoothing/fusing of ordinally scaled independent variables using a group lasso or generalized ridge penalty. Nonlinear principal components analysis for ordinal variables using a second-order difference penalty.

Details

Package:	ordPens
Type:	Package
Version:	1.1.0
Date:	2023-07-10
Depends:	grplasso, mgcv, RLRsim, quadprog, glmpath
Imports:	ordinalNet
Suggests:	psy
License:	GPL-2
LazyLoad:	yes

Smoothing and selection of ordinal predictors is done by the function ordSelect; smoothing only, by ordSmooth; fusion and selection of ordinal predictors by ordFusion. For ANOVA with ordinal factors, use ordAOV. Nonlinear PCA, performance evaluation and selection of an optimal penalty parameter can be done using ordPCA.

Author(s)

Authors: Jan Gertheiss jan.gertheiss@hsu-hh.de, Aisouda Hoshiyar aisouda.hoshiyar@hsu-hh.de.

Contributors: Fabian Scheipl

Maintainer: Aisouda Hoshiyar aisouda.hoshiyar@hsu-hh.de

References

Gertheiss, J. (2014). ANOVA for factors with ordered levels, Journal of Agricultural, Biological and Environmental Statistics, 19, 258-277.

Gertheiss, J., S. Hogger, C. Oberhauser and G. Tutz (2011). Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. Journal of the Royal Statistical Society C (Applied Statistics), 60, 377-395.

Gertheiss, J. and F. Oehrlein (2011). Testing relevance and linearity of ordinal predictors, Electronic Journal of Statistics, 5, 1935-1959.

Gertheiss, J., F. Scheipl, T. Lauer, and H. Ehrhardt (2022). Statistical inference for ordinal predictors in generalized linear and additive models with application to bronchopulmonary dysplasia. BMC research notes, 15, 112.

Gertheiss, J. and G. Tutz (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345-365.

Gertheiss, J. and G. Tutz (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150-2180.

Hoshiyar, A., H.A.L. Kiers, and J. Gertheiss (2021). Penalized non-linear principal components analysis for ordinal variables with an application to international classification of functioning core sets, British Journal of Mathematical and Statistical Psychology, 76, 353-371.

Hoshiyar, A., Gertheiss, L.H., and Gertheiss, J. (2023). Regularization and Model Selection for Item-on-Items Regression with Applications to Food Products' Survey Data. Preprint, available from https://arxiv.org/abs/2309.16373.

Tutz, G. and J. Gertheiss (2014). Rating scales as predictors – the old question of scale level and some answers. Psychometrica, 79, 357-376.

Tutz, G. and J. Gertheiss (2016). Regularized regression for categorical data. Statistical Modelling, 16, 161-200.

Examples

## Not run: 
### smooth modeling of a simulated dataset
set.seed(123)

# generate (ordinal) predictors
x1 <- sample(1:8,100,replace=TRUE)
x2 <- sample(1:6,100,replace=TRUE)
x3 <- sample(1:7,100,replace=TRUE)

# the response
y <- -1 + log(x1) + sin(3*(x2-1)/pi) + rnorm(100)

# x matrix
x <- cbind(x1,x2,x3)

# lambda values
lambda <- c(1000,500,200,100,50,30,20,10,1)

# smooth modeling
o1 <- ordSmooth(x = x, y = y, lambda = lambda)

# results
round(o1$coef,digits=3)
plot(o1)

# If for a certain plot the x-axis should be annotated in a different way,
# this can (for example) be done as follows:
plot(o1, whx = 1, xlim = c(0,9), xaxt = "n")
axis(side = 1, at = c(1,8), labels = c("no agreement","total agreement"))


### nonlinear PCA on chronic widespread pain data 
# load example data 
data(ICFCoreSetCWP)

# adequate coding to get levels 1,..., max 
H <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1, 50), rep(5, 16), 1),
                                    nrow(ICFCoreSetCWP), 67, 
                                    byrow = TRUE)

# nonlinear PCA
ordPCA(H, p = 2, lambda = 0.5, maxit = 1000,
       Ks = c(rep(5, 50), rep(9, 16), 5), 
       constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE))


# k-fold cross-validation
set.seed(1234)
lambda <- 10^seq(4,-4, by = -0.1) 
cvResult1 <- ordPCA(H, p = 2, lambda = lambda, maxit = 100,
       Ks = c(rep(5, 50), rep(9, 16), 5), 
       constr = c(rep(TRUE, 50), rep(FALSE, 16), TRUE),
       CV = TRUE, k = 5)
            
# optimal lambda                    
lambda[which.max(apply(cvResult1$VAFtest,2,mean))]                      

## End(Not run)