R: Penalized Orthogonal-Components Regression (POCRE) with...

sipocre {POCRE}

R Documentation

Penalized Orthogonal-Components Regression (POCRE) with Significance Inference

Description

Applying POCRE to select variables and evaluate the significance of selected variables using the multiple splitting method by Meinshausen et al. (2009). The tuning parameter may be selected based on either an information criterion or k-fold cross-validation. The tuning parameter can also be fixed at a prespecified value.

Usage

sipocre(y, x, n.splits=10, delta=0.1, crit=c('ic','cv','fixed'),
        ptype=c('ebtz','ebt','l1','scad','mcp'), maxvar=dim(x)[1]/2,
        msc=NA, maxit=100, maxcmp=50, gamma=3.7, tol=1e-6,
        n.folds=10, lambda=1, n.train=round(nrow(x)/2))

Arguments

`y`	n*q matrix, values of q response variables (allow for multiple response variables).
`x`	n*p matrix, values of p predicting variables (excluding the intercept).
`n.splits`	number of random splits (=10 by default).
`delta`	step size to increase or decrase from current tuning parameter.
`crit`	character indicating the criterion to choose the tuning parameter: `'ic'` (information criteria such as AIC, AICc, BIC, EBIC), `'cv'` (k-folds cross-valdiation) or `'fixed'` (a pre-specified value).
`ptype`	a character to indicate the type of penalty: `'ebtz'` (emprical Bayes thresholding after Fisher's z-transformation, by default), `'ebt'` (emprical Bayes thresholding by Johnstone & Silverman (2004)), `'l1'` (L_1 penalty), `'scad'` (SCAD by Fan & Li (2001)), `'mcp'` (MCP by Zhang (2010)).
`maxvar`	maximum number of selected variables.
`msc`	value(s) to indicate the penalty related to the information criterion: 0~1 for (E)BIC, 2 for AIC, 3 for AICc, used when `crit='ic'`.
`maxit`	maximum number of iterations to be allowed.
`maxcmp`	maximum number of components to be constructed.
`gamma`	a parameter used by SCAD and MCP (=3.7 by default).
`tol`	tolerance of precision in iterations.
`n.folds`	number of folds in k-folds cross-validation, used when `crit='cv'`.
`lambda`	pre-sepcified value for the tuning parameter, used when `crit='fixed'`.
`n.train`	sample size of the training data set.

Value

a list consisting of the following components,

`cpv`	component-based p-values which are calculated by testing the constructed components, either a matrix (when `crit='ic'`, in this case each column corresponds to one value in msc) or a vector (when `crit='cv'` or `crit='fixed'`).
`xpv`	traditional p-values, either a matrix (when `crit='ic'`, in this case each column corresponds to one value in msc) or a vector (when `crit='cv'` or `crit='fixed'`).

Author(s)

Dabao Zhang, Zhongli Jiang, Zeyu Zhang, Department of Statistics, Purdue University

References

Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348-1360

Johnstone IM and Silverman BW (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Annals of Statistics, 32: 1594-1649.

Meinshausen N, Meier L, and Buhlmann P (2009) p-Values for High-Dimensional Regression. Journal of the American Statistical Association, 104: 1671-1681.

Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38: 894-942.

Zhang D, Lin Y, and Zhang M (2009). Penalized orthogonal-components regression for large p small n data. Electronic Journal of Statistics, 3: 781-796.

Examples

## Not run: 
data(simdata)
xx <- simdata[,-1]
yy <- simdata[,1]

sipres <- sipocre(yy,xx)

## End(Not run)

[Package POCRE version 0.6.0 Index]