R: Automatic search for optimal penalty parameters of the...

optPenaltyGLMmultiT.kCVauto {porridge}

R Documentation

Automatic search for optimal penalty parameters of the targeted ridge GLM estimator.

Description

Function finds the optimal penalty parameter of the targeted ridge regression estimator of the generalized linear model parameter. The optimum is defined as the minimizer of the cross-validated loss associated with the estimator.

Usage

optPenaltyGLMmultiT.kCVauto(Y, X, lambdaInit, model="linear", targetMat,
                      folds=makeFoldsGLMcv(min(10, length(X)), Y, model=model),
                      loss="loglik", lambdaMin=10^(-5),
                      minSuccDiff=10^(-5), maxIter=100)

Arguments

`Y`	A `numeric` being the response vector.
`X`	The design `matrix`. The number of rows should match the number of elements of `Y`.
`lambdaInit`	A `numeric` giving the starting values for search of the optimal penalty parameter.
`model`	A `character`, either `"linear"` and `"logistic"` (a reference to the models currently implemented), indicating which generalized linear model model instance is to be fitted.
`targetMat`	A `matrix` with targets for the regression parameter as columns.
`folds`	A `list`. Each list item representing a fold. It is an `integer` vector indexing the samples that comprise the fold. This object can be generated with the `makeFoldsGLMcv` function.
`loss`	A `character`, either `loss="loglik"` or `"sos"`, specifying loss criterion to be used in the cross-validation. Used only if `model="linear"`.
`lambdaMin`	A positive `numeric`, the lower bound of search interval of the regular ridge penalty parameter.
`minSuccDiff`	A `numeric`, the minimum distance between the loglikelihoods of two successive iterations to be achieved. Used only if `model="logistic"`.
`maxIter`	A `numeric` specifying the maximum number of iterations. Used only if `model="logistic"`.

Value

The function returns an all-positive numeric, the cross-validated optimal penalty parameters. The average loglikelihood over the left-out samples is used as the cross-validation criterion. If model="linear", also the average sum-of-squares over the left-out samples is offered as cross-validation criterion.

Author(s)

W.N. van Wieringen.

References

van Wieringen, W.N. Binder, H. (2022), "Sequential learning of regression models by penalized estimation", accepted.

Examples

# set the sample size
n <- 50

# set the true parameter
betas <- (c(0:100) - 50) / 20

# generate covariate data
X <- matrix(rnorm(length(betas)*n), nrow=n)

# sample the response
probs <- exp(tcrossprod(betas, X)[1,]) / (1 + exp(tcrossprod(betas, X)[1,]))
Y     <- numeric()
for (i in 1:n){
    Y <- c(Y, sample(c(0,1), 1, prob=c(1-probs[i], probs[i])))
}

# create targets
targets <- cbind(betas/2, rep(0, length(betas)))

# tune the penalty parameter
### optLambdas <- optPenaltyGLMmultiT.kCVauto(Y, X, c(50,0.1), fold=5,            
###                                           targetMat=targets, model="logistic", 
###                                           minSuccDiff=10^(-3))                  

# estimate the logistic regression parameter
### bHat <- ridgeGLMmultiT(Y, X, lambdas=optLambdas, targetMat=targets, model="logistic")

[Package porridge version 0.3.3 Index]