R: Fit a group-adaptive elastic net penalised linear or logistic...

squeezy {squeezy}

R Documentation

Fit a group-adaptive elastic net penalised linear or logistic model

Description

Estimate group-specific elastic net penalties and fit a linear or logistic regression model.

Usage

squeezy(Y, X, groupset, alpha = 1, model = NULL, X2 = NULL, 
        Y2 = NULL, unpen = NULL, intrcpt = TRUE, 
        method = c("ecpcEN", "MML", "MML.noDeriv", "CV"), 
        fold = 10, compareMR = TRUE, selectAIC = FALSE, fit.ecpc = NULL, 
        lambdas = NULL, lambdaglobal = NULL, lambdasinit = NULL, 
        sigmasq = NULL, ecpcinit = TRUE, SANN = FALSE, minlam = 10^-3, 
        standardise_Y = NULL, reCV = NULL, opt.sigma = NULL, 
        resultsAICboth = FALSE, silent=FALSE)

Arguments

`Y`	Response data; n-dimensional vector (n: number of samples) for linear and logistic outcomes.
`X`	Observed data; (nxp)-dimensional matrix (p: number of covariates) with each row the observed high-dimensional feature vector of a sample.
`groupset`	Co-data group set; list with G groups. Each group is a vector containing the indices of the covariates in that group.
`alpha`	Elastic net penalty mixing parameter.
`model`	Type of model for the response; linear or logistic.
`X2`	(optional) Independent observed data for which response is predicted.
`Y2`	(optional) Independent response data to compare with predicted response.
`unpen`	Unpenalised covariates; vector with indices of covariates that should not be penalised.
`intrcpt`	Should an intercept be included? Included by default for linear and logistic, excluded for Cox for which the baseline hazard is estimated.
`method`	Which method should be used to estimate the group-specific penalties? Default MML.
`fold`	Number of folds used in inner cross-validation to estimate (initial) global ridge penalty lambda (if not given).
`compareMR`	TRUE/FALSE to fit the multi-ridge model and return results for comparison.
`selectAIC`	TRUE/FALSE to select the single-group model or multi-group model.
`fit.ecpc`	(optional) Model fit obtained by the function ecpc (from the ecpc R-package)
`lambdas`	(optional) Group-specific ridge penalty parameters. If given, these are transformed to elastic net penalties.
`lambdaglobal`	(optional) Global ridge penalty parameter used for initialising the optimisation.
`lambdasinit`	(optional) Group-specific ridge penalty parameters used for initialising the optimisation.
`sigmasq`	(linear model only) If given, noise level is fixed (Y~N(X*beta,sd=sqrt(sigmasq))).
`ecpcinit`	TRUE/FALSE for using group-specific ridge penalties as given in ‘fit.ecpc’ for initialising the optimisation.
`SANN`	('method'=MML.noDeriv only) TRUE/FALSE to use simulated annealing in optimisation of the ridge penalties.
`minlam`	Minimal value of group-specific ridge penalty used in the optimisation.
`standardise_Y`	TRUE/FALSE should Y be standardised?
`reCV`	TRUE/FALSE should the elastic net penalties be recalibrated by cross-validation of a global rescaling penalty?
`opt.sigma`	(linear model only) TRUE/FALSE to optimise sigmasq jointly with the ridge penalties.
`resultsAICboth`	(selectAIC=TRUE only) TRUE/FALSE to return results of both the single-group and multi-group model.
`silent`	Should output messages be suppressed (default FALSE)?

Value

`betaApprox`	Estimated regression coefficients of the group-adaptive elastic net model; p-dimensional vector.
`a0Approx`	Estimated intercept of the group-adaptive elastic net model; scalar.
`lambdaApprox`	Estimated group penalty parameters of the group-adaptive elastic net model; G-dimensional vector.
`lambdapApprox`	Estimated elastic net penalty parameter of the group-adaptive elastic net model for all covariates; p-dimensional vector.
`tauMR`	Estimated group variances of the multi-ridge model; G-dimensional vector.
`lambdaMR`	Estimated group penalties of the multi-ridge model; G-dimensional vector.
`lambdaglobal`	Estimated global ridge penalty; scalar. Note: only optimised if selectAIC=TRUE or compareMR=TRUE, else the returned crude estimate is sufficient for initialisation of squeezy.
`sigmahat`	(linear model) Estimated sigma^2; scalar.
`MLinit`	Min log marginal likelihood value at initial group penalties; scalar.
`MLfinal`	Min log marginal likelihood value at estimated group penalties; scalar.
`alpha`	Value used for the elastic net mixing parameter alpha; scalar.
`glmnet.fit`	Fit of the ‘glmnet’ function to obtain the regression coefficients.

If ‘compareMR’=TRUE, multi-ridge model is returned as well:

`betaMR`	Estimated regression coefficients of the multi-ridge model; p-dimensional vector.
`a0MR`	Estimated intercept of the multi-ridge model; scalar.

If independent test set ‘X2’ is given, predictions and MSE are returned:

`YpredApprox`	Predictions for the test set of the estimated group-adaptive elastic net model.
`MSEApprox`	Mean squared error on the test set of the estimated group-adaptive elastic net model.
`YpredMR`	Predictions for the test set of the estimated group-adaptive multi-ridge model.
`MSEMR`	Mean squared error on the test set of the estimated group-adaptive multi-ridge model.

If ‘selectAIC’=TRUE, the multi-group or single-group model with best AIC is selected. Results in ‘betaApprox’, ‘a0Approx’, ‘lambdaApprox’ contain those results of the best model. Summary results of both models are included as well:

AICmodels

List with elements “multigroup" and “onegroup".- Each element is a list with results of the multi-group or single-group model, containing the group penalties (‘lambdas’), sigma^2 (‘sigmahat’, linear model only), and AIC (‘AIC’).

If besides ‘selectAIC’=TRUE, also ‘resultsAICboth’=TRUE, the fit of both the single-group model and multi-group model as obtained with squeezy are returned (‘fit’).

modelbestAIC

Either “onegroup" or “multigroup" for the selected model.

Author(s)

Mirrelijn M. van Nee, Tim van de Brug, Mark A. van de Wiel

References

Mirrelijn M. van Nee, Tim van de Brug, Mark A. van de Wiel, "Fast marginal likelihood estimation of penalties for group-adaptive elastic net", arXiv preprint, arXiv:2101.03875 (2021).

Examples


#####################
# Simulate toy data #
#####################
p<-100 #number of covariates
n<-50 #sample size training data set
n2<-100 #sample size test data set
G<- 5 #number of groups

taugrp <- rep(c(0.05,0.1,0.2,0.5,1),each=p/G) #ridge prior variance
groupIndex <- rep(1:G,each=p/G) #groups for co-data
groupset <- lapply(1:G,function(x){which(groupIndex==x)}) #group set with each element one group
sigmasq <- 2 #linear regression noise
lambda1 <- sqrt(taugrp/2) #corresponding lasso penalty
#A Laplace(0,b) variate can also be generated as the difference of two i.i.d.
#Exponential(1/b) random variables
betas <-   rexp(p, 1/lambda1) -  rexp(p, 1/lambda1) #regression coefficients
X <- matrix(rnorm(n*p),n,p) #simulate training data
Y <- rnorm(n,X%*%betas,sd=sqrt(sigmasq))
X2 <- matrix(rnorm(n*p),n,p) #simulate test data
Y2 <- rnorm(n,X2%*%betas,sd=sqrt(sigmasq))

###############
# Fit squeezy #
###############
#may be fit directly..
res.squeezy <- squeezy(Y,X,groupset=groupset,Y2=Y2,X2=X2,
                       model="linear",alpha=0.5)


  #..or with ecpc-fit as initialisation
  if(requireNamespace("ecpc")){
    res.ecpc <- ecpc::ecpc(Y,X, #observed data and response to train model
                     groupsets=list(groupset), #informative co-data group set
                     Y2=Y2,X2=X2, #test data
                     model="linear",
                     hypershrinkage="none",postselection = FALSE)
    res.squeezy <- squeezy(Y,X, #observed data and response to train model
                           groupset=groupset, #informative co-data group set
                           Y2=Y2,X2=X2, #test data
                           fit.ecpc = res.ecpc, #ecpc-fit for initial values
                           model="linear", #type of model for the response
                           alpha=0.5) #elastic net mixing parameter
  }



summary(res.squeezy$betaApprox) #estimated elastic net regression coefficients
summary(res.squeezy$betaMR) #estimated multi-ridge regression coefficients
res.squeezy$lambdaApprox #estimated group elastic net penalties
res.squeezy$tauMR #multi-ridge group variances
res.squeezy$MSEApprox #MSE group-elastic net model
res.squeezy$MSEMR #MSE group-ridge model

#once fit, quickly find model fit for different values of alpha:
res.squeezy2 <- squeezy(Y,X, #observed data and response to train model
                        groupset=groupset, #informative co-data groupset
                        Y2=Y2,X2=X2, #test data
                        lambdas = res.squeezy$lambdaMR, #fix lambdas at multi-ridge estimate
                        model="linear", #type of model for the response
                        alpha=0.9) #elastic net mixing parameter


  #Select single-group model or multi-group model based on best mAIC
  res.squeezy <- squeezy(Y,X, #observed data and response to train model
                         groupset=groupset, #informative co-data group set
                         Y2=Y2,X2=X2, #test data
                         fit.ecpc = res.ecpc, #ecpc-fit for initial values
                         model="linear", #type of model for the response
                         alpha=0.5, #elastic net mixing parameter
                         selectAIC = TRUE,resultsAICboth = TRUE)
  
  res.squeezy$modelbestAIC #selected model
  res.squeezy$AICmodels$multigroup$fit$MSEApprox #MSE on test set of multi-group model
  res.squeezy$AICmodels$onegroup$fit$MSEApprox #MSE on test set of single-group model

[Package squeezy version 1.1-1 Index]