R: Stacked Penalized Logistic Regression

StaPLR {mvs}

R Documentation

Stacked Penalized Logistic Regression

Description

Fit a two-level stacked penalized (logistic) regression model with a single base-learner and a single meta-learner. Stacked penalized regression models with a Gaussian or Poisson outcome can be fitted using the family argument.

Usage

StaPLR(
  x,
  y,
  view,
  view.names = NULL,
  family = "binomial",
  correct.for = NULL,
  alpha1 = 0,
  alpha2 = 1,
  nfolds = 10,
  seed = NULL,
  std.base = FALSE,
  std.meta = FALSE,
  ll1 = -Inf,
  ul1 = Inf,
  ll2 = 0,
  ul2 = Inf,
  cvloss = "deviance",
  metadat = "response",
  cvlambda = "lambda.min",
  cvparallel = FALSE,
  lambda.ratio = 1e-04,
  fdev = 0,
  penalty.weights = NULL,
  parallel = FALSE,
  skip.version = TRUE,
  skip.meta = FALSE,
  skip.cv = FALSE,
  progress = TRUE
)

staplr(
  x,
  y,
  view,
  view.names = NULL,
  family = "binomial",
  correct.for = NULL,
  alpha1 = 0,
  alpha2 = 1,
  nfolds = 10,
  seed = NULL,
  std.base = FALSE,
  std.meta = FALSE,
  ll1 = -Inf,
  ul1 = Inf,
  ll2 = 0,
  ul2 = Inf,
  cvloss = "deviance",
  metadat = "response",
  cvlambda = "lambda.min",
  cvparallel = FALSE,
  lambda.ratio = 1e-04,
  fdev = 0,
  penalty.weights = NULL,
  parallel = FALSE,
  skip.version = TRUE,
  skip.meta = FALSE,
  skip.cv = FALSE,
  progress = TRUE
)

Arguments

`x`	input matrix of dimension nobs x nvars
`y`	outcome vector of length nobs
`view`	a vector of length nvars, where each entry is an integer describing to which view each feature corresponds.
`view.names`	(optional) a character vector of length nviews specifying a name for each view.
`family`	Either a character string representing one of the built-in families, or else a `glm()` family object. For more information, see `family` argument's documentation in `glmnet`. Note that "multinomial", "mgaussian", "cox", or 2-column responses with "binomial" family are not yet supported.
`correct.for`	(optional) a matrix with nrow = nobs, where each column is a feature which should be included directly into the meta.learner. By default these features are not penalized (see penalty.weights) and appear at the top of the coefficient list.
`alpha1`	(base) alpha parameter for glmnet: lasso(1) / ridge(0)
`alpha2`	(meta) alpha parameter for glmnet: lasso(1) / ridge(0)
`nfolds`	number of folds to use for all cross-validation.
`seed`	(optional) numeric value specifying the seed. Setting the seed this way ensures the results are reproducible even when the computations are performed in parallel.
`std.base`	should features be standardized at the base level?
`std.meta`	should cross-validated predictions be standardized at the meta level?
`ll1`	lower limit(s) for each coefficient at the base-level. Defaults to -Inf.
`ul1`	upper limit(s) for each coefficient at the base-level. Defaults to Inf.
`ll2`	lower limit(s) for each coefficient at the meta-level. Defaults to 0 (non-negativity constraints). Does not apply to correct.for features.
`ul2`	upper limit(s) for each coefficient at the meta-level. Defaults to Inf. Does not apply to correct.for features.
`cvloss`	loss to use for cross-validation.
`metadat`	which attribute of the base learners should be used as input for the meta learner? Allowed values are "response", "link", and "class".
`cvlambda`	value of lambda at which cross-validated predictions are made. Defaults to the value giving minimum internal cross-validation error.
`cvparallel`	whether to use 'foreach' to fit each CV fold (DO NOT USE, USE OPTION parallel INSTEAD).
`lambda.ratio`	the ratio between the largest and smallest lambda value.
`fdev`	sets the minimum fractional change in deviance for stopping the path to the specified value, ignoring the value of fdev set through glmnet.control. Setting fdev=NULL will use the value set through glmnet.control instead. It is strongly recommended to use the default value of zero.
`penalty.weights`	(optional) a vector of length nviews, containing different penalty factors for the meta-learner. Defaults to rep(1,nviews). The penalty factor is set to 0 for correct.for features.
`parallel`	whether to use foreach to fit the base-learners and obtain the cross-validated predictions in parallel. Executes sequentially unless a parallel backend is registered beforehand.
`skip.version`	whether to skip checking the version of the glmnet package.
`skip.meta`	whether to skip training the metalearner.
`skip.cv`	whether to skip generating the cross-validated predictions.
`progress`	whether to show a progress bar (only supported when parallel = FALSE).

Value

An object with S3 class "StaPLR".

Author(s)

Wouter van Loon <w.s.van.loon@fsw.leidenuniv.nl>

Examples


set.seed(012)
n <- 1000
cors <- seq(0.1,0.7,0.1)
X <- matrix(NA, nrow=n, ncol=length(cors)+1)
X[,1] <- rnorm(n)

for(i in 1:length(cors)){
  X[,i+1] <- X[,1]*cors[i] + rnorm(n, 0, sqrt(1-cors[i]^2))
}

beta <- c(1,0,0,0,0,0,0,0)
eta <- X %*% beta
p <- exp(eta)/(1+exp(eta))
y <- rbinom(n, 1, p) ## create binary response
view_index <- rep(1:(ncol(X)/2), each=2)

# Stacked penalized logistic regression
fit <- StaPLR(X, y, view_index)
coef(fit)$meta

new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)

# Stacked penalized linear regression
y <- eta + rnorm(100) ## create continuous response
fit <- StaPLR(X, y, view_index, family = "gaussian")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)

# Stacked penalized Poisson regression
y <- ceiling(eta + 4) ## create count response
fit <- StaPLR(X, y, view_index, family = "poisson")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)

[Package mvs version 1.0.2 Index]