StaPLR {mvs}R Documentation

Stacked Penalized Logistic Regression

Description

Fit a two-level stacked penalized (logistic) regression model with a single base-learner and a single meta-learner. Stacked penalized regression models with a Gaussian or Poisson outcome can be fitted using the family argument.

Usage

StaPLR(
  x,
  y,
  view,
  view.names = NULL,
  family = "binomial",
  correct.for = NULL,
  alpha1 = 0,
  alpha2 = 1,
  nfolds = 10,
  seed = NULL,
  std.base = FALSE,
  std.meta = FALSE,
  ll1 = -Inf,
  ul1 = Inf,
  ll2 = 0,
  ul2 = Inf,
  cvloss = "deviance",
  metadat = "response",
  cvlambda = "lambda.min",
  cvparallel = FALSE,
  lambda.ratio = 1e-04,
  fdev = 0,
  penalty.weights = NULL,
  parallel = FALSE,
  skip.version = TRUE,
  skip.meta = FALSE,
  skip.cv = FALSE,
  progress = TRUE
)

staplr(
  x,
  y,
  view,
  view.names = NULL,
  family = "binomial",
  correct.for = NULL,
  alpha1 = 0,
  alpha2 = 1,
  nfolds = 10,
  seed = NULL,
  std.base = FALSE,
  std.meta = FALSE,
  ll1 = -Inf,
  ul1 = Inf,
  ll2 = 0,
  ul2 = Inf,
  cvloss = "deviance",
  metadat = "response",
  cvlambda = "lambda.min",
  cvparallel = FALSE,
  lambda.ratio = 1e-04,
  fdev = 0,
  penalty.weights = NULL,
  parallel = FALSE,
  skip.version = TRUE,
  skip.meta = FALSE,
  skip.cv = FALSE,
  progress = TRUE
)

Arguments

x

input matrix of dimension nobs x nvars

y

outcome vector of length nobs

view

a vector of length nvars, where each entry is an integer describing to which view each feature corresponds.

view.names

(optional) a character vector of length nviews specifying a name for each view.

family

Either a character string representing one of the built-in families, or else a glm() family object. For more information, see family argument's documentation in glmnet. Note that "multinomial", "mgaussian", "cox", or 2-column responses with "binomial" family are not yet supported.

correct.for

(optional) a matrix with nrow = nobs, where each column is a feature which should be included directly into the meta.learner. By default these features are not penalized (see penalty.weights) and appear at the top of the coefficient list.

alpha1

(base) alpha parameter for glmnet: lasso(1) / ridge(0)

alpha2

(meta) alpha parameter for glmnet: lasso(1) / ridge(0)

nfolds

number of folds to use for all cross-validation.

seed

(optional) numeric value specifying the seed. Setting the seed this way ensures the results are reproducible even when the computations are performed in parallel.

std.base

should features be standardized at the base level?

std.meta

should cross-validated predictions be standardized at the meta level?

ll1

lower limit(s) for each coefficient at the base-level. Defaults to -Inf.

ul1

upper limit(s) for each coefficient at the base-level. Defaults to Inf.

ll2

lower limit(s) for each coefficient at the meta-level. Defaults to 0 (non-negativity constraints). Does not apply to correct.for features.

ul2

upper limit(s) for each coefficient at the meta-level. Defaults to Inf. Does not apply to correct.for features.

cvloss

loss to use for cross-validation.

metadat

which attribute of the base learners should be used as input for the meta learner? Allowed values are "response", "link", and "class".

cvlambda

value of lambda at which cross-validated predictions are made. Defaults to the value giving minimum internal cross-validation error.

cvparallel

whether to use 'foreach' to fit each CV fold (DO NOT USE, USE OPTION parallel INSTEAD).

lambda.ratio

the ratio between the largest and smallest lambda value.

fdev

sets the minimum fractional change in deviance for stopping the path to the specified value, ignoring the value of fdev set through glmnet.control. Setting fdev=NULL will use the value set through glmnet.control instead. It is strongly recommended to use the default value of zero.

penalty.weights

(optional) a vector of length nviews, containing different penalty factors for the meta-learner. Defaults to rep(1,nviews). The penalty factor is set to 0 for correct.for features.

parallel

whether to use foreach to fit the base-learners and obtain the cross-validated predictions in parallel. Executes sequentially unless a parallel backend is registered beforehand.

skip.version

whether to skip checking the version of the glmnet package.

skip.meta

whether to skip training the metalearner.

skip.cv

whether to skip generating the cross-validated predictions.

progress

whether to show a progress bar (only supported when parallel = FALSE).

Value

An object with S3 class "StaPLR".

Author(s)

Wouter van Loon <w.s.van.loon@fsw.leidenuniv.nl>

Examples


set.seed(012)
n <- 1000
cors <- seq(0.1,0.7,0.1)
X <- matrix(NA, nrow=n, ncol=length(cors)+1)
X[,1] <- rnorm(n)

for(i in 1:length(cors)){
  X[,i+1] <- X[,1]*cors[i] + rnorm(n, 0, sqrt(1-cors[i]^2))
}

beta <- c(1,0,0,0,0,0,0,0)
eta <- X %*% beta
p <- exp(eta)/(1+exp(eta))
y <- rbinom(n, 1, p) ## create binary response
view_index <- rep(1:(ncol(X)/2), each=2)

# Stacked penalized logistic regression
fit <- StaPLR(X, y, view_index)
coef(fit)$meta

new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)

# Stacked penalized linear regression
y <- eta + rnorm(100) ## create continuous response
fit <- StaPLR(X, y, view_index, family = "gaussian")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)

# Stacked penalized Poisson regression
y <- ceiling(eta + 4) ## create count response
fit <- StaPLR(X, y, view_index, family = "poisson")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)


[Package mvs version 1.0.2 Index]