StaPLR {mvs} | R Documentation |
Stacked Penalized Logistic Regression
Description
Fit a two-level stacked penalized (logistic) regression model with a single base-learner and a single meta-learner. Stacked penalized regression models with a Gaussian or Poisson outcome can be fitted using the family argument.
Usage
StaPLR(
x,
y,
view,
view.names = NULL,
family = "binomial",
correct.for = NULL,
alpha1 = 0,
alpha2 = 1,
nfolds = 10,
seed = NULL,
std.base = FALSE,
std.meta = FALSE,
ll1 = -Inf,
ul1 = Inf,
ll2 = 0,
ul2 = Inf,
cvloss = "deviance",
metadat = "response",
cvlambda = "lambda.min",
cvparallel = FALSE,
lambda.ratio = 1e-04,
fdev = 0,
penalty.weights = NULL,
parallel = FALSE,
skip.version = TRUE,
skip.meta = FALSE,
skip.cv = FALSE,
progress = TRUE
)
staplr(
x,
y,
view,
view.names = NULL,
family = "binomial",
correct.for = NULL,
alpha1 = 0,
alpha2 = 1,
nfolds = 10,
seed = NULL,
std.base = FALSE,
std.meta = FALSE,
ll1 = -Inf,
ul1 = Inf,
ll2 = 0,
ul2 = Inf,
cvloss = "deviance",
metadat = "response",
cvlambda = "lambda.min",
cvparallel = FALSE,
lambda.ratio = 1e-04,
fdev = 0,
penalty.weights = NULL,
parallel = FALSE,
skip.version = TRUE,
skip.meta = FALSE,
skip.cv = FALSE,
progress = TRUE
)
Arguments
x |
input matrix of dimension nobs x nvars |
y |
outcome vector of length nobs |
view |
a vector of length nvars, where each entry is an integer describing to which view each feature corresponds. |
view.names |
(optional) a character vector of length nviews specifying a name for each view. |
family |
Either a character string representing one of the built-in families, or else a |
correct.for |
(optional) a matrix with nrow = nobs, where each column is a feature which should be included directly into the meta.learner. By default these features are not penalized (see penalty.weights) and appear at the top of the coefficient list. |
alpha1 |
(base) alpha parameter for glmnet: lasso(1) / ridge(0) |
alpha2 |
(meta) alpha parameter for glmnet: lasso(1) / ridge(0) |
nfolds |
number of folds to use for all cross-validation. |
seed |
(optional) numeric value specifying the seed. Setting the seed this way ensures the results are reproducible even when the computations are performed in parallel. |
std.base |
should features be standardized at the base level? |
std.meta |
should cross-validated predictions be standardized at the meta level? |
ll1 |
lower limit(s) for each coefficient at the base-level. Defaults to -Inf. |
ul1 |
upper limit(s) for each coefficient at the base-level. Defaults to Inf. |
ll2 |
lower limit(s) for each coefficient at the meta-level. Defaults to 0 (non-negativity constraints). Does not apply to correct.for features. |
ul2 |
upper limit(s) for each coefficient at the meta-level. Defaults to Inf. Does not apply to correct.for features. |
cvloss |
loss to use for cross-validation. |
metadat |
which attribute of the base learners should be used as input for the meta learner? Allowed values are "response", "link", and "class". |
cvlambda |
value of lambda at which cross-validated predictions are made. Defaults to the value giving minimum internal cross-validation error. |
cvparallel |
whether to use 'foreach' to fit each CV fold (DO NOT USE, USE OPTION parallel INSTEAD). |
lambda.ratio |
the ratio between the largest and smallest lambda value. |
fdev |
sets the minimum fractional change in deviance for stopping the path to the specified value, ignoring the value of fdev set through glmnet.control. Setting fdev=NULL will use the value set through glmnet.control instead. It is strongly recommended to use the default value of zero. |
penalty.weights |
(optional) a vector of length nviews, containing different penalty factors for the meta-learner. Defaults to rep(1,nviews). The penalty factor is set to 0 for correct.for features. |
parallel |
whether to use foreach to fit the base-learners and obtain the cross-validated predictions in parallel. Executes sequentially unless a parallel backend is registered beforehand. |
skip.version |
whether to skip checking the version of the glmnet package. |
skip.meta |
whether to skip training the metalearner. |
skip.cv |
whether to skip generating the cross-validated predictions. |
progress |
whether to show a progress bar (only supported when parallel = FALSE). |
Value
An object with S3 class "StaPLR".
Author(s)
Wouter van Loon <w.s.van.loon@fsw.leidenuniv.nl>
Examples
set.seed(012)
n <- 1000
cors <- seq(0.1,0.7,0.1)
X <- matrix(NA, nrow=n, ncol=length(cors)+1)
X[,1] <- rnorm(n)
for(i in 1:length(cors)){
X[,i+1] <- X[,1]*cors[i] + rnorm(n, 0, sqrt(1-cors[i]^2))
}
beta <- c(1,0,0,0,0,0,0,0)
eta <- X %*% beta
p <- exp(eta)/(1+exp(eta))
y <- rbinom(n, 1, p) ## create binary response
view_index <- rep(1:(ncol(X)/2), each=2)
# Stacked penalized logistic regression
fit <- StaPLR(X, y, view_index)
coef(fit)$meta
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)
# Stacked penalized linear regression
y <- eta + rnorm(100) ## create continuous response
fit <- StaPLR(X, y, view_index, family = "gaussian")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)
# Stacked penalized Poisson regression
y <- ceiling(eta + 4) ## create count response
fit <- StaPLR(X, y, view_index, family = "poisson")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)