flamCV {flam} | R Documentation |
Fit the Fused Lasso Additive Model and Do Tuning Parameter Selection using K-Fold Cross-Validation
Description
Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.
Usage
flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL,
alpha = 1, family = "gaussian", method = "BCD", fold = NULL,
n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6)
Arguments
x |
n x p covariate matrix. May have p > n. |
y |
n-vector containing the outcomes for the n observations in |
lambda.min.ratio |
smallest value for |
n.lambda |
the number of lambda values to consider - the default is 50. |
lambda.seq |
a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate |
alpha |
the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n. |
family |
specifies the loss function to use. Currently supports squared error loss (default; |
method |
specifies the optimization algorithm to use. Options are block-coordinate descent (default; |
fold |
user-supplied fold numbers for cross-validation. If supplied, |
n.fold |
the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of |
seed |
an optional number used with |
within1SE |
logical ( |
tolerance |
specifies the convergence criterion for the objective (default is 10e-6). |
Details
Note that flamCV
does not cross-validate over alpha
- just a single value should be provided. However, if the user would like to cross-validate over alpha
, then flamCV
should be called multiple times for different values of alpha
and the same seed
. This ensures that the cross-validation folds (fold
) remain the same for the different values of alpha
. See the example below for details.
Value
An object with S3 class "flamCV".
mean.cv.error |
m-vector containing cross-validation error where m is the length of |
se.cv.error |
m-vector containing cross-validation standard error where m is the length of |
lambda.cv |
optimal lambda value chosen by cross-validation. |
alpha |
as specified by user (or default). |
index.cv |
index of the model corresponding to 'lambda.cv'. |
flam.out |
object of class 'flam' returned by |
fold |
as specified by user (or default). |
n.folds |
as specified by user (or default). |
within1SE |
as specified by user (or default). |
tolerance |
as specified by user (or default). |
call |
matched call. |
Author(s)
Ashley Petersen
References
Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.
See Also
flam
, plot.flamCV
, summary.flamCV
Examples
#See ?'flam-package' for a full example of how to use this package
#generate data
set.seed(1)
data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1)
#fit model for a range of lambda chosen by default
#pick lambda using 2-fold cross-validation
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out <- flamCV(x = data$x, y = data$y, alpha = 0.75, n.fold = 2)
## Not run:
#note that cross-validation is only done to choose lambda for specified alpha
#to cross-validate over alpha also, call 'flamCV' for several alpha and set seed
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out1 <- flamCV(x = data$x, y = data$y, alpha = 0.65, seed = 100,
within1SE = FALSE, n.fold = 2)
flamCV.out2 <- flamCV(x = data$x, y = data$y, alpha = 0.75, seed = 100,
within1SE = FALSE, n.fold = 2)
flamCV.out3 <- flamCV(x = data$x, y = data$y, alpha = 0.85, seed = 100,
within1SE = FALSE, n.fold = 2)
#this ensures that the folds used are the same
flamCV.out1$fold; flamCV.out2$fold; flamCV.out3$fold
#compare the CV error for the optimum lambda of each alpha to choose alpha
CVerrors <- c(flamCV.out1$mean.cv.error[flamCV.out1$index.cv],
flamCV.out2$mean.cv.error[flamCV.out2$index.cv],
flamCV.out3$mean.cv.error[flamCV.out3$index.cv])
best.alpha <- c(flamCV.out1$alpha, flamCV.out2$alpha,
flamCV.out3$alpha)[which(CVerrors==min(CVerrors))]
#also can generate data for logistic FLAM model
data2 <- sim.data(n = 50, scenario = 1, zerof = 10, family = "binomial")
#fit the FLAM model with cross-validation using logistic loss
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.logistic.out <- flamCV(x = data2$x, y = data2$y, family = "binomial",
n.fold = 2)
## End(Not run)
#'flamCV' returns an object of the class 'flamCV' that includes an object
#of class 'flam' (flam.out); see ?'flam-package' for an example using S3
#methods for the classes of 'flam' and 'flamCV'