cv.cmls {CMLS} | R Documentation |
Cross-Validation for cmls
Description
Does k-fold or generalized cross-validation to tune the constraint options for cmls
. Tunes the model with respect to any combination of the arguments const
, df
, degree
, and/or intercept
.
Usage
cv.cmls(X, Y, nfolds = 2, foldid = NULL, parameters = NULL,
const = "uncons", df = 10, degree = 3, intercept = TRUE,
mse = TRUE, parallel = FALSE, cl = NULL, verbose = TRUE, ...)
Arguments
X |
Matrix of dimension |
Y |
Matrix of dimension |
nfolds |
Number of folds for k-fold cross-validation. Ignored if |
foldid |
Factor or integer vector of length |
parameters |
Parameters for tuning. Data frame with columns |
const |
Parameters for tuning. Character vector specifying constraints for tuning. See Details. |
df |
Parameters for tuning. Integer vector specifying degrees of freedom for tuning. See Details. |
degree |
Parameters for tuning. Integer vector specifying polynomial degrees for tuning. See Details. |
intercept |
Parameters for tuning. Logical vector specifying intercepts for tuning. See Details. |
mse |
If |
parallel |
Logical indicating if |
cl |
Cluster created by |
verbose |
If |
... |
Additional arguments to the |
Details
The parameters for tuning can be supplied via one of two options:
(A) Using the parameters
argument. In this case, the argument parameters
must be a data frame with columns const
, df
, degree
, and intercept
, where each row gives a combination of parameters for the CV tuning.
(B) Using the const
, df
, degree
, and intercept
arguments. In this case, the expand.grid
function is used to create the parameters
data frame, which contains all combinations of the arguments const
, df
, degree
, and intercept
. Duplicates are removed before the CV tuning.
Value
best.parameters |
Best combination of parameters, i.e., the combination that minimizes the |
top5.parameters |
Top five combinations of parameters, i.e., the combinations that give the five smallest values of the |
full.parameters |
Full set of parameters. Data frame with |
Author(s)
Nathaniel E. Helwig <helwig@umn.edu>
References
Helwig, N. E. (in prep). Constrained multivariate least squares in R.
See Also
See the cmls
and const
functions for further details on the available constraint options.
Examples
# make X
set.seed(1)
n <- 50
m <- 20
p <- 2
Xmat <- matrix(rnorm(n*p), nrow = n, ncol = p)
# make B (which satisfies all constraints except monotonicity)
x <- seq(0, 1, length.out = m)
Bmat <- rbind(sin(2*pi*x), sin(2*pi*x+pi)) / sqrt(4.75)
struc <- rbind(rep(c(TRUE, FALSE), each = m / 2),
rep(c(FALSE, TRUE), each = m / 2))
Bmat <- Bmat * struc
# make noisy data
Ymat <- Xmat %*% Bmat + rnorm(n*m, sd = 0.5)
# 5-fold CV: tune df (5,...,15) for const = "smooth"
kcv <- cv.cmls(X = Xmat, Y = Ymat, nfolds = 5,
const = "smooth", df = 5:15)
kcv$best.parameters
kcv$top5.parameters
plot(kcv$full.parameters$df, kcv$full.parameters$cvloss, t = "b")
## Not run:
# sample foldid for 5-fold CV
set.seed(2)
foldid <- sample(rep(1:5, length.out = n))
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (no struc)
# using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (no struc)
# using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
cl <- makeCluster(2L) # using 2 cores
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15,
parallel = TRUE, cl = cl)
stopCluster(cl)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
# using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15, struc = struc)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
# using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
cl <- makeCluster(2L) # using 2 cores
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15, struc = struc,
parallel = TRUE, cl = cl)
stopCluster(cl)
})
kcv$best.parameters
kcv$top5.parameters
## End(Not run)