CVR {CVR}R Documentation

Fit canonical variate regression with tuning parameters selected by cross validation.

Description

This function fits the solution path of canonical variate regression, with tuning parameters selected by cross validation. The tuning parameters include the rank, the η and the λ.

Usage

CVR(Y, Xlist, rankseq = 2, neta = 10, etaseq = NULL, nlam = 50, 
     Lamseq = NULL, family = c("gaussian", "binomial", "poisson"),  
     Wini = NULL, penalty = c("GL1", "L1"), nfold = 10, foldid = NULL,  
     opts = list(), type.measure = NULL)

Arguments

Y

A univariate response variable.

Xlist

A list of two covariate matrices as in cvrsolver.

rankseq

A sequence of candidate ranks. The default is a single value 2.

neta

Number of η values. The default is 10.

etaseq

A sequence of length neta containing candidate η values between 0 and 1. The default is 10^seq(-2, log10(0.9), length = neta).

nlam

Number of λ values. The default is 50.

Lamseq

A matrix of λ values. The column number is the number of sets in Xlist, and the row number is nlam. The default is 10^(seq(-2, 2, length = nlam)) for each column.

family

Type of response as in cvrsolver. The default is "gaussian".

Wini

A list of initial loading W's. The default is from the SparseCCA solution. See SparseCCA.

penalty

Type of penalty on loading matrices W's as in cvrsolver. The default is "GL1".

nfold

Number of folds in cross validation. The default is 10.

foldid

Specifying training and testing sets in cross validation; random generated if not supplied. It remains the same across different rank and η.

opts

A list of options for controlling the algorithm. The default of opts$spthresh is 0.4, which means we only search sparse models with at most 40% nonzero entries in W1 and W2. See the other options (standardization, maxIters and tol) in cvrsolver.

type.measure

Type of measurement used in cross validation. "mse" for Gaussian, "auc" for binomial, and "deviance" for binomial and Poisson.

Details

In this function, the rank, η and λ are tuned by cross validation. CVR then is refitted with all data using the selected tuning parameters. The plot function shows the tuning of λ, with selected rank and η.

Value

An object with S3 class "CVR" containing the following components

cverror

A matrix containing the CV errors. The number of rows is the length of etaseq and the number of columns is the length of rankseq.

etahat

Selected η.

rankhat

Selected rank.

Lamhat

Selected λ's.

Alphapath

An array containing the fitted paths of the intercept term α.

Betapath

An array containing the fitted paths of the regression coefficient β.

W1path, W2path

Arrays containing the fitted paths of W1 and W2.

foldid

foldid used in cross validation.

cvout

Cross validation results using selected η and rank.

solution

A list including the solutions of α, β, W1 and W2, by refitting all the data using selected tuning parameters.

Author(s)

Chongliang Luo, Kun Chen.

References

Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.

See Also

cvrsolver, SparseCCA, SimulateCVR.

Examples

############## Gaussian response ###################### 
set.seed(42)   
mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70,  
                  pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata$X1;
X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## fix rank = 4, tune eta and lambda   
##out_cvr <- CVR(Y, Xlist, rankseq = 4, neta = 5, nlam = 25,  
##               family = "g", nfold = 5)
## out_cvr$solution$W[[1]];  
## out_cvr$solution$W[[2]];     
### uncomment to see plots 
## plot.CVR(out_cvr)
## 
## Distance of subspaces 
##U <- mydata$U
##Pj <- function(U) U %*% solve(t(U) %*% U, t(U)) 
##sum((Pj(U) - (Pj(X1 %*% out_cvr$sol$W[[1]]) + Pj(X2 %*% out_cvr$sol$W[[2]]))/2)^2) 
## Precision/Recall rate
## the first 10 rows of the true W1 and W2 are set to be nonzero
##W12 <- rbind(out_cvr$sol$W[[1]], out_cvr$sol$W[[1]])
##W12norm <- apply(W12, 1, function(a)sqrt(sum(a^2)))
##prec <- sum(W12norm[c(1:10, 51:60)] != 0)/sum(W12norm != 0); prec 
##rec <- sum(W12norm[c(1:10, 51:60)] != 0)/20; rec
## sequential SparseCCA, compare the Distance of subspaces and Prec/Rec
##W12s <- SparseCCA(X1, X2, 4)
## Distance larger than CVR's 
##sum((Pj(U) - (Pj(X1 %*% W12s$W1) + Pj(X2 %*% W12s$W2))/2)^2) 
##W12snorm <- apply(rbind(W12s$W1, W12s$W2), 1, function(a)sqrt(sum(a^2)))
## compare Prec/Rec 
##sum(W12snorm[c(1:10, 51:60)] != 0)/sum(W12snorm != 0);  
##sum(W12snorm[c(1:10, 51:60)] != 0)/20; 

############## binary response ########################
set.seed(12) 
mydata <- SimulateCVR(family = "binomial", n = 300, rank = 4, p1 = 50,  
                       p2 = 70, pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata$X1; X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## out_cvr <- CVR(Y, Xlist, 4, neta = 5, nlam=25, family = "b", nfold = 5)  
## out_cvr$sol$W[[1]];  
## out_cvr$sol$W[[2]];    
## plot.CVR(out_cvr)

############## Poisson response ######################
set.seed(34)
mydata <- SimulateCVR(family = "p", n = 100, rank = 4, p1 = 50,    
                       p2 = 70, pnz = 10, beta = c(0.2, 0.1, 0, 0))
X1 <- mydata$X1; X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## etaseq <- 10^seq(-3, log10(0.95), len = 10)
## out_cvr <- CVR(Y, Xlist, 4, neta = 5, nlam = 25, family = "p", nfold = 5)
## out_cvr$sol$W[[1]];  
## out_cvr$sol$W[[2]];    
## plot.CVR(out_cvr)  

[Package CVR version 0.1.1 Index]