R: Fit canonical variate regression with tuning parameters...

CVR {CVR}

R Documentation

Fit canonical variate regression with tuning parameters selected by cross validation.

Description

This function fits the solution path of canonical variate regression, with tuning parameters selected by cross validation. The tuning parameters include the rank, the \eta and the \lambda.

Usage

CVR(Y, Xlist, rankseq = 2, neta = 10, etaseq = NULL, nlam = 50, 
     Lamseq = NULL, family = c("gaussian", "binomial", "poisson"),  
     Wini = NULL, penalty = c("GL1", "L1"), nfold = 10, foldid = NULL,  
     opts = list(), type.measure = NULL)

Arguments

`Y`	A univariate response variable.
`Xlist`	A list of two covariate matrices as in `cvrsolver`.
`rankseq`	A sequence of candidate ranks. The default is a single value 2.
`neta`	Number of `\eta` values. The default is 10.
`etaseq`	A sequence of length `neta` containing candidate `\eta` values between 0 and 1. The default is 10^seq(-2, log10(0.9), length = neta).
`nlam`	Number of `\lambda` values. The default is 50.
`Lamseq`	A matrix of `\lambda` values. The column number is the number of sets in `Xlist`, and the row number is `nlam`. The default is 10^(seq(-2, 2, length = nlam)) for each column.
`family`	Type of response as in `cvrsolver`. The default is `"gaussian"`.
`Wini`	A list of initial loading W's. The default is from the SparseCCA solution. See `SparseCCA`.
`penalty`	Type of penalty on loading matrices W's as in `cvrsolver`. The default is `"GL1"`.
`nfold`	Number of folds in cross validation. The default is 10.
`foldid`	Specifying training and testing sets in cross validation; random generated if not supplied. It remains the same across different rank and `\eta`.
`opts`	A list of options for controlling the algorithm. The default of `opts$spthresh` is 0.4, which means we only search sparse models with at most 40% nonzero entries in W1 and W2. See the other options (`standardization`, `maxIters` and `tol`) in `cvrsolver`.
`type.measure`	Type of measurement used in cross validation. `"mse"` for Gaussian, `"auc"` for binomial, and `"deviance"` for binomial and Poisson.

Details

In this function, the rank, \eta and \lambda are tuned by cross validation. CVR then is refitted with all data using the selected tuning parameters. The plot function shows the tuning of \lambda, with selected rank and \eta.

Value

An object with S3 class "CVR" containing the following components

`cverror`	A matrix containing the CV errors. The number of rows is the length of `etaseq` and the number of columns is the length of `rankseq`.
`etahat`	Selected `\eta`.
`rankhat`	Selected rank.
`Lamhat`	Selected `\lambda`'s.
`Alphapath`	An array containing the fitted paths of the intercept term `\alpha`.
`Betapath`	An array containing the fitted paths of the regression coefficient `\beta`.
`W1path`, `W2path`	Arrays containing the fitted paths of W1 and W2.
`foldid`	`foldid` used in cross validation.
`cvout`	Cross validation results using selected `\eta` and rank.
`solution`	A list including the solutions of `\alpha`, `\beta`, W1 and W2, by refitting all the data using selected tuning parameters.

Author(s)

Chongliang Luo, Kun Chen.

References

Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.

Examples

############## Gaussian response ###################### 
set.seed(42)   
mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70,  
                  pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata$X1;
X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## fix rank = 4, tune eta and lambda   
##out_cvr <- CVR(Y, Xlist, rankseq = 4, neta = 5, nlam = 25,  
##               family = "g", nfold = 5)
## out_cvr$solution$W[[1]];  
## out_cvr$solution$W[[2]];     
### uncomment to see plots 
## plot.CVR(out_cvr)
## 
## Distance of subspaces 
##U <- mydata$U
##Pj <- function(U) U %*% solve(t(U) %*% U, t(U)) 
##sum((Pj(U) - (Pj(X1 %*% out_cvr$sol$W[[1]]) + Pj(X2 %*% out_cvr$sol$W[[2]]))/2)^2) 
## Precision/Recall rate
## the first 10 rows of the true W1 and W2 are set to be nonzero
##W12 <- rbind(out_cvr$sol$W[[1]], out_cvr$sol$W[[1]])
##W12norm <- apply(W12, 1, function(a)sqrt(sum(a^2)))
##prec <- sum(W12norm[c(1:10, 51:60)] != 0)/sum(W12norm != 0); prec 
##rec <- sum(W12norm[c(1:10, 51:60)] != 0)/20; rec
## sequential SparseCCA, compare the Distance of subspaces and Prec/Rec
##W12s <- SparseCCA(X1, X2, 4)
## Distance larger than CVR's 
##sum((Pj(U) - (Pj(X1 %*% W12s$W1) + Pj(X2 %*% W12s$W2))/2)^2) 
##W12snorm <- apply(rbind(W12s$W1, W12s$W2), 1, function(a)sqrt(sum(a^2)))
## compare Prec/Rec 
##sum(W12snorm[c(1:10, 51:60)] != 0)/sum(W12snorm != 0);  
##sum(W12snorm[c(1:10, 51:60)] != 0)/20; 

############## binary response ########################
set.seed(12) 
mydata <- SimulateCVR(family = "binomial", n = 300, rank = 4, p1 = 50,  
                       p2 = 70, pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata$X1; X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## out_cvr <- CVR(Y, Xlist, 4, neta = 5, nlam=25, family = "b", nfold = 5)  
## out_cvr$sol$W[[1]];  
## out_cvr$sol$W[[2]];    
## plot.CVR(out_cvr)

############## Poisson response ######################
set.seed(34)
mydata <- SimulateCVR(family = "p", n = 100, rank = 4, p1 = 50,    
                       p2 = 70, pnz = 10, beta = c(0.2, 0.1, 0, 0))
X1 <- mydata$X1; X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2); 
Y <- mydata$y
## etaseq <- 10^seq(-3, log10(0.95), len = 10)
## out_cvr <- CVR(Y, Xlist, 4, neta = 5, nlam = 25, family = "p", nfold = 5)
## out_cvr$sol$W[[1]];  
## out_cvr$sol$W[[2]];    
## plot.CVR(out_cvr)

[Package CVR version 0.1.1 Index]