R: Seeded Canonical correlation analysis

seedCCA {seedCCA}

R Documentation

Seeded Canonical correlation analysis

Description

The function seedCCA is mainly for implementing seeded canonical correlation analysis proposed by Im et al. (2015). The function conducts the following four methods, depending on the value of type. The option type has one of c("cca", "seed1", "seed2", "pls").

Usage

seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)

Arguments

`X`	numeric vector or matrix (n * p), the first set of variables
`Y`	numeric vector or matrix (n * r), the second set of variables
`type`	character, a choice of methods among `c("cca", "seed1", "seed2", "pls")`. The default is `"seed2"`.
`ux`	numeric, maximum number of projections for X. The default is NULL. If this is not NULL, it surpasses the option `u` with `type="seed1"` and p>r. For `type="seed2"`, if `ux` and `uy` are not NULL, they surpass `u`.
`uy`	numeric, maximum number of projections for Y. The default is NULL. If this is not NULL, it surpasses the option `u` with `type="seed1"` and p<r. For `type="seed2"`, if `ux` and `uy` are not NULL, they surpass `u`.
`u`	numeric, maximum number of projections. The default is 10. This is used for `type="seed1"`, `type="seed2"` and tyepe="pls".
`eps`	numeric, the criteria to terminate iterative projections. The default is 0.01. If increment of projections is less than `eps`, then the iterative projection is terminated.
`cut`	numeric, between 0 and 1. The default is 0.9. If `d` is NULL, `cut` is used for automatic replacements of cov(X,Y) and cov(Y,X) with their eigenvectors, depending on the value of `cut`. So, if any value of `d` is given, `cut` is not effective. cov(X,Y) and cov(Y,X) are replaced with their largest eigenvectors, whose cumulative eigenvalue proportion is bigger than the value of `cut`. This only works for `type="seed2"`.
`d`	numeric, the user-selected number of largest eigenvectors of cov(X, Y) and cov(Y, X). The default is NULL. This only works for `type="seed2"`. If any value of `d` is given, `cut` does not work.
`AS`	logical, status of automatic stop of projections. The default is `TRUE`. If TRUE, the iterative projection is automatically stopped, when the terminaion condition `eps` is satisfied. If`AS=FALSE`, the iterative projections are stopped at the value of `u`.
`scale`	logical. scaling predictors to have zero mean and one standard deviation. The default is `FALSE`. If `scale=TRUE`, each predictor is scaled with mean 0 and variance 1 for partial least squares. This option works only for `type="pls"`.

Details

Let p and r stand for the numbers of variables in the two sets and n stands for the sample size. The option of type="cca" can work only when max(p,r) < n, and seedCCA conducts standard canonical correlation analysis (Johnson and Wichern, 2007). If type="cca" is given and either p or r is equal to one, ordinary least squares (OLS) is done instead of canonical correlation analysis. If max(p,r) >= n, either type="seed1" or type="seed2" has to be chosen. This is the main purpose of seedCCA. If type="seed1", only one set of variables, saying X with p for convenience, to have more variables than the other, saying Y with r, is initially reduced by the iterative projection approach (Cook et al. 2007). And then, the canonical correlation analysis of the initially-reduced X and the original Y is finalized. If type="seed2", both X and Y are initially reduced. And then, the canonical correlation analysis of the two initially-reduced X and Y are finalzed. If type="pls", partial least squares (PLS) is done. If type="pls" is given, the first set of variables in seedCCA is predictors and the second set is response. This matters The response can be multivariate. Depeding on the value of type, the resulted subclass by seedCCA are different.:

type="cca": subclass "finalCCA" (p >2; r >2; p,r<n)

type="cca": subclass "seedols" (either p or r is equal to 1.)

type="seed1" and type="seed2": subclass "finalCCA" (max(p,r)>n)

type="pls": subclass "seedpls" (p>n and r <n)

So, plot(object) will result in different figure depending on the object.

The order of the values depending on type is follows.:

type="cca": standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass

type="cca": ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass

type="seed1": seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass

type="seed1": seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass

type="seed2": seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass

type="pls": partial least squares (p>n and r<n) / "seedpls" subclass

Value

`type="cca"`	Values with selecting `type="cca"`: standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="cca"`	Values with selecting `type="cca"`: ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass
`coef`	the estimated ordinary least squares coefficients
`X`	X, the first set
`Y`	Y, the second set
`type="seed1"`	Values with selecting `type="seed1"`: seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.u`	a suggested proper number of projections for X
`initialMX0`	the initialized canonical coefficient matrices of X
`newX`	initially-reduced X
`Y`	the original Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="seed1"`	Values with selecting `type="seed1"`: seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass)
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.u`	a suggested proper number of projections for Y
`X`	the original X
`initialMY0`	the initialized canonical coefficient matrices of Y
`newY`	initially-reduced Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="seed2"`	Values with selecting `type="seed2"`: seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.ux`	a suggested proper number of projections for X
`proper.uy`	a suggested proper number of projections for Y
`d`	suggested number of eigenvectors of cov(X,Y)
`initialMX0`	the initialized canonical coefficient matrices of X
`initialMY0`	the initialized canonical coefficient matrices of Y
`newX`	initially-reduced X
`newY`	initially-reduced Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="pls"`	Values with selecting `type="pls"`:: partial least squares (p>n and r<n) / "seedpls" subclass
`coef`	the estimated coefficients for each iterative projection upto u
`u`	the maximum number of projections
`X`	predictors
`Y`	response
`scale`	status of scaling predictors
`cases`	the number of observations

References

R. D. Cook, B. Li and F. Chiaromonte. Dimension reduction in regression without matrix inversion. Biometrika 2007; 94: 569-584.

Y. Im, H. Gang and JK. Yoo. High-throughput data dimension reduction via seeded canonical correlation analysis, J. Chemometrics 2015; 29: 193-199.

R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall: New Jersey, USA; 6 edition. 2007; 539-574.

K. Lee and JK. Yoo. Canonical correlation analysis through linear modeling, AUST. NZ. J. STAT. 2014; 56: 59-72.

Examples

######  data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
dim(X);dim(Y)

## standard CCA
fit.cca <-seedCCA(X[,1:4], Y, type="cca")  ## standard canonical correlation analysis is done.
plot(fit.cca)

## ordinary least squares
fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca")  ## ordinary least squares is done, because r=1.
fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca")  ## ordinary least squares is done, because p=1.

## seeded CCA with case 1
fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3.
fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times.
fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE)  ## projections not done until uy=6.
plot(fit.seed1)

## partial least squares
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response
par(mfrow=c(1,2))
plot(fit.pls1); plot(fit.pls.m)


########  data(nutrimouse) ########
data(nutrimouse)
X<-as.matrix(nutrimouse$gene)
Y<-as.matrix(nutrimouse$lipid)
dim(X);dim(Y)

## seeded CCA with case 2
fit.seed2 <- seedCCA(X, Y, type="seed2")  ## d not specified, so cut=0.9 (default) used.
fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99)  ## cut=0.99 used.
fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3)  ## d is specified with 3.

## ux and uy specified, so proper values not suggested.
fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10)
plot(fit.seed2)

[Package seedCCA version 3.1 Index]