seedCCA {seedCCA} | R Documentation |
Seeded Canonical correlation analysis
Description
The function seedCCA
is mainly for implementing seeded canonical correlation analysis proposed by Im et al. (2015). The function conducts the following four methods, depending on the value of type
. The option type
has one of c("cca", "seed1", "seed2", "pls")
.
Usage
seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)
Arguments
X |
numeric vector or matrix (n * p), the first set of variables |
Y |
numeric vector or matrix (n * r), the second set of variables |
type |
character, a choice of methods among |
ux |
numeric, maximum number of projections for X. The default is NULL. If this is not NULL, it surpasses the option |
uy |
numeric, maximum number of projections for Y. The default is NULL. If this is not NULL, it surpasses the option |
u |
numeric, maximum number of projections. The default is 10. This is used for |
eps |
numeric, the criteria to terminate iterative projections. The default is 0.01. If increment of projections is less than |
cut |
numeric, between 0 and 1. The default is 0.9.
If |
d |
numeric, the user-selected number of largest eigenvectors of cov(X, Y) and cov(Y, X). The default is NULL. This only works for |
AS |
logical, status of automatic stop of projections. The default is |
scale |
logical. scaling predictors to have zero mean and one standard deviation. The default is |
Details
Let p and r stand for the numbers of variables in the two sets and n stands for the sample size. The option of type="cca"
can work only when max(p,r) < n, and seedCCA
conducts standard canonical correlation analysis (Johnson and Wichern, 2007). If type="cca"
is given and either p or r is equal to one, ordinary least squares (OLS) is done instead of canonical correlation analysis. If max(p,r) >= n, either type="seed1"
or type="seed2"
has to be chosen. This is the main purpose of seedCCA
. If type="seed1"
, only one set of variables, saying X with p for convenience, to have more variables than the other, saying Y with r, is initially reduced by the iterative projection approach (Cook et al. 2007). And then, the canonical correlation analysis of the initially-reduced X and the original Y is finalized. If type="seed2"
, both X and Y are initially reduced. And then, the canonical correlation analysis of the two initially-reduced X and Y are finalzed. If type="pls"
, partial least squares (PLS) is done. If type="pls"
is given, the first set of variables in seedCCA
is predictors and the second set is response. This matters The response can be multivariate. Depeding on the value of type
, the resulted subclass by seedCCA
are different.:
type="cca"
: subclass "finalCCA" (p >2; r >2; p,r<n)
type="cca"
: subclass "seedols" (either p or r is equal to 1.)
type="seed1"
and type="seed2"
: subclass "finalCCA" (max(p,r)>n)
type="pls"
: subclass "seedpls" (p>n and r <n)
So, plot(object)
will result in different figure depending on the object.
The order of the values depending on type is follows.:
type="cca"
: standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass
type="cca"
: ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass
type="seed1"
: seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass
type="seed1"
: seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass
type="seed2"
: seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass
type="pls"
: partial least squares (p>n and r<n) / "seedpls" subclass
Value
type="cca" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="cca" |
Values with selecting |
coef |
the estimated ordinary least squares coefficients |
X |
X, the first set |
Y |
Y, the second set |
type="seed1" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.u |
a suggested proper number of projections for X |
initialMX0 |
the initialized canonical coefficient matrices of X |
newX |
initially-reduced X |
Y |
the original Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="seed1" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.u |
a suggested proper number of projections for Y |
X |
the original X |
initialMY0 |
the initialized canonical coefficient matrices of Y |
newY |
initially-reduced Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="seed2" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.ux |
a suggested proper number of projections for X |
proper.uy |
a suggested proper number of projections for Y |
d |
suggested number of eigenvectors of cov(X,Y) |
initialMX0 |
the initialized canonical coefficient matrices of X |
initialMY0 |
the initialized canonical coefficient matrices of Y |
newX |
initially-reduced X |
newY |
initially-reduced Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="pls" |
Values with selecting |
coef |
the estimated coefficients for each iterative projection upto u |
u |
the maximum number of projections |
X |
predictors |
Y |
response |
scale |
status of scaling predictors |
cases |
the number of observations |
References
R. D. Cook, B. Li and F. Chiaromonte. Dimension reduction in regression without matrix inversion. Biometrika 2007; 94: 569-584.
Y. Im, H. Gang and JK. Yoo. High-throughput data dimension reduction via seeded canonical correlation analysis, J. Chemometrics 2015; 29: 193-199.
R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall: New Jersey, USA; 6 edition. 2007; 539-574.
K. Lee and JK. Yoo. Canonical correlation analysis through linear modeling, AUST. NZ. J. STAT. 2014; 56: 59-72.
Examples
###### data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
dim(X);dim(Y)
## standard CCA
fit.cca <-seedCCA(X[,1:4], Y, type="cca") ## standard canonical correlation analysis is done.
plot(fit.cca)
## ordinary least squares
fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca") ## ordinary least squares is done, because r=1.
fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca") ## ordinary least squares is done, because p=1.
## seeded CCA with case 1
fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3.
fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times.
fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE) ## projections not done until uy=6.
plot(fit.seed1)
## partial least squares
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response
par(mfrow=c(1,2))
plot(fit.pls1); plot(fit.pls.m)
######## data(nutrimouse) ########
data(nutrimouse)
X<-as.matrix(nutrimouse$gene)
Y<-as.matrix(nutrimouse$lipid)
dim(X);dim(Y)
## seeded CCA with case 2
fit.seed2 <- seedCCA(X, Y, type="seed2") ## d not specified, so cut=0.9 (default) used.
fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99) ## cut=0.99 used.
fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3) ## d is specified with 3.
## ux and uy specified, so proper values not suggested.
fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10)
plot(fit.seed2)