cancor {candisc}  R Documentation 
The function cancor
generalizes and regularizes computation
for canonical correlation analysis in a way conducive to visualization
using methods in the heplots
package.
cancor(x, ...) ## S3 method for class 'formula' cancor(formula, data, subset, weights, na.rm=TRUE, method = "gensvd", ...) ## Default S3 method: cancor(x, y, weights, X.names = colnames(x), Y.names = colnames(y), row.names = rownames(x), xcenter = TRUE, ycenter = TRUE, xscale = FALSE, yscale = FALSE, ndim = min(p, q), set.names = c("X", "Y"), prefix = c("Xcan", "Ycan"), na.rm = TRUE, use = if (na.rm) "complete" else "pairwise", method = "gensvd", ... ) ## S3 method for class 'cancor' print(x, digits = max(getOption("digits")  2, 3), ...) ## S3 method for class 'cancor' summary(object, digits = max(getOption("digits")  2, 3), ...) ## S3 method for class 'cancor' coef(object, type = c("x", "y", "both", "list"), standardize=FALSE, ...) scores(x, ...) ## S3 method for class 'cancor' scores(x, type = c("x", "y", "both", "list", "data.frame"), ...)
formula 
A twosided formula of the form 
data 
The data.frame within which the formula is evaluated 
subset 
an optional vector specifying a subset of observations to be used in the calculations. 
weights 
Observation weights. If supplied, this must be a vector of length equal to the number of
observations in X and Y, typically within [0,1]. In that case, the variancecovariance
matrices are computed using 
na.rm 
logical, determining whether observations with missing cases are excluded in the computation of the variance matrix of (X,Y). See Notes for details on missing data. 
method 
the method to be used for calculation; currently only 
x 
Varies depending on method. For the 
y 
For the 
X.names, Y.names 
Character vectors of names for the X and Y variables. 
row.names 
Observation names in 
xcenter, ycenter 
logical. Center the X, Y variables? [not yet implemented] 
xscale, yscale 
logical. Scale the X, Y variables to unit variance? [not yet implemented] 
ndim 
Number of canonical dimensions to retain in the result, for scores, coefficients, etc. 
set.names 
A vector of two character strings, giving names for the collections of the X, Y variables. 
prefix 
A vector of two character strings, giving prefixes used to name the X and Y canonical variables, respectively. 
use 
argument passed to 
object 
A 
digits 
Number of digits passed to 
... 
Other arguments, passed to methods 
type 
For the 
standardize 
For the 
Canonical correlation analysis (CCA), as traditionally presented is used to identify and measure the associations between two sets of quantitative variables, X and Y. It is often used in the same situations for which a multivariate multiple regression analysis (MMRA) would be used. However, CCA is is “symmetric” in that the sets X and Y have equivalent status, and the goal is to find orthogonal linear combinations of each having maximal (canonical) correlations. On the other hand, MMRA is “asymmetric”, in that the Y set is considered as responses, each one to be explained by separate linear combinations of the Xs.
This implementation of cancor
provides the basic computations for CCA, together with
some extractor functions and methods for working with the results in a
convenient fashion.
However, for visualization using HE plots, it is most natural to consider plots representing
the relations among the canonical variables for the Y variables in terms of a
multivariate linear model predicting the Y canonical scores, using either the X variables
or the X canonical scores as predictors. Such plots, using heplot.cancor
provide a lowrank (1D, 2D, 3D) visualization of the relations between the two sets,
and so are useful in cases when there are more than 2 or 3 variables in each of X and Y.
The connection between CCA and HE plots for MMRA models can be developed as follows. CCA can also be viewed as a principal component transformation of the predicted values of one set of variables from a regression on the other set of variables, in the metric of the error covariance matrix.
For example, regress the Y variables on the X variables,
giving predicted values \hat{Y} = X (X'X)^{1} X' Y
and residuals R = Y  \hat{Y}.
The error covariance matrix is E = R'R/(n1).
Choose a transformation Q
that orthogonalizes the error covariance matrix to an identity, that is,
(RQ)'(RQ) = Q' R' R Q = (n1) I,
and apply the same transformation to the predicted values to yield, say, Z = \hat{Y} Q.
Then, a principal component analysis on the covariance matrix of Z gives eigenvalues of
E^{1} H, and so is equivalent to the MMRA analysis of lm(Y ~ X)
statistically,
but visualized here in canonical space.
An object of class cancorr
, a list with the following components:
cancor 
Canonical correlations, i.e., the correlations between each canonical variate for the Y variables with the corresponding canonical variate for the X variables. 
names 
Names for various items, a list of 4 components: 
ndim 
Number of canonical dimensions extracted, 
dim 
Problem dimensions, a list of 3 components: 
coef 
Canonical coefficients, a list of 2 components: 
scores 
Canonical variate scores, a list of 2 components:

X 
The matrix X 
Y 
The matrix Y 
weights 
Observation weights, if supplied, else 
structure 
Structure correlations ("loadings"), a list of 4 components:
The formula method also returns components 
Not all features of CCA are presently implemented: standardized vs. raw scores, more flexible handling of missing data, other plot methods, ...
Michael Friendly
Gittins, R. (1985). Canonical Analysis: A Review with Applications in Ecology, Berlin: Springer.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. London: Academic Press.
Other implementations of CCA: cancor
(very basic),
cca
in the yacca (fairly complete, but very messy return structure),
cc
in CCA (fairly complete, very messy return structure, no longer maintained).
redundancy
, for redundancy analysis;
plot.cancor
, for enhanced scatterplots of the canonical variates.
heplot.cancor
for CCA HE plots and
heplots
for generic heplot methods.
candisc
for related methods focused on multivariate linear models with
one or more factors among the X variables.
data(Rohwer, package="heplots") X < as.matrix(Rohwer[,6:10]) # the PA tests Y < as.matrix(Rohwer[,3:5]) # the aptitude/ability variables # visualize the correlation matrix using corrplot() if (require(corrplot)) { M < cor(cbind(X,Y)) corrplot(M, method="ellipse", order="hclust", addrect=2, addCoef.col="black") } (cc < cancor(X, Y, set.names=c("PA", "Ability"))) ## Canonical correlation analysis of: ## 5 PA variables: n, s, ns, na, ss ## with 3 Ability variables: SAT, PPVT, Raven ## ## CanR CanRSQ Eigen percent cum scree ## 1 0.6703 0.44934 0.81599 77.30 77.30 ****************************** ## 2 0.3837 0.14719 0.17260 16.35 93.65 ****** ## 3 0.2506 0.06282 0.06704 6.35 100.00 ** ## ## Test of H0: The canonical correlations in the ## current row and all that follow are zero ## ## CanR WilksL F df1 df2 p.value ## 1 0.67033 0.44011 3.8961 15 168.8 0.000006 ## 2 0.38366 0.79923 1.8379 8 124.0 0.076076 ## 3 0.25065 0.93718 1.4078 3 63.0 0.248814 # formula method cc < cancor(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer, set.names=c("PA", "Ability")) # using observation weights set.seed(12345) wts < sample(0:1, size=nrow(Rohwer), replace=TRUE, prob=c(.05, .95)) (ccw < cancor(X, Y, set.names=c("PA", "Ability"), weights=wts) ) # show correlations of the canonical scores zapsmall(cor(scores(cc, type="x"), scores(cc, type="y"))) # standardized coefficients coef(cc, type="both", standardize=TRUE) plot(cc, smooth=TRUE) ################## data(schooldata) ################## #fit the MMreg model school.mod < lm(cbind(reading, mathematics, selfesteem) ~ education + occupation + visit + counseling + teacher, data=schooldata) Anova(school.mod) pairs(school.mod) # canonical correlation analysis school.cc < cancor(cbind(reading, mathematics, selfesteem) ~ education + occupation + visit + counseling + teacher, data=schooldata) school.cc heplot(school.cc, xpd=TRUE, scale=0.3)