ccaGrid {ccaPP} | R Documentation |
(Robust) CCA via alternating series of grid searches
Description
Perform canoncial correlation analysis via projection pursuit based on alternating series of grid searches in two-dimensional subspaces of each data set, with a focus on robust and nonparametric methods.
Usage
ccaGrid(x, y, k = 1, method = c("spearman", "kendall", "quadrant", "M",
"pearson"), control = list(...), nIterations = 10, nAlternate = 10,
nGrid = 25, select = NULL, tol = 1e-06, standardize = TRUE,
fallback = FALSE, seed = NULL, ...)
CCAgrid(x, y, k = 1, method = c("spearman", "kendall", "quadrant", "M",
"pearson"), maxiter = 10, maxalter = 10, splitcircle = 25,
select = NULL, zero.tol = 1e-06, standardize = TRUE,
fallback = FALSE, seed = NULL, ...)
Arguments
x , y |
each can be a numeric vector, matrix or data frame. |
k |
an integer giving the number of canonical variables to compute. |
method |
a character string specifying the correlation functional to
maximize. Possible values are |
control |
a list of additional arguments to be passed to the specified
correlation functional. If supplied, this takes precedence over additional
arguments supplied via the |
nIterations , maxiter |
an integer giving the maximum number of iterations. |
nAlternate , maxalter |
an integer giving the maximum number of alternate series of grid searches in each iteration. |
nGrid , splitcircle |
an integer giving the number of equally spaced grid points on the unit circle to use in each grid search. |
select |
optional; either an integer vector of length two or a list
containing two index vectors. In the first case, the first integer gives
the number of variables of |
tol , zero.tol |
a small positive numeric value to be used for determining convergence. |
standardize |
a logical indicating whether the data should be (robustly) standardized. |
fallback |
logical indicating whether a fallback mode for robust standardization should be used. If a correlation functional other than the Pearson correlation is maximized, the first attempt for standardizing the data is via median and MAD. In the fallback mode, variables whose MADs are zero (e.g., dummy variables) are standardized via mean and standard deviation. Note that if the Pearson correlation is maximized, standardization is always done via mean and standard deviation. |
seed |
optional initial seed for the random number generator (see
|
... |
additional arguments to be passed to the specified correlation functional. Currently, this is only relevant for the M-estimator. For Spearman, Kendall and quadrant correlation, consistency at the normal model is always forced. |
Details
The algorithm is based on alternating series of grid searches in
two-dimensional subspaces of each data set. In each grid search,
nGrid
grid points on the unit circle in the corresponding plane are
obtained, and the directions from the center to each of the grid points are
examined. In the first iteration, equispaced grid points in the interval
[-\pi/2, \pi/2)
are used. In each subsequent
iteration, the angles are halved such that the interval
[-\pi/4, \pi/4)
is used in the second iteration and so
on. If only one data set is multivariate, the algorithm simplifies
to iterative grid searches in two-dimensional subspaces of the corresponding
data set.
In the basic algorithm, the order of the variables in a series of grid
searches for each of the data sets is determined by the average absolute
correlations with the variables of the respective other data set. Since
this requires to compute the full (p \times q)
matrix of
absolute correlations, where p
denotes the number of variables of
x
and q
the number of variables of y
, a faster
modification is available as well. In this modification, the average
absolute correlations are computed over only a subset of the variables of
the respective other data set. It is thereby possible to use randomly
selected subsets of variables, or to specify the subsets of variables
directly.
Note that also the data sets are ordered according to the maximum average absolute correlation with the respective other data set to ensure symmetry of the algorithm.
For higher order canonical correlations, the data are first transformed into suitable subspaces. Then the alternate grid algorithm is applied to the reduced data and the results are back-transformed to the original space.
Value
An object of class "cca"
with the following components:
cor |
a numeric vector giving the canonical correlation measures. |
A |
a numeric matrix in which the columns contain the canonical vectors
for |
B |
a numeric matrix in which the columns contain the canonical vectors
for |
centerX |
a numeric vector giving the center estimates used in
standardization of |
centerY |
a numeric vector giving the center estimates used in
standardization of |
scaleX |
a numeric vector giving the scale estimates used in
standardization of |
scaleY |
a numeric vector giving the scale estimates used in
standardization of |
call |
the matched function call. |
Note
CCAgrid
is a simple wrapper function for ccaGrid
for
more compatibility with package pcaPP concerning function and argument
names.
Author(s)
Andreas Alfons
See Also
ccaProj
, maxCorGrid
,
corFunctions
Examples
data("diabetes")
x <- diabetes$x
y <- diabetes$y
## Spearman correlation
ccaGrid(x, y, method = "spearman")
## Pearson correlation
ccaGrid(x, y, method = "pearson")