fastCor {HiClimR} | R Documentation |
Fast correlation for large matrices
Description
fastCor
is a helper function that compute Pearson correlation matrix
for HiClimR
and validClimR
functions. It is similar
to cor
function in R but uses a faster implementation on 64-bit
machines (an optimized BLAS
library is highly recommended). fastCor
also uses a memory-efficient algorithm that allows for splitting the data matrix and
only compute the upper-triangular part of the correlation matrix. It can be used to
compute correlation matrix for the columns of any data matrix.
Usage
fastCor(xt, nSplit = 1, upperTri = FALSE, optBLAS = FALSE, verbose = TRUE)
Arguments
xt |
an ( |
nSplit |
integer number greater than or equal to one, to split the data matrix into
|
upperTri |
logical to compute only the upper-triangular half of the correlation
matrix if |
optBLAS |
logical to use optimized BLAS library if installed and |
verbose |
logical to print processing information if |
Details
The fastCor
function computes the correlation matrix by
calling the cross product function in the Basic Linear Algebra Subroutines
(BLAS) library used by R. A significant performance improvement can be
achieved when building R on 64-bit machines with an optimized BLAS library,
such as ATLAS, OpenBLAS, or the commercial Intel MKL.
For big data, the memory required to allocate the square matrix of correlations
may exceed the total amount of physical memory available resulting in
“Error: cannot allocate vector of size...”. fastCor
allows
for splitting the data matrix into nSplit
splits and only computes the
upper-triangular part of the correlation matrix with upperTri = TRUE
.
This almost halves memory use, which can be very important for big data.
If nSplit > 1
, the correlation matrix (or the upper-triangular part if
upperTri = TRUE
) will be allocated and filled with computed correlation
sub-matrix for each split. the first n-1
splits have equal size while
the last split may include any remaining columns.
Value
An (N
rows by N
columns) correlation matrix.
Author(s)
Hamada S. Badr <badr@jhu.edu>, Benjamin F. Zaitchik <zaitchik@jhu.edu>, and Amin K. Dezfuli <amin.dezfuli@nasa.gov>.
References
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi: 10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
See Also
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
Examples
require(HiClimR)
## Load test case data
x <- TestCase$x
## Use fastCor function to compute the correlation matrix
t0 <- proc.time() ; xcor <- fastCor(t(x)) ; proc.time() - t0
## compare with cor function
t0 <- proc.time() ; xcor0 <- cor(t(x)) ; proc.time() - t0
## Not run:
## Split the data into 10 splits and return upper-triangular half only
xcor10 <- fastCor(t(x), nSplit = 10, upperTri = TRUE)
## End(Not run)