big_randomSVD {bigstatsr}  R Documentation 
An algorithm for partial SVD (or PCA) of a Filebacked Big Matrix based on the
algorithm in RSpectra (by Yixuan Qiu and Jiali Mei).
This algorithm is linear in time in all dimensions and is very
memoryefficient. Thus, it can be used on very large big.matrices.
big_randomSVD( X, fun.scaling = big_scale(center = FALSE, scale = FALSE), ind.row = rows_along(X), ind.col = cols_along(X), k = 10, tol = 1e04, verbose = FALSE, ncores = 1, fun.prod = big_prodVec, fun.cprod = big_cprodVec )
X 
An object of class FBM. 
fun.scaling 
A function that returns a named list of
\frac{X_{i,j}  mean_j}{sd_j}. Default doesn't use any scaling. 
ind.row 
An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices. 
ind.col 
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices. 
k 
Number of singular vectors/values to compute. Default is 
tol 
Precision parameter of svds. Default is 
verbose 
Should some progress be printed? Default is 
ncores 
Number of cores used. Default doesn't use parallelism. You may use nb_cores. 
fun.prod 
Function that takes 6 arguments (in this order):

fun.cprod 
Same as 
A named list (an S3 class "big_SVD") of
d
, the singular values,
u
, the left singular vectors,
v
, the right singular vectors,
niter
, the number of the iteration of the algorithm,
nops
, number of MatrixVector multiplications used,
center
, the centering vector,
scale
, the scaling vector.
Note that to obtain the Principal Components, you must use predict on the result. See examples.
The idea of using this Implicitly Restarted Arnoldi Method algorithm
comes from G. Abraham, Y. Qiu, and M. Inouye,
FlashPCA2: principal component analysis of biobankscale genotype datasets,
bioRxiv: doi: 10.1101/094714.
It proved to be faster than our implementation of the "blanczos" algorithm
in Rokhlin, V., Szlam, A., & Tygert, M. (2010).
A Randomized Algorithm for Principal Component Analysis.
SIAM Journal on Matrix Analysis and Applications, 31(3), 11001124.
doi: 10.1137/080736417.
set.seed(1) X < big_attachExtdata() K < 10 # Using only half of the data for "training" n < nrow(X) ind < sort(sample(n, n/2)) test < big_randomSVD(X, fun.scaling = big_scale(), ind.row = ind, k = K) str(test) pca < prcomp(X[ind, ], center = TRUE, scale. = TRUE) # same scaling all.equal(test$center, pca$center) all.equal(test$scale, pca$scale) # use this function to predict scores class(test) scores < predict(test) # scores and loadings are the same or opposite plot(scores, pca$x[, 1:K]) plot(test$v, pca$rotation[, 1:K]) plot(test$u) plot(test, type = "scores") # projecting on new data ind2 < setdiff(rows_along(X), ind) scores.test2 < predict(test, X, ind.row = ind2) scores.test3 < predict(pca, X[ind, ]) plot(scores.test2, scores.test3[, 1:K])