EsaBcv {esaBcv}R Documentation

Estimate Latent Factor Matrix

Description

Find out the best number of factors using Bi-Cross-Validation (BCV) with Early-Stopping-Alternation (ESA) and then estimate the factor matrix.

Usage

EsaBcv(Y, X = NULL, r.limit = 20, niter = 3, nRepeat = 12, only.r = F,
  svd.method = "fast", center = F)

Arguments

Y

observed data matrix. p is the number of variables and n is the sample size. Dimension is c(n, p)

X

the known predictors of size c(n, k) if any. Default is NULL (no known predictors). k is the number of known covariates.

r.limit

the maximum number of factor to try. Default is 20. Can be set to Inf.

niter

the number of iterations for ESA. Default is 3.

nRepeat

number of repeats of BCV. In other words, the random partition of Y will be repeated for nRepeat times. Default is 12.

only.r

whether only to estimate and return the number of factors.

svd.method

either "fast", "propack" or "standard". "fast" is using the fast.svd function in package corpcor to compute SVD, "propack" is using the propack.svd to compute SVD and "standard" is using the svd function in the base package. Because of PROPACK issues, "propack" fails for some matrices, and when that happens, the function will use "fast" to compute the SVD of that matrix instead. Default method is "fast".

center

logical, whether to add an intercept term in the model. Default is False.

Details

The model is

Y = 1 \mu' + X \beta + n^{1/2}U D V' + E \Sigma^{1/2}

where D and \Sigma are diagonal matrices, U and V are orthogonal and mu' and V' represent _mu transposed_ and _V transposed_ respectively. The entries of E are assumed to be i.i.d. standard Gaussian. The model assumes heteroscedastic noises and especially works well for high-dimensional data. The method is based on Owen and Wang (2015). Notice that when nonnull X is given or centering the data is required (which is essentially adding a known covariate with all 1), for identifiability, it's required that <X, U> = 0 or <1, U> = 0 respectively. Then the method will first make a rotation of the data matrix to remove the known predictors or centers, and then use the latter n - k (or n - k - 1 if centering is required) samples to estimate the latent factors. The rotation idea first appears in Sun et.al. (2012).

Value

EsaBcv returns an obejct of class "esabcv" The function plot plots the cross-validation results and points out the number of factors estimated An object of class "esabcv" is a list containing the following components:

best.r

the best number of factor estimated

estSigma

the diagonal entries of estimated \Sigma which is a vector of length p

estU

the estimated U. Dimension is c(n, r)

estD

the estimated diagonal entries of D which is a vector of length r

estV

the estimated V. Dimension is c(p, r)

beta

the estimated \beta which is a matrix of size c(k, p). Return NULL if the argument X is NULL.

estS

the estimated signal(factor) matrix S where

S = 1 \mu' + X \beta + n^{1/2}U D V'

mu

the sample centers of each variable which is a vector of length p. It's an estimate of \mu. Return NULL if the argument center is False.

max.r

the actual maximum number of factors used. For the details of how this is decided, please refer to Owen and Wang (2015)

result.list

a matrix with dimension c(nRepeat, (max.r + 1)) storing the detailed BCV entrywise MSE of each repeat for r from 0 to max.r

References

Art B. Owen and Jingshu Wang(2015), Bi-cross-validation for factor analysis, http://arxiv.org/abs/1503.03515

Yunting Sun, Nancy R. Zhang and Art B. Owen, Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. The Annuals of Applied Statistics, 6(4): 1664-1688, 2012

See Also

ESA, plot.esabcv

Examples

Y <- matrix(rnorm(100), nrow = 10)
EsaBcv(Y)

[Package esaBcv version 1.2.1.1 Index]