dimLRT {FMradio}R Documentation

Assess the latent dimensionality using a likelihood ratio test

Description

dimLRT is a function that evaluates a likelihood ratio test on the factor model. It can be used to choose the number of latent factors.

Usage

dimLRT(R, X, maxdim, rankDOF = TRUE, graph = TRUE, 
       alpha = .05, Bartlett = FALSE, verbose = TRUE)

Arguments

R

(Regularized) correlation matrix.

X

A (possibly centered and scaled and possibly subsetted) data matrix.

maxdim

A numeric integer or integer indicating the maximum factor dimension to be assessed.

rankDOF

A logical indicating if the degrees of freedom should be based on the rank of the raw correlation matrix.

graph

A logical indicating if the results should be visualized.

alpha

A numeric scalar representing the alpha level. Only used when graph = TRUE.

Bartlett

A logical indicating if the Bartlett correction should be applied.

verbose

A logical indicating if the function should run silently.
Runs silently when verbose = FALSE.

Details

The most formal approach to factor analytic dimensionality assessment is through likelihood ratio (LR) testing. The basic idea is to test the m-factor model against the saturated model. The corresponding LR criterion then converges, under the standard correlation matrix and corresponding parameter estimates under m-factors, to (n - 1) times a certain discrepancy function evaluated at the maximum-likelihood-parameters under the m-factor model. This quantity is approximately \chi^{2}-distributed under certain regularity conditions (Amemiya & Anderson, 1990). The general strategy would then be to sequentially test solutions of increasing dimensionality m = 1, \ldots, \mbox{maxdim} until the null hypothesis (stating that the m-factor model holds) is not rejected at Type-I error level alpha.

The degrees of freedom for the LRT under the m-factor model equals the number of parameters in the saturated model (i.e., the unstructured sample correlation) minus the number of freely estimable parameters in the m-factor model. Note that the general stategy above makes use of asymptotic results. In our setting, however, the observation dimension (n) is usually small relative to the feature dimension (p). Hence, the standard test will in a sense overestimate the degrees of freedom. One simple option dealing with this observation would be to adapt the degrees of freedom to incorporate the rank deficiency of R. This road is taken when rankDOF = TRUE. Bartlett (1950) proposed a correction factor when the sample size is small to make the test statistic behave more \chi^{2}-like. This correction factor is used when Bartlett = TRUE.

When graph = TRUE the LRT results are visualized. The graph plots the LRT p-values against the consecutive dimensions of the factor solution. A horizontal line is plotted at the value provided in the alpha argument.

Unless the number of observations is much larger than the number of features, the LRT is not recommended for inference in general. In Peeters et al. (2019) the LRT was assessed in a comparative setting inviolving high-dimensional factor models.

Value

The function returns an object of class data.frame. The first column represents the assessed dimensions running from 1 to maxdim. The second column represents the observed values of the LRT statistic. The third column represents the corresponding p-values.

Note

Author(s)

Carel F.W. Peeters <cf.peeters@vumc.nl>, Caroline Ubelhor

References

Amemiya, Y., & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. The Annals of Statistics, 18:1453–1463.

Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Psychology (Statistics Section), 3:77–85.

Ledermann, W. (1937). On the rank of the reduced correlational matrix in multiple factor analysis. Psychometrika, 2:85–93.

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].

See Also

dimGB, FAsim

Examples

## Simulate some data according to the factor model
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 500)
simDAT$cormatrix

## Calculate the LRT for models of factor dimension 1 to 20
LRT <- dimLRT(simDAT$cormatrix, simDAT$data, maxdim = 20, rankDOF = FALSE)
print(LRT)

[Package FMradio version 1.1.1 Index]