R: Assess the latent dimensionality using a likelihood ratio...

dimLRT {FMradio}

R Documentation

Assess the latent dimensionality using a likelihood ratio test

Description

dimLRT is a function that evaluates a likelihood ratio test on the factor model. It can be used to choose the number of latent factors.

Usage

dimLRT(R, X, maxdim, rankDOF = TRUE, graph = TRUE, 
       alpha = .05, Bartlett = FALSE, verbose = TRUE)

Arguments

`R`	(Regularized) correlation `matrix`.
`X`	A (possibly centered and scaled and possibly subsetted) data `matrix`.
`maxdim`	A `numeric` integer or `integer` indicating the maximum factor dimension to be assessed.
`rankDOF`	A `logical` indicating if the degrees of freedom should be based on the rank of the raw correlation matrix.
`graph`	A `logical` indicating if the results should be visualized.
`alpha`	A `numeric` scalar representing the alpha level. Only used when `graph = TRUE`.
`Bartlett`	A `logical` indicating if the Bartlett correction should be applied.
`verbose`	A `logical` indicating if the function should run silently. Runs silently when `verbose = FALSE`.

Details

The most formal approach to factor analytic dimensionality assessment is through likelihood ratio (LR) testing. The basic idea is to test the m-factor model against the saturated model. The corresponding LR criterion then converges, under the standard correlation matrix and corresponding parameter estimates under m-factors, to (n - 1) times a certain discrepancy function evaluated at the maximum-likelihood-parameters under the m-factor model. This quantity is approximately \chi^{2}-distributed under certain regularity conditions (Amemiya & Anderson, 1990). The general strategy would then be to sequentially test solutions of increasing dimensionality m = 1, \ldots, \mbox{maxdim} until the null hypothesis (stating that the m-factor model holds) is not rejected at Type-I error level alpha.

The degrees of freedom for the LRT under the m-factor model equals the number of parameters in the saturated model (i.e., the unstructured sample correlation) minus the number of freely estimable parameters in the m-factor model. Note that the general stategy above makes use of asymptotic results. In our setting, however, the observation dimension (n) is usually small relative to the feature dimension (p). Hence, the standard test will in a sense overestimate the degrees of freedom. One simple option dealing with this observation would be to adapt the degrees of freedom to incorporate the rank deficiency of R. This road is taken when rankDOF = TRUE. Bartlett (1950) proposed a correction factor when the sample size is small to make the test statistic behave more \chi^{2}-like. This correction factor is used when Bartlett = TRUE.

When graph = TRUE the LRT results are visualized. The graph plots the LRT p-values against the consecutive dimensions of the factor solution. A horizontal line is plotted at the value provided in the alpha argument.

Unless the number of observations is much larger than the number of features, the LRT is not recommended for inference in general. In Peeters et al. (2019) the LRT was assessed in a comparative setting inviolving high-dimensional factor models.

Value

The function returns an object of class data.frame. The first column represents the assessed dimensions running from 1 to maxdim. The second column represents the observed values of the LRT statistic. The third column represents the corresponding p-values.

Note

Note that, for argument X, the observations are expected to be in the rows and the features are expected to be in the columns.
The argument maxdim cannot exceed the Ledermann-bound (Ledermann, 1937): \lfloor [2p + 1 - (8p + 1)^{1/2}]/2\rfloor, where p indicates the observed-feature dimension. Usually, one wants to set maxdim much lower than this bound.
note that, if p > n, then the maximum rank of the raw correlation matrix is n - 1. In this case there is an alternative Ledermann-bound when rankDOF = TRUE. The number of information points in the correlation matrix is then given as n\times (n-1)/2 and this number must exceed p\times \mbox{maxdim} + p - (\mbox{maxdim} \times (\mbox{maxdim} - 1))/2, putting more restrictions on maxdim.
Other functions for factor analytic dimensionality assessment are dimGB and dimIC. In high-dimensional situations usage of dimGB on the regularized correlation matrix is recommended.

Author(s)

Carel F.W. Peeters <cf.peeters@vumc.nl>, Caroline Ubelhor

References

Amemiya, Y., & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. The Annals of Statistics, 18:1453–1463.

Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Psychology (Statistics Section), 3:77–85.

Ledermann, W. (1937). On the rank of the reduced correlational matrix in multiple factor analysis. Psychometrika, 2:85–93.

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].

Examples

## Simulate some data according to the factor model
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 500)
simDAT$cormatrix

## Calculate the LRT for models of factor dimension 1 to 20
LRT <- dimLRT(simDAT$cormatrix, simDAT$data, maxdim = 20, rankDOF = FALSE)
print(LRT)

[Package FMradio version 1.1.1 Index]