dimLRT {FMradio} | R Documentation |
Assess the latent dimensionality using a likelihood ratio test
Description
dimLRT
is a function that evaluates a likelihood ratio test on the factor model.
It can be used to choose the number of latent factors.
Usage
dimLRT(R, X, maxdim, rankDOF = TRUE, graph = TRUE,
alpha = .05, Bartlett = FALSE, verbose = TRUE)
Arguments
R |
(Regularized) correlation |
X |
A (possibly centered and scaled and possibly subsetted) data |
maxdim |
A |
rankDOF |
A |
graph |
A |
alpha |
A |
Bartlett |
A |
verbose |
A |
Details
The most formal approach to factor analytic dimensionality assessment is through likelihood ratio (LR) testing.
The basic idea is to test the m
-factor model against the saturated model.
The corresponding LR criterion then converges, under the standard correlation matrix and corresponding parameter estimates under m
-factors, to (n - 1)
times a certain discrepancy function evaluated at the maximum-likelihood-parameters under the m
-factor model.
This quantity is approximately \chi^{2}
-distributed under certain regularity conditions (Amemiya & Anderson, 1990).
The general strategy would then be to sequentially test solutions of increasing dimensionality m = 1, \ldots, \mbox{maxdim}
until the null hypothesis (stating that the m
-factor model holds) is not rejected at Type-I error level alpha
.
The degrees of freedom for the LRT under the m
-factor model equals the number of parameters in the saturated model (i.e., the unstructured sample correlation) minus the number of freely estimable parameters in the m
-factor model.
Note that the general stategy above makes use of asymptotic results.
In our setting, however, the observation dimension (n
) is usually small relative to the feature dimension (p
).
Hence, the standard test will in a sense overestimate the degrees of freedom.
One simple option dealing with this observation would be to adapt the degrees of freedom to incorporate the rank deficiency of R
.
This road is taken when rankDOF = TRUE
.
Bartlett (1950) proposed a correction factor when the sample size is small to make the test statistic behave more \chi^{2}
-like.
This correction factor is used when Bartlett = TRUE
.
When graph = TRUE
the LRT results are visualized.
The graph plots the LRT p
-values against the consecutive dimensions of the factor solution.
A horizontal line is plotted at the value provided in the alpha
argument.
Unless the number of observations is much larger than the number of features, the LRT is not recommended for inference in general. In Peeters et al. (2019) the LRT was assessed in a comparative setting inviolving high-dimensional factor models.
Value
The function returns an object of class data.frame
.
The first column represents the assessed dimensions running from 1 to maxdim
.
The second column represents the observed values of the LRT statistic.
The third column represents the corresponding p
-values.
Note
Note that, for argument
X
, the observations are expected to be in the rows and the features are expected to be in the columns.The argument
maxdim
cannot exceed the Ledermann-bound (Ledermann, 1937):\lfloor [2p + 1 - (8p + 1)^{1/2}]/2\rfloor
, wherep
indicates the observed-feature dimension. Usually, one wants to setmaxdim
much lower than this bound.note that, if
p > n
, then the maximum rank of the raw correlation matrix isn - 1
. In this case there is an alternative Ledermann-bound whenrankDOF = TRUE
. The number of information points in the correlation matrix is then given asn\times (n-1)/2
and this number must exceedp\times \mbox{maxdim} + p - (\mbox{maxdim} \times (\mbox{maxdim} - 1))/2
, putting more restrictions onmaxdim
.Other functions for factor analytic dimensionality assessment are
dimGB
anddimIC
. In high-dimensional situations usage ofdimGB
on the regularized correlation matrix is recommended.
Author(s)
Carel F.W. Peeters <cf.peeters@vumc.nl>, Caroline Ubelhor
References
Amemiya, Y., & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. The Annals of Statistics, 18:1453–1463.
Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Psychology (Statistics Section), 3:77–85.
Ledermann, W. (1937). On the rank of the reduced correlational matrix in multiple factor analysis. Psychometrika, 2:85–93.
Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].
See Also
Examples
## Simulate some data according to the factor model
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 500)
simDAT$cormatrix
## Calculate the LRT for models of factor dimension 1 to 20
LRT <- dimLRT(simDAT$cormatrix, simDAT$data, maxdim = 20, rankDOF = FALSE)
print(LRT)