dimGB {FMradio} | R Documentation |
Assess the latent dimensionality using Guttman bounds
Description
dimGB
is a function that calculates the first, second, and third Guttman (lower-)bounds to the dimensionality of the latent vector.
These can be used to choose the number of latent factors.
Usage
dimGB(R, graph = TRUE, verbose = TRUE)
Arguments
R |
(Regularized) correlation |
graph |
A |
verbose |
A |
Details
The communality in factor analysis refers to the amount of variance (of feature j
) explained by the latent features.
The correlation of any feature with itself can then be decomposed into common variance (the communality) and unique variance.
This implies that unity (1) minus the unique variance for feature j
equals the communality for feature j
.
From the matrix perspective one can then construct a reduced correlation matrix: the correlation matrix with communalities in the diagonal.
This reduced correlation matrix is, by the assumptions on the factor model, Gramian and of rank m
, with m
indicating the intrinsic dimensionality of the latent vector.
The dimension of the latent vector (i.e., the number of common factors) can then be assessed by evaluating the rank of the sample correlation matrix in which the diagonal elements are replaced with appropriate communality estimates.
In our case, which is often high-dimensional, we use the regularized correlation matrix as our sample-representation of the population correlation matrix. The diagonal elements are then replaced with Guttman's lower-bound estimates for the communalities (Guttman, 1956). Guttman (1956) gives 3 (ordered) lower-bound estimates. The first estimate is the most conservative, using 0 as a lower-bound estimate of the communalities. From this perspective, every positive eigenvalue of the reduced sample correlation matrix is indicative of a latent factor whose contribution to variance-explanation is above and beyond mere unique variance. The decisonal approach would then be to retain all such factors. See Peeters et al. (2019) for additional detail.
The Guttman approach has historically been used as a lower-bound estimate of the latent dimensionality.
We consider the decisional approach stated above to give an upper-bound.
Peeters et al. (2019) contains an extensive simulation study showing that in high-dimensional situations this decisional approach provides a reliable upper-bound.
The choice of the number of factors can be further assessed with the SMC
and dimVAR
functions.
Assessments provided by these latter functions may inform if the result of the decisional rule above should be accepted or be treated as an upper-bound.
When graph = TRUE
the Guttman bounds are visualized.
It plots the consecutive eigenvalues for each of the reduced correlation matrices.
The number of positive eigenvalues for each respective reduced correlation matrix then corresponds to each of the respective Guttman bounds.
The visualization may be of limited value when the feature-dimension gets (very) large.
Value
The function returns an object of class table
.
The entries correspond to the first, second, and third Guttman bounds.
Note
Again, from a historical perspective, the decisional rule would have been used as a lower-bound to the question of the number of latent common factors. In high-dimensional situations we recommend to use it as an upper-bound.
Other functions for factor analytic dimensionality assessment are
dimIC
anddimLRT
. In high-dimensional situations usage ofdimGB
is recommended over these other functions.
Author(s)
Carel F.W. Peeters <cf.peeters@vumc.nl>
References
Guttman, L. (1956). Best possible systematic estimates of communalities. Psychometrika, 21:273–285.
Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].
See Also
Examples
## Simulate some data according to a factor model with 5 latent factors
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 100)
simDAT$cormatrix
## Evaluate the Guttman bounds
## First Guttman bound indicates to retain 5 latent factors
GB <- dimGB(simDAT$cormatrix)
print(GB)