dimGB {FMradio}R Documentation

Assess the latent dimensionality using Guttman bounds

Description

dimGB is a function that calculates the first, second, and third Guttman (lower-)bounds to the dimensionality of the latent vector. These can be used to choose the number of latent factors.

Usage

dimGB(R, graph = TRUE, verbose = TRUE)

Arguments

R

(Regularized) correlation matrix.

graph

A logical indicating if the results should be visualized.

verbose

A logical indicating if the function should run silently.
Runs silently when verbose = FALSE.

Details

The communality in factor analysis refers to the amount of variance (of feature j) explained by the latent features. The correlation of any feature with itself can then be decomposed into common variance (the communality) and unique variance. This implies that unity (1) minus the unique variance for feature j equals the communality for feature j. From the matrix perspective one can then construct a reduced correlation matrix: the correlation matrix with communalities in the diagonal. This reduced correlation matrix is, by the assumptions on the factor model, Gramian and of rank m, with m indicating the intrinsic dimensionality of the latent vector. The dimension of the latent vector (i.e., the number of common factors) can then be assessed by evaluating the rank of the sample correlation matrix in which the diagonal elements are replaced with appropriate communality estimates.

In our case, which is often high-dimensional, we use the regularized correlation matrix as our sample-representation of the population correlation matrix. The diagonal elements are then replaced with Guttman's lower-bound estimates for the communalities (Guttman, 1956). Guttman (1956) gives 3 (ordered) lower-bound estimates. The first estimate is the most conservative, using 0 as a lower-bound estimate of the communalities. From this perspective, every positive eigenvalue of the reduced sample correlation matrix is indicative of a latent factor whose contribution to variance-explanation is above and beyond mere unique variance. The decisonal approach would then be to retain all such factors. See Peeters et al. (2019) for additional detail.

The Guttman approach has historically been used as a lower-bound estimate of the latent dimensionality. We consider the decisional approach stated above to give an upper-bound. Peeters et al. (2019) contains an extensive simulation study showing that in high-dimensional situations this decisional approach provides a reliable upper-bound. The choice of the number of factors can be further assessed with the SMC and dimVAR functions. Assessments provided by these latter functions may inform if the result of the decisional rule above should be accepted or be treated as an upper-bound.

When graph = TRUE the Guttman bounds are visualized. It plots the consecutive eigenvalues for each of the reduced correlation matrices. The number of positive eigenvalues for each respective reduced correlation matrix then corresponds to each of the respective Guttman bounds. The visualization may be of limited value when the feature-dimension gets (very) large.

Value

The function returns an object of class table. The entries correspond to the first, second, and third Guttman bounds.

Note

Author(s)

Carel F.W. Peeters <cf.peeters@vumc.nl>

References

Guttman, L. (1956). Best possible systematic estimates of communalities. Psychometrika, 21:273–285.

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].

See Also

SMC, dimVAR, FAsim

Examples

## Simulate some data according to a factor model with 5 latent factors
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 100)
simDAT$cormatrix

## Evaluate the Guttman bounds
## First Guttman bound indicates to retain 5 latent factors
GB <- dimGB(simDAT$cormatrix)
print(GB)

[Package FMradio version 1.1.1 Index]