autoFMradio {FMradio} | R Documentation |
Wrapper for automated workflow
Description
autoFMradio
is a wrapper function that automates the three main steps of the FMradio
workflow.
Usage
autoFMradio(X, t = .95, fold = 5, GB = 1, type = "thomson",
verbose = TRUE, printInfo = TRUE, seed = NULL)
Arguments
X |
A data |
t |
A scalar |
fold |
A |
GB |
A |
type |
A |
verbose |
A |
printInfo |
A |
seed |
A |
Details
The autoFMradio
function automates the three main steps of the workflow by providing a wrapper around all core functions.
Step 1 (regularized correlation matrix estimation) is performed using the X
, t
, and fold arguments.
The raw correlation matrix based on data X
is redundancy-filtered using the threshold provided in t
.
Subsequently, a regularized estimate of the correlation matrix (on the possibly filtered feature set) is computed with the optimal penalty value determined by cross-validation.
The number of folds is set by the fold
argument.
For more information on Step 1 see RF
, subSet
, and regcor
.
Step 2 (factor analytic data compression) is performed using the GB
argument.
With this argument one can use either the first, second, or third Guttman bound to select the intrinsic dimensionality of the latent vector.
This bound, together with the regularized correlation matrix, is used in a maximum likelihood factor analysis with simple-structure rotation.
For more information on Step 2, see dimGB
and mlFA
.
Step 3 (obtaining factor scores) is performed using the type
argument.
It determines factor scores: the score each object/individual would obtain on each of the latent factors.
The type
argument determines the type of factor score that is calculated.
For more information on Step 3, see facScore
.
When printInfo = TRUE
additional information is printed on-screen after the full procedure has run its course.
This additional information pertains to each of the steps mentioned above.
For Step 1 it reiterates the thresholding value for redundancy filtering and gives the number of features retained after this filtering.
It also reiterates the number of folds used in determining the optimal penalty value as well as this value itself.
Moreover, it provides the value of the Kaiser-Meyer-Olkin index on the optimal regularized correlation matrix estimate (see SA
).
For Step 2 it reiterates which Guttman bound was used in determining the number of latent factors as well as the number of latent factors retained.
It also gives the proportion of explained variance under the factor solution of the chosen latent dimension (see dimVAR
).
For step 3 it reiterates the type of factor score that was calculated.
Also, it prints the lowest ‘determinacy score’ amongst the latent factors (see facSMC
).
The factor scores in the $Scores
slot of the output (see below) can be directly used as input features in any prediction or classification procedure.
In case of external (rather than internal) validation one can use the parameter matrices in the $Loadings
and $Uniqueness
slots in combination with fresh data to provide a validation factor projection based on the training solution.
See Peeters et al. (2019).
Value
The function returns an object of class list
:
$Scores |
An object of class |
$FilteredData |
Subsetted data |
$FilteredCor |
A correlation |
$optPen |
A |
$optCor |
A |
$m |
An |
$Loadings |
A matrix of class |
$Uniqueness |
A |
$Exvariance |
A |
$determinacy |
A |
$used.seed |
A |
Note
When seed = NULL
the starting seed is determined by drawing a single integer from the integers 1:9e5
. This non-user-supplied seed is also found in the $used.seed
slot of the output.
Author(s)
Carel F.W. Peeters <cf.peeters@vumc.nl>
References
Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].
See Also
RF
, subSet
, regcor
, dimGB
, mlFA
, facScore
Examples
## Simulate some data according to a factor model with 3 latent factors
simDAT <- FAsim(p = 24, m = 3, n = 40, loadingvalue = .9)
X <- simDAT$data
## Perform the lot
FullMonty <- autoFMradio(X, GB = 1, seed = 303)