dispca {drclust} | R Documentation |
Disjoint Principal Components Analysis
Description
Performs disjoint PCA, that is, a simplified version of PCA. Computes each one of the Q principal components from a different subset of the J variables (resulting thus, in a simplified, easier to interpret loading matrix A).
Usage
dispca(X, Q, Rndstart, verbose, maxiter, tol, prep, print, constr)
Arguments
X |
Units x variables numeric data matrix. |
Q |
Number of factors. |
Rndstart |
Number of runs to be performed (Defaults is 20). |
verbose |
Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option). |
maxiter |
Maximum number of iterations allowed (if convergence is not yet reached. Default is 100). |
tol |
Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed). Default is 1e-6. |
prep |
Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed. |
print |
Prints summary statistics of the results (1 = enabled; 0 = disabled, default option). |
constr |
is a vector of length J = nr. of variables, pre-specifying to which cluster some of the variables must be assigned. Each component of the vector can assume integer values from 1 o Q (See example for more details), or 0 if no constraint on the variable is imposed (i.e., it will be assigned based on the plain algorithm). |
Value
returns a list of estimates and some descriptive quantities of the final results.
V |
Variables x factors membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster it has been assigned. |
A |
Variables x components loading matrix. |
betweenss |
Amount of deviance captured by the model (scalar). |
totss |
total amount of deviance (scalar). |
size |
Number of variables assigned to each column-cluster (vector). |
loop |
The index of the (best) run from which the results have been chosen. |
it |
the number of iterations performed during the (best) run. |
Author(s)
Ionel Prunila, Maurizio Vichi
References
Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028>
Examples
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
# No constraint on variables
out <- dispca(iris, Q = 2)
# Constraint: the first two variables must contribute to the same factor.
outc <- dispca(iris, Q = 2, constr = c(1,1,0,0))