R: Disjoint Principal Components Analysis

dispca {drclust}

R Documentation

Disjoint Principal Components Analysis

Description

Performs disjoint PCA, that is, a simplified version of PCA. Computes each one of the Q principal components from a different subset of the J variables (resulting thus, in a simplified, easier to interpret loading matrix A).

Usage

dispca(X, Q, Rndstart, verbose, maxiter, tol, prep, print, constr)

Arguments

`X`	Units x variables numeric data matrix.
`Q`	Number of factors.
`Rndstart`	Number of runs to be performed (Defaults is 20).
`verbose`	Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).
`maxiter`	Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).
`tol`	Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed). Default is 1e-6.
`prep`	Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.
`print`	Prints summary statistics of the results (1 = enabled; 0 = disabled, default option).
`constr`	is a vector of length J = nr. of variables, pre-specifying to which cluster some of the variables must be assigned. Each component of the vector can assume integer values from 1 o Q (See example for more details), or 0 if no constraint on the variable is imposed (i.e., it will be assigned based on the plain algorithm).

Value

returns a list of estimates and some descriptive quantities of the final results.

`V`	Variables x factors membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster it has been assigned.
`A`	Variables x components loading matrix.
`betweenss`	Amount of deviance captured by the model (scalar).
`totss`	total amount of deviance (scalar).
`size`	Number of variables assigned to each column-cluster (vector).
`loop`	The index of the (best) run from which the results have been chosen.
`it`	the number of iterations performed during the (best) run.

Author(s)

Ionel Prunila, Maurizio Vichi

References

Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# No constraint on variables
out <- dispca(iris, Q = 2)

# Constraint: the first two variables must contribute to the same factor.
outc <- dispca(iris, Q = 2, constr = c(1,1,0,0))

[Package drclust version 0.1 Index]