CDpca {biplotbootGUI} R Documentation

## Clustering and Disjoint Principal Component Analysis

### Description

CDpca performs a clustering and disjoint principal components analysis (CDPCA) on the given numeric data matrix and returns a list of results Given a (IxJ) real data matrix X = [xij], the CDPCA methodology is allowed to cluster the I objects into P nonempty and nonoverlapping clusters Cp, p = 1,...,P, which are identified by theirs centroids, and, simultaneously, to partitioning the J attributes into Q disjoint components, PCq, q = 1,...,Q. The CDpca function models X estimating the parameter of the model using an Alternating Least Square (ALS) procedure originally proposed by Vichi and Saport (2009) and described in two steps by Macedo and Freitas (2015).

### Usage

```CDpca (data, class=NULL, P, Q, SDPinitial=FALSE, tol= 10^(-5), maxit, r, cdpcaplot=TRUE)
```

### Arguments

 `data` A numeric matrix or data frame which provides the data for the CDPCA `class` A numeric vector containing the real classification of the objects in the data, or NULL if the class of objects is unknown `P` An integer value indicating the number of clusters of objects `Q` An integer value indicating the number of clusters of variables `SDPinitial` A logical value indicating whether the initial assignment matrices U and V are randomly generated (by default) or an algorithmic framework based on a semidefinite programming approach is preferred (TRUE) `tol` A positive (low) value indicating the maximum term for the difference between two consecutives values of the objective function. A tolerance value of 10^(-5) is indicated by default `maxit` The maximum number of iterations of one run of the ALS algorithm `r` Number of runs of the ALS algorithm for the final solution `cdpcaplot` A logical value indicating whether an additional graphic is created (showing the data projected on the first two CDPCA principal components)

### Value

Cdpca returns a list of results containing the following components:

 `Iter` The total number of iterations used in the best loop for computing the best solution `loop` The best loop number `timebestloop` The computation time on the best loop `timeallloops` The computation time for all loops `Y` The component score matrix `Ybar` The object centroids matrix in the reduced space `A` The component loading matrix `U` The partition of objects `V` The partition of variables `F` The value of the objective function to maximize `bcdev` The between cluster deviance `bcdevTotal` The between cluster deviance over the total variability `tableclass` The cdpca classification `pseudocm` The pseudo confusion matrix concerning the true (given by class) and cdpca classifications `Enorm` The error norm for the obtained cdpca model