CDpca {biplotbootGUI}R Documentation

Clustering and Disjoint Principal Component Analysis

Description

CDpca performs a clustering and disjoint principal components analysis (CDPCA) on the given numeric data matrix and returns a list of results Given a (IxJ) real data matrix X = [xij], the CDPCA methodology is allowed to cluster the I objects into P nonempty and nonoverlapping clusters Cp, p = 1,...,P, which are identified by theirs centroids, and, simultaneously, to partitioning the J attributes into Q disjoint components, PCq, q = 1,...,Q. The CDpca function models X estimating the parameter of the model using an Alternating Least Square (ALS) procedure originally proposed by Vichi and Saport (2009) and described in two steps by Macedo and Freitas (2015).

Usage

CDpca (data, class=NULL, P, Q, SDPinitial=FALSE, tol= 10^(-5), maxit, r, cdpcaplot=TRUE)

Arguments

data

A numeric matrix or data frame which provides the data for the CDPCA

class

A numeric vector containing the real classification of the objects in the data, or NULL if the class of objects is unknown

P

An integer value indicating the number of clusters of objects

Q

An integer value indicating the number of clusters of variables

SDPinitial

A logical value indicating whether the initial assignment matrices U and V are randomly generated (by default) or an algorithmic framework based on a semidefinite programming approach is preferred (TRUE)

tol

A positive (low) value indicating the maximum term for the difference between two consecutives values of the objective function. A tolerance value of 10^(-5) is indicated by default

maxit

The maximum number of iterations of one run of the ALS algorithm

r

Number of runs of the ALS algorithm for the final solution

cdpcaplot

A logical value indicating whether an additional graphic is created (showing the data projected on the first two CDPCA principal components)

Value

Cdpca returns a list of results containing the following components:

Iter

The total number of iterations used in the best loop for computing the best solution

loop

The best loop number

timebestloop

The computation time on the best loop

timeallloops

The computation time for all loops

Y

The component score matrix

Ybar

The object centroids matrix in the reduced space

A

The component loading matrix

U

The partition of objects

V

The partition of variables

F

The value of the objective function to maximize

bcdev

The between cluster deviance

bcdevTotal

The between cluster deviance over the total variability

tableclass

The cdpca classification

pseudocm

The pseudo confusion matrix concerning the true (given by class) and cdpca classifications

Enorm

The error norm for the obtained cdpca model

Author(s)

Eloisa Macedo macedo@ua.pt, Adelaide Freitas adelaide@ua.pt, Maurizio Vichi maurizio.vichi@uniroma1.it

References


[Package biplotbootGUI version 1.3 Index]