GPDC {FPDclustering}R Documentation

Gaussian PD-Clustering

Description

An implementation of Gaussian PD-Clustering GPDC, an extention of PD-clustering adjusted for cluster size that uses a dissimilarity measure based on the Gaussian density.

Usage

GPDC(data=NULL,k=2,ini="kmedoids", nr=5,iter=100)

Arguments

data

A matrix or data frame such that rows correspond to observations and columns correspond to variables.

k

A numerical parameter giving the number of clusters

ini

A parameter that selects center starts. Options available are random ("random"), kmedoid ("kmedoid", by default), and PDC ("PDclust").

nr

Number of random starts when ini set to "random"

iter

Maximum number of iterations

Value

A class FPDclustering list with components

label

A vector of integers indicating the cluster membership for each unit

centers

A matrix of cluster means

sigma

A list of K elements, with the variance-covariance matrix per cluster

probability

A matrix of probability of each point belonging to each cluster

JDF

The value of the Joint distance function

iter

The number of iterations

data

the data set

Author(s)

Cristina Tortora and Francesco Palumbo

References

Tortora C., McNicholas P.D., and Palumbo F. A probabilistic distance clustering algorithm using Gaussian and Student-t multivariate density distributions. SN Computer Science, 1:65, 2020.

C. Rainey, C. Tortora and F.Palumbo. A parametric version of probabilistic distance clustering. In: Greselin F., Deldossi L., Bagnato L., Vichi M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham, 33-43 2019. doi.org/10.1007/978-3-030-21140-0_4

See Also

PDC,PDQ

Examples

#Load the data
data(ais)
dataSEL=ais[,c(10,3,5,8)]

#Clustering
res=GPDC(dataSEL,k=2,ini = "kmedoids")

#Results
table(res$label,ais$sex)
plot(res)
summary(res)

[Package FPDclustering version 2.3.1 Index]