calcDissimMat {DisimForMixed}R Documentation

Calculate Dissimilarity Matrix for Mixed Attributes.

Description

Takes in two data frames where first contains only qualitative attributes and the other contains only quantitative attributes. Function calculates the dissimilarity matrix based on the method proposed by Ahmad & Dey (2007).

Usage

calcDissimMat(myDataQuali, myDataQuant)

Arguments

myDataQuali

A data frame which includes only qualitative variables in columns.

myDataQuant

A data frame which includes only quantitative variables in columns.

Details

calcDissimMat is an implementtion of the method proposed by Ahmad & Dey (2007) to calculate the dissimilarity matrix at the presence of both qualitative and quantitative attributes. This approach finds dissimilarity of qualitative and quantitative attributes seperately and the final dissimilarity matrix is formed by combining both. See Ahmad & Dey (2007) for more datails.

Value

A dissimilarity matrix. This can be used as an input to pam, fanny, agnes and diana functions.

References

Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503-527.

Examples

QualiVars <- data.frame(Qlvar1 = c("A","B","A","C","C","A"), Qlvar2 = c("Q","Q","R","Q","R","Q"))
QuantVars <- data.frame(Qnvar1 = c(1.5,3.2,4.9,5,2.8,3.1), Qnvar2 = c(4.8,2,1.1,5.8,3.1,2.2))
DisSimMatCalcd <- calcDissimMat(QualiVars, QuantVars)

agnesClustering <- cluster::agnes(DisSimMatCalcd, diss = TRUE, method = "ward")
silWidths <- cluster::silhouette(cutree(agnesClustering, k = 2), DisSimMatCalcd)
mean(silWidths[,3])
plot(agnesClustering)

PAMClustering <- cluster::pam(DisSimMatCalcd, k=2, diss = TRUE)
silWidths <- cluster::silhouette(PAMClustering, DisSimMatCalcd)
plot(silWidths)

[Package DisimForMixed version 0.2 Index]