calcDissimMat {DisimForMixed} | R Documentation |
Calculate Dissimilarity Matrix for Mixed Attributes.
Description
Takes in two data frames where first contains only qualitative attributes and the other contains only quantitative attributes. Function calculates the dissimilarity matrix based on the method proposed by Ahmad & Dey (2007).
Usage
calcDissimMat(myDataQuali, myDataQuant)
Arguments
myDataQuali |
A data frame which includes only qualitative variables in columns. |
myDataQuant |
A data frame which includes only quantitative variables in columns. |
Details
calcDissimMat is an implementtion of the method proposed by Ahmad & Dey (2007) to calculate the dissimilarity matrix at the presence of both qualitative and quantitative attributes. This approach finds dissimilarity of qualitative and quantitative attributes seperately and the final dissimilarity matrix is formed by combining both. See Ahmad & Dey (2007) for more datails.
Value
A dissimilarity matrix. This can be used as an input to pam, fanny, agnes and diana functions.
References
Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503-527.
Examples
QualiVars <- data.frame(Qlvar1 = c("A","B","A","C","C","A"), Qlvar2 = c("Q","Q","R","Q","R","Q"))
QuantVars <- data.frame(Qnvar1 = c(1.5,3.2,4.9,5,2.8,3.1), Qnvar2 = c(4.8,2,1.1,5.8,3.1,2.2))
DisSimMatCalcd <- calcDissimMat(QualiVars, QuantVars)
agnesClustering <- cluster::agnes(DisSimMatCalcd, diss = TRUE, method = "ward")
silWidths <- cluster::silhouette(cutree(agnesClustering, k = 2), DisSimMatCalcd)
mean(silWidths[,3])
plot(agnesClustering)
PAMClustering <- cluster::pam(DisSimMatCalcd, k=2, diss = TRUE)
silWidths <- cluster::silhouette(PAMClustering, DisSimMatCalcd)
plot(silWidths)