R: Predict the closest clusters for a new dataset.

cluster.predict {WCluster}

R Documentation

Predict the closest clusters for a new dataset.

Description

Given observations with weights and cluster assignments, this function returns the cluster assignments for a new dataset by choosing the closest clusters.

Usage

cluster.predict(x,w = rep(1,nrow(x)),cl,newx)

Arguments

`x`	A data matrix (data frame, data table, matrix, etc.) containing only entries of class numeric.
`w`	Vector of length nrow(x) of weights for each observation in the dataset. Must be of class numeric or integer. If NULL, the default value is a vector of 1 with length nrow(x), i.e., weights equal 1 for all observations.
`cl`	Vector of length nrow(x) of cluster assignments for each observation in the dataset, indicating the cluster to which each observation is allocated. Must be of class integer.
`newx`	A new dataset (a data.frame), with the same variables as the learning dataset. Must be of class data.frame.

Details

To obtain the cluster assignments for a new dataset, the weighted cluster centers are calculated firstly based on observations with weights and known cluster assginments. Then, the cluster with the minimal Euclidean distance between new observation and weighted cluster center is chosen as the closest cluster. In this way, the cluster assignments for all the new observations are returned.

Value

Vector of length nrow(newx) containing the cluster assignments for each observation in the new dataset.

Author(s)

Yajie Duan, Javier Cabrera, Ge Cheng

References

Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.

Beavers, T., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., Teigler, J. (2023). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure (Submitted for Publication)

Examples


    require(cluster)

    # The Ruspini data set from the package "cluster""
    data = as.matrix(ruspini)

    #take the first 70 observations for clustering,
    #and the last 5 observations for prediction
    x = data[1:70,]
    test.x = data[71:75,]

    # assign random weights to observations
    w = sample(1:20,nrow(x),replace = TRUE)

    #k-means clustering with observational weights
    cl = Wkmeans(dataset = x, k = 4, obs.weights = w, num.init = 3)

    #predict the cluster assignments for the test data
    cluster.predict(x,w, cl = cl$`Cluster Assignments`,newx = as.data.frame(test.x))

[Package WCluster version 1.2.0 Index]