R: Assign clusters to a new vector of categories

assign.cluster {greenclust}

R Documentation

Assign clusters to a new vector of categories

Description

Maps a vector of cluster numbers to another categorical vector, yielding a new vector of matching cluster numbers. Useful for distributing cluster numbers back out to the original observations in cases where the clustering was performed on a table of unique levels rather than directly on the observations (such as with greenclust).

Usage

assign.cluster(x, clusters, impute = FALSE)

Arguments

`x`	a factor or character vector representing a categorical variable
`clusters`	a named numeric vector of cluster numbers, such as an object returned by `greencut` or `cutree`
`impute`	a boolean controlling the behavior when a value in `x` is not found in `names(clusters)` (see Details).

Details

Any categories in x that do not exist in names(clusters) are given a cluster of NA, or (if impute is TRUE) assigned the cluster number that is most-frequently used for the other existing categories, with ties going to the lowest cluster number. If there are no matching clusters for any of the categories in x, imputation will simply use the first cluster number in clusters.

If there are duplicate names in clusters, the first occurrence takes precedence.

Value

A factor vector of the same length as x, representing assigned cluster numbers.

Examples

# Cluster feed types based on number of "underweight" chicks
grc <- greenclust(table(chickwts$feed,
                        ifelse(chickwts$weight < 200, "Y", "N")))
# Assign clusters to each original observation
feed.clustered <- assign.cluster(chickwts$feed, greencut(grc))
table(chickwts$feed, feed.clustered)

[Package greenclust version 1.1.1 Index]