kplus_moment_variables {anticlust} | R Documentation |
Compute k-plus variables
Description
Compute k-plus variables
Usage
kplus_moment_variables(x, T, standardize = TRUE)
Arguments
x |
A vector, matrix or data.frame of data points. Rows correspond to elements and columns correspond to features. A vector represents a single feature. |
T |
The number of distribution moments for which variables are generated. |
standardize |
Logical, should all columns of the output be standardized (defaults to TRUE). |
Details
The k-plus criterion is an extension of the k-means criterion
(i.e., the "variance", see variance_objective
).
In kplus_anticlustering
, equalizing means and variances
simultaneously (and possibly additional distribution moments) is
accomplished by internally appending new variables to the data
input x
. When using only the variance as additional criterion, the
new variables represent the squared difference of each data point to
the mean of the respective column. All columns are then included—in
addition to the original data—in standard k-means
anticlustering. The logic is readily extended towards higher order moments,
see Papenberg (2024). This function gives users the possibility to generate
k-plus variables themselves, which offers some additional flexibility when
conducting k-plus anticlustering.
Value
A matrix containing all columns of x
and all additional
columns of k-plus variables. If x
has M columns, the output matrix
has M * T columns.
Author(s)
Martin Papenberg martin.papenberg@hhu.de
References
Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77(1), 80–102. https://doi.org/10.1111/bmsp.12315
Examples
# Use Schaper data set for example
data(schaper2019)
features <- schaper2019[, 3:6]
K <- 3
N <- nrow(features)
# Some equivalent ways of doing k-plus anticlustering:
init_groups <- sample(rep_len(1:3, N))
table(init_groups)
kplus_groups1 <- anticlustering(
features,
K = init_groups,
objective = "kplus",
standardize = TRUE,
method = "local-maximum"
)
kplus_groups2 <- anticlustering(
kplus_moment_variables(features, T = 2), # standardization included by default
K = init_groups,
objective = "variance", # (!)
method = "local-maximum"
)
# this function uses standardization by default unlike anticlustering():
kplus_groups3 <- kplus_anticlustering(
features,
K = init_groups,
method = "local-maximum"
)
all(kplus_groups1 == kplus_groups2)
all(kplus_groups1 == kplus_groups3)
all(kplus_groups2 == kplus_groups3)