skm_mls {skm} | R Documentation |
skm_mls
Description
a selective k-means problem solver - wrapper over skm_mls_cpp
Usage
skm_mls(x, k = 1L, s_colname = "s", t_colname = "t", d_colname = "d",
w_colname = NULL, s_ggrp = integer(0L), s_must = integer(0L),
max_it = 100L, max_at = 100L, auto_create_ggrp = TRUE,
extra_immaculatism = TRUE, extra_at = 10L)
Arguments
x |
data.table with s - t - d(s, t): s<source> - t<target> - d<distance> where s<source> and t<target> must characters and d<distance> must numeric. aware d<distance> is not necessary as an euclidean or any distance and even necessary as symmetric - d(s, t) can be unequal to d(t, s) - view d as such a measure of the cost of assigning one to the other! |
k |
number of centers |
s_colname |
s<source> |
t_colname |
t<target> |
d_colname |
d<distance> - view d as cost of assigning t into s. also modify the input data or build in the algorithm can solve problem with a different fixed cost on using each s as source - i prefer to moddify data so that the algorithm is clean and clear - i will show a how to in vignette |
w_colname |
w<weighting> - optional: when not null will optimize toward objective to minimize d = d * w such as weighted cost of assigning t into s |
s_ggrp |
s_init will be stratified sampling from s w.r.t s_ggrp. |
s_must |
length <= k-1 s must in result: conditional optimizing. |
max_it |
max number of iterations can run for optimizing result. |
max_at |
max number of attempts/repeats on running for optimial. |
auto_create_ggrp |
boolean indicator of whether auto creating the group structure using the first letter of s when s_ggrp is integer(0). |
extra_immaculatism |
boolean indicator of whether making extra runs for improving result consistency when multiple successive k is specified, e.g., k = c(9L, 10L). |
extra_at |
an integer specifying the number of extra runs when argument extra_immaculatism is TRUE. |
Details
a selective k-means problem is defined as finding a subset of k rows from a m x n matrix such that the sum of each column minimial is minimized.
skm_mls would take data.table (data.frame) as inputs, rather than a matrix, assume that a data.table of s - t - d(s, t) for all combination of s and t, choose k of s that minimizes sum(min(d(s, t) over selected k of s) over t).
Value
data.table
o - objective - based on d_colname
w - weighting - based on w_colname
k - k<k-list> - based on k - input
s - s<source> - based on s_colname
d - weighed averge value of d_colname weighed by w_column when s are selected.