RJclust {RJcluster} | R Documentation |
RJclust
Description
This is a high dimensional clustering algorithm for data in matrix form. There are are two different types of penalty methods that can be used,
depending on the size of the data and the desired accuracy. The first is the default method: the hokey stick penalty. There is also the BIC penalty.
For large n
, the scale method can be used, which uses the approximation method of RJclust. For the scaleRJ method,
a parmater n_bins (usually \sqrt(p)
) is required that splits the data into different buckets.
For all methods, a C_max variable is needed that is an upper limit on the possible
number of clusters.
Usage
RJclust(
data,
penalty = "hockey_stick",
scaleRJ = FALSE,
C_max = 10,
criterion = "VVI",
n_bins = NULL,
seed = 1,
verbose = FALSE
)
Arguments
data |
Data input, must be in matrix form. Currently no support for missing values |
penalty |
A string of possible vectors. Options include: "bic" an "hockey_stock" (default = "hockey_stick") |
scaleRJ |
Should the scaled version of RJ be used, suggested for data where n > 1000 (default = FALSE) |
C_max |
Maximum number of clusters to look for (default is 10) |
criterion |
Model of covariance structure (default = "VVI") |
n_bins |
Number of cuts if penalty = "scale" for the scaled RJ algorithm (default = sqrt(p)) |
seed |
Seed (defalt = 1) |
verbose |
Should progress be printed? (default = FALSE) |
Details
All implementations use backend C++ to increase runtime.
model_names controls the type of covariance structure. See Mclust Documenttion for more information. Note criterion "kmeans" is the same as "EEI". It is not suggested to use "kmeans" if it is suspected the classes are imbalanced
Value
Returns RJ algorithm result for "aic", "bic" ("mclust" and "scale" will return an mclust object:
K | number of clusters found |
class | Class labels |
penalty | Penalty values at each iteraiton |
mean | Mean matrix |
prob | Probability values |
z | Z values from mclust (NULL penalty = "full_covariance") |
Examples
X = simulate_HD_data()
X = X$X
clust = RJclust(X, penalty = "hockey_stick", C_max = 10)