RJclust {RJcluster}R Documentation

RJclust

Description

This is a high dimensional clustering algorithm for data in matrix form. There are are two different types of penalty methods that can be used, depending on the size of the data and the desired accuracy. The first is the default method: the hokey stick penalty. There is also the BIC penalty. For large n, the scale method can be used, which uses the approximation method of RJclust. For the scaleRJ method, a parmater n_bins (usually \sqrt(p)) is required that splits the data into different buckets. For all methods, a C_max variable is needed that is an upper limit on the possible number of clusters.

Usage

RJclust(
  data,
  penalty = "hockey_stick",
  scaleRJ = FALSE,
  C_max = 10,
  criterion = "VVI",
  n_bins = NULL,
  seed = 1,
  verbose = FALSE
)

Arguments

data

Data input, must be in matrix form. Currently no support for missing values

penalty

A string of possible vectors. Options include: "bic" an "hockey_stock" (default = "hockey_stick")

scaleRJ

Should the scaled version of RJ be used, suggested for data where n > 1000 (default = FALSE)

C_max

Maximum number of clusters to look for (default is 10)

criterion

Model of covariance structure (default = "VVI")

n_bins

Number of cuts if penalty = "scale" for the scaled RJ algorithm (default = sqrt(p))

seed

Seed (defalt = 1)

verbose

Should progress be printed? (default = FALSE)

Details

All implementations use backend C++ to increase runtime.

model_names controls the type of covariance structure. See Mclust Documenttion for more information. Note criterion "kmeans" is the same as "EEI". It is not suggested to use "kmeans" if it is suspected the classes are imbalanced

Value

Returns RJ algorithm result for "aic", "bic" ("mclust" and "scale" will return an mclust object:

K number of clusters found
class Class labels
penalty Penalty values at each iteraiton
mean Mean matrix
prob Probability values
z Z values from mclust (NULL penalty = "full_covariance")

Examples

X = simulate_HD_data()
X = X$X
clust = RJclust(X, penalty = "hockey_stick", C_max = 10)

[Package RJcluster version 3.2.4 Index]