idaDivCluster {ibmdbR} | R Documentation |
Hierarchical (divisive) clustering
Description
This function generates a hierarchical (divisive) clustering model
based on the contents of an IDA data frame (ida.data.frame
) by applying recursively the K-means algorithm.
Usage
idaDivCluster(
data,
id,
distance="euclidean",
maxiter=5,
minsplit=5,
maxdepth=3,
randseed=12345,
outtable=NULL,
modelname=NULL
)
## S3 method for class 'idaDivCluster'
print(x,...)
## S3 method for class 'idaDivCluster'
predict(object, newdata, id,...)
Arguments
data |
An IDA data frame that contains the input data for the function. The input IDA data frame must include a column that contains a unique ID for each row. |
id |
The name of the column that contains a unique ID for each row of the input data. |
distance |
The distance function that is to be used. This can be set to |
maxiter |
The maximum number of iterations to perform in the base K-means Clustering algorithm |
minsplit |
The minimum number of instances per cluster that can be split. |
maxdepth |
The maximum number of cluster levels (including leaves). |
randseed |
The seed for the random number generator. |
outtable |
The name of the output table that is to contain the results of the operation. When NULL is specified, a table name is generated automatically. |
modelname |
The name under which the model is stored in the database.
This is the name that is specified when using functions such as |
object |
An object of the class |
x |
An object of the class |
newdata |
An IDA data frame that contains the data to which to apply the model. |
... |
Additional parameters to pass to the print or predict method. |
Details
The idaDivCluster clustering function builds a hierarchical clustering model by applying the K-means algorithm recursively in a top-down fashion. The hierarchy of clusters is represented in a binary tree structure (each parent node has exactly 2 child nodes). The leafs of the cluster tree are identified by negative numbers.
Models are stored persistently in the database under the name modelname
. Model names cannot have more than 64 characters and
cannot contain white spaces. They need to be quoted like table names, otherwise they will be treated upper case by default. Only one
model with a given name is allowed in the database at a time. If a model with modelname
already exists, you need to drop it with idaDropModel
first before you can create another one with the same name. The model name can be used to retrieve the model later (idaRetrieveModel
).
The output of the print function for a idaDivCluster object is:
A vector containing a list of centers
A vector containing a list of cluster sizes
A vector containing a list of the number of elements in each cluster
A data frame or the name of the table containing the calculated cluster assignments
The within-cluster sum of squares (which indicates cluster density)
The names of the slots that are available in the idaDivCluster object.
Value
The idaDivCluster function returns an object of class idaDivCluster
.
See Also
idaRetrieveModel
, idaDropModel
, idaListModels
Examples
## Not run:
#Create ida data frame
idf <- ida.data.frame("IRIS")
#Create a DivCluster model stored in the database as DivClusterMODEL
dcm <- idaDivCluster(idf, id="ID",modelname="DivClusterMODEL")
#Print the model
print(dcm)
#Predict the model
pred <- predict(dcm,idf,id="ID")
#Inspect the predictions
head(pred)
## End(Not run)