idaKMeans {ibmdbR} | R Documentation |
k-means clustering
Description
This function generates a k-means clustering model based on the contents of a IDA data frame (ida.data.frame
).
Usage
idaKMeans(
data,
id,
k=3,
maxiter=5,
distance="euclidean",
outtable=NULL,
randseed=12345,
statistics=NULL,
modelname=NULL
)
## S3 method for class 'idaKMeans'
print(x,...)
## S3 method for class 'idaKMeans'
predict(object, newdata, id,...)
Arguments
data |
An IDA data frame that contains the input data for the function. The input IDA data frame must include a column that contains a unique ID for each row. |
id |
The name of the column that contains a unique ID for each row of the input data. |
k |
The number of clusters to be calculated. |
maxiter |
The maximum number of iterations to be used to calculate the k-means clusters. A larger number of iterations increases both the precision of the results and the amount of time required to calculate them. |
distance |
The distance function that is to be used. This can be set to |
outtable |
The name of the output table that is to contain the results of the operation. When NULL is specified, a table name is generated automatically. |
randseed |
The seed for the random number generator. |
statistics |
Denotes which statistics to calculate. Allowed values are |
modelname |
The name under which the model is stored in the database.
This is the name that is specified when using functions such as |
object |
An object of the class |
x |
An object of the class |
newdata |
A IDA data frame that contains the data to which to apply the model. |
... |
Additional parameters to pass to the print or predict method. |
Details
The idaKMeans function calculates the squared Euclidean distance between rows, and groups them into clusters. Initial clusters are chosen randomly using a random seed, and the results are adjusted iteratively until either the maximum number of iterations is reached or until two iterations return identical results. Variables with missing values are set zero for distance calculation.
Models are stored persistently in database under the name modelname
. Model names cannot have more than 64 characters and
cannot contain white spaces. They need to be quoted like table names, otherwise they will be treated upper case by default. Only one
model with a given name is allowed in the database at a time. If a model with modelname
already exists, you need to drop it with idaDropModel
first before you can create another one with the same name. The model name can be used to retrieve the model later (idaRetrieveModel
).
The output of the print function for a idaKMeans object is:
A vector containing a list of centers
A vector containing a list of cluster sizes
A vector containing a list of the number of elements in each cluster
A data frame or the name of the table containing the calculated cluster assignments
The within-cluster sum of squares (which indicates cluster density)
The names of the slots that are available in the idaKMeans object
Value
The idaKMeans function returns an object of class idaKMeans
and kmeans
.
See Also
idaRetrieveModel
, idaDropModel
, idaListModels
Examples
## Not run:
#Create ida data frame
idf <- ida.data.frame("IRIS")
#Create a kmeans model stored in the database as KMEANSMODEL
km <- idaKMeans(idf, id="ID",modelname="KMEANSMODEL")
#Print the model
print(km)
#Predict the model
pred <- predict(km,idf,id="ID")
#Inspect the predictions
head(pred)
## End(Not run)