getdata {miclust}R Documentation

Creates a midata object.

Description

Creates an object of class miData to be clustered by the function miclust.

Usage

getdata(data)

Arguments

data

a list or data.frame object. If it is a data frame, it is assumed to contain just the raw data, with or without missing data. If it is a list of data frames, it is assumed that the first element contains the raw data and the remaining ones correspond to multiple imputed data sets. Since all variables are considered in the clustering procedure, no identifier variables must be present in the data. In addition, all variables need to be treated as numeric (i.e. categorical variables must be coded with numeric values). See Details below.

Details

All variables in data frames in impdata are standardized by getdata, so categorical variables need to be coded with numeric values. Standardization is performed by centering all variables at the mean and then dividing by the standard deviation (or the difference between the maximum and the minimum values for binary variables). Such a standardization is applied only to the imputed data sets. The standardization of the raw data is internally applied by the miclust if needed (which is the case of analyzing just the raw data, i.e. complete cases analysis).

Value

An object of classes c("list", "midata") including the following items:

rawdata

a data frame containing the raw data.

impdata

if data is an object of class list, impdata is a list containing the standardized imputed data sets.

See Also

miclust.

Examples

### data minhanes:
data(minhanes)
class(minhanes)

### number of imputed datasets:
length(minhanes) - 1

### raw data with missing values:
summary(minhanes[[1]])

### first imputed data set:
minhanes[[2]]
summary(minhanes[[2]])

### data preparation for a complete case cluster analysis:
data1 <- getdata(minhanes[[1]])
class(data1)
names(data1)

### there are no imputed data sets:
data1$impdata

### data preparation for a multiple imputation cluster analysis:
data2 <- getdata(minhanes)
class(data2)
names(data2)

### number of imputed data sets:
length(data2$impdata)

### imputed data sets are standardized:
summary(data2$rawdata)
summary(data2$impdata[[1]])

[Package miclust version 1.2.8 Index]