ClustImpute {ClustImpute}R Documentation

K-means clustering with build-in missing data imputation

Description

Clustering algorithm that produces a missing value imputation using on the go. The (local) imputation distribution is defined by the currently assigned cluster. The first draw is by random imputation.

Usage

ClustImpute(
  X,
  nr_cluster,
  nr_iter = 10,
  c_steps = 1,
  wf = default_wf,
  n_end = 10,
  seed_nr = 150519,
  assign_with_wf = TRUE,
  shrink_towards_global_mean = TRUE
)

Arguments

X

Data frame with only numeric values or NAs

nr_cluster

Number of clusters

nr_iter

Iterations of procedure

c_steps

Number of clustering steps per iteration

wf

Weight function. Linear up to n_end by default. Used to shrink X towards zero or the global mean (default). See shrink_towards_global_mean

n_end

Steps until convergence of weight function to 1

seed_nr

Number for set.seed()

assign_with_wf

Default is TRUE. If set to False, then the weight function is only applied in the centroid computation, but ignored in the cluster assignment.

shrink_towards_global_mean

By default TRUE. The weight matrix w is applied on the difference of X from the global mean m, i.e, (x-m)*w+m

Value

complete_data

Completed data without NAs

clusters

For each row of complete_data, the associated cluster

centroids

For each cluster, the coordinates of the centroids in tidy format

centroids_matrix

For each cluster, the coordinates of the centroids in matrix format

imp_values_mean

Mean of the imputed variables per draw

imp_values_sd

Standard deviation of the imputed variables per draw

Examples

# Random Dataset
set.seed(739)
n <- 750 # numer of points
nr_other_vars <- 2
mat <- matrix(rnorm(nr_other_vars*n),n,nr_other_vars)
me<-4 # mean
x <- c(rnorm(n/3,me/2,1),rnorm(2*n/3,-me/2,1))
y <- c(rnorm(n/3,0,1),rnorm(n/3,me,1),rnorm(n/3,-me,1))
dat <- cbind(mat,x,y)
dat<- as.data.frame(scale(dat)) # scaling

# Create NAs
dat_with_miss <- miss_sim(dat,p=.1,seed_nr=120)

# Run ClustImpute
res <- ClustImpute(dat_with_miss,nr_cluster=3)

# Plot complete data set and cluster assignment
ggplot2::ggplot(res$complete_data,ggplot2::aes(x,y,color=factor(res$clusters))) +
ggplot2::geom_point()

# View centroids
res$centroids


[Package ClustImpute version 0.2.4 Index]