ClustImpute {ClustImpute} | R Documentation |
K-means clustering with build-in missing data imputation
Description
Clustering algorithm that produces a missing value imputation using on the go. The (local) imputation distribution is defined by the currently assigned cluster. The first draw is by random imputation.
Usage
ClustImpute(
X,
nr_cluster,
nr_iter = 10,
c_steps = 1,
wf = default_wf,
n_end = 10,
seed_nr = 150519,
assign_with_wf = TRUE,
shrink_towards_global_mean = TRUE
)
Arguments
X |
Data frame with only numeric values or NAs |
nr_cluster |
Number of clusters |
nr_iter |
Iterations of procedure |
c_steps |
Number of clustering steps per iteration |
wf |
Weight function. Linear up to n_end by default. Used to shrink X towards zero or the global mean (default). See shrink_towards_global_mean |
n_end |
Steps until convergence of weight function to 1 |
seed_nr |
Number for set.seed() |
assign_with_wf |
Default is TRUE. If set to False, then the weight function is only applied in the centroid computation, but ignored in the cluster assignment. |
shrink_towards_global_mean |
By default TRUE. The weight matrix w is applied on the difference of X from the global mean m, i.e, (x-m)*w+m |
Value
- complete_data
Completed data without NAs
- clusters
For each row of complete_data, the associated cluster
- centroids
For each cluster, the coordinates of the centroids in tidy format
- centroids_matrix
For each cluster, the coordinates of the centroids in matrix format
- imp_values_mean
Mean of the imputed variables per draw
- imp_values_sd
Standard deviation of the imputed variables per draw
Examples
# Random Dataset
set.seed(739)
n <- 750 # numer of points
nr_other_vars <- 2
mat <- matrix(rnorm(nr_other_vars*n),n,nr_other_vars)
me<-4 # mean
x <- c(rnorm(n/3,me/2,1),rnorm(2*n/3,-me/2,1))
y <- c(rnorm(n/3,0,1),rnorm(n/3,me,1),rnorm(n/3,-me,1))
dat <- cbind(mat,x,y)
dat<- as.data.frame(scale(dat)) # scaling
# Create NAs
dat_with_miss <- miss_sim(dat,p=.1,seed_nr=120)
# Run ClustImpute
res <- ClustImpute(dat_with_miss,nr_cluster=3)
# Plot complete data set and cluster assignment
ggplot2::ggplot(res$complete_data,ggplot2::aes(x,y,color=factor(res$clusters))) +
ggplot2::geom_point()
# View centroids
res$centroids