impute_missing {diceR} | R Documentation |
Impute missing values
Description
Impute missing values from bootstrapped subsampling
Usage
impute_missing(E, data, nk)
Arguments
E |
4D array of clusterings from |
data |
data matrix with samples as rows and genes/features as columns |
nk |
cluster size to extract data for (single value) |
Details
The default output from consensus_cluster
will undoubtedly contain NA
entries because each replicate chooses a random subset (with replacement) of
all samples. Missing values should first be imputed using impute_knn()
. Not
all missing values are guaranteed to be imputed by KNN. See class::knn()
for details. Thus, any remaining missing values are imputed using majority
voting.
Value
If flattened matrix consists of more than one repetition, i.e. it
isn't a column vector, then the function returns a matrix of clusterings
with complete cases imputed using majority voting, and relabelled, for
chosen k
.
Author(s)
Aline Talhouk
See Also
Other imputation functions:
impute_knn()
Examples
data(hgsc)
dat <- hgsc[1:100, 1:50]
E <- consensus_cluster(dat, nk = 3:4, reps = 10, algorithms = c("hc", "km",
"sc"), progress = FALSE)
sum(is.na(E))
E_imputed <- impute_missing(E, dat, 4)
sum(is.na(E_imputed))