knncatimputeLarge {scrime} | R Documentation |
Missing Value Imputation with kNN for High-Dimensional Data
Description
Imputes missing values in a high-dimensional matrix composed of categorical variables
using Nearest Neighbors.
Usage
knncatimputeLarge(data, mat.na = NULL, fac = NULL, fac.na = NULL,
nn = 3, distance = c("smc", "cohen", "snp1norm", "pcc"),
n.num = 100, use.weights = TRUE, verbose = FALSE)
Arguments
data |
a numeric matrix consisting of integers between 1 and Each row of |
mat.na |
a numeric matrix containing missing values. Must have the same number of
columns as |
fac |
a numeric or character vector of length |
fac.na |
a numeric or character vector of length |
nn |
an integer specifying |
distance |
character string naming the distance measure used in |
n.num |
an integer giving the number of rows of |
use.weights |
should weighted |
verbose |
should more information about the progress of the imputation be printed? |
Value
If mat.na = NULL
, then a matrix of the same size as data
in which the missing
values have been replaced. If mat.na
has been specified, then a matrix of the same size as
mat.na
in which the missing values have been replaced.
Note
While in knncatimpute
all variable/rows are considered when replacing
missing values, knncatimputeLarge
only considers the rows with no missing values
when searching for the nearest neighbors.
Author(s)
Holger Schwender, holger.schwender@udo.edu
References
Schwender, H. and Ickstadt, K.\ (2008). Imputing Missing Genotypes with Nearest Neighbors.
Technical Report, SFB 475, Department of Statistics, University of Dortmund. Appears soon.
See Also
knncatimpute
, gknn
, smc
, pcc
Examples
## Not run:
# Generate a data set consisting of 100 columns and 2000 rows (actually,
# knncatimputeLarge is made for much larger data sets), where the values
# are randomly drawn from the integers 1, 2, and 3.
# Afterwards, remove 200 of the observations randomly.
mat <- matrix(sample(3, 200000, TRUE), 2000)
mat[sample(200000, 20)] <- NA
# Apply knncatimputeLarge to mat to remove the missing values.
mat2 <- knncatimputeLarge(mat)
sum(is.na(mat))
sum(is.na(mat2))
# Now assume that the first 100 rows belong to SNPs from chromosome 1,
# the second 100 rows to SNPs from chromosome 2, and so on.
chromosome <- rep(1:20, e = 100)
# Apply knncatimputeLarge to mat chromosomewise, i.e. only consider
# the SNPs that belong to the same chromosome when replacing missing
# genotypes.
mat4 <- knncatimputeLarge(mat, fac = chromosome)
## End(Not run)