searchDataPairs {wrMisc} | R Documentation |
Search duplicated data over multiple columns, ie pairs of data
Description
searchDataPairs
searches matrix for columns of similar data, ie 'duplicate' values in separate columns or very similar columns if realDupsOnly=FALSE
.
Initial distance measures will be normalized either to diagonale (normRange=TRUE)
of 'window' or to the real max distance observed (equal or less than diagonale).
Return data.frame with names for sample-pair, percent of identical values (100 for complete identical pair) and relative (Euclidean) distance (ie max dist observed =1.0).
Note, that low distance values do not necessarily imply correlating data.
Usage
searchDataPairs(
dat,
disThr = 0.01,
byColumn = TRUE,
normRange = TRUE,
altNa = NULL,
realDupsOnly = TRUE,
silent = FALSE,
callFrom = NULL
)
Arguments
dat |
matrix or data.frame (main input) |
disThr |
(numeric) threshold to decide when to report similar data (applied on normalized distances, low val fewer reported), applied on normalized distances (norm to diagonale of all data for best relative 'unbiased' view) |
byColumn |
(logical) rotates main input by 90 degrees (using |
normRange |
(logical) normize each columns separately if |
altNa |
(character, default |
realDupsOnly |
(logical) if |
silent |
(logical) suppres messages |
callFrom |
(character) allows easier tracking of messages produced |
Value
This function returns a data.frame with names of sample-pairs, percent of identical values (100 for complete identical pair) and rel (Euclidean) distance (ie max dist observed =1.0)
See Also
Examples
mat <- round(matrix(c(11:40,runif(20)+12,11:19,17,runif(20)+18,11:20), nrow=10), 1)
colnames(mat) <- 1:9
searchDataPairs(mat,disThr=0.05)