mergeCheck {kutils} | R Documentation |
First draft of function to diagnose problems in merges and key variables
Description
This is a first effort. It works with 2 data frames and 1 key variable in each. It does not work if the by parameter includes more than one column name (but may work in future). The return is a list which includes full copies of the rows from the data frames in which trouble is observed.
Usage
mergeCheck(
x,
y,
by,
by.x = by,
by.y = by,
incomparables = c(NULL, NA, NaN, Inf, "\\s+", "")
)
Arguments
x |
data frame |
y |
data frame |
by |
Commonly called the "key" variable. A column name to be
used for merging (common to both |
by.x |
Column name in |
by.y |
Column name in |
incomparables |
values in the key (by) variable that are ignored for matching. We default to include these values as incomparables: c(NULL, NA, NaN, Inf, "\s+", ""). Note this is a larger list of incomparables than assumed by R merge (which assumes only NULL). |
Value
A list of data structures that are displayed for keys and
data sets. The return is list(keysBad, keysDuped,
unmatched)
. unmatched
is a list with 2 elements, the
unmatched cases from x
and y
.
Author(s)
Paul Johnson
Examples
df1 <- data.frame(id = 1:7, x = rnorm(7))
df2 <- data.frame(id = c(2:6, 9:10), x = rnorm(7))
mc1 <- mergeCheck(df1, df2, by = "id")
## Use mc1 objects mc1$keysBad, mc1$keysDuped, mc1$unmatched
df1 <- data.frame(id = c(1:3, NA, NaN, "", " "), x = rnorm(7))
df2 <- data.frame(id = c(2:6, 5:6), x = rnorm(7))
mergeCheck(df1, df2, by = "id")
df1 <- data.frame(idx = c(1:5, NA, NaN), x = rnorm(7))
df2 <- data.frame(idy = c(2:6, 9:10), x = rnorm(7))
mergeCheck(df1, df2, by.x = "idx", by.y = "idy")