cells {validate} | R Documentation |
Cell counts and differences for a series of datasets
Description
Cell counts and differences for a series of datasets
Usage
cells(..., .list = NULL, compare = c("to_first", "sequential"))
Arguments
... |
For |
.list |
A |
compare |
How to compare the datasets. |
Value
An object of class cellComparison
, which is really an array
with a few extra attributes. It counts the total number of cells, the number of
missings, the number of altered values and changes therein as compared to
the reference defined in how
.
Comparing datasets cell by cell
When comparing the contents of two data sets, the total number of cells in the current data set can be partitioned as in the following figure.
This function computes the partition for two or more
datasets, comparing the current set to the first (default) or to the
previous (by setting compare='sequential'
).
Details
This function assumes that the datasets have the same dimensions and that both rows and columns are ordered similarly.
References
The figure is reproduced from MPJ van der Loo and E. De Jonge (2018) Statistical Data Cleaning with applications in R (John Wiley & Sons).
See Also
Other comparing:
as.data.frame,cellComparison-method
,
as.data.frame,validatorComparison-method
,
barplot,cellComparison-method
,
barplot,validatorComparison-method
,
compare()
,
match_cells()
,
plot,cellComparison-method
,
plot,validatorComparison-method
Examples
data(retailers)
# start with raw data
step0 <- retailers
# impute turnovers
step1 <- step0
step1$turnover[is.na(step1$turnover)] <- mean(step1$turnover,na.rm=TRUE)
# flip sign of negative revenues
step2 <- step1
step2$other.rev <- abs(step2$other.rev)
# create an overview of differences, comparing to the previous step
cells(raw = step0, imputed = step1, flipped = step2, compare="sequential")
# create an overview of differences compared to raw data
out <- cells(raw = step0, imputed = step1, flipped = step2)
out
# Graphical overview of the changes
plot(out)
barplot(out)
# transform data to data.frame (easy for use with ggplot)
as.data.frame(out)