compare.synds {synthpop} | R Documentation |
Compare univariate distributions of synthesised and observed data
Description
Compare synthesised data set with the original (observed) data set
using percent frequency tables and histograms. When more than one
synthetic data set has been generated (object$m > 1
), by
default pooled synthetic data are used for comparison.
This function can be also used with synthetic data NOT created by
syn()
, but then an additional parameter cont.na
might
need to be provided.
Usage
## S3 method for class 'synds'
compare(object, data, vars = NULL,
msel = NULL, stat = "percents", breaks = 20,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE, ...)
## S3 method for class 'data.frame'
compare(object, data, vars = NULL, cont.na = NULL,
msel = NULL, stat = "percents", breaks = 20,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE, ...)
## S3 method for class 'list'
compare(object, data, vars = NULL, cont.na = NULL,
msel = NULL, stat = "percents", breaks = 20,
nrow = 2, ncol = 2, rel.size.x = 1,
utility.stats = c("pMSE", "S_pMSE", "df"),
utility.for.plot = "S_pMSE",
cols = c("#1A3C5A","#4187BF"),
plot = TRUE, table = FALSE, ...)
## S3 method for class 'compare.synds'
print(x, ...)
Arguments
object |
an object of class |
data |
an original (observed) data set. |
vars |
variables to be compared. If |
cont.na |
a named list of codes for missing values for continuous
variables if different from the |
msel |
index or indices of synthetic data copies for which a comparison
is to be made. If |
stat |
determines whether tables and plots present percentages
|
breaks |
the number of cells for the histogram. |
nrow |
the number of rows for the plotting area. |
ncol |
the number of columns for the plotting area. |
rel.size.x |
a number representing the relative size of x-axis labels. |
utility.stats |
a single string or a vector of strings that determines
which utility measures to print. Must be a selection from:
|
utility.for.plot |
a single string that determines which utility
measure to print in facet labels of the plot. Set to |
cols |
bar colors. |
plot |
a logical value with default set to |
table |
a logical value with default set to |
... |
additional parameters. |
x |
an object of class |
Details
Missing data categories for numeric variables are plotted on the same plot
as non-missing values. They are indicated by miss.
suffix.
Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.
Value
An object of class compare.synds
which is a list including a list
of comparative frequency tables (tables
) and a ggplot object
(plots
) with bar charts/histograms. If multiple plots are produced
they and their corresponding frequency tables are stored as a list.
References
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. doi:10.18637/jss.v074.i11.
See Also
Examples
ods <- SD2011[ , c("sex", "age", "edu", "marital", "ls", "income")]
s1 <- syn(ods, cont.na = list(income = -8))
### synthetic data provided as a 'synds' object
compare(s1, ods, vars = "ls")
compare(s1, ods, vars = "income", stat = "counts",
table = TRUE, breaks = 10)
### synthetic data provided as 'data.frame'
compare(s1$syn, ods, vars = "ls")
compare(s1$syn, ods, vars = "income", cont.na = list(income = -8),
stat = "counts", table = TRUE, breaks = 10)