multi.compare {synthpop}R Documentation

Multivariate comparison of synthesised and observed data

Description

Graphical comparisons of a variable (var) in the synthesised data set with the original (observed) data set within subgroups defined by the variables in a vector by. var can be a factor or a continuous variable and the plots produced will depend on the class of var. The variables in by will usually be factors or variables with only a few values.

Usage

multi.compare(object, data, var = NULL, by = NULL, msel = NULL, 
  barplot.position = "fill", cont.type = "hist", y.hist = "count", 
  boxplot.point = TRUE, binwidth = NULL, ...)

Arguments

object

an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s).

data

an original (observed) data set.

var

variable to be compared between observed and synthetic data within subgroups.

by

variables to be tabulated or cross-tabulated to form groups.

barplot.position

type of barplot. The default "fill" gives a single bar with the proportions in each group while "dodge" gives side-by-side bars with the numbers in each category.

cont.type

default "hist" gives histograms and "boxplot" gives boxplots.

y.hist

defines y scale for histograms - "count" is default; "density" gives proportions.

boxplot.point

default (TRUE) adds individual points to boxplots.

msel

numbers of synthetic data sets to be used - must be numbers in the range 1:object$m. If NULL pooled synthetic data copies are compared with the original data.

binwidth

sets width of a bin for histograms.

...

additional parameters that can be supplied to ggplot.

Value

Plots as specified above. A table of the numbers in the subgroups is printed to the R console.

Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.

See Also

compare.synds, compare.fit.synds

Examples

### default synthesis of selected variables
vars <- c("sex", "age", "edu", "smoke")
ods  <- na.omit(SD2011[1:1000, vars])
s1 <- syn(ods)

### categorical var
multi.compare(s1, ods, var = "smoke", by = c("sex","edu"))

### numeric var
multi.compare(s1, ods, var = "age", by = c("sex"), y.hist = "density", binwidth = 5)
multi.compare(s1, ods, var = "age", by = c("sex", "edu"), cont.type = "boxplot")

[Package synthpop version 1.8-0 Index]