uni_compare {sampcompR} | R Documentation |
Compare data frames and Plot Differences
Description
Returns data or a plot showing the difference of two or more data frames The differences are calculated on the base of differing metrics, chosen in the funct argument. All used data frames must contain at least one column named equal in all data frames, that has equal values.
Usage
uni_compare(
dfs,
benchmarks,
variables = NULL,
nboots = 2000,
funct = "rel_mean",
data = TRUE,
type = "comparison",
legendlabels = NULL,
legendtitle = NULL,
colors = NULL,
shapes = NULL,
summetric = "rmse2",
label_x = NULL,
label_y = NULL,
plot_title = NULL,
varlabels = NULL,
name_dfs = NULL,
name_benchmarks = NULL,
summet_size = 4,
silence = TRUE,
conf_level = 0.95,
conf_adjustment = NULL,
weight = NULL,
id = NULL,
strata = NULL,
weight_bench = NULL,
id_bench = NULL,
strata_bench = NULL,
adjustment_weighting = "raking",
adjustment_vars = NULL,
raking_targets = NULL,
post_targets = NULL,
ndigits = 3
)
Arguments
dfs |
A character vector containing the names of data frames to compare against the benchmarks. |
benchmarks |
A character vector containing the names of benchmarks to compare the data frames against.
The vector must either be the same length as |
variables |
A character vector containing the names of the variables for the comparison. If NULL,
all variables named similarly in both the |
nboots |
The number of bootstraps used to calculate standard errors. Must either be >2 or 0.
If >2 bootstrapping is used to calculate standard errors with |
funct |
A character string, indicating the function to calculate the difference between the data frames. Predefined functions are:
|
data |
If TRUE, a uni_compare_object is returned, containing results of the comparison. |
type |
Define the type of comparison. Can either be |
legendlabels |
A character string or vector of strings containing a label for the legend. |
legendtitle |
A character string containing the title of the legend. |
colors |
A vector of colors, that is used in the plot for the different comparisons. |
shapes |
A vector of shapes applicable in |
summetric |
If |
label_x , label_y |
A character string or vector of character strings containing a label for the x-axis and y-axis. |
plot_title |
A character string containing the title of the plot. |
varlabels |
A character string or vector of character strings containing the new names of variables, also used in plot. |
name_dfs , name_benchmarks |
A character string or vector of character strings containing the
new names of the |
summet_size |
A number to determine the size of the displayed |
silence |
If |
conf_level |
A numeric value between zero and one to determine the confidence level of the confidence interval. |
conf_adjustment |
If |
weight , weight_bench |
A character vector determining variables to weight the |
id , id_bench |
A character vector determining |
strata , strata_bench |
A character vector determining strata variables
used to weigh the |
adjustment_weighting |
A character vector indicating if adjustment
weighting should be used. It can either be |
adjustment_vars |
Variables used to adjust the survey when using raking or post stratification. |
raking_targets |
A list of raking targets that can be given to the rake
function of |
post_targets |
A list of post-stratification targets that can be given to the
|
ndigits |
The number of digits to round the numbers in the plot. |
Value
A plot based on ggplot2::ggplot2()
(or data frame if data==TRUE)
which shows the difference between two or more data frames on predetermined variables,
named identical in both data frames.
References
Felderer, B., Kirchner, A., & Kreuter, FALSE. (2019). The Effect of Survey Mode on Data Quality: Disentangling Nonresponse and Measurement Error Bias. Journal of Official Statistics, 35(1), 93–115. https://doi.org/10.2478/jos-2019-0005
Examples
## Get Data for comparison
require(wooldridge)
card<-wooldridge::card
black<-wooldridge::card[wooldridge::card$black==1,]
north<-wooldridge::card[wooldridge::card$south==0,]
white<-wooldridge::card[wooldridge::card$black==0,]
south<-wooldridge::card[wooldridge::card$south==1,]
## use the function to plot the data
univar_comp<-sampcompR::uni_compare(dfs = c("north","white"),
benchmarks = c("south","black"),
variables= c("age","educ","fatheduc","motheduc","wage","IQ"),
funct = "abs_rel_mean",
nboots=200,
summetric="rmse2",
data=FALSE)
univar_comp