biv_compare {sampcompR} | R Documentation |
Compare Multiple Data Frames on a Bivariate Level
Description
Compare multiple data frames on a bivariate level and plot them together.
Usage
biv_compare(
dfs,
benchmarks,
variables = NULL,
corrtype = "r",
data = TRUE,
id = NULL,
weight = NULL,
strata = NULL,
id_bench = NULL,
weight_bench = NULL,
strata_bench = NULL,
p_value = NULL,
p_adjust = NULL,
varlabels = NULL,
plot_title = NULL,
plots_label = NULL,
diff_perc = TRUE,
diff_perc_size = 4.5,
perc_diff_transparance = 0,
note = FALSE,
order = NULL,
breaks = NULL,
colors = NULL,
mar = c(0, 0, 0, 0),
grid = "white",
gradient = FALSE,
sum_weights = NULL,
missings_x = TRUE,
remove_nas = "pairwise",
ncol_facet = 3,
nboots = 0,
parallel = FALSE,
adjustment_weighting = "raking",
adjustment_vars = NULL,
raking_targets = NULL,
post_targets = NULL
)
Arguments
dfs |
A character vector containing the names of data frames to compare
against the |
benchmarks |
A character vector containing the names of benchmarks to
compare the |
variables |
A character vector that containes the names of the variables for
the comparison. If it is |
corrtype |
A character string, indicating the type of the bivariate correlation. It can either be "r" for Pearson's r or "rho" for Spearman's "rho". At the moment, rho is only applicable to unweighted data. |
data |
If |
strata , strata_bench |
A character vector that determines strata variables
that are used to weigh the |
id_bench , id |
A character vector determining id variables used to weigh
the |
weight_bench , weight |
A character vector that determines variables to weigh
the |
p_value |
A number between zero and one to determine the maximum significance niveau. |
p_adjust |
Can be either |
varlabels |
A character string or vector of character strings containing the new names of variables that is used in the plot. |
plot_title |
A character string containing the title of the plot. |
plots_label |
A character string or vector of character strings containing the new names of the data frames that are used in the plot. |
diff_perc |
If |
diff_perc_size |
A number to determine the size of the displayed percental difference between surveys in the plot. |
perc_diff_transparance |
A number to determine the transparency of the displayed percental difference between surveys in the plot. |
note |
If |
order |
A character vector to determine in which order the variables should be displayed in the plot. |
breaks |
A vector to label the color scheme in the legend. |
colors |
A vector to determine the colors in the plot. |
mar |
A vector that determines the margins of the plot. |
grid |
A color string, that determines the color of the lines between the tiles of the heatmap. |
gradient |
If |
sum_weights |
A vector containing information for every variable to weigh them in the displayed percental-difference calculation. It can be used if some variables are over- or underrepresented in the analysis. |
missings_x |
If |
remove_nas |
A character string, that indicates how missing values should be
removed, can either be |
ncol_facet |
The number of columns used in faced_wrap() for the plots. |
nboots |
A numeric value indicating the number of bootstrap replications.
If |
parallel |
Can be either |
adjustment_weighting |
A character vector indicating if adjustment
weighting should be used. It can either be |
adjustment_vars |
Variables used to adjust the survey when using raking or post-stratification. |
raking_targets |
A list of raking targets that can be given to the rake
function of |
post_targets |
A list of post_stratification targets that can be given to
the |
Details
The plot shows a heatmap of a correlation matrix, where the colors are determined by the similarity of the Pearson's r values in both sets of respondents. Leaving default breaks and colors,
-
Same
(green) indicates, that the Pearson's r correlation is not significant > 0 in the related data frame or benchmark or the Pearson's r correlations are not significantly different, between data frame and benchmark. -
Small Diff
(yellow) indicates that the Pearson's r correlation is significant > 0 in the related data frame or benchmark and the Pearson's r correlations are significantly different, between data frame and benchmark. -
Large Diff
(red) indicates, that the same conditions of yellow are fulfilled, and the correlations are either in opposite directions,or one is double the size of the other.
Value
A object generated with the help of ggplot2::ggplot2()
visualizes
the differences between the data frames and benchmarks. If data = TRUE
instead of the plot a list will be returned containing information of the
analyses. This biv_compare
object can be used in
plot_biv_compare
to build a plot, or in biv_compare_table
,
to get a table.
Examples
## Get Data for comparison
require(wooldridge)
card<-wooldridge::card
south <- card[card$south==1,]
north <- card[card$south==0,]
black <- card[card$black==1,]
white <- card[card$black==0,]
## use the function to plot the data
bivar_comp<-sampcompR::biv_compare(dfs = c("north","white"),
benchmarks = c("south","black"),
variables= c("age","educ","fatheduc","motheduc","wage","IQ"),
data=FALSE)
bivar_comp