R: Compares observed and expected distribution of all...

cat_all_fn {reappraised}

R Documentation

Compares observed and expected distribution of all categorical (binomial) variables

Description

Creates plots of observed to expected numbers and ratios for the binomial variables and/or compares reported and calculated p-values for the variables
Reference: Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. Distributions of baseline categorical variables were different from the expected distributions in randomized trials with integrity concerns. J Clin Epidemiol. 2023;154:117-124

Usage

cat_all_fn(
  df = cat_all_data,
  comp.pvals = "no",
  fisher.sim = "y",
  fish.n.sims = 10000,
  binom = "no",
  two_levels = "no",
  del.disparate = "yes",
  excl.level = "yes",
  seed = 0,
  title = "",
  verbose = TRUE
)

Arguments

`df`	data frame generated from load_clean function
`comp.pvals`	"yes" or "no" indicator whether reported and calculated p-values should be compared
`fisher.sim`	"yes" or "no" indicator whether to allow fisher test to simulate p-values for >2*2 tables
`fish.n.sims`	number of simulations to use in Fisher test, default 10,000
`binom`	"yes" or "no" indicator whether observed to expected distributions of binomial variables should be calculated
`two_levels`	"yes" or "no" indicator whether variables with more than 2 levels should be collapsed to 2 levels
`del.disparate`	if yes, data in which the absolute difference between group sizes is >20% are deleted
`excl.level`	"yes" or "no" indicator whether one level of a variable should be deleted. Deleted level is chosen randomly using seed parameter.
`seed`	seed for random number generator, default 0 = current date and time. Specify seed to make repeatable.
`title`	title name for plots (optional)
`verbose`	TRUE or FALSE indicates whether progress bar and comments show and flextable or plot or both are printed

Details

Returns a list containing objects described below and (if verbose = TRUE) prints the flextable cat_all_diff_calc_rep_ft and/or graph cat_all_graph depending on options chosen

Value

list containing objects as described

if p-value comparison used:

cat_all_pvals = data frame of data for comparison of reported and calculated p-values
cat_all_diff_calc_rep_ft = flextable of comparison of reported and calculated p-values
cat_all_diff_calc_rep_data = data frame used to make flextable
cat_all_diff_thresh_ft = flextable of comparison of reported and calculated p-values when only threshold given
cat_all_diff_thresh_data = data frame used to make flextable for p-value thresholds

if comparing categorical variables used

cat_all_graph = plot of observed to expected numbers and differences between groups, top panels are the absolute numbers, bottom panels are the differences between trial arms in two arm studies
cat_all_graph_pc = plot of observed to expected numbers expressed as percentages and differences between groups, top panels are the percentages, bottom panels are the differences between trial arms in two arm studies
cat_all_data_abs = data frame of data for absolute numbers
cat_all_data_df = data frame of data for difference between groups in two arm studies
cat_all_dataset_abs = data frame of dataset used for all trials
cat_all_dataset_df = data frame of dataset used for two arm trials
cat_all_all_graphs list containing
- abs = plot for absolute numbers only
- df = plot for difference between groups in two arm studies only
- pc = plot for percentages only
- all_pc = composite plot of percentages and absolute numbers
- individual_graphs list of 6 individual plots making up composite figures

Examples

# load example data
cat_all_data <- load_clean(import= "no", file.cat = "SI_cat_all", cat_all= "yes",
format.cat = "wide")$cat_all_data


# run function comparing p-values only (takes only a few seconds)
cat_all_fn (comp.pvals = "yes")$cat_all_diff_calc_rep_ft

# run function comparing distribution of binomial variables only

# to speed example up limit to 12 2-arm trials with 20 variables
# (takes close to 5 secs)

cat_all_data <- cat_all_data [1:41, c(1:8,10:11,13:15)]

cat_all_fn (binom = "yes", two_levels = "yes", del.disparate = "yes",
excl.level = "yes", seed = 10)$cat_all_graph


# to import an excel spreadsheet (modify using local path,
# file and sheet name, range, and format):

# get path for example files
path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised",
                   mustWork = TRUE)
# delete file name from path
path <- sub("/[^/]+$", "", path)

# load data
cat_all_data <- load_clean(import= "yes", cat_all = "yes", dir = path,
   file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all",
   range.name.cat = "A:N", format.cat = "wide")$cat_all_data

[Package reappraised version 0.1.1 Index]