R: Compares differences between baseline means using Carlisle's...

anova_fn {reappraised}

R Documentation

Compares differences between baseline means using Carlisle's montecarlo anova method

Description

Creates plots of distribution of p-values for differences in baseline means calculated using Carlisle's montecarlo anova method.

Usage

anova_fn(
  df = anova_data,
  method = "alt",
  seed = 0,
  sims = -1,
  btsp = 500,
  title = "",
  verbose = TRUE
)

Arguments

`df`	dataframe generated from load_clean function
`method`	"orig" is adapted from original code; "alt" avoids using loops in the code (see details)
`seed`	the seed to use for random number generation, default 0 = current date and time. Specify seed to make repeatable.
`sims`	number of simulations, default -1 = function selects based on number of variables and sample size
`btsp`	number of bootstrap repeats used to generate 95% confidence interval around AUC
`title`	optional title for plots
`verbose`	TRUE or FALSE indicates whether progress bar and comments show and prints plot

Details

Method is from Carlisle JB, Loadsman JA. Evidence for non-random sampling in randomised, controlled trials by Yuhji Saitoh. Anaesthesia. 2017;72:17-27.
R code is in appendix to paper. This function is adapted from that code.
The function has two methods. The published code selects each variable from each study then generates simulations for that variable using a row-wise approach with several loops. The adapted method is method = "orig". The method = "alt" generates all the simulations at once and initially I thought was considerably faster, but in practice the time savings are small.
The results from the two approaches will not be identical even if the same random number seed is used because they use the generated random numbers in different orders but the p-values generated differ by about <0.1. Usually the differences are close to 0.01 (although this depends on the number of simulations- more simulations = smaller differences). The code that generates the p-value for each variable from the simulated means is essentially the same.

Returns a list containing 3 objects and (if verbose = TRUE) prints the plot anova_ecdf

Value

list containing 3 objects as described

anova_ecdf = plot of cumulative distribution of calculated p-values compared to the expected uniform distribution
anova_pvalues = plots of distribution of calculated p-values and AUC, as for pval_cont_fn()
anova_all_results = list containing
- anova_data = data frame of baseline data, with calculated p-values
- anova_pvals = plot of distribution of calculated p-values from anova_pvalues
- anova_auc = plot of AUC of calculated p-values from anova_pvalues

Examples

# load example data
anova_data <- load_clean(import= "no", file.cont = "SI_pvals_cont",anova= "yes",
format.cont = "wide")$anova_data


# run function (takes only a few seconds)
anova_fn(seed=10, sims = 100, btsp = 100)$anova_ecdf

# to import an excel spreadsheet (modify using local path,
# file and sheet name, range, and format):

# get path for example files
path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised",
                    mustWork = TRUE)
# delete file name from path
path <- sub("/[^/]+$", "", path)

# load data
anova_data <- load_clean(import= "yes", anova = "yes", dir = path,
     file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont",
     range.name.cont = "A:O", format.cont = "wide")$anova_data

[Package reappraised version 0.1.1 Index]