load_clean {reappraised} | R Documentation |
Load data then clean and format it
Description
Function loads and cleans data for the nine functions
Usage
load_clean(
import = "yes",
file.cont = "",
file.cat = "",
dir = "",
file.name = "",
pval_cont = "no",
match = "no",
cohort = "no",
anova = "no",
dir.cont = "",
file.name.cont = "",
sheet.name.cont = "Sheet1",
range.name.cont = "",
format.cont = "wide",
cat = "no",
sr = "no",
cat_all = "no",
pval_cat = "no",
cat.names = c("n"),
dir.cat = "",
file.name.cat = "",
sheet.name.cat = "Sheet1",
range.name.cat = "",
format.cat = "wide",
generic = "",
gen.vars.keep = "",
gen.vars.del = "",
verbose = TRUE
)
Arguments
import |
'yes' indicates import excel file. 'no' indicates takes dataset already loaded into R as data frame |
file.cont |
If import = 'no', name of data frame containing continuous data |
file.cat |
If import = 'no', name of data frame containing categorical data |
dir |
If import = 'yes', path to location of excel file for continuous and categorical data |
file.name |
If import = 'yes', file name of excel file containing continuous and categorical data |
pval_cont |
'yes'/'no' indicating if data will be used for pval_cont_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
match |
'yes'/'no' indicating if data will be used for match_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
cohort |
'yes'/'no' indicating if data will be used for cohort_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
anova |
'yes'/'no' indicating if data will be used for anova_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
dir.cont |
If import = 'yes', path to location of excel file for continuous data |
file.name.cont |
If import = 'yes', file name of excel file containing continuous data |
sheet.name.cont |
Sheet name containing continuous data |
range.name.cont |
Range of cells containing continuous data. Can be in format 'a1:b20' or 'a:b' |
format.cont |
'wide'/'long' indicating continuous data is in wide or long format |
cat |
'yes'/'no' indicating if data will be used for cat_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
sr |
'yes'/'no' indicating if data will be used for sr_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
cat_all |
'yes'/'no' indicating if data will be used for cat_all_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
pval_cat |
'yes'/'no' indicating if data will be used for cat_all_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
cat.names |
names of variables to be used in cat_fn and sr_fn |
dir.cat |
If import = 'yes', path to location of excel file for categorical data |
file.name.cat |
If import = 'yes', file name of excel file containing categorical data |
sheet.name.cat |
Sheet name containing categorical data |
range.name.cat |
Range of cells containing categorical. Can be in format 'a1:b20' or 'a:b' |
format.cat |
'wide'/'long' indicating categorical data is in wide or long format |
generic |
'yes'/'no' indicating if data to be loaded for generic use |
gen.vars.keep |
Vector of variables in data to keep |
gen.vars.del |
Vector of variables in data to delete |
verbose |
TRUE/FALSE TRUE indicates comments will be printed during loading |
Details
Function can load continuous or categorical data.
Continuous data can be used for comparison of baseline p-values (pval_cont_fn),
matching summary stats within a trial (match_fn), matching summary stats in different cohorts (cohort_fn),
or comparing means of baseline p-values (anova_fn).
Categorical data can be used for comparisons of observed with expected distributions for single variable (cat_fn),
for group numbers in trials using simple randomisation (sr_fn), for all variables (cat_all_fn), and for comparison
of baseline p-values (pval_cat_fn).
There is one function in development that allows assessment of proportion of final digits in summary statistics (final_digit_fn). This function works using summary statistics but could be adapted to use on raw continuous or categorical data.
Only 1 continuous and/or 1 categorical data set allowed per load to avoid clashes
Data can be imported from a file (import = "yes") or taken from an existing data frame, import = "no"
If loading from an existing data use file.cont and file.cat
If loading from common directory or file, can use dir and file.name rather than more specific dir.cont, dir.cat, file.name.cont, or file.name.cat.
Comments about each indicator:
pval_cont
loads continuous data for pval_cont_fn, outputs as list of 1 containing named data frame pval_cont_data.
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation,
p = baseline p value (can omit if not reported)
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3..., p
long: study, var, group, m , s, n , p
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
match
loads continuous data for match_fn, outputs as list of 1 containing named data frame match_data
remainder is same as for pval_cont above.
only difference between pval_cont and match is that match allows for missing mean or SD whereas pval_cont does not
format should be study, variable or var, n, m, s. Can be in any order. n = sample size, m = mean, s = standard deviation
can be in wide or long format
wide: study, var, n1, n2, m1, m2, s1, s2, p
long: study, var, group, m , s, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cohort
loads continuous data for cohort_fn, outputs as list of 1 containing named data frame cohort_data
same as pval_cont but allows a lookup variable for variable names
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3...
long: study, var, group, m , s, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
lookup table is var_name_final, var_name_orig and allows you to specify a list of all variables names (var_name_orig)
from all studies and a lookup table of standardised names (var_name_final) allowing different names in different studies to
be standardised
has optional variable 'population' which can be used to subset the data if trials in different populations are reported
anova
loads continuous data for anova_fn, outputs as list of 1 containing named data frame anova_data
same as for pval_cont above but allows for optional value for decimal place
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation,
d= decimal place of mean (if omitted, this is calculated automatically in anova_fn)
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3..., d
long: study, var, group, m , s, n , d
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cat
loads categorical data for cat_fn, outputs as list of 1 containing named data frame cat_data
format should be study, n, v. Can be in any order, n= group size, v= number with characteristic
can be in wide or long format
wide: study, n1, n2, n3 ..., v1, v2, v3...
long: study, group, n, v
group or g or grp required for long format
use cat.names to name variable eg c("n", "v") , c("n", "g") ...
separators (eg n1 n_1 n.1) are stripped and replaced
sr
loads categorial data for sr_fn, outputs as list of 1 containing named data frame sr_data
as for cat but only requires study and n
format should be study, n. n= group size
can be in wide or long format
wide: study, n1, n2, n3 ...
long: study, group, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cat_all
loads categorical data for cat_all_fn, outputs as list of 1 containing named data frame cat_all_data
format should be study, var or variable, n, N, level, stat, recode, p. Can be in any order, n = number with characteristic, N = group size,
p = baseline p value (can omit if not reported), can use "ns" for not significant or "<" or ">" to indicate threshold (eg "<0.05")
optional level - number for level of variable (eg y/n =1,2; high/med/low =1,2,3)
optional recode- for variables with >2 levels to tell how to recode into 2 groups
optional stat: statistical test used for p-value : chisq - Chisquare, chisqc- Chisquare with correction,
fisher- Fisher's exact, midp - midp -calculated using two different methods, lr- likelihood ratio,
mh - Mantel-Haenszel test
can be in wide or long format
wide study, var, n1, n2, n3, ... N1, N2, N3... p, stat, level, recode
long study, var, group, n, N, p, stat, level, recode
group or g or grp required for long format
if variable has 2 levels, only 1 required, other will be calculated.
separators (eg n1 n_1 n.1) are stripped and replaced
pval_cat
loads categorical data for pval_cat_fn, outputs as list of 1 containing named data frame pval_cat_data
as for cat_all but recode variable is not generated
format should be study, var or variable, n, N, p. Can be in any order, n = number with characteristic, N = group size,
p = baseline p value (can omit if not reported), can use "ns" for not significant or "<" or ">" to indicate threshold (eg "<0.05")
optional level - number for level of variable (eg y/n =1,2; high/med/low =1,2,3)
optional stat: statistical test used for p-value : chisq - Chisquare, fisher- Fisher's exact
can be in wide or long format
wide study, var, n1, n2, n3, ... N1, N2, N3... p, stat, level
long study, var, group, n, N, p, stat, level
group or g or grp required for long format
if variable has 2 levels, only 1 required, other will be calculated.
separators (eg n1 n_1 n.1) are stripped and replaced
generic
loads data for use generic use, outputs as list of 1 containing named data frame generic_data
use cont suffixes for file details: dir.cont (or dir), file.name.cont (or file.name), sheet.name,cont, range.name.cont)
format should be study, var or variable, variable names
optional gen.vars.keep = vector of variables to keep
optional gen.vars.del = vector of variables to delete
can be in wide or long format
wide study, var, a1, a2..., b1, b2 ...
long study, var, group, a, b, ....
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
no data checking or other transformations take place
Value
list containing a named data frame containing data in suitable format for appropriate function as described in Details
Examples
# examples of loading data for each function are given in the individual functions.
# Here is one- for pval_cont_fn():
pval_cont_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", pval_cont= "yes",
format.cont = "wide")$pval_cont_data
# to import an excel spreadsheet (modify using local path,
# file and sheet name, range, and format):
# get path for example files
path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised",
mustWork = TRUE)
# delete file name from path
path <- sub("/[^/]+$", "", path)
# load data
pval_cont_data <- load_clean(import= "yes", pval_cont = "yes", dir = path,
file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont",
range.name.cont = "A1:O51", format.cont = "wide")$pval_cont_data