metaumbrella-package {metaumbrella} | R Documentation |
metaumbrella: An Umbrella Review Package for R
Description
The metaumbrella package offers several facilities to assist in data analysis when performing an umbrella review.
This package is built around three core functions which automatically perform the statistical analyses required for an umbrella review (the umbrella()
function), stratify the evidence according to various classification criteria (the add.evidence()
function) and generate a graphical presentation of the results (the forest()
function).
The
umbrella()
function automatically performs meta-analyses and additional calculations needed for an umbrella review. It outputs an object of class “umbrella”. The advantage of this function over standard R packages only designed for fitting a single meta-analysis lies, for example, in the possibility of automatically fitting several meta-analyses when input information differs, automatically extracting the necessary information to stratify the evidence, and automatically performing the additional tests needed (a test for excess significance, a test for publication bias and a jackknife leave-one-out analysis).The
add.evidence()
function stratifies the evidence generated by theumbrella()
function according to a set of pre-specified criteria (those proposed by Prof. Ioannidis or an algorithmic version of GRADE classification), or according to a personalized classification that the users may specify manually. This feature allows users to rely on already developed criteria or to develop new ones that match the specific needs of their umbrella review.The
forest()
function creates graphical representations of the results of an umbrella review, including a forest plot along with information on the stratification of evidence.
Well-formatted dataset
One of the specificities of the metaumbrella package is that all the functions of this package do not have an argument to specify the name of the variables contained in the dataset of the users.
Therefore, it is necessary that the datasets that are passed to the different functions of the package respect a very precise formatting (which we will refer to as well-formatted dataset
).
We present here the rules that must be respected when creating a well-formatted dataset.
The datasets passed to the functions of the metaumbrella package should contain information on each individual study pooled in the different meta-analyses included in the umbrella review. The information about each individual study must allow for replication of the meta-analyses. It is therefore necessary that the information contained in a well-formatted dataset allows for estimating the effect size and variance of all individual studies. Ten types of effect size measures are accepted:
-
"SMD": standardized mean difference (i.e., Cohen's d)
-
"G": Hedges' g
-
"MD": mean difference
-
"SMC": standardized mean change
-
"R": Pearson's correlation
-
"Z": Fisher's z
-
"OR" or "logOR": odds ratio or its logarithm
-
"RR" or "logRR": risk ratio or its logarithm
-
"HR" or "logHR": hazard ratio or its logarithm
-
"IRR" or "logIRR": incidence rate ratio or its logarithm
To estimate the effect size and the variance of each individual study, the metaumbrella package allows for flexible inputs.
We detail below (A) the variables that are mandatory and must be indicated in a well-formatted dataset, (B) the variables that vary depending on the effect size measure and (C) the variables that are optional but that can be indicated to benefit from certain features of the package.
Note that the package includes examples of well-formatted datasets for each effect size measure (df.SMD
, df.SMC
, df.R
, df.OR
, df.RR
, df.HR
and df.IRR
).
A. Mandatory variables
The following variables must be included in the dataset regardless of the effect size measure used. The name of these variables (in bold) cannot be changed.
-
meta_review: a character variable that contains an identifier for the sources of the meta-analyses included in an umbrella review. Typically, this variable contains the name of the first-author of the included meta-analyses.
-
factor: a character variable that contains an identifier for the risk factors or the interventions whose effect are studied. Importantly, all rows in the dataset with the same
factor
value will be pooled together in a meta-analysis. -
author and year: character variables identifying the name and the year of publication of each individual study that is included in a meta-analysis. For a given factor, all rows with the same author and year values will be identified as having some type of dependence (see below).
-
measure: a character variable describing the type of effect size measure used to quantify the effect of the factor and it must be either "SMD", "MD", "G", "SMC", "R", "Z", "OR", "logOR", "RR", "logRR", "HR", "logHR", "IRR" or "logIRR". Note here that if a study reports the numbers of cases and controls in exposed and non-exposed groups but does not report an effect size value (i.e., the value of an OR or RR), we recommend specifying "OR" for case-control studies while "RR" for cohort studies.
B. Required information depending on the effect size measure
Depending on the effect size measure used, different information must be provided to replicate the meta-analyses. To allow users adapting to the data available in the original articles, several combinations of information can be provided for a given effect size measure. We detail the information that can provided in the dataset to replicate the meta-analyses and we provide several summary tables displaying the various combinations of minimum information required to replicate the meta-analyses.
-
value: Value of the effect size for each individual study.
-
ci_lo: Lower bound of the 95% confidence interval around the effect size for each individual study.
-
ci_up: Upper bound of the 95% confidence interval around the effect size for each individual study.
-
n_sample: Total number of participants in each individual study.
-
n_cases: Number of cases in each individual study.
-
n_controls: Number of controls in each individual study.
-
n_exp: Number of exposed participants in each individual study.
-
n_nexp: Number of non-exposed participants in each individual study.
-
n_cases_exp: Number of cases in the exposed group in each individual study.
-
n_controls_exp: Number of controls in the exposed group in each individual study.
-
n_cases_nexp: Number of cases in the non-exposed group in each individual study.
-
n_controls_nexp: Number of controls in the non-exposed group in each individual study.
-
mean_pre_cases: Mean of the cases at baseline for each individual study.
-
mean_pre_controls: Mean of the controls at baseline for each individual study.
-
sd_pre_cases: Standard deviation of the cases at baseline for each individual study.
-
sd_pre_controls: Standard deviation of the controls at baseline for each individual study.
-
pre_post_cor: Correlation between the pre-test and post-test scores (across groups) for each individual study.
-
mean_cases: Mean of the cases (at follow up) for each individual study.
-
mean_controls: Mean of the controls (at follow up) for each individual study.
-
sd_cases: Standard deviation of the cases (at follow up) for each individual study.
-
sd_controls: Standard deviation of the controls (at follow up) for each individual study.
-
time: Sum of the person-time of disease-free observation in the exposed and non-exposed groups for each individual study.
-
time_exp: Person-time of disease-free observation in the exposed group for each individual study.
-
time_nexp: Person-time of disease-free observation in the non-exposed group for each individual study.
We now present the summary tables indicating the minimum combination of information that should be provided for each individual study to run the analyses.
The symbol X
indicates that the information is provided in a dataset.
The symbol +
between two information indicates that the two information are mandatory.
The symbol |
between two information indicates that only one of the two information is required.
For each effect size measure, users must provide information on at least one row of the table corresponding to the effect size measure used.
Note that users can provide different combination of information for a same factor (e.g., it is possible to include the SMD value + 95% CI + sample sizes for a study and the means/SDs + sample sizes for another study within the same factor).
1. "SMD"
mean_cases + mean_controls + |
||||
sd_cases + sd_controls | n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | X | - | - | - |
- | X | X | - | - |
- | X | X | X | - |
- | X | X | - | X |
2. "G"
n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | X | - | - |
X | X | X | - |
X | X | - | X |
3. "MD"
n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | X | X | - |
X | X | - | X |
4. "SMC"
mean_pre_cases + |
||||
mean_pre_controls + |
||||
sd_pre_cases + |
||||
sd_pre_controls + |
||||
mean_cases + |
||||
mean_controls + |
||||
sd_cases + |
||||
sd_controls + |
||||
pre_post_cor | n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | X | - | - | - |
- | X | X | X | - |
- | X | X | - | X |
mean_change_cases + |
|
mean_change_controls + |
|
sd_change_cases + |
|
sd_change_controls | n_cases + n_controls |
X | X |
5. "R"
n_sample | value | se | var | ci_lo + ci_up |
X | X | - | - |
X | X | X | - |
X | X | - | X |
6. "Z"
n_sample | value | se | var | ci_lo + ci_up |
X | X | - | - |
X | X | X | - |
X | X | - | X |
7. "OR" or "logOR"
n_cases_exp + |
|||||
n_controls_exp + |
|||||
n_cases_nexp + |
|||||
n_controls_nexp | n_exp + n_nexp | n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | - | - | - | - | - |
- | - | X | X | - | - |
- | - | X | X | X | - |
- | - | X | X | - | X |
- | X | - | X | X | - |
- | X | - | X | - | X |
8. "RR" or "logRR"
n_cases_exp + n_controls_exp + |
||||
n_cases_nexp + n_controls_nexp | n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | - | - | - | - |
- | X | X | X | - |
- | X | X | - | X |
9. "HR" or "logHR"
n_cases + n_controls | value | se | var | ci_lo + ci_up |
X | X | X | - |
X | X | - | X |
10. "IRR" or "logIRR"
n_cases_exp + n_cases_nexp + |
|||||
time_exp + time_nexp | n_cases | time | value | se | var | ci_lo + ci_up |
X | - | - | - | - | - |
- | X | X | X | X | - |
- | X | X | X | - | X |
C. Optional variables
The following variables do not have to be included in a well-formatted dataset but they can be added to benefit from certain features of the functions. The name of these variables (in bold) cannot be changed.
-
multiple_es: Reason for the presence of several effect sizes for a unique study (i.e., a study with the same author and year values within the same factor). It must be either "groups" or "outcomes". An example of a well-formatted dataset with multiple outcomes/groups can be found here (
df.OR.multi
) and an example of analysis of a dataset with dependent effect sizes is available in a vignette of the package.-
groups
: When "groups" is indicated, it is assumed that the multiple effect sizes for a unique study come from independent subgroups. A unique effect size per study is calculated using the Borenstein's (2009) approach. For each study, the sample size is obtained by summing up all participants from the different groups. -
outcomes
: When "outcomes" is indicated, it is assumed that the multiple effect sizes come from multiple outcomes (or time-points) measured within the same sample. Again, a unique effect size per study is calculated using the Borenstein's (2009) approach. Strength of the correlation between the outcomes (or time-points) can be indicated using either ther
column in your dataset (see below) or ther
argument of theumbrella()
function. Indicating the strength of the correlation between the outcomes of a study in ther
column allows to use different values depending on the study. In contrast, using ther
argument ofumbrella()
function allows to conveniently set a unique correlation for all studies that do not have any value in ther
column. For each study, the sample size is obtained by taking the largest sample size for one outcome/time-point.
-
-
r: When a study reports multiple effect sizes coming from the measurement of several outcomes (or measurements of the same outcome at different time-points) in the same participants, the
r
column can be used to indicate the value of the correlation coefficient between the effect sizes of a given study. The r value should be (i) within the (-1, 1) range, (ii) constant within a study, and (iii) set asNA
for studies which do not include multiple effect sizes coming from different outcomes/time-points. -
shared_nexp: In some situations, several studies share participants from the same non-exposed group but compare this group to various exposed groups. When several studies in the same factor share a same non-exposed group, they should be identified as such by having the same
shared_nexp
value. Identifying studies sharing the same non-exposed group allows to adjust calculations (the size of the shared sample is divided by the number of studies sharing the sample). Studies not sharing their non-exposed group should have aNA
(or a unique) value in theshared_nexp
column. -
shared_controls: In some situations, several studies share participants from the same control group but compare this group to various experimental groups. When several studies in the same factor share a same control group, they should be identified as such by having the same
shared_control
value. Identifying studies sharing the same control group allows to adjust calculations (the size of the shared sample is divided by the number of studies sharing the sample). Studies not sharing their control group should have aNA
(or a unique) value in theshared_controls
column. -
pre_post_cor: The value of the correlation coefficient between baseline and follow-up scores in pre-post studies. You should indicate the mean pre-post correlation across groups. Only needed when using the SMC measure.
-
reverse_es: Whether users want to reverse the effect size of a study. All rows with a
"reverse"
value in this column will have the direction of their effect size flipped (e.g., an OR of 0.5 will be expressed as 2). Note that the reverse_es column has an action on both the direction of the value of an effect size and on the information used to calculate an effect size (e.g., if the means and SDs of experimental and control groups are reported, the mean and SD of the experimental group are used as the mean and SD of the control group and vice-versa). This feature is particularly useful to facilitate the presentation of the results when several meta-analyses report the same effects in opposite direction. -
rob: The risk of bias of each individual study. Should be either "high", "low" or "unclear". These values are used to generate the "GRADE" classification and to stratify evidence according to the 'rob' criteria in the 'Personalized' classification. Studies with a missing rob are assumed to be at high risk of bias. The approach used to provide a categorical judgment ("low" vs. "unclear" vs. "high) on the risk of bias of a study is left to the user.
-
amstar: The amstar score of the meta-analysis. Note that the amstar score should be constant for a given factor. These values are used only to stratify evidence according to the 'amstar' criteria in the 'Personalized' classification.
-
analysis: Whether users want to conduct specific analyses. For now, only the
"allelic"
value can be specified, which multiplies by two the number of cases and controls. -
discard: Whether a particular row should be removed from the analyses (any row with a "yes" or TRUE value in the
discard
column will be removed).