adjust_batch {batchtma} | R Documentation |
Adjust for batch effects
Description
adjust_batch
generates biomarker levels for the variable(s)
markers
in the dataset data
that are corrected
(adjusted) for batch effects, i.e. differential measurement
error between levels of batch
.
Usage
adjust_batch(
data,
markers,
batch,
method = c("simple", "standardize", "ipw", "quantreg", "quantnorm"),
confounders = NULL,
suffix = "_adjX",
ipw_truncate = c(0.025, 0.975),
quantreg_tau = c(0.25, 0.75),
quantreg_method = "fn"
)
Arguments
data |
Data set |
markers |
Variable name(s) to batch-adjust. Select
multiple variables with tidy evaluation, e.g.,
|
batch |
Categorical variable indicating batch. |
method |
Method for batch effect correction:
|
confounders |
Optional: Confounders, i.e. determinants of
biomarker levels that differ between batches. Only used if
|
suffix |
Optional: What string to append to variable names
after batch adjustment. Defaults to
|
ipw_truncate |
Optional and used for |
quantreg_tau |
Optional and used for |
quantreg_method |
Optional and used for |
Details
If no true differences between batches are expected, because
samples have been randomized to batches, then a method
that returns adjusted values with equal means
(method = simple
) or with equal rank values
(method = quantnorm
) for all batches is appropriate.
If the distribution of determinants of biomarker values
(confounders
) differs between batches, then a
method
that retains these "true" differences
between batches while adjusting for batch effects
may be appropriate: method = standardize
and
method = ipw
address means; method = quantreg
addresses lower values and dynamic range separately.
Which method
to choose depends on the properties of
batch effects (affecting means or also variance?) and
the presence and strength of confounding. For the two
mean-only confounder-adjusted methods, the choice may depend
on whether the confounder–batch association (method = ipw
)
or the confounder–biomarker association
(method = standardize
) can be modeled better.
Generally, if batch effects are present, any adjustment
method tends to perform better than no adjustment in
reducing bias and increasing between-study reproducibility.
See references.
All adjustment approaches except method = quantnorm
are based on linear models. It is recommended that variables
for markers
and confounders
first be transformed
as necessary (e.g., log
transformations or
splines
). Scaling or mean centering are not necessary,
and adjusted values are returned on the original scale.
Parameters markers
, batch
, and confounders
support tidy evaluation.
Observations with missing values for the markers
and
confounders
will be ignored in the estimation of adjustment
parameters, as are empty batches. Batch effect-adjusted values
for observations with existing marker values but missing
confounders are based on adjustment parameters derived from the
other observations in a batch with non-missing confounders.
Value
The data
dataset with batch effect-adjusted
variable(s) added at the end. Model diagnostics, using
the attribute .batchtma
of this dataset, are available
via the diagnose_models
function.
Author(s)
Konrad H. Stopsack
References
Stopsack KH, Tyekucheva S, Wang M, Gerke TA, Vaselkiv JB, Penney KL, Kantoff PW, Finn SP, Fiorentino M, Loda M, Lotan TL, Parmigiani G+, Mucci LA+ (+ equal contribution). Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays. bioRxiv 2021.06.29.450369; doi: https://doi.org/10.1101/2021.06.29.450369 (This R package, all methods descriptions, and further recommendations.)
Rosner B, Cook N, Portman R, Daniels S, Falkner B.
Determination of blood pressure percentiles in
normal-weight children: some methodological issues.
Am J Epidemiol 2008;167(6):653-66. (Basis for
method = standardize
)
Bolstad BM, Irizarry RA, Åstrand M, Speed TP.
A comparison of normalization methods for high density
oligonucleotide array data based on variance and bias.
Bioinformatics 2003;19:185–193. (method = quantnorm
)
See Also
https://stopsack.github.io/batchtma/
Examples
# Data frame with two batches
# Batch 2 has higher values of biomarker and confounder
df <- data.frame(
tma = rep(1:2, times = 10),
biomarker = rep(1:2, times = 10) +
runif(max = 5, n = 20),
confounder = rep(0:1, times = 10) +
runif(max = 10, n = 20)
)
# Adjust for batch effects
# Using simple means, ignoring the confounder:
adjust_batch(
data = df,
markers = biomarker,
batch = tma,
method = simple
)
# Returns data set with new variable "biomarker_adj2"
# Use quantile regression, include the confounder,
# change suffix of returned variable:
adjust_batch(
data = df,
markers = biomarker,
batch = tma,
method = quantreg,
confounders = confounder,
suffix = "_batchadjusted"
)
# Returns data set with new variable "biomarker_batchadjusted"