tcareg {TCA} | R Documentation |
Fitting a TCA regression model
Description
TCA regression allows to test, under several types of statistical tests, the effects of source-specific values on an outcome of interest (or on mediating components thereof). For example, in the context of tissue-level bulk DNA methylation data coming from a mixture of cell types (i.e. the input is methylation sites by individuals), tcareg
allows to test for cell-type-specific effects of methylation on outcomes of interest (or on mediating components thereof).
Usage
tcareg(
X,
tca.mdl,
y,
C3 = NULL,
test = "marginal_conditional",
null_model = NULL,
alternative_model = NULL,
save_results = FALSE,
fast_mode = TRUE,
output = "TCA",
sort_results = FALSE,
parallel = FALSE,
num_cores = NULL,
log_file = "TCA.log",
features_metadata = NULL,
debug = FALSE,
verbose = TRUE
)
Arguments
X |
An |
tca.mdl |
The value returned by applying tca to |
y |
An |
C3 |
An |
test |
A character vector with the type of test to perform on each of the features in |
null_model |
A vector with a subset of the names of the sources in |
alternative_model |
A vector with a subset (or all) of the names of the sources in |
save_results |
A logical value indicating whether to save the returned results in a file. If |
fast_mode |
A logical value indicating whether to use a fast version of TCA regression, in which source-specific-values are first estimated using the |
output |
Prefix for output files (activated only if |
sort_results |
A logical value indicating whether to sort the results by their p-value (i.e. features with lower p-value will appear first in the results). This option is not available if |
parallel |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |
num_cores |
A numeric value indicating the number of cores to use (activated only if |
log_file |
A path to an output log file. Note that if the file |
features_metadata |
A path to a csv file containing metadata about the features in |
debug |
A logical value indicating whether to set the logger to a more detailed debug level; set |
verbose |
A logical value indicating whether to print logs. |
Details
TCA models Z_{hj}^i
as the source-specific value of observation i
in feature j
coming from source h
(see tca for more details). A TCA regression model tests an outcome Y
for a linear statistical relation with the source-specific values of a feature j
by assuming:
Y_i = \alpha_{j,0} + \sum_{h=1}^k\beta_{hj} Z_{hj}^i + c_i^{(3)}\alpha_{j} + e_i
where \alpha_{j,0}
is an intercept term, \beta_{hj}
is the effect of source h
, c_i^{(3)}
and \alpha_j
correspond to the p_3
covariate values of observation i
(i.e. a row vector from C3
) and their effect sizes, respectively, and e_i \sim N(0,\phi^2)
. In practice, if fast_mode == FALSE
then tcareg
fits this model using the conditional distribution Y|X
, which, effectively, integrates over the random Z_{hj}^i
. Statistical significance is then calculated using a likelihood ratio test (LRT).
Alternatively, in case fast_mode == TRUE
the above model is fitted by first learning point estimates for Z_{hj}^i
using the tensor function and then assessing statistical significance using T-tests and partial F-tests under a standard regression framework. This alternative provides a substantial boost in speed.
Note that the null and alternative models will be set automatically, except when test == 'custom'
, in which case they will be set according to the user-specified null and alternative hypotheses.
Under the TCA regression model, several statistical tests can be performed by setting the argument test
according to one of the following options:
1. If test == 'marginal'
, tcareg
will perform the following for each source l
. For each feature j
, \beta_{lj}
will be estimated and tested for a non-zero effect, while assuming \beta_{hj}=0
for all other sources h\neq l
.
2. If test == 'marginal_conditional'
, tcareg
will perform the following for each source l
. For each feature j
, \beta_{lj}
will be estimated and tested for a non-zero effect, while also estimating the effect sizes \beta_{hj}
for all other sources h\neq l
(thus accounting for covariances between the estimated effects of different sources).
3. If test == 'joint'
, tcareg
will estimate for each feature j
the effect sizes of all k
sources \beta_{1j},…,\beta_{kj}
and then test the set of k
estimates of each feature j
for a joint effect.
4. If test == 'single_effect'
, tcareg
will estimate for each feature j
the effect sizes of all k
sources \beta_{1j},…,\beta_{kj}
, under the assumption that \beta_{1j} = … = \beta_{kj}
, and then test the set of k
estimates of each feature j
for a joint effect.
5. If test == 'custom'
, tcareg
will estimate for each feature j
the effect sizes of a predefined set of sources (defined by a user-specified alternative model) and then test their estimates for a joint effect, while accounting for a nested predefined set of sources (defined by a user-specified null model).
Value
A list with the results of applying the TCA regression model to each of the features in X
. If test == 'marginal'
or (test == 'marginal_conditional'
and fast_mode == FALSE
) then a list of k
such lists of results are returned, one for the results of each source.
phi |
An estimate of the standard deviation of the i.i.d. component of variation in the TCA regression model. |
beta |
A matrix of effect size estimates for the source-specific effects, such that each row corresponds to the estimated effect sizes of one feature in |
intercept |
An |
alpha |
An |
null_ll |
An |
alternative_ll |
An |
stats |
An |
df |
The degrees of freedom for deriving p-values. |
pvals |
An |
qvals |
An |
References
Rahmani E, Schweiger R, Rhead B, Criswell LA, Barcellos LF, Eskin E, Rosset S, Sankararaman S, Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nature Communications 2019.
Examples
n <- 50
m <- 10
k <- 3
p1 <- 1
p2 <- 1
data <- test_data(n, m, k, p1, p2, 0.01)
tca.mdl <- tca(X = data$X, W = data$W, C1 = data$C1, C2 = data$C2)
y <- matrix(rexp(n, rate=.1), ncol=1)
rownames(y) <- rownames(data$W)
# marginal conditional test:
res0 <- tcareg(data$X, tca.mdl, y)
# joint test:
res1 <- tcareg(data$X, tca.mdl, y, test = "joint")
# custom test, testing for a joint effect of sources 1,2 while accounting for source 3
res2 <- tcareg(data$X, tca.mdl, y, test = "custom", null_model = c("3"),
alternative_model = c("1","2","3"))
# custom test, testing for a joint effect of sources 1,2 assuming no effects under the null
res3 <- tcareg(data$X, tca.mdl, y, test = "custom", null_model = NULL,
alternative_model = c("1","2"))