association_parametric {Unico} | R Documentation |
Performs parametric statistical testing
Description
Performs parametric statistical testing (T-test) on (1) the marginal effect of each covariate in C1
at source-specific level (2) the joint effect across all sources for each covariate in C1
(3) non-source-specific effect for each covariate in C2
. In the context of bulk genomic data containing a mixture of cell types, these correspond to the marginal effect of each covariate in C1
(potentially including the phenotype of interest) at each cell type, joint tissue-level effect for each covariate in C1
, and tissue-level effect for each covariate in C2
.
Usage
association_parametric(
X,
Unico.mdl,
slot_name = "parametric",
diag_only = FALSE,
intercept = TRUE,
X_max_stds = 2,
Q_max_stds = Inf,
XQ_max_stds = Inf,
parallel = TRUE,
num_cores = NULL,
log_file = "Unico.log",
verbose = FALSE,
debug = FALSE
)
Arguments
X |
An |
Unico.mdl |
The entire set of model parameters estimated by Unico on the 2D mixture matrix (i.e. the list returned by applying function |
slot_name |
A string indicating the key for storing the results under |
diag_only |
A logical value indicating whether to only use the estimated source-level variances (and thus ignoring the estimate covariance) for controlling the heterogeneity in the observed mixture. if set to FALSE, Unico instead estimates the observation- and feature-specific variance in the mixture by leveraging the entire |
intercept |
A logical value indicating whether to fit the intercept term when performing the statistical testing. |
X_max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the observed mixture value. Only samples whose observed mixture value fall within |
Q_max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the estimated mixture variance. Only samples whose estimated mixture variance fall within |
XQ_max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the weighted mixture value. Only samples whose weighted mixture value fall within |
parallel |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |
num_cores |
A numeric value indicating the number of cores to use (activated only if |
log_file |
A path to an output log file. Note that if the file |
verbose |
A logical value indicating whether to print logs. |
debug |
A logical value indicating whether to set the logger to a more detailed debug level; set |
Details
If we assume that source-specific values Z_{ijh}
are normally distributed, under the Unico model, we have the following:
Z_{ij} \sim \mathcal{N}\left(\mu_{j} + (c_i^{(1)})^T \gamma_{jh}, \sigma_{jh}^2 \right)
X_{ij} \sim \mathcal{N}\left(w_{i}^T (\mu_{j} + (c_i^{(1)})^T \gamma_{jh}) + (c_i^{(2)})^T \beta_j, \text{Sum}\left((w_i w_i^T ) \odot \Sigma_j\right) + \tau_j^2\right)
For a given feature j
under test, the above equation corresponds to a heteroskedastic regression problem with X_{ij}
as the dependent variable and \{\{w_i\}, \{w_i c_i^{(1)}\}, \{c_i^{(2)}\}\}
as the set of independent variables.
This view allows us to perform parametric statistical testing (T-test for marginal effects and partial F-test for joint effects) by solving a generalized least squares problem with sample i
scaled by the inverse of its estimated standard deviation.
Value
An updated Unico.mdl
object with the the following list of effect size and p-value estimates stored in an additional key specified by slot_name
gammas_hat |
An |
betas_hat |
An |
gammas_hat_pvals |
An |
betas_hat_pvals |
An |
gammas_hat_pvals.joint |
An |
Q |
An |
masks |
An |
phi_hat |
An |
phi_se |
An |
phi_hat_pvals |
An |
Examples
data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)
res$params.hat = association_parametric(data$X, res$params.hat, parallel=FALSE, log_file=NULL)