association_asymptotic {Unico} | R Documentation |
Performs asymptotic statistical testing under no distribution assumption
Description
Performs asymptotic statistical testing on (1) the marginal effect of each covariate in C1
at source-specific level (2) non-source-specific effect for each covariate in C2
. In the context of bulk genomic data containing a mixture of cell types, these correspond to the marginal effect of each covariate in C1
(potentially including the phenotype of interest) at each cell type and tissue-level effect for each covariate in C2
.
Usage
association_asymptotic(
X,
Unico.mdl,
slot_name = "asymptotic",
diag_only = FALSE,
intercept = TRUE,
X_max_stds = 2,
Q_max_stds = Inf,
V_min_qlt = 0.05,
parallel = TRUE,
num_cores = NULL,
log_file = "Unico.log",
verbose = FALSE,
debug = FALSE
)
Arguments
X |
An |
Unico.mdl |
The entire set of model parameters estimated by Unico on the 2D mixture matrix (i.e. the list returned by applying function |
slot_name |
A string indicating the key for storing the results under |
diag_only |
A logical value indicating whether to only use the estimated source-level variances (and thus ignoring the estimate covariance) for controlling the heterogeneity in the observed mixture. if set to FALSE, Unico instead estimates the observation- and feature-specific variance in the mixture by leveraging the entire |
intercept |
A logical value indicating whether to fit the intercept term when performing the statistical testing. |
X_max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the observed mixture value. Only samples whose observed mixture value fall within |
Q_max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the estimated mixture variance. Only samples whose estimated mixture variance fall within |
V_min_qlt |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the estimated moment condition variance. This value should be between 0 and 1. Only samples whose estimated moment condition variance fall outside the bottom |
parallel |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |
num_cores |
A numeric value indicating the number of cores to use (activated only if |
log_file |
A path to an output log file. Note that if the file |
verbose |
A logical value indicating whether to print logs. |
debug |
A logical value indicating whether to set the logger to a more detailed debug level; set |
Details
Under no distribution assumption, we can solve for the following weighted least square problem, which is similar to the heteroskedastic regression view described in association_parametric.
\hat{\phi_j}^{\text{asym}} = \text{argmin}_{\phi_j} (x_j - S\phi_j) ^T Q_j (x_j - S\phi_j)
S
denotes the design matrix formed by stacking samples in the rows and dependent variables \{\{w_i\}, \{w_i c_i^{(1)}\}, \{c_i^{(2)}\}\}
on the columns.
\phi_j
denotes the corresponding effect sizes on the dependent variables.
Q_j
denotes the feature-specific weighting scheme. Similar to the parametric counterpart, Q_j=\text{diag}(q_{1j}^2,...,q_{nj}^2)
, where for each sample i
, its corresponding weight will be the inverse of the estimated variance in the mixture: q_{ij}^2 = \frac{1}{sum(w_i w_i^T \odot \hat{\Sigma}_j)}
.
Marginal testing can thus be carried out on each dependent variable via the asymptotic distribution of the estimator \hat{\phi_j}^{\text{asym}}
.
Value
An updated Unico.mdl
object with the the following list of effect size and p-value estimates stored in an additional key specified by slot_name
gammas_hat |
An |
betas_hat |
An |
gammas_hat_pvals |
An |
betas_hat_pvals |
An |
Q |
An |
masks |
An |
fphi_hat |
An |
phi_hat |
An |
phi_se |
An |
phi_hat_pvals |
An |
Examples
data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)
res$params.hat = association_asymptotic(data$X, res$params.hat, parallel=FALSE, log_file=NULL)