R: Performs asymptotic statistical testing under no distribution...

association_asymptotic {Unico}

R Documentation

Performs asymptotic statistical testing under no distribution assumption

Description

Performs asymptotic statistical testing on (1) the marginal effect of each covariate in C1 at source-specific level (2) non-source-specific effect for each covariate in C2. In the context of bulk genomic data containing a mixture of cell types, these correspond to the marginal effect of each covariate in C1 (potentially including the phenotype of interest) at each cell type and tissue-level effect for each covariate in C2.

Usage

association_asymptotic(
  X,
  Unico.mdl,
  slot_name = "asymptotic",
  diag_only = FALSE,
  intercept = TRUE,
  X_max_stds = 2,
  Q_max_stds = Inf,
  V_min_qlt = 0.05,
  parallel = TRUE,
  num_cores = NULL,
  log_file = "Unico.log",
  verbose = FALSE,
  debug = FALSE
)

Arguments

`X`	An `m` by `n` matrix of measurements of `m` features for `n` observations. Each column in `X` is assumed to be a mixture of `k` sources. Note that `X` must include row names and column names and that NA values are currently not supported. `X` should not include features that are constant across all observations. Note that `X` must be the same `X` used to learn `Unico.mdl` (i.e. the original observed 2D mixture used to fit the model).
`Unico.mdl`	The entire set of model parameters estimated by Unico on the 2D mixture matrix (i.e. the list returned by applying function `Unico` to `X`).
`slot_name`	A string indicating the key for storing the results under `Unico.mdl`
`diag_only`	A logical value indicating whether to only use the estimated source-level variances (and thus ignoring the estimate covariance) for controlling the heterogeneity in the observed mixture. if set to FALSE, Unico instead estimates the observation- and feature-specific variance in the mixture by leveraging the entire `k` by `k` variance-covariance matrix.
`intercept`	A logical value indicating whether to fit the intercept term when performing the statistical testing.
`X_max_stds`	A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the observed mixture value. Only samples whose observed mixture value fall within `X_max_stds` standard deviations from the mean will be used for the statistical testing of a given feature.
`Q_max_stds`	A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the estimated mixture variance. Only samples whose estimated mixture variance fall within `Q_max_stds` standard deviations from the mean will be used for the statistical testing of a given feature.
`V_min_qlt`	A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers due to the estimated moment condition variance. This value should be between 0 and 1. Only samples whose estimated moment condition variance fall outside the bottom `V_min_qlt` quantile will be used for the statistical testing of a given feature.
`parallel`	A logical value indicating whether to use parallel computing (possible when using a multi-core machine).
`num_cores`	A numeric value indicating the number of cores to use (activated only if `parallel == TRUE`). If `num_cores == NULL` then all available cores except for one will be used.
`log_file`	A path to an output log file. Note that if the file `log_file` already exists then logs will be appended to the end of the file. Set `log_file` to `NULL` to prevent output from being saved into a file; note that if `verbose == FALSE` then no output file will be generated regardless of the value of `log_file`.
`verbose`	A logical value indicating whether to print logs.
`debug`	A logical value indicating whether to set the logger to a more detailed debug level; set `debug` to `TRUE` before reporting issues.

Details

Under no distribution assumption, we can solve for the following weighted least square problem, which is similar to the heteroskedastic regression view described in association_parametric.

\hat{\phi_j}^{\text{asym}} = \text{argmin}_{\phi_j} (x_j - S\phi_j) ^T Q_j (x_j - S\phi_j)

S denotes the design matrix formed by stacking samples in the rows and dependent variables \{\{w_i\}, \{w_i c_i^{(1)}\}, \{c_i^{(2)}\}\} on the columns. \phi_j denotes the corresponding effect sizes on the dependent variables. Q_j denotes the feature-specific weighting scheme. Similar to the parametric counterpart, Q_j=\text{diag}(q_{1j}^2,...,q_{nj}^2), where for each sample i, its corresponding weight will be the inverse of the estimated variance in the mixture: q_{ij}^2 = \frac{1}{sum(w_i w_i^T \odot \hat{\Sigma}_j)}. Marginal testing can thus be carried out on each dependent variable via the asymptotic distribution of the estimator \hat{\phi_j}^{\text{asym}}.

Value

An updated Unico.mdl object with the the following list of effect size and p-value estimates stored in an additional key specified by slot_name

`gammas_hat`	An `m` by `k*p1` matrix of the estimated effects of the `p1` covariates in `C1` on each of the `m` features in `X`, where the first `p1` columns are the source-specific effects of the `p1` covariates on the first source, the following `p1` columns are the source-specific effects on the second source and so on.
`betas_hat`	An `m` by `p2` matrix of the estimated effects of the `p2` covariates in `C2` on the mixture values of each of the `m` features in `X`.
`gammas_hat_pvals`	An `m` by `k*p1` matrix of p-values for the estimates in `gammas_hat` (based on a T-test).
`betas_hat_pvals`	An `m` by `p2` matrix of p-values for the estimates in `betas_hat` (based on a T-test).
`Q`	An `m` by `n` matrix of weights used for controlling the heterogeneity of each observation at each feature (activated only if `debug == TRUE`).
`masks`	An `m` by `n` matrix of logical values indicating whether observation participated in statistical testing at each feature (activated only if `debug == TRUE`).
`fphi_hat`	An `m` by `n` matrix containing the entire estimated moment condition variance for each feature. Note that observations who are considered as outliers due to any of the criteria will be marked as -1 in the estimated moment condition variance (activated only if `debug == TRUE`).
`phi_hat`	An `m` by `k+p1*k+p2` matrix containing the entire estimated effect sizes (including those on source weights) for each feature (activated only if `debug == TRUE`).
`phi_se`	An `m` by `k+p1*k+p2` matrix containing the estimated standard errors associated with `phi_hat` for each feature (activated only if `debug == TRUE`).
`phi_hat_pvals`	An `m` by `k+p1*k+p2` matrix containing the p-values associated with `phi_hat` for each feature (activated only if `debug == TRUE`).

Examples

data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)
res$params.hat = association_asymptotic(data$X, res$params.hat, parallel=FALSE, log_file=NULL)

[Package Unico version 0.1.0 Index]