Unico {Unico} | R Documentation |
Fitting the Unico model
Description
Fits the Unico model for an input matrix of features by observations that are coming from a mixture of k
sources, under the assumption that each observation is a mixture of unique (unobserved) source-specific values (in each feature in the data). Specifically, for each feature, it standardizes the data and learns the source-specific mean and full k
by k
variance-covariance matrix.
Usage
Unico(
X,
W,
C1,
C2,
fit_tau = FALSE,
mean_penalty = 0,
var_penalty = 0.01,
covar_penalty = 0.01,
mean_max_iterations = 2,
var_max_iterations = 3,
nloptr_opts_algorithm = "NLOPT_LN_COBYLA",
max_stds = 2,
init_weight = "default",
max_u = 1,
max_v = 1,
parallel = TRUE,
num_cores = NULL,
log_file = "Unico.log",
verbose = FALSE,
debug = FALSE
)
Arguments
X |
An |
W |
An |
C1 |
An |
C2 |
An |
fit_tau |
A logical value indicating whether to fit the standard deviation of the measurement noise (i.e. the i.i.d. component of variation in the model denoted as |
mean_penalty |
A non-negative numeric value indicating the regularization strength on the source-specific mean estimates. |
var_penalty |
A non-negative numeric value indicating the regularization strength on the diagonal entries of the full |
covar_penalty |
A non-negative numeric value indicating the regularization strength on the off diagonal entries of the full |
mean_max_iterations |
A non-negative numeric value indicating the number of iterative updates performed on the mean estimates. |
var_max_iterations |
A non-negative numeric value indicating the number of iterative updates performed on the variance-covariance matrix. |
nloptr_opts_algorithm |
A string indicating the optimization algorithm to use. |
max_stds |
A non-negative numeric value indicating, for each feature, the portions of data that are considered as outliers. Only samples within |
init_weight |
A string indicating the initial weights on the samples to start the iterative optimization. |
max_u |
A non-negative numeric value indicating the maximum weights/influence a sample can have on mean estimates. |
max_v |
A non-negative numeric value indicating the maximum weights/influence a sample can have on variance-covariance estimates. |
parallel |
A logical value indicating whether to use parallel computing (possible when using a multi-core machine). |
num_cores |
A numeric value indicating the number of cores to use (activated only if |
log_file |
A path to an output log file. Note that if the file |
verbose |
A logical value indicating whether to print logs. |
debug |
A logical value indicating whether to set the logger to a more detailed debug level; set |
Details
Unico assumes the following model:
The mixture value at sample feature
:
is modeled as a weighted linear combination, specified by weights
, of a total of
source-specific levels, specified by
.
In addition, we also consider global-level covariates
that systematically affect the observed mixture values and their effect sizes
.
denotes the i.i.d measurement noise with variance
across all samples.
Weights have be to non-negative and sum up to
across all sources for each sample. In practice, we assume that the weights are fixed and estimated by external methods.
Source specific profiles are further modeled as:
denotes the population level mean of feature
at source
.
We also consider covariates
that systematically affect the source-specific values and their effect sizes
on each source.
Finally, we actively model the
by
covariance structure of a given feature
across all
sources
.
Value
A list with the estimated parameters of the model. This list can be then used as the input to other functions such as tensor.
W |
An |
C1 |
An |
C2 |
An |
mus_hat |
An |
gammas_hat |
An |
betas_hat |
An |
sigmas_hat |
An |
taus_hat |
An |
scale.factor |
An |
config |
A list with hyper-parameters used for fitting the model and configurations for in the optimization algorithm. |
Us_hat_list |
A list tracking, for each feature, the sample weights used for each iteration of the mean optimization (activated only if |
Vs_hat_list |
A list tracking, for each feature, the sample weights used for each iteration of the variance-covariance optimization (activated only if |
Ls_hat_list |
A list tracking, for each feature, the computed estimates of the upper triangular cholesky decomposition of variance-covariance matrix at each iteration of the variance-covariance optimization (activated only if |
sigmas_hat_list |
A list tracking, for each feature, the computed estimates of the variance-covariance matrix at each iteration of the variance-covariance optimization (activated only if |
Examples
data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)